Panebench dashboard
Compare score, tokens, cost, and duration across the latest benchmark runs, then inspect the exact problem statement and test cases behind each run.
View type
Run `bun run bench` first, then refresh this page.