Panebench dashboard

Model results at a glance.

Compare score, tokens, cost, and duration across the latest benchmark runs, then inspect the exact problem statement and test cases behind each run.

Score