Back to CAD-Bench

Leaderboard

Per-cell results across the full benchmark run. Each row is one (agent, model) pair scored on 100 tasks. Scores are 0–100 percentage points; the combined score is the harmonic mean of geometry similarity and CAD/spec consistency.

RankModelAgentAgent Ver.Geom ScoreSpec ScoreCombinedTokens (mean)Tokens (total)Cost (mean)Cost (total)CountDate
1
gpt-5.5codex0.130.080.893.983.21.12M112.39M$1.700$170.001002026-05-13
2
google/gemini-3.1-pro-previewmini-swe-agent2.2.874.295.479.0589.37K58.94M$0.708$70.821002026-05-13
3
gpt-5.5mini-swe-agent2.2.871.489.374.4132.47K13.25M$0.423$42.351002026-05-13
4
claude-opus-4-7mini-swe-agent2.2.869.389.073.4240.7K24.07M$0.298$29.841002026-05-13
5
gemini-3.1-pro-previewgemini-cli0.42.068.983.972.9741.76K74.18M$0.513$51.281002026-05-13
6
claude-opus-4-7claude-code2.1.14062.483.365.5709.03K70.9M$0.732$73.251002026-05-13
7
claude-sonnet-4-6claude-code2.1.14047.868.351.81.17M117.13M$1.014$96.341002026-05-13
8
claude-haiku-4-5claude-code2.1.14023.354.128.42.09M209.02M$0.247$24.671002026-05-13
9
google/gemini-3.1-flash-lite-previewmini-swe-agent2.2.819.647.624.1177.73K17.77M$0.030$3.001002026-05-13
10
gemini-3.1-flash-lite-previewgemini-cli0.42.010.024.211.91.4M139.84M$0.076$7.631002026-05-13