SWE-bench Pro

Coding

Harder version of SWE-bench for professional coding tasks.

Metrics
Resolved (%)

How to Run

Same as SWE-bench Verified with harder test subset

Leaderboard

Rank Model Provider Parameters Score
1 GPT-5.2 Thinking OpenAI Unknown 55.6%