SWE-bench Pro
CodingHarder version of SWE-bench for professional coding tasks.
- Metrics
- Resolved (%)
How to Run
Same as SWE-bench Verified with harder test subset
Leaderboard
| Rank | Model | Provider | Parameters | Score |
|---|---|---|---|---|
| 1 | GPT-5.2 Thinking | OpenAI | Unknown | 55.6% |