SWE-bench Verified

Coding

Real GitHub issues from popular repositories. Gold standard for coding.

Metrics
Resolved (%)

How to Run

git clone https://github.com/princeton-nlp/SWE-bench && cd SWE-bench && pip install -e . && python run_evaluation.py

Leaderboard

Rank Model Provider Parameters Score
1 Claude Opus 4.5 Anthropic Unknown 80.9%
2 GPT-5.2 OpenAI Unknown 80.0%
3 Gemini 3 Pro Google Unknown 76.2%
4 DeepSeek V3 DeepSeek Unknown 67.1%