HumanEval

Coding

Function-level code generation with unit tests.

Metrics
pass@1 (%)

How to Run

pip install human-eval && generate_samples MODEL && evaluate_functional_correctness samples.jsonl

Leaderboard

Rank Model Provider Parameters Score
1 DeepSeek V3 DeepSeek Unknown 91.5%