Benchmarks

All Coding Japanese Knowledge Math Overall Reasoning Vision
Japanese

ELYZA-tasks-100

100 questions requiring Japanese knowledge and reasoning by ELYZA.

Metrics: Score (0-5)
Japanese

JCommonsenseQA

Japanese commonsense reasoning QA dataset with 5-choice questions.

Metrics: Accuracy (%)
Japanese

JGLUE

Japanese General Language Understanding Evaluation - text classification, sentence pairs, QA.

Metrics: Accuracy (%)
Japanese

Japanese MT-Bench

Japanese version of MT-Bench for multi-turn conversation evaluation.

Metrics: Score (1-10)
Japanese

Nejumi 4

Comprehensive Japanese LLM evaluation covering reasoning, knowledge, coding, safety.

Metrics: Score (0-1)