Benchmark Performance
OpenAI reports that the o3 model has achieved unprecedented results across several benchmarks:
Coding Proficiency: The o3 model surpasses previous performance records, achieving a 22.8% improvement over its predecessor in coding tests, and even outperforms OpenAI’s Chief Scientist in competitive programming scenarios.
Mathematical Reasoning: In the 2024 American Invitational Mathematics Exam (AIME), o3 nearly achieved a perfect score, missing only one question. Additionally, it solved 25.2% of problems on the Frontier Math benchmark by EpochAI, a significant leap from previous models that did not exceed 2%.
Scientific Understanding: The model attained an 87.7% score on the GPQA Diamond benchmark, which comprises graduate-level questions in biology, physics, and chemistry.