Daya Benchmark Leaderboard

About Daya Benchmark

Daya Benchmark is a dynamic AI model evaluation leaderboard maintained by the Daya team. It is designed to provide comprehensive, objective, and timely performance assessments across the models and provider channels available on the Daya platform.

Benchmark Highlights

Full cross-channel coverage: We test every available provider channel for each model independently. For example, if the GPT-4 model is available via both OpenAI and Azure, we evaluate each channel separately to reflect differences in performance, stability, and other factors across providers.

Fully open and transparent: We publish our methodology, scoring rules, and benchmark updates so users can understand how rankings are produced and compare results over time.

Dataset and Methodology

We use Scale AI’s publicly released dataset, Humanity's Last Exam (Text Only), as our primary evaluation benchmark. This dataset spans a broad range of knowledge domains and reasoning capabilities and is widely recognized in the industry as a high-quality AI evaluation standard.

For detailed information about the dataset, see: https://scale.com/leaderboard/humanitys_last_exam_text_only

Evaluation Methodology

We run full-scale testing for each model to deliver the most comprehensive performance assessment. However, due to content filtering policies and other technical constraints from certain vendors, some models may be unable to complete all questions.

To address this fairly, we apply the following scoring approach:

Scoring: Use the number of questions the model successfully responded to as the total when computing scores.
Cost normalization: Adjust cost proportionally based on the actual completion rate to ensure fairness in cost-effectiveness comparisons.
Transparency: Detailed success rates and completion status for each model can be found in the raw test data for the corresponding provider source in each benchmark release.

Our Goal

Daya Benchmark aims to build a dynamically updated real-time leaderboard that helps teams track the latest performance of AI models. We will continue refining our methodology, expanding evaluation dimensions, and giving users clearer guidance for model selection.

Feedback and Suggestions

We welcome community feedback and suggestions. If you have ideas about the evaluation methodology, result analysis, or leaderboard features, please open an issue in the Daya repository: https://github.com/ZikaiSun/daya/issues

Thank you for your interest in and support of Daya Benchmark.

Daya Benchmark Leaderboard ​

About Daya Benchmark ​

Benchmark Highlights ​

Dataset and Methodology ​

Evaluation Methodology ​

Our Goal ​

Feedback and Suggestions ​