How to Benchmark DeepSeek-R1 Distilled Models on GPQA Using Ollama and OpenAI’s simple-evals | Towards Data Science

Set up and run the GPQA-Diamond benchmark on DeepSeek-R1’s distilled models locally to evaluate its reasoning capabilities.

By Ember Recon · March 16, 2026 · 1 min read

Source: Towards Data Science

Set up and run the GPQA-Diamond benchmark on DeepSeek-R1’s distilled models locally to evaluate its reasoning capabilities.