How to Benchmark DeepSeek-R1 Distilled Models on GPQA Using Ollama and OpenAI’s simple-evals | Towards Data Science
Set up and run the GPQA-Diamond benchmark on DeepSeek-R1’s distilled models locally to evaluate its reasoning capabilities.

Source: Towards Data Science
Set up and run the GPQA-Diamond benchmark on DeepSeek-R1’s distilled models locally to evaluate its reasoning capabilities.