DeepSeek-R1-Lite-Preview seems to beat DeepSeek V3 on multiple benchmarks, so why is V3 getting so much more hype?

Discalimer - I'm having trouble finidng direct comparisons between these models, which is wierd because they are from the same company? I got my numbers from https://github.com/deepseek-ai/DeepSeek-V3/blob/main/DeepSeek_V3.pdf and https://api-docs.deepseek.com/news/news1120

It looks like a mixed bag. For example, r1-lite-preview scores a 52.5 on AIME while v3 scores a 39.2

On the other hand, v3 beats r1-lite-preview on GPQA Dimaond with a 59.5 to a 58.5

On Codeforces, both are listed using different units, but I think r1-preview wins? r1-lite-preview is listed as a 1450, while v3 gets a 51.6th percentile. I'm pretty sure 1450 beats the 51.6th percentile looking at the codeforces website but I could be wrong.

I understand that r1-lite-preview is being marketed as a reasoning model, but if you read the v3 paper they say that it is also trained as a reasoner via distillation of r1 (I think it is the full r1 they are using for distillation, not preview, but I couldn't tell in the paper).

Anyway this is not an attack on DeepSeek, they've made two amazing models.

Madison Howard

Share Your Mood

30299578815310

DeepSeek-R1-Lite-Preview seems to beat DeepSeek V3 on multiple benchmarks, so why is V3 getting so much more hype?