DeepSeek-R1-Lite-Preview seems to beat DeepSeek V3 on multiple benchmarks, so why is V3 getting so much more hype?
Discalimer - I'm having trouble finidng direct comparisons between these models, which is wierd because they are from the same company? I got my numbers from https://github.com/deepseek-ai/DeepSeek-V3/blob/main/DeepSeek_V3.pdf and https://api-docs.deepseek.com/news/news1120
It looks like a mixed bag. For example, r1-lite-preview scores a 52.5 on AIME while v3 scores a 39.2
On the other hand, v3 beats r1-lite-preview on GPQA Dimaond with a 59.5 to a 58.5
On Codeforces, both are listed using different units, but I think r1-preview wins? r1-lite-preview is listed as a 1450, while v3 gets a 51.6th percentile. I'm pretty sure 1450 beats the 51.6th percentile looking at the codeforces website but I could be wrong.
I understand that r1-lite-preview is being marketed as a reasoning model, but if you read the v3 paper they say that it is also trained as a reasoner via distillation of r1 (I think it is the full r1 they are using for distillation, not preview, but I couldn't tell in the paper).
Anyway this is not an attack on DeepSeek, they've made two amazing models.