Paper + OSS Models

https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf

Training

deepseek_r1.png

Cold Start (Fine-Tune)

RL Stage 1

Rejection Sampling

Fine-Tune