47.9M 1 week ago

DeepSeek-R1 is a family of open reasoning models with performance approaching that of leading models, such as O3 and Gemini 2.5 Pro.

thinking 1.5b 7b 8b 14b 32b 70b 671b

1 week ago

6995872bfe4c · 5.2GB

qwen3
·
8.19B
·
Q4_K_M
{{- if .System }}{{ .System }}{{ end }} {{- range $i, $_ := .Messages }} {{- $last := eq (len (slice
MIT License Copyright (c) 2023 DeepSeek Permission is hereby granted, free of charge, to any perso
{ "stop": [ "<|begin▁of▁sentence|>", "<|end▁of▁sentence|>",

Readme

DeepSeek-R1 has received a minor version upgrade to DeepSeek-R1-0528 for the 8 billion parameter distilled model and the full 671 billion parameter model. In this update, DeepSeek R1 has significantly improved its reasoning and inference capabilities. The model has demonstrated outstanding performance across various benchmark evaluations, including mathematics, programming, and general logic. Its overall performance is now approaching that of leading models, such as O3 and Gemini 2.5 Pro.

image.png

Models

DeepSeek-R1-0528-Qwen3-8B

ollama run deepseek-r1

DeepSeek-R1

ollama run deepseek-r1:671b

Note: to update the model from an older version, run ollama pull deepseek-r1

Distilled models

DeepSeek team has demonstrated that the reasoning patterns of larger models can be distilled into smaller models, resulting in better performance compared to the reasoning patterns discovered through RL on small models.

Below are the models created via fine-tuning against several dense models widely used in the research community using reasoning data generated by DeepSeek-R1. The evaluation results demonstrate that the distilled smaller dense models perform exceptionally well on benchmarks.

DeepSeek-R1-0528-Qwen3-8B

ollama run deepseek-r1:8b

DeepSeek-R1-Distill-Qwen-1.5B

ollama run deepseek-r1:1.5b

DeepSeek-R1-Distill-Qwen-7B

ollama run deepseek-r1:7b

DeepSeek-R1-Distill-Qwen-14B

ollama run deepseek-r1:14b

DeepSeek-R1-Distill-Qwen-32B

ollama run deepseek-r1:32b

DeepSeek-R1-Distill-Llama-70B

ollama run deepseek-r1:70b

License

The model weights are licensed under the MIT License. DeepSeek-R1 series support commercial use, allow for any modifications and derivative works, including, but not limited to, distillation for training other LLMs. Please note that:

The Qwen distilled models are derived from Qwen-2.5 series, which are originally licensed under Apache 2.0 License, and now finetuned with 800k samples curated with DeepSeek-R1.

The Llama 8B distilled model is derived from Llama3.1-8B-Base and is originally licensed under llama3.1 license.

The Llama 70B distilled model is derived from Llama3.3-70B-Instruct and is originally licensed under llama3.3 license.