Authors:
DeepSeek-AI,
Anyi Xu,
Bangcai Lin,
Bing Xue,
Bingxuan Wang,
Bingzheng Xu,
Bochao Wu,
Bowei Zhang,
Chaofan Lin,
Chen Dong,
Chenchen Ling,
Chengda Lu,
Chenggang Zhao,
Chengqi Deng,
Chengyu Hou,
Chenhao Xu,
Chenze Shao,
Chong Ruan,
Conner Sun,
Damai Dai,
Daya Guo,
Dejian Yang,
Deli Chen,
Donghao Li,
Dongjie Ji,
Erhang Li,
Fang Wei,
Fangyun Lin,
Fangzhou Yuan,
Feiyu Xia,
Fucong Dai,
Guangbo Hao,
Guanting Chen,
Guoai Cao,
Guolai Meng,
Guowei Li,
Han Yu,
Han Zhang,
Hanwei Xu,
Hao Li,
Haofen Liang,
Haoling Zhang,
Haoming Luo,
Haoran Wei,
Haotian Yuan,
Haowei Zhang,
Haowen Luo,
Haoyu Chen,
Haozhe Ji,
Hengqing Zhang,
Honghui Ding,
Hongxuan Tang,
Huanqi Cao,
Huazuo Gao,
Hui Qu,
Hui Zeng,
J Yang,
JQ Zhu,
Jia Luo,
Jia Song,
Jia Yu,
Jialiang Huang,
Jialu Cai,
Jian Liang,
Jiangting Zhou,
Jiasheng Ye,
Jiashi Li,
Jiaxin Xu,
Jiewen Hu,
Jieyu Yang,
Jin Chen,
Jin Yan,
Jingchang Chen,
Jingli Zhou,
Jingting Xiang,
Jingyang Yuan,
Jingyuan Cheng,
Jingzi Zhou,
Jinhua Zhu,
Jiping Yu,
Joseph Sun,
Jun Ran,
Junguang Jiang,
Junjie Qiu,
Junlong Li,
Junmin Zheng,
Junxiao Song,
Kai Dong,
Kaige Gao,
Kang Guan,
Kexing Zhou,
Kezhao Huang,
Kuai Yu,
Lean Wang,
Lecong Zhang,
Lei Wang,
Leyi Xia,
Li Zhang,
Liang Zhao,
Lihua Guo,
Lingxiao Luo,
Linwang Ma,
Linyan Zhu,
Litong Wang,
Liyu Cai,
Liyue Zhang,
Longhao Chen,
MS Di,
MY Xu,
Max Mei,
Miaojun Wang,
Mingchuan Zhang,
Minghua Zhang,
Minghui Tang,
Mingming Li,
Mingxu Zhou,
Minmin Han,
Ning Wang,
Panpan Huang,
Panpan Wang,
Peixin Cong,
Peiyi Wang,
Peng Zhang,
Qiancheng Wang,
Qihao Zhu,
Qingyang Li,
Qinyu Chen,
Qiushi Du,
Qiwei Jiang,
Rui Tian,
Ruifan Xu,
Ruijie Lu,
Ruiling Xu,
Ruiqi Ge,
Ruisong Zhang,
Ruizhe Pan,
Runji Wang,
Runqian Chen,
Runqiu Yin,
Runxin Xu,
Ruomeng Shen,
Ruoyu Zhang,
Ruyi Chen,
SH Liu,
Shanghao Lu,
Shangmian Sun,
Shangyan Zhou,
Shanhuang Chen,
Shaofei Cai,
Shaoheng Nie,
Shaoqing Wu,
Shaoyuan Chen,
Shengding Hu,
Shengyu Liu,
Shiqiang Hu,
Shirong Ma,
Shiyu Wang,
Shuiping Yu,
Shunfeng Zhou,
Shuting Pan,
Shuying Yu,
Songyang Zhou,
Tao Ni,
Tao Yun,
Tian Jin,
Tian Pei,
Tian Ye,
Tianle Lin,
Tianran Ji,
Tianyi Cui,
Tianyuan Yue,
Tingting Yu,
Tun Wang,
W Zhang,
WL Xiao,
Wangding Zeng,
Wei An,
Weilin Zhao,
Wen Liu,
Wenfeng Liang,
Wenjie Pang,
Wenjing Luo,
Wenjing Yao,
Wenjun Gao,
Wenkai Yang,
Wenlve Huang,
Wenqing Hou,
Wentao Zhang,
Wenting Ma,
Xi Gao,
Xiang He,
Xiangwen Wang,
Xianzu Wang,
Xiao Bi,
Xiaodong Liu,
Xiaohan Wang,
Xiaokang Chen,
Xiaokang Zhang,
Xiaotao Nie,
Xiaowen Sun,
Xiaoxiang Wang,
Xin Cheng,
Xin Liu,
Xin Xie,
Xingchao Liu,
Xingchen Liu,
Xingkai Yu,
Xingyou Li,
Xinyu Yang,
Xinyu Zhang,
Xu Chen,
Xuanyu Wang,
Xuecheng Su,
Xueyin Chen,
Xuheng Lin,
Xuwei Fu,
YC Yan,
YQ Wang,
YW Ma,
Yanfeng Luo,
Yang Zhang,
Yanhong Xu,
Yanru Ma,
Yanwen Huang,
Yao Li,
Yao Li,
Yao Xu,
Yao Zhao,
Yaofeng Sun,
Yaohui Wang,
Yi Qian,
Yi Shao,
Yi Yu,
Yichao Zhang,
Yifan Ding,
Yifan Shi,
Yijia Wu,
Yiliang Xiong,
Yiling Ma,
Ying He,
Ying Tang,
Ying Zhou,
Yingjia Luo,
Yinmin Zhong,
Yishi Piao,
Yisong Wang,
Yixiang Zhang,
Yixiao Chen,
Yixuan Tan,
Yixuan Wei,
Yiyang Ma,
Yiyuan Liu,
Yonglun Yang,
Yongqiang Guo,
Yongtong Wu,
Yu Wu,
YuKun Li,
Yuan Cheng,
Yuan Ou,
Yuanfan Xu,
Yuanhao Li,
Yuduan Wang,
Yuehan Yang,
Yuer Xu,
Yuhan Wu,
Yuhao Meng,
Yuheng Zou,
Yukun Zha,
Yunfan Xiong,
Yupeng Chen,
Yuping Lin,
Yuqian Cao,
Yuqian Wang,
Yushun Zhang,
Yuting Yan,
Yutong Lin,
Yuxian Gu,
Yuxiang Luo,
Yuxiang You,
Yuxuan Liu,
Yuxuan Zhou,
Yuyang Zhou,
Yuzhen Huang,
ZF Wu,
Zehao Wang,
Zehua Zhao,
Zehui Ren,
Zekai Zhang,
Zhangli Sha,
Zhe Fu,
Zhe Ju,
Zhean Xu,
Zhenda Xie,
Zhengyan Zhang,
Zheren Gao,
Zhewen Hao,
Zhibin Gou,
Zhicheng Ma,
Zhigang Yan,
Zhihong Shao,
Zhixian Huang,
Zhixuan Chen,
Zhiyu Wu,
Zhizhou Ren,
Zhongyu Wu,
Zhuoshu Li,
Zhuping Zhang,
Zian Xu,
Zihao Wang,
Zihua Qu,
Zihui Gu,
Zijia Zhu,
Zilin Li,
Zipeng Zhang,
Ziwei Xie,
Ziyi Gao,
Ziyi Wan,
Zizheng Pan,
Zongqing Yao
et al. (289 additional authors not shown)
We present a preview version of DeepSeek-V4 series, including two strong Mixture-of-Experts (MoE) language models -- DeepSeek-V4-Pro with 1.6T parameters (49B activated) and DeepSeek-V4-Flash with 284B parameters (13B activated) -- both supporting a context length of one million tokens. DeepSeek-V4 series incorporate several key upgrades in architecture and optimization: (1) a hybrid attention architecture that combines Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA) to improve long-context efficiency; (2) Manifold-Constrained Hyper-Connections (mHC) that enhance conventional residual connections; (3) and the Muon optimizer for faster convergence and greater training stability. We pre-train both models on more than 32T diverse and high-quality tokens, followed by a comprehensive post-training pipeline that unlocks and further enhances their capabilities. DeepSeek-V4-Pro-Max, the maximum reasoning effort mode of DeepSeek-V4-Pro, redefines the state-of-the-art for open models, outperforming its predecessors in core tasks. Meanwhile, DeepSeek-V4 series are highly efficient in long-context scenarios. In the one-million-token context setting, DeepSeek-V4-Pro requires only 27% of single-token inference FLOPs and 10% of KV cache compared with DeepSeek-V3.2. This enables us to routinely support one-million-token contexts, thereby making long-horizon tasks and further test-time scaling more feasible. The model checkpoints are available at https://huggingface.co/collections/deepseek-ai/deepseek-v4.
Subjects:
Computation and Language
,
Artificial Intelligence
Publish: 2026-04-26 14:49:33 UTC