2401.02954

Total: 1

#1 DeepSeek LLM: Scaling Open-Source Language Models with Longtermism [PDF105] [Copy] [Kimi210] [REL]

Authors: DeepSeek-AI : Xiao Bi ; Deli Chen ; Guanting Chen ; Shanhuang Chen ; Damai Dai ; Chengqi Deng ; Honghui Ding ; Kai Dong ; Qiushi Du ; Zhe Fu ; Huazuo Gao ; Kaige Gao ; Wenjun Gao ; Ruiqi Ge ; Kang Guan ; Daya Guo ; Jianzhong Guo ; Guangbo Hao ; Zhewen Hao ; Ying He ; Wenjie Hu ; Panpan Huang ; Erhang Li ; Guowei Li ; Jiashi Li ; Yao Li ; Y. K. Li ; Wenfeng Liang ; Fangyun Lin ; A. X. Liu ; Bo Liu ; Wen Liu ; Xiaodong Liu ; Xin Liu ; Yiyuan Liu ; Haoyu Lu ; Shanghao Lu ; Fuli Luo ; Shirong Ma ; Xiaotao Nie ; Tian Pei ; Yishi Piao ; Junjie Qiu ; Hui Qu ; Tongzheng Ren ; Zehui Ren ; Chong Ruan ; Zhangli Sha ; Zhihong Shao ; Junxiao Song ; Xuecheng Su ; Jingxiang Sun ; Yaofeng Sun ; Minghui Tang ; Bingxuan Wang ; Peiyi Wang ; Shiyu Wang ; Yaohui Wang ; Yongji Wang ; Tong Wu ; Y. Wu ; Xin Xie ; Zhenda Xie ; Ziwei Xie ; Yiliang Xiong ; Hanwei Xu ; R. X. Xu ; Yanhong Xu ; Dejian Yang ; Yuxiang You ; Shuiping Yu ; Xingkai Yu ; B. Zhang ; Haowei Zhang ; Lecong Zhang ; Liyue Zhang ; Mingchuan Zhang ; Minghua Zhang ; Wentao Zhang ; Yichao Zhang ; Chenggang Zhao ; Yao Zhao ; Shangyan Zhou ; Shunfeng Zhou ; Qihao Zhu ; Yuheng Zou

The rapid development of open-source large language models (LLMs) has been truly remarkable. However, the scaling law described in previous literature presents varying conclusions, which casts a dark cloud over scaling LLMs. We delve into the study of scaling laws and present our distinctive findings that facilitate scaling of large scale models in two commonly used open-source configurations, 7B and 67B. Guided by the scaling laws, we introduce DeepSeek LLM, a project dedicated to advancing open-source language models with a long-term perspective. To support the pre-training phase, we have developed a dataset that currently consists of 2 trillion tokens and is continuously expanding. We further conduct supervised fine-tuning (SFT) and Direct Preference Optimization (DPO) on DeepSeek LLM Base models, resulting in the creation of DeepSeek Chat models. Our evaluation results demonstrate that DeepSeek LLM 67B surpasses LLaMA-2 70B on various benchmarks, particularly in the domains of code, mathematics, and reasoning. Furthermore, open-ended evaluations reveal that DeepSeek LLM 67B Chat exhibits superior performance compared to GPT-3.5.

Subjects: Computation and Language ; Artificial Intelligence ; Machine Learning

Publish: 2024-01-05 18:59:13 UTC