Professional Documents
Culture Documents
DeepSeek-V2: High-Performing Open-Source LLM With MoE Architecture
DeepSeek-V2: High-Performing Open-Source LLM With MoE Architecture
com/
Introduction
However, the path of progress is not without its challenges. MoE models
grapple with issues such as balancing computational costs and the
increasing demand for high-quality outputs. Memory requirements and
fine-tuning also pose significant hurdles. To overcome these challenges,
DeepSeek-AI, a team dedicated to advancing the capabilities of AI
What is DeepSeek-V2?
Model Variant(s)
source - https://arxiv.org/pdf/2405.04434
source - https://arxiv.org/pdf/2405.04434
source - https://arxiv.org/pdf/2405.04434
source - https://arxiv.org/pdf/2405.04434
DeepSeek-V2 Chat (SFT) and DeepSeek-V2 Chat (RL) have also been
evaluated on open-ended benchmarks. Notably, DeepSeek-V2 Chat
(RL) achieves a 38.9 length-controlled win rate on AlpacaEval 2.0, an
8.97 overall score on MT-Bench, and a 7.91 overall score on
AlignBench. These evaluations demonstrate that DeepSeek-V2 Chat
(RL) has top-tier performance among open-source chat models. In
Chinese, DeepSeek-V2 Chat (RL) outperforms all open-source models
and even beats most closed-source models.
source - https://arxiv.org/pdf/2405.04434
Finally, it’s worth mentioning that certain prior studies incorporate SFT
data during the pre-training stage, whereas DeepSeek-V2 has never
been exposed to SFT data during pre-training. Despite this,
DeepSeek-V2 still demonstrates substantial improvements in GSM8K,
MATH, and HumanEval evaluations compared with its base version. This
progress can be attributed to the inclusion of SFT data, which comprises
a considerable volume of math and code-related content. In addition,
DeepSeek-V2 Chat (RL) further boosts the performance on math and
code benchmarks.
However, it’s important to note that these limitations are part of the
current state of AI and are areas of active research. Future work by
DeepSeek-AI and the broader AI community will focus on addressing
these challenges, continually pushing the boundaries of what’s possible
with AI.
Conclusion
Source
research paper : https://arxiv.org/abs/2405.04434
research document : https://arxiv.org/pdf/2405.04434
GitHub repo : https://github.com/deepseek-ai/DeepSeek-V2
Model weights:
https://huggingface.co/deepseek-ai/DeepSeek-V2
https://huggingface.co/deepseek-ai/DeepSeek-V2-Chat