DeepSeek, a Chinese AI company established by Liang Wenfeng, unveiled its most recent model, DeepSeek V3.
The model, released on Thursday, is one of the most potent open-source AI rivals, making this release a milestone.
Read also: How Agentic AI is redefining CRM, workflow automation at Creation AI
Performance and specifications
DeepSeek V3 boasts an impressive 671 billion parameters, surpassing the previous record held by LLaMA, which had 405 billion.
The model is proficient in various tasks, such as coding, translation, and essay writing, and has been trained on an astonishing 14.8 trillion tokens.
According to internal benchmarks, it outperforms open and closed models like OpenAI’s GPT-4o and Meta’s Llama 3.1 in coding competitions hosted on platforms such as Codeforces.
The model’s architecture employs a Mixture-of-Experts (MoE) design, allowing it to activate only a portion of its parameters during inference, thus optimising performance while maintaining efficiency.
DeepSeek V3 is reportedly three times faster than its predecessor and fully open-source. This enables developers to modify and utilise it under a permissive license for commercial applications.
Read also: Chrome to enhance security with AI-driven scam detection
Innovations and future developments
Alongside the launch of DeepSeek V3, the company is working on an innovative feature called Deep Roles. This feature will allow users to create and share personalised roles within the DeepSeek ecosystem, similar to Custom GPTs.
Although still in development, it promises to enhance user interaction with the model by enabling tailored experiences in both Chinese and English.
DeepSeek’s commitment to open-source development reflects its strategy to challenge the closed-source models of competitors like OpenAI.
The company promotes AI community participation and technological advancement.
Despite facing regulatory challenges in China that filter responses on sensitive topics, DeepSeek continues to push boundaries in AI research and application
The development of DeepSeek V3 was achieved with remarkable efficiency; it was trained using Nvidia H800 GPUs over just two months for $5.5 million—a fraction compared to other major models like GPT-4.
This shows DeepSeek’s resilience in overcoming international technology acquisition regulations.
Leave a Reply