> 自媒体 > (AI)人工智能 > DeepSeek Unveils Its Most Powerful Open-source Innovation, Challenging GPT-5 and Gemini 3
DeepSeek Unveils Its Most Powerful Open-source Innovation, Challenging GPT-5 and Gemini 3
来源:钛媒体APP
2025-12-05 08:17:41
148
管理

Source: DeepSeek official WeChat

Meanwhile, DeepSeek also stated that compared to the recently released Kimi-K2-Thinking from the domestic large model developer Moonshot AI, DeepSeek V3.2 has significantly reduced output length, which greatly decreases computational overhead and user wait time. In agent benchmarking, V3.2 also outperformed other open-source models such as Kimi-K2-Thinking and MiniMax M2, making it the strongest open-source large model to date. Its overall performance is now extremely close to that of the top closed-source models.

Image from DeepSeek official WeChat

What’s even more noteworthy is V3.2’s performance in certain Q&A scenarios and general agent tasks. In a specific case involving travel advice, for example, V3.2 leveraged deep reasoning along with web crawling and search engine tools to provide highly detailed and accurate travel tips and recommendations. The latest API update for V3.2 also supports tool usage in “thinking mode” for the first time, greatly enriching the usefulness and breadth of answers users receive.

In addition, DeepSeek specifically emphasized that V3.2 was not specially trained on the tools featured in these evaluation datasets.

We’ve observed that while benchmark scores for large models are climbing, these models often make basic factual errors in everyday user interactions (a criticism especially directed at GPT-5 upon its release). Against this backdrop, DeepSeek has made a point of highlighting with each update that it avoids relying solely on correct answers as a reward mechanism. As a result, they have not produced a so-called “super-intelligent brain” that appears clever in benchmarks yet fails at simple tasks and questions that matter to ordinary users—a “low EQ” AI agent.

Overcoming this challenge at a fundamental level—becoming a large model with both high IQ and high EQ—is the key to developing a truly versatile, reliable, and efficient AI agent. DeepSeek also believes that V3.2 can demonstrate strong generalization capabilities in real-world application scenarios.

In order to strike a balance between computational efficiency, powerful reasoning capabilities, and agent performance, DeepSeek has implemented comprehensive optimizations across training, integration, and application layers. According to its technical paper, V3.2 introduces DSA (DeepSeek Sparse Attention mechanism), which significantly reduces computational complexity in long-context scenarios while maintaining model performance.

At the same time, to integrate reasoning capabilities into tool-using scenarios, DeepSeek has developed a new synthesis pipeline that enables systematic, large-scale generation of training data. This approach facilitates scalable agent post-training optimization, substantially improving generalization in complex, interactive environments as well as the model’s ability to follow instructions.

In addition, as mentioned earlier, V3.2 is also the first model from DeepSeek to incorporate reasoning into tool usage, greatly enhancing the model’s generalization capabilities.

If the focus of V3.2 is on “saying things that make sense and getting things done”—a balance-seeking approach for practical intelligent agents—then the positioning of the “Special Forces” V3.2 Speciale is to push the reasoning ability of open-source models to the limit and explore the boundaries of model capabilities through extended reasoning.

It’s worth noting that a major highlight of V3.2 Speciale is its integration of the theorem-proving capabilities from DeepSeek-Math-V2, the most powerful mathematical large model released just last week.

Math-V2 not only achieved gold-medal-level performance in the 2025 International Mathematical Olympiad and the 2024 China Mathematical Olympiad, but also outperformed Gemini 3 in the IMO-Proof Bench benchmark evaluation.

Moreover, in a similar vein to previously discussed approaches, this mathematical model is also striving to overcome the limitations of correct-answer reward mechanisms and the so-called “test-solver” identity by adopting a self-verification process. In doing so, it seeks to break through the current bottlenecks in AI’s deep reasoning, enabling large models to truly understand mathematics and logical derivations; as a result, it aims to achieve more robust, reliable, and versatile theorem-proving capabilities.

With its greatly enhanced reasoning abilities, V3.2 Speciale has achieved Gemini 3.0 Pro-level results in mainstream reasoning benchmarks. However, V3.2 Speciale’s performance advantages come at the cost of consuming a large number of tokens, which significantly increases its operational costs. As a result, it currently does not support tool calls or everyday conversation and writing, and is intended for research use only.

From OCR to Math-V2, then to V3.2 and V3.2 Speciale, each of DeepSeek’s recent product launches has been met with widespread praise. At the same time, these releases have not only brought significant improvements in overall capabilities, but also continually clarified the main development trajectories of “practicality” and “generalization”.

In the second half of 2025, with GPT-5, Gemini 3, and Claude Opus 4.5 launching one after another—each outperforming the last in benchmark tests—and with DeepSeek rapidly catching up, the race to be crowned the “most powerful large model” is already getting crowded. Leading large models are now showing clear distinctions in their training approaches as well as their unique characteristics in real-world performance, setting the stage for an even more exciting competition among large models in 2026. (Author|Hu Jiameng, Editor|Li Chengcheng)

0
点赞
赏礼
赏钱
0
收藏
免责声明:本文仅代表作者个人观点,与本站无关。其原创性以及文中陈述文字和内容未经本网证实,对本文以及其中全部或者 部分内容、文字的真实性、完整性、及时性本站不作任何保证或承诺,请读者仅作参考,并请自行核实相关内容。 凡本网注明 “来源:XXX(非本站)”的作品,均转载自其它媒体,转载目的在于传递更多信息,并不代表本网赞同其观点和对 其真实性负责。 如因作品内容、版权和其它问题需要同本网联系的,请在一周内进行,以便我们及时处理。 QQ:617470285 邮箱:617470285@qq.com
相关文章
三菱退出中国?官方回应:将与现有伙伴继续合作
6月23日,有媒体报道称,三菱汽车将逐步取消包括欧洲、中国在内的市场业..
2026款三菱帕杰罗曝光,第二代超选四驱+2.4T/2.0T双动力..
硬派越野圈的“老将”居然换小排量了?2026款三菱帕杰罗刚露出消息,就把..
恩智浦计划退出5G功率放大器业务;三星或将退出SATA SSD市场;三菱化学出售..
五分钟了解产业大事每日头条芯闻 恩智浦计划退出5G功率放大器业务我国首..
实拍三菱全新欧蓝德!搭1.5T四缸,内饰配大屏,不比奇骏香?..
在重庆车展上,全新一代三菱欧蓝德终于在国内亮相了,相比其国外的发布时..
试驾广汽三菱奕歌:小巧灵动
■ 阅车试驾车型:广汽三菱奕歌长/宽/高(mm):4405/1805/1685轴距(mm..
新车 | 四驱越野MPV/配侧滑门/2.2T柴油机,新款三菱Delica D:5亮相..
文:懂车帝原创 高帅鹏[懂车帝原创 产品] 日前,2025东京车展开幕,新款..
三菱集团的传奇发家史
“三菱”两个字,在日本就像一把瑞士军刀:银行、飞机、汽车、火箭、寿司..
2026款三菱Montero曝光,S-AWC四驱+差速锁全配,普拉多见了..
当 “普拉多见了都得慌” 的话题在越野圈炸锅,2026 款三菱 Montero 的曝..
日韩巨擘数据,三星2.1万亿三菱21万亿,中国第一谁?..
图片来源于网络2025年,让人火大的资本较量又来一波。韩国三星手里握着2...
关于作者
感恩的人(普通会员)
文章
1648
关注
0
粉丝
0
点击领取今天的签到奖励!
签到排行

成员 网址收录40418 企业收录2986 印章生成263552 电子证书1157 电子名片68 自媒体91237

@2022 All Rights Reserved 浙ICP备19035174号-7
0
0
分享
请选择要切换的马甲:

个人中心

每日签到

我的消息

内容搜索