DeepSeek's new AI model appears to be one of the best 'open' challengers.


A Chinese lab has created one of the most powerful “open” AI models to date.

model DeepSeek V3Developed by AI firm DeepSeek, it was released on Wednesday under a license that allows it to be downloaded and modified for most applications, including commercial applications.

DeepSeek V3 is coding, Can handle text-based work and tasks such as translating and writing essays and emails.

According to DeepSeek's internal benchmark testing, DeepSeek V3 downloadable, It improves both “openly” available models and “closed” AI models accessible only through an API. In a subset of coding competitions held at Codeforces, a platform for programming competitions, DeepSeek outperforms other models, including Meta's. 3.1 Call 405B.OpenAI's GPT-4oand Alibaba's Qwen 2.5 72B.

DeepSeek V3 also crushed competition from Aider Polyglot, a test designed to measure whether a model can successfully write new code that integrates into existing code.

DeepSeek V3 is trained on a data set of 14.8 trillion tokens, DeepSeek claims. in data science; Tokens are used to represent raw data — 1 million tokens equals about 750,000 words.

It's not just a great exercise. DeepSeek V3 is huge: 685 billion frames. (Parameters are internal variables models use to make predictions or decisions.) This is around 1.6 times the size of Llama 3.1 405B, which has 405 billion parameters.

Parameter is often (but not always) related to skill. Models with more parameters outperform models with fewer parameters. But larger models also require better hardware to run. The optimized version of DeepSeek V3 will require an advanced GPU bank to answer queries at reasonable speed.

It's not the most realistic model, though. DeepSeek V3 is a success in some respects. In about two months, DeepSeek was able to train the model using a data center of Nvidia H800 GPUs — most recently GPUs from Chinese companies. is restricted. Purchase from the US Department of Commerce. The company says it spent just $5.5 million to train DeepSeek V3, which is part of OpenAI's GPT-4-like models.

The downside is that the model's political views are slightly censored. For example, Ask DeepSeek V3 about Tiananmen Square; There will be no answer.

DeepSeek is a Chinese company; standards China's internet regulator has included “core socialist values” to ensure its models respond. A lot. Chinese AI systems Decline. To respond to topics that may raise the ire of regulators, such as speculation Xi Qifan A military group.

DeepSeek recently unveiled a solution, the DeepSeek-R1. OpenAI's o1 “reasoning” modelIt is a curious organization. It is backed by High-Flyer Capital Management, a Chinese quant hedge fund that uses AI to inform its trading decisions.

DeepSeek's models include ByteDance; It has forced competitors like Baidu and Alibaba to lower user fees for some of their models, while making others completely free.

High-Flyer builds its own server clusters for model training; One of these. Received news. It has 10,000 Nvidia A100 GPUs and costs 1 billion yen (~$138 million). Founded by Liang Wenfeng, a computer science graduate, High-Flyer aims to achieve “intelligent” AI through his organization DeepSeek.

in one Interview Earlier this year, Liang described open source as a “cultural act” and closed source AI as a “temporary” moat of OpenAI. “Even OpenAI's closed-source approach hasn't stopped others from catching on,” he noted.

indeed.





Source link

Leave a Reply

Your email address will not be published. Required fields are marked *