Elon Musk agrees that AI training data is exhausted.


Elon Musk agrees with other AI experts that there is little real-world data left on which to train AI models.

“Now we've basically exhausted the body of human knowledge. In AI Training,” Musk said during a live chat with Stagwell President Mark Penn that aired on X late Wednesday. “It basically happened last year.”

Musk, who owns AI company xAI, echoed the sentiments of former OpenAI scientist Ilya Sutskever. touch During a speech at NeurIPS in December, Machine Learning Conference. Sutskever, who says the field of AI is entering what he calls “peak data,” predicted that the lack of training data would take it far from the way models are developed today.

In fact, Musk suggested that the way forward is synthetic data – which AI models themselves generate. “The only way to supplement (real-world data) is with synthetic data, which is (training data) generated by AI,” he said. “With combined data… (AI) will sort itself out and go through this self-learning process.”

Microsoft, Meta, Other companies, including tech giants such as OpenAI and Anthropic, are already using synthetic data to train leading AI models. Gartner Estimates By 2024, 60% of data used for AI and analytics projects will be co-generated.

Microsoft's Phil-4which was open-sourced early Wednesday; The data was trained on a combination of real-world data. The same goes for Google. Gemma models. Anthropic used some of the combined data to build one of its most powerful systems. Claude 3.5 Sonnet. Meta has been well-fixed in its most recent update. Llama A series of models Using data generated by AI.

Integrated data training has other advantages such as low cost. AI startup Writer uses nearly synthetic resources to develop its Palmyra X 004 model, costing just $700,000 to develop — Compare About $4.6 million for an OpenAI model of the same size.

But there are also weaknesses. Some research It has been suggested that composite data can cause model collapse where a model's outputs become less “creative” and more biased, ultimately severely compromising its performance. Because models create composite data; If there are biases and limitations in the data used to train these models. Their exits will be similarly contaminated.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *