By Milcah Tanimu
Tesla CEO Elon Musk revealed that all available human data for training artificial intelligence (AI), including books and internet content, was exhausted in 2024. This statement aligns with observations from other industry experts, including former OpenAI chief scientist Ilya Sutskever, who had previously noted that the AI industry reached a “peak data” point.
Musk, who also leads the AI company xAI, explained during a livestream with Stagwell chairman Mark Penn on X that the next phase for AI development involves synthetic data. This data, generated by AI itself, is the only option left for further AI training after exhausting human-sourced information.
He pointed out that AI has progressed in both hardware and software, but the shortage of human-generated data now requires a shift to synthetic data. Musk elaborated, stating, “We’ve literally run out of the entire internet, all books ever written, and all interesting videos. We’ve now exhausted the cumulative sum of human knowledge in AI training.”
Synthetic Data Challenges
Despite its promise, synthetic data presents challenges, especially in verifying the accuracy of AI’s outputs. Musk expressed concerns over “hallucinated” answers, where AI may generate false or misleading information, making it difficult to determine the correctness of its responses.
Researchers have also warned that reliance on synthetic data could lead to “model collapse,” where AI models become less creative and more biased, limiting their functionality.
Industry Shift to Synthetic Data
Several tech giants are already using synthetic data to enhance their AI models. Microsoft, Meta, OpenAI, and Anthropic have incorporated AI-generated data into their training processes. According to Gartner, 60% of the data used in AI and analytics projects in 2024 will be synthetically generated.