This artificial problem seeks artificial solutions.
massive shortage
As AI companies continue to build bigger and better models, they run into a common problem: Soon, the Internet won’t be enough to provide all the data they need.
as wall street journal As the internet becomes too small, some companies are reportedly looking to alternative sources of data training, with options ranging from publicly available video transcripts to even artificial intelligence-generated “synthetic data.”
While some companies, such as Dataology, founded by former Meta and Google DeepMind researcher Ari Morcos, are working on how to train larger, smarter models with less data and resources, most of the big companies are working on novel and controversial Method data training means.
For example, OpenAI based wall street journalSources discussed how GPT-5 was trained on transcriptions of public YouTube videos, although the company’s own CTO Mira Murati is still struggling to answer questions about whether its Sora video generator was trained using YouTube data.
Don’t panic
Meanwhile, in recent months, researchers have discovered that training AI models on AI-generated data will be a digital form of “inbreeding” This will ultimately lead to “model collapse” or “Habsburg AI”.
Some companies, like OpenAI and Anthropic (OpenAI launched in 2021), are working to build artificial intelligence that is safer and more ethical than their former employers, and they are looking to prevent this from happening by creating synthetic data that is said to be higher quality ——Of course, don’t forcefully reveal what secrets this will bring.
In fact, humans admit that when Announces its Claude 3 LLM The model is trained on “data we generate internally,” he said in an interview wall street journalJared Kaplan, the company’s chief scientist, said he thinks synthetic data also has good use cases.
While concerns about AI running out of data appear to have been plaguing researchers for some time, researcher Pablo Villalobos told the newspaper that while his company Epoch expects AI to The available training data will be exhausted within the year, but there is no reason to panic yet.
“The biggest uncertainty,” Villalobos said, “is what breakthrough you will see.”
Then again, there’s another obvious solution to this artificial problem: AI companies could stop trying to create bigger and better models because, in addition to being short of training data, they use massive amounts of electricity and expensive computing chips , need to mine rare earth minerals.
More information about artificial intelligence training: Microsoft and OpenAI reportedly spending $100 billion to build secret supercomputer to train advanced artificial intelligence