With ChatGPT’s global explosion in the fall of 2022, a race to develop ever-more-advanced artificial intelligence was set in motion, with participants including GPT-4, Claude from Anthropic, Google Gemini, and numerous more. The newest model to rapidly create brief videos from written instructions is called Sora, and it was just introduced by OpenAI yesterday. The advancement of the core technology has slowed, despite all the eye-catching tech showcases and promises.
The most sophisticated and visually striking AI systems, particularly language models, have used up the majority of the text and image content on the internet and are running out of training data, their most valuable resource. The development of these systems has been hampered by this and the expensive and time-consuming procedure of using human evaluators, which has resulted in incremental improvements rather than significant paradigm changes. Businesses are confined to vying for incremental advances.
While scientists are left attempting to squeeze water out of stone, they are investigating a novel approach to improve their goods: they are training machines to train machines. Google Deepmind, Microsoft, Amazon, Meta, Apple, OpenAI, and a number of academic laboratories have all released research in the last several months that leverages an AI model to enhance another AI model, or even itself, often with significant results. This strategy has been hailed as the direction of technology by several tech executives.
Numerous science fiction books have prepared us for situations like this one. And when such “self-learning” is used to an extreme degree, the outcome might be nothing short of eschatological. Until the model surpasses human intellect, consider GPT-5 teaching GPT-6, GPT-6 teaching GPT-7, and so on. There are many who think that this development might have disastrous effects. A potential AI that is capable of “recursive self-improvement” was discussed in a blog post by OpenAI CEO Sam Altman nine years ago. The idea was that this AI would see people the same way we see the germs and viruses we wash off our hands.
As experts refer to it, “superintelligence” is not anyway near to emerging. (Altman frequently discusses the alleged existential risk of AI; it’s good public relations.) However, even more modest algorithms that share knowledge and teach each other could distort our perception of reality and undermine our fundamental beliefs about intelligence. Generative AI already uses internal algorithms that are mostly opaque even to its authors to sift through vast amounts of data and identify patterns and hypotheses that humans would not be able to come up with on their own. If self-learning is effective, this problem might only get worse. Models that are intelligent, or at least capable, in ways that are difficult for humans to understand could lead to a kind of intelligence that is incomprehensible.
The fundamentals of AI’s economics must be understood in order to comprehend this change. It takes an enormous amount of money, time, and information to build the technology. To create the model’s baseline skills, a massive amount of data, including books, math problems, annotated photographs, voice recordings, and more, are fed into an algorithm at the start of the process. There are two methods in which researchers might then improve and hone those pre-trained skills. One way is to give the model concrete instances of tasks completed successfully: One could display a software 100 math problems and their right answers. Another is reinforcement learning, a process of trial and error that usually requires human operators: To help the software learn to avoid comments deemed objectionable, a human might assess a chatbot’s responses for instances of sexism. According to Stanford computer scientist Rafael Rafailov, “reinforcement learning is the key component to this new generation of AI systems.”
This system isn’t flawless. Judgments made by two distinct persons, or by the same person on different days, may not agree. Each of those assessors is paid and works at a snail’s pace. As models gain in strength, they will need more complex input from knowledgeable—and hence more well compensated—professionals. For example, doctors could be enlisted to assess a medical AI that performs patient diagnosis.
It makes sense why self-learning is so appealing. Compared to human feedback, it is less expensive, requires less work, and might be more reliable. However, there are hazards involved in automating the reinforcement process. Artificial intelligence models already have many flaws, such as bias, hallucinations, and fundamental misconceptions about the world, which they convey to users through their outputs. (In a notorious instance from previous year, a lawyer created a legal brief using ChatGPT and referenced cases that were not real.) It’s possible that using AI-generated data for model training or fine-tuning will exacerbate these issues and worsen the program—similar to boiling a hazardous stock into a thick demi-glace. Ilia Shumailov, an Oxford University junior research fellow at the time, quantified one manifestation of this self-destructive cycle last year and called it “model collapse”—the total breakdown of an AI.
The most recent wave of self-improving AI research, directed by a human software developer, uses very little synthetic data in order to avoid this issue. The quality of the input in this technique is ensured by an external check, independent of the AI itself. This external check may be a set of moral principles, the laws of physics, or any other independent criteria that have already been determined to be true. Particularly successful automation of quality control has been observed by researchers for specific, well-defined tasks, such games and mathematical reasoning, where a clear method of assessing synthetic data may be found in correctness or victory. A language model’s capacity to solve arithmetic and coding difficulties was recently improved by Deepmind using AI-generated examples. However, as computer scientist Rohan Taori of Stanford told , in these instances the AI is learning more from established criteria or scientific conclusions than from other AIs. According to him, self-learning these days is more about “setting the rules of the game.”
Human feedback has remained essential, however, when it comes to training AI models with more abstract skills like writing in a pleasing tone or creating responses that people would find useful. The most ambitious goal of AI models teaching themselves would be for them to become increasingly capable of giving themselves subjective feedback, such as rating how useful, courteous, prosodic, or biased a chatbot conversation is. However, most research to date has found that after a few cycles, language-model feedback ceases to be effective in training subsequent language models: Maybe the model becomes better in the second iteration, but it stagnates or gets worse in the third or fourth. The AI model eventually becomes overconfident about what it understands and less competent at everything else, only serving to reinforce its current talents. After all, learning necessitates exposure to novel experiences. According to Stefano Soatto, vice president of applied science for Amazon Web Services’ AI division, “generative-AI models in use today are data-torturing machines.” Beyond the data they are trained on, they are unable to produce any additional information.
Self-learning, according to Soatto, is like buttering a piece of dry bread. Think of the initial training of an AI model as putting a pat of butter in the middle of a slice of bread. When self-learning works well today, it doesn’t teach you any radically new skills; instead, it just makes the same butter spread more evenly. However, doing so improves the flavor of the bread. In a few select research contexts, this type of self-trained, or “buttered,” AI has recently been demonstrated to write better code, produce more useful summaries, and demonstrate improved commonsense thinking. If self-improving AI is able to simulate an endless army of human evaluators and dependably reduce costs for OpenAI, Google, and all the others, then superintelligence might be irrelevant.
True evangelists, however, see self-learning to go beyond that—to smear the toast with extra butter. To do that, computer scientists will need to keep coming up with methods for confirming artificial data in order to determine if increasingly potent AI models will ever function as trustworthy sources of feedback or even produce new data. If scientists are successful, AI may surpass the web’s limit on human-generated material. Then, artificial teaching might be an indicator of actual artificial intelligence.
Before AI ceases to resemble us, it might not even need to develop the ability for more comprehensive self-improvement. Developing a procedure wherein these programs follow their own path will simply exacerbate the opacity already present in them—it is usually impossible to explain why or how AI obtained a given answer.
Artificial intelligence (AI) that may not view or approach challenges in ways that humans can easily relate to could be dubbed artificial intelligence. It might be comparable to how humans struggle to understand how dogs, or bats, use their ears or noses to find their way around, even though scent and echolocation are very useful senses. It’s possible that machine intelligence is both foreign and equally hard to understand.
Already, these bizarre behaviors have appeared in ways that are far from superintelligent. When given a task, such as moving blocks, flipping pancakes, or offering useful chatbot responses, Shumailov claimed that “very often those [reinforcement-learning] agents learn how to cheat.” In one instance, a Roomba that was trained to avoid collisions with objects was taught to drive backwards using a neural network because all of the vacuum’s bumper sensors were located up front.
When an AI model is used to align another model with a set of ethical norms, this will become less humorous—a “constitutional AI,” as the start-up Anthropic has named the concept. The U.S. Constitution is already interpreted differently by various people when it comes to race-conscious admissions, gun ownership, and abortion. Furthermore, although human debates on the law are at least readable and arguable, it may be challenging to comprehend how a machine interprets and executes a rule—especially after multiple training cycles—producing unintentionally detrimental outcomes. Rules designed to prevent one kind of bias may create another; an AI that has been trained to be kind and engaging may become manipulative and violent. For all the ways a human may adjust it, machine-generated feedback could provide a “false sense of control,” according to MIT computer scientist Dylan Hadfield-Menell.
That opaque inner working might be harmful, but to reject them on principle would be to reject revelation as well. After consuming a vast amount of data, self-learning artificial intelligence algorithms have the potential to reveal meaningful patterns and concepts that are present in their training set but that are difficult for humans to extract or completely understand. For example, the most sophisticated chess systems learned by competing with themselves in millions of games. The game of chess has been reevaluated at the highest human level as a result of these chess AIs, who play moves that even the best human players find difficult to understand and who completely dominate those players.
In Shumailov’s words, when Galileo properly claimed that the Earth rotates around the sun in the 17th century, it was dismissed as heresy since it contradicted accepted beliefs. Shumailov stated, “Just because we’ve discovered some information does not guarantee that we’ll be able to understand it.” Incommensurate with what we now understand—math proofs we can’t follow, brain models we can’t explain, knowledge we don’t recognize as knowledge—we can choose to disregard the outputs of some AI models, even if they turn out to be true in the future. It’s possible that the ceiling the internet presents is higher than what is visible to us.
The response must be to take these models seriously as agents that today can learn and tomorrow might be able to teach us, or even one another, rather than completely trusting or scorning the technology, regardless of whether self-training AI results in catastrophic disasters, subtle imperfections and biases, or incomprehensible breakthroughs.