When OpenAI unveiled GPT-4 last year, it achieved the last major advancement in artificial intelligence by dramatically expanding the size of its models. A model that can “reason” logically through a variety of challenging problems and is noticeably smarter than current AI without requiring a significant scale-up was unveiled by the company today, signaling a change in strategy.
OpenAI o1, the new model, is capable of solving problems that baffle other AI models, including GPT-4o, OpenAI’s most potent model to date. Instead of producing the solution all at once, as is typically the case with large language model, it solves the puzzle by effectively thinking aloud as a human would, and then comes up with the correct answer.
According to Mira Murati, chief technology officer of OpenAI, this is what they see as the new paradigm in these models. It is far more adept at handling extremely difficult reasoning problems.
According to OpenAI, the new model, code-named Strawberry, is a complement to GPT-4o rather than its replacement.
OpenAI is presently developing GPT-5, its next master model, which will be significantly larger than its predecessor, according to Murati. GPT-5 is probably going to incorporate the reasoning technology that was unveiled today, even though the company still thinks that scale will help extract new capabilities from AI. Murati claims that there are two paradigms. The paradigms of scaling and this new one. They expect that they will bring them together.
Large neural networks that are fed enormous amounts of training data are usually used by LLMs to conceive their answers. Though they typically struggle with surprisingly simple problems like basic math problems that require reasoning, they can display remarkable linguistic and logical abilities.
OpenAI o1, according to Murati, employs reinforcement learning to enhance its reasoning process. Reinforcement learning entails providing a model with positive feedback when it answers correctly and negative feedback when it doesn’t. According to her, the model refines the methods it employs to arrive at the solution and sharpens its thinking. Computers can now play games with superhuman skill and perform useful tasks like designing computer chips thanks to reinforcement learning. Additionally, the technique is essential to transforming an LLM into a well-mannered and functional chatbot.
Vice President of Research at OpenAI Mark Chen gave a demonstration of the new model, utilizing it to solve multiple problems that its previous model, GPT-4o, was unable to solve. They included the following challenging mathematical riddle and an advanced chemistry question: When a princess reaches twice the age of the prince, which happens when the princess’s age is half the sum of their current ages, then the princess is as old as the prince will be. What are the prince and princess’s ages? The prince is thirty and the princess is forty, which is the right response.
According to Chen, the [new] model is learning to think for itself instead of kind of trying to mimic human thought processes like a traditional LLM would.
OpenAI claims that on several problem sets, including those pertaining to coding, math, physics, biology, and chemistry, its new model outperforms previous models significantly. In the math portion of the American Invitational Mathematics Examination (AIME), which is given to students, the company reports that GPT-4o answered correctly on average 12% of the questions, compared to o1’s 83%.
In contrast to GPT-4o, the new model is slower and may not always outperform it, according to OpenAI. This is partly because it is not multimodal, which means it cannot parse audio or images, and it is unable to search the web.
Enhancing LLMs’ reasoning skills has been a popular topic in research circles for a while. In fact, competitors are working on related research projects. AlphaProof, a project by Google that uses reinforcement learning and language models to solve challenging math problems, was unveiled in July.
By examining the right answers, AlphaProof was able to develop reasoning skills in solving math problems. Expanding this type of learning is hampered by the fact that there are incorrect responses for every situation a model may face. Chen claims that OpenAI has been successful in creating a far more universal reasoning system. Chen states, “I do believe we have made some progress there; I think it is part of our edge.” “It actually does a pretty decent job of reasoning across all domains.”
A “carefully prompted language model and handcrafted data” may hold the key to more generalized training, according to Stanford professor Noah Goodman, whose research on enhancing LLMs’ reasoning skills has been published. He continues, saying it would be a “nice advance” to be able to reliably trade off increased accuracy for faster results.
Assistant professor Yoon Kim of MIT says that although LLMs are capable of step-by-step reasoning, there may be important distinctions between them and human intelligence in the way that they solve problems that are still not fully understood. As the technology is used more frequently, this might become important. According to him, these are systems capable of making decisions that have a wide-ranging impact on numerous individuals. “The more important query is: Do we really need to have faith in the way a computational model makes its decisions?”
The method that OpenAI unveiled today might also contribute to ensuring that AI models operate correctly. According to Murati, the new model has demonstrated an improved ability to prevent unpleasant or possibly harmful output by considering the consequences of its decisions. Thinking about teaching kids, you’ll find that once they can justify their actions, they learn to conform to certain norms, behaviors, and values much more easily, according to her.
Emeritus professor of artificial intelligence at the University of Washington Oren Etzioni says it’s critical to give LLMs the ability to use tools, solve complicated problems, and solve multi-step problems. “Pure scale up will not deliver this,” he continues. But according to Etzioni, there are still more difficulties to come. There would still be the problem of hallucinations and factuality even if reasoning were solved.
According to Chen of OpenAI, the company’s new reasoning method demonstrates that improving AI doesn’t have to come at the expense of absurdly high compute power requirements. He claims that one of the exciting things about the paradigm is that we think it will enable us to ship intelligence more cheaply, and he believes that this is actually the company’s main goal.