At the International Mathematical Olympiad (IMO) this year, an AI from Google DeepMind won a silver medal, marking the first time an AI has placed on the podium.
For young mathematicians, the IMO is regarded as the most prestigious competition in the world. The mathematical skills that AI systems generally lack are necessary to correctly answer its test questions.
AlphaGeometry, an AI system that could respond to some IMO geometry problems just as well as humans, was demonstrated in January by Google DeepMind. But this wasn’t from a live competition, and it wasn’t able to respond to problems from other mathematical fields, such combinatorics, algebra, and number theory, which are required to receive an IMO medal.
Recently, AlphaProof, a new artificial intelligence system from Google DeepMind, and AlphaGeometry, an enhanced version of the latter, have been made available for use in solving a greater number of geometry issues.
The team received a score of 28 out of 42 points when they tested both systems simultaneously on this year’s IMO questions, answering four of the six questions correctly. This was just one point shy of the gold medal requirement for this year, but still good enough to win a silver medal.
58 competitors took home gold at the competition held in Bath, UK, last week, while 123 others took home silver.
The majority of mathematical problems will someday be solved by AI more effectively than by humans, as we are all well aware, but according to IMO President Gregor Dolinar, the pace at which AI is developing is astounding. It is really impressive to have missed the gold medal at IMO 2024 by only one point a few days ago.
Timothy Gowers of the University of Cambridge, who assisted in grading AlphaProof’s responses, claimed during a press conference that the AI’s performance was unexpected and that it seemed to find “magic keys” to solve issues in a manner that was comparable to that of humans. In one or two cases, it was somewhat surprising to learn that the software had actually located these magic keys, as Gowers had assumed they would be a bit beyond its capabilities.
Like Google DeepMind’s earlier AIs that can defeat the greatest human chess and go players, AlphaProof operates in a similar manner. Reinforcement learning, a trial-and-error method, is the foundation of all these AIs; via repeated attempts, the system figures out how to solve an issue on its own. IMO-like problems are typically stated in English, but this method requires a huge variety of problems written in a language that the AI can comprehend and verify.
In order to circumvent this, Thomas Hubert of DeepMind and his associates translated these issues into a programming language called Lean so that the AI could learn how to solve them using Google’s Gemini AI, a language model similar to the one that drives ChatGPT.
Hubert stated at the press conference that initially, it might be able to handle the easiest difficulties. It would then be able to apply the lessons it learns from handling these easier problems to tackle increasingly difficult ones. It also generates its responses in Lean, allowing for instantaneous verification of their accuracy.
Although competitors can find answers in 4.5 hours for every three questions, AlphaProof takes up to three days to locate some answers, despite its remarkable performance. Combinatorics is the science of counting and organizing numbers, and it was unable to provide an answer to both of the issues. At Google DeepMind, Alex Davies states that researchers are still trying to figure out why this is happening, which should help them make improvements to the system.
Though Gowers noted that AlphaProof’s capacity to convert Lean proofs into English makes it simple to verify their accuracy, it is unclear how it arrives at its conclusions or whether it draws from the same mathematical intuitions as humans.
According to Geordie Williamson of the University of Sydney in Australia, the outcome is noteworthy and remarkable. Formal proofs have been the subject of numerous prior attempts at reinforcement learning, all of which ended in failure.
Although practicing mathematicians could find AlphaProof helpful in developing proofs, Yang-Hui He from the London Institute for Mathematical Sciences notes that the system is obviously unable to assist with problem identification, which takes up a significant amount of academics’ time.
Hubert stated that his team is hoping that by lowering the number of erroneous answers, AlphaProof would be able to assist in enhancing Google’s large language models, such as Gemini.
The trading company XTX Markets has given a $5 million prize, known as the AI Mathematical Olympiad, to an AI capable of winning a gold medal at the IMO, but AlphaProof is ineligible because it is not publicly available. They hope that DeepMind’s advances will motivate other teams to enter the AIMO Prize, and would certainly welcome a public entry from DeepMind itself,” adds Alex Gerko of XTX Markets.