AI is Outperforming Humans in Predicting the Future

Every three months, contestants in the Metaculus Forecasting Cup attempt to make future predictions in hopes of winning a prize pot of around $5,000. The forecasting website Metaculus asks geopolitically significant questions like “Will Thailand have a military takeover before September 2025?” and “Will Israel attack the Iranian military once more before September 2025?”

Weeks to months ahead of time, forecasters provide estimates of the likelihood that the events will occur—a more informative bet than a simple “yes” or “no”—and they frequently do so with surprising precision. Metaculus users estimated that Roe v. Wade would be reversed about two months before it really occurred, and they accurately anticipated the date of the Russian invasion of Ukraine two weeks in advance.

Nevertheless, even the analysts were taken aback by one of the top 10 finishers in the Summer Cup, whose winners were revealed on Wednesday: an AI. “It’s truly quite astounding,” says Toby Shevlane, CEO of Mantic, the newly revealed UK firm that developed the AI. Participants in the June competition expected that the top bot would score 40% more than the average of the top human performance. Mantic, on the other hand, exceeded 80%.

“Forecasting—it’s everywhere, right?” asks Nathan Manzotti, who has experience with AI and data analytics for the General Services Administration, the Department of Defense, and around six other U.S. government organizations. “If you choose a government agency, they most likely have some sort of forecasting going on.”

Anthony Vassalo, co-director of RAND’s Forecasting Initiative, a US government research tank, says forecasters assist organizations in anticipating the future. It also enables them to alter it. According to Vassalo, predicting geopolitical events weeks or months ahead of time helps “avoid surprise” and “assist decision makers in being able to make decisions.” In order to forecast how a hypothetical policy intervention is likely to alter future outcomes, forecasters update their projections in light of legislatively established policies. According to Vassalo, forecasters can assist decision makers in “changing the scenario they’re in” if they are headed in an unpleasant direction.

However, it is notoriously difficult to predict major global concerns. For a single question, the most renowned forecasters may spend days and tens of thousands of dollars making their predictions. “It would take months for human forecasters to do an initial forecast on all those questions, let alone update them regularly for organizations like RAND, which track multiple topics across many geopolitical zones,” Vassalo added.

Weather forecasting, quant fund trading, and other fields with large amounts of well-structured data have long benefited from machine learning. “There are many intricate, interconnected factors that human judgment can be more accessible and affordable in predicting when it comes to geopolitics or technological advancements,” says Deger Turan, CEO of Metaculus.

This human judgment can be replicated by large language models, which use the same jumbled data as human forecasters. Additionally, they are developing in a manner similar to that of people: by predicting a wide range of issues, monitoring their results, and revising their forecasting techniques accordingly—on a scale far greater than that of humans.

Since that’s how people learn, our key finding was that forecasting the future is often a provable challenge. says Ben Turtel, CEO of LightningRod, a company that creates forecasting AIs that have achieved competitive results in Metaculus AI competitions. The organization used 100,000 forecasting questions to train a new model.

The rankings reflect the training that AIs receive. In June, Metaculus’ top-ranked bot, built on OpenAI’s o1 reasoning model, finished 25th in the cup. Mantic is now eighth out of 549 entrants, marking the first time a bot has been in the top ten in the competition series.

According to Ben Wilson, an engineer at Metaculus who compares AIs and humans on predicting issues, the results should be interpreted with caution. The contest has a very small sample size of 60 questions. Furthermore, the majority of the 600 participants are novices, with some predicting only a few questions during the competition, resulting in a poor score.

Finally, the machines hold an unfair edge. Participants receive points not only for accuracy but also for “coverage”—how early they make predictions, how many items they guess on, and how frequently they update their estimations. Even if an AI is less accurate than human rivals, it may nevertheless perform well in the rankings by continually revising its estimations in reaction to current news, something humans cannot do.

The unfair advantage of AIs, in Vassalo’s opinion, resolves his key unresolved issue: obtaining accurate forecasts for every topic he requires predictions for. “I don’t really need it to reach the level of a superforecaster,” he adds, referring to the title bestowed upon the most accomplished forecasters.

This is more difficult than it may seem because one of the most reliable predictions on the site is the Metaculus Community prediction, which is a compilation of all users’ predictions for each topic. The knowledge of the audience is such that if it were a person, it would be ranked fourth on the site. In the Quarterly Cup, Mantic trailed the Community Prediction by five positions.

A competent AI forecaster could track hundreds of queries at the same time, allowing Vassalo to deploy top human forecasters just against those that the AI deems important enough to investigate further.

Manzotti believes that forecasting, or predictive analytics, is mostly used to help decision-making. Many leaders may disregard statistics if they have a gut sense in another way. That is an issue that AI cannot address.

Source link