Why Large ML Models Struggle At Maths?

In 1960, Nobel Laureate and American physicist Eugene Wigner wrote about the ‘unreasonable effectiveness of mathematics in natural sciences’. Mathematics is called the language of nature for a reason. That’s why the ‘Is math invented or discovered?’ debate never gets old. Mathematics exerts its influence on literally every field.

Mathematics is also the building block of machine learning models. ML practitioners use mathematics to analyse a problem, pick out better heuristics, and club both to generate an answer. Despite the critical role mathematics plays in machine learning, even state-of-art models struggle at maths.

new study by the researchers at the University of California, Berkeley, have now introduced the MATH dataset. The team said the dataset provides a detailed assessment of a model’s mathematical ability across difficulties and subjects.

What Is MATH Dataset?

The MATH dataset consists of 12,500 problems taken from various high school mathematics competitions. The dataset measures the problem-solving ability of large and general-purpose language models. A machine learning model generates a sequence for a given problem from the MATH dataset and encodes the final answer.

MATH problems are labelled from 1 to 5, depending on the difficulty level and span across seven subjects, including geometry, number theory, algebra calculus, statistics, and linear algebra. For problems of geometry, diagrams can be specified with the Asymptote language.

Since step-by-step solutions also accompany the problems, language models can learn to answer questions they haven’t been exposed to before. The step-by-step approach allows models to perform intermediate computations instead of giving the final answer immediately.

Recognising the need to train the model on maths fundamentals before exposing to MATH that cover advanced problem-solving techniques, the team also released the Auxiliary Mathematics Problems and Solutions (AMPS). The ‘pretraining corpus’ has over 100,000 problems from Khan Academy with solutions and 5 million problems, based on 100 hand-designed modules, generated using Mathematica scripts.

Results

When the MATH dataset was tested for large language models, including GPT-3, the accuracies were found to be abysmally low, ranging from 2.9 percent to 6.9 percent. However, on the flip side, the models achieved up to 15 percent accuracy on the easiest level. When evaluated on humans, a PhD student with no specialisation in Mathematics attained 40 percent, while a three-time Olympiad gold medalist scored 90 percent.

Further, having the models generate a step-by-step solution before producing the final answer reduced accuracy. This was because, while many of these steps were related to the question, they were not logical.

The researchers found simply increasing the amount of training time, and the parameters proved extremely costly, although they did improve performance in a few cases. The researchers have open-sourced both MATH and AMPS to encourage and facilitate further research in this direction.

Predecessors

OpenAI recently introduced GPT-f, an automated prover and proof assistant for the Metamath formalisation language. Metamath is a language that expresses theorems in abstract mathematics along with proofs that a computer program can validate.

Last year, Facebook built an AI system that can solve complex mathematical problems using symbolic reasoning. The team gave a system to represent mathematical expressions as a language and then treating the solutions as a translation problem for sequence-to-sequence neural networks.

Wrapping Up

While most other text-based tasks are already nearly solved by enormous Transformers, MATH is notably different. We showed that accuracy is slowly increasing and, if trends continue, the community will need to discover conceptual and algorithmic breakthroughs to attain strong performance on MATH, the researchers stated.

This article has been published from the source link without modifications to the text. Only the headline has been changed.

Source link