New algorithm masters university math course questions

Machine learning models have consistently been baffled with topics such as multivariable calculus, differential equations, and linear algebra — which many MIT students can easily master. The best models can only answer elementary or high school-level math questions, and they are not always successful in finding the right answers.

Currently, a multidisciplinary team of researchers from MIT and outside, headed by Iddo Drori – an EECS (Electrical Engineering and Computer Science) lecturer at MIT, has made use of a neural network for solving university-level math problems in a few seconds at a human level.

The model can automatically explain the solutions and also generates new university math subject problems. When these machine-generated questions were revealed by the researchers to university students, the students were unable to distinguish whether the questions were human-generated or algorithm generated.

This work can be helpful in streamlining content generation for courses, especially in large residential courses and MOOCs (massive open online courses) which comprise thousands of students. The system can also serve as an automated tutor and assist the students by displaying the steps required to solve undergraduate math problems.

We believe this will enhance higher education, states Drori – the lead author of the work, and also an adjunct associate professor in Columbia University’s Department of Computer Science. He will be joining the faculty at Boston University this summer. It will assist the students in improving, teachers in creating new content, and also help in raising the difficulty level in a few courses.

It also enables us to create a graph of questions and courses, thereby helping us in understanding the relationship between courses and their prerequisites based on data rather than just historical consideration.

Students, researchers, and faculty from MIT, Columbia University, Harvard University, and the University of Waterloo collaborated on the project. Gilbert Strang, an MIT mathematics professor, is the senior author. This week, the findings were reported in the Proceedings of the National Academy of Sciences.

“Aha!” moment

For almost two years, Drori as well as his students and colleagues have been working on this project. They discovered that models pretrained with text could only achieve 8% accuracy on high school math problems, whereas those trained with graph neural networks could master machine learning course questions but would need a week to train.

Then Drori experienced his “aha” or what one can call the “eureka” moment. He decided to attempt and choose questions from MIT’s undergraduate math courses and also one from Columbia University that the model had never recognized previously, converted them into programming tasks, and applied methods called program synthesis and few-shot learning.

Conversion of a question into a programming task might be as easy as rephrasing the question: Find what is the distance between two points as: write a program to find the difference between two points, or giving several question-program pairs as examples.

Before the programming tasks were fed to a neural network, the researchers included a new step that allowed it to surpass their previous attempts vastly.

Previously, the current researchers and everyone’s approach to this problem was using a neural network like GPT-3, which was pretrained only with text implying that it was trained with millions of examples of text to learn the natural language’s pattern. However, in this approach, a neural network pretrained on the text and also “fine-tuned” on code was used. OpenAI created this network, dubbed Codex. Fine-tuning is another pretraining step that can enhance a machine-learning model’s performance.

Millions of code examples from online archives were shown to the pretrained model. Since the training data of this model comprised both millions of natural language words and millions of lines of code, it can understand the relationships between pieces of text and code.

Drori explains that a computational graph or tree can be used to solve numerous math problems, but it will be a challenge to convert a problem in text format into this kind of representation. Since this model can understand the relationships between text and code, it can convert a text into code, when provided with just some question-code examples, and then execute the code to answer the problem.

It’s difficult for a machine-learning model to arrive at an answer when you just ask a question in text, he says, even if the answer is in the text. This work fills the gap in the use of code and program synthesis.

This is the first study to solve undergraduate math problems, and it improves accuracy from 8% to over 80%, according to Drori.

Adding context

According to Drori, converting math questions into programming tasks is not an easy task. For the neural network to process the question accurately, a few problems need the addition of context by researchers. A student can easily understand this context whereas a neural network cannot as it does not have the background knowledge unless specified by the researchers.

For example, they may need to clarify that the term “network” in the text of a question refers to “neural networks” instead of “communications networks.” Alternatively, they may be required to instruct the model on which programming package to use. They may also be required to provide definitions; for example, in a question about poker hands, they may be required to inform the model that each deck contains 52 cards.

They feed these programming tasks, along with the context and examples, to the pretrained and fine-tuned neural network, which outputs a program that usually produces the right answer. It answered more than 80% of the questions correctly.

The researchers also used their model to generate questions by feeding the neural network a series of math problems on a specific topic and then asking it to generate a new one.

It surprised us in some areas. For instance, there were questions regarding quantum detection of horizontal and vertical lines, and it sparked new ones about quantum detection of diagonal lines. So it’s not just creating new questions by replacing values and variables with existing ones, Drori explains.

Human vs. machine-generated questions

The researchers put the machine-generated questions to the test by displaying them to university students. The researchers randomly assigned 10 questions from each undergraduate math course to students; five were created by humans and five were generated by machines.

Students were unable to distinguish whether an algorithm or a human-generated these machine-generated questions and they graded both human-generated and machine-generated questions similarly for difficulty and course appropriateness.

Drori emphasizes that this work is not meant to replace human professors.

Automation is currently at 80 percent, but it will never be 100 percent accurate. Whenever something is solved, somebody will come up with a more complex question. However, this work has paved the way for people to begin solving many complex questions with machine learning. He believes it will have a significant impact on higher education.

The team is encouraged by the success of their approach and has expanded the work to handle mathematical proofs, but there are some limitations that they intend to address. Currently, the model cannot answer questions with a visual component and cannot solve computationally intractable problems due to computational complexity.

Along with overcoming these obstacles, they are working to expand the model to hundreds of courses. They will generate more data from those hundreds of courses, which will help to improve automation and provide insights into course design and curricula.