The most significant unsolved problem in computer science is the query, “Does P = NP,” and when computer scientists get together at cocktail parties, this topic frequently comes up.
The debate over whether P equals NP, which was first proposed almost 50 years ago, is a profound reflection on what will ultimately be possible with computers. The subject, which has repercussions for industries like cryptography and quantum computing, has defied a clear response despite decades of intensive research. Generative AI is now contributing to that effort.
The lead author of a work titled “Large Language Model for Science: A Study on P vs. NP,” Qingxiu Dong, and colleagues programme OpenAI’s GPT-4 large language model using what they term a Socratic Method, which involves multiple turns of prompt-based conversation with GPT-4.
The approach taken by the team essentially entails spoon-feeding GPT-4 points from a previous article in an effort to elicit insightful responses.
Dong and his team note that GPT-4 presents evidence that P does not actually equal NP. Furthermore, they assert that the research demonstrates that large language models are capable of more than just spitting out enormous amounts of text; they can also “discover novel insights” that could result in “scientific discoveries,” a possibility they dub “LLMs for Science.”
It’s crucial to have a basic understanding of the P = NP problem in order to understand what the authors are doing.
The P vs NP problem, or “P = NP,” as it is frequently referred to, was independently developed in the 1970s by computer scientists Leonid Levin and Stephen Cook. It asks how simple it is for a computer to solve a certain problem. The letter P stands for problems that have been demonstrated to be practicable to solve, indicating that the time required to compute a solution is not out of the question, and whose solutions are also simple to verify, indicating that it is simple to determine whether the result is right.
In contrast, the initials NP stand for issues where the answer is likewise rather simple to verify, exactly like P, but where there isn’t a simple way to compute a solution. Sudoku is frequently used as an example of NP because, while any filled-in Sudoku game can be quite simply checked for accuracy, the difficulty of finding a solution increases exponentially with the size of the game grid.
The question is then, Does P = NP? examines the possibility that NP issues, which we believe to be difficult to solve but which we are aware are simple to check, may actually prove to be P problems, which can be both simply confirmed and solved.
If the answer is no, i.e., P doesn’t equal NP, then there are some problems that computers cannot solve, even with enormous computational budgets. This is known as an upper bound on computing. The difficulties would thus appear more difficult and unsolvable by computing, such as cracking some encryption.
Dong and his team use a recent trend of “reasoning” with large language models to address the P = NP problem. By simply including the phrase Let’s think step by step at the beginning of the prompt and providing an example answer, as demonstrated in the 2022 work of Takeshi Kojima and team at The University of Tokyo and Google Research, it is possible to enhance the performance of large language models on specific tasks. They discovered that phrase was sufficient to cause the language model to go through a “chain-of-thought” process.
It follows the same line of reasoning that Dong and his team are attempting to achieve with their Socratic Method. You are a wise philosopher,” “You are a mathematician skilled in probability theory,” and other statements like these are prefixed to each of the authors’ 97 prompt rounds to condition GPT-4. In other words, the authors play the now-familiar game of getting GPT-4 to play a role, or, “persona,” to stylize its text generation.
They use a technique known as “proof by contradiction” to get GPT-4 to demonstrate that P does not, in fact, equal NP by first assuming that it does with an example and then coming up with a means to make the example fail.
The intriguing aspect is that two of the study’s authors—Ke Xu and Guangyan Zhou—each independently published a paper this month in which they explicitly defend P = NP using standard, formal mathematical justifications. They come to the conclusion in that study that P does not equal NP.
Dong, Xu, Zhou, and the rest of the team are essentially rebuilding their formal maths paper by guiding GPT-4 through the language of their own thinking, one prompt at a time. Actually, out of the 73 pages in the document, 67 pages are a comprehensive printing of each of the 97 prompts plus the entirety of the GPT-4 response. Reconstructing a discussion feels like a massive effort in prompt engineering.
Because Xu and Zhou’s publication is so recent, it is difficult to say whether the results Dong and team obtained using GPT-4 genuinely demonstrate that P does not equal NP. Other than their own publication with Dong and team, the paper has not yet received any citations on the website Semantic Scholar, which compiles citations of papers.
The world hasn’t yet accepted their argument, thus.
The authors contend that their discussion in the prompts illustrates the potential for large language models to do more than only imitate human textual works, which they believe is more significant for individuals who like Gen AI.
According to the authors, this study “highlights the potential capability of GPT-4 to collaborate with humans in exploring exceptionally complex and expert-level problems.” They highlight paragraphs that they consider to be “the insightful parts” of what GPT-4 generates across the 67 pages of questions and answers.
A subject that requires further research is presumably how perceptive their replies are. Some researchers have discovered that large language models are especially poor in connecting citations and descriptions.
The paper’s margins, where Dong and his team have annotated the GPT-4 responses with their thoughts on the quality of the responses, contain an intriguing telltale detail.
Each of the previous GPT-4 responses has been used as background in the most recent prompt, the authors say in one of those parenthetical remarks, with the exception of those responses that they opted to edit to retain only the most pertinent information.
They note in the margin on page 7 that “we only include the most valuable solution in the conversation history if the model offers multiple solutions.” By concentrating on relevant information, this tactic helps GPT-4 be more productive and efficient overall.
To put it another way, the manner that the GPT-4 utilized the past history in what is referred to as its “context window,” all of the earlier rounds upon which it can draw, was quite helpfully curated. To lead GPT-4 through the course of a discussion, Dong and his team engaged in very selective quick engineering. That has implications for the technique known as “retrieval-augmented generation,” or “RAG,” which refers to the current interest in using old chat data as fresh input to a large language model.
One of the most important contributions of the entire exercise may be this: Whether or if it resolves P = NP, a new direction in prompt engineering may bring programmes closer to RAG and deepen chat sessions. If you think back only recently, chat conversations had a habit of being pointless and frequently veering off topic.
There’s something to be said for the fact that Dong and the team managed to maintain the machine on target during the entire 97 rounds.