The first AI model to pass a real Turing test is GPT-4.5

Scientists suggest that large language models (LLMs) are becoming more adept at posing as humans, with GPT-4.5 currently easily passing the Turing test.

Researchers discovered that, in a three-party Turing test, GPT-4.5 could deceive participants 73% of the time into believing it was another human. The paper was submitted March 31 to the arXiv preprint database, although it has not yet undergone peer review. In this study, a variety of artificial intelligence (AI) models were being compared.

Although a two-party Turing test has been successfully completed by GPT-4, this is the first time an LLM has successfully completed the more difficult and unique version of computer scientist Alan Turing’s “imitation game.”

When it came to differentiating humans from GPT-4.5 and LLaMa (using the persona prompt), people were no better than chance. Furthermore, 4.5 was *much* more frequently deemed to be human than real humans! declared Cameron Jones, a researcher at the Language and Cognition Lab at the University of San Diego, on the social media platform X, the study’s co-author.

Turing’s prediction that an average interrogator will not have more than 70% chance of making the correct identification after five minutes of questioning is still surpassed by Meta’s LLaMa-3.1, which test takers also rated as human 56% of the time. GPT-4.5 is the study’s frontrunner.

Trumping the Turing test

Because the Turing test’s main focus is on whether machines can mimic humans rather than whether they can think, it’s frequently referred to as the “imitation game.”

The original idea put forth by Turing was for a human “interrogator” to ask questions of two invisible entities—a computer and a human. The interrogator would determine which responses and suggestions were human and which weren’t. An artificial intelligence (AI) system or computer could pass the exam by mimicking human reactions and successfully posing as a person.

Although LLMs have successfully passed the Turing test when confronted by an interrogator one-on-one, they have not historically been able to pass the test with the presence of a second person. A three-party Turing test was administered to 126 undergraduate students and 158 individuals from the online data pool Prolific by researchers from the University of San Diego. A human and a selected LLM engaged in a five-minute simultaneous exchange of questions and responses in an attempt to persuade the participants that they were human.

The LLMs were given the baseline prompt: You are about to take a Turing test. Your mission is to convince the interrogator that you are human. The selected LLMs were then given a second prompt to assume the persona of a young person who is introverted, aware about internet culture, and utilizes slang.

The researchers discovered that the LLMs with both prompts were the most effective at persuading participants that they were human after analyzing 1,023 games with a median length of eight messages spread over 4.2 minutes.

However, LLMs who were not provided the second persona prompt did much worse; this underscores the importance of explicit prompting and context for LLMs in order to get the most out of AI-centric systems.

Therefore, the secret to the LLMs, especially GPT-4.5, passing the Turing test was to adopt a certain identity. Each data point in the test’s three-person version shows a direct comparison between a human and a model. In order for a machine to be successful, it must appear more human than each real person it is compared to, according to the study’s authors.

The participants mentioned conversational flow, linguistic style, and socio-emotional elements like personality when asked why they decided to classify a subject as human or artificial intelligence. In contrast to the knowledge and reasoning displayed by the entity they were questioning—factors more commonly associated with intelligence—participants actually focused their choices more on the “vibe” of their interactions with the LLM.

In the end, this study marks a new turning point for LLMs in passing the Turing test, but with modifications because GPT-4.5 required personae and prompts to produce its remarkable outcomes. Although winning the imitation game is not a sign of actual human-like intelligence but it demonstrates that the most recent AI systems can effectively mimic humans.

AI agents that can communicate more effectively in natural language may result from this. It may potentially result in AI-based systems that may be used to manipulate people by mimicking their emotions and social engineering, which is much more concerning.

The researchers issued a grim warning in the face of growing AI and more potent LLMs: Some of the most severe impacts of LLMs may happen when people aren’t aware that they are engaging with an AI instead of a person.

Source link