close
close

ChatGPT passes the famous “Turing Test”

ChatGPT passes the famous “Turing Test”

  • Scientists claim ChatGPT-4 is the first AI to pass a two-player Turing test
  • In 54 percent of cases, the AI ​​was able to deceive a human interlocutor



Since it was first proposed in 1950, passing the “Turing Test” has been considered one of the highest goals of artificial intelligence.

But now researchers claim that ChatGPT is the first AI to pass this famous test of human intelligence.

The approach proposed by computer pioneer Alan Turing states that an AI should be considered truly intelligent if people cannot tell whether they are talking to a human or a machine.

Cognitive scientists at UC San Diego argue in a preprint that ChatGPT-4 can deceive human test subjects in more than half the cases.

However, the researchers believe that this may say more about the Turing test than about the intelligence of modern AI.

ChatGPT-4 has passed the famous “Turing Test”, which was developed to determine whether computers have human-like intelligence
Overview of the Turing Test: A human questioner (C) asks questions to an AI (A) and another human (B) and evaluates the answers. The questioner does not know who is who. If the AI ​​makes the questioner believe that its answers were generated by a human, it passes the test.

What is the Turing Test?

The Turing Test was introduced in 1950 by World War II codebreaker Alan Turing.

He predicted that computers could one day be programmed to have capabilities equal to human intelligence.

He proposed the test to determine whether a computer can think.

One person, called the interrogator, conducts a text-based conversation with another person and a computer – and has to figure out who is who.

If this is not possible, the computer has passed the test.

In 1950, World War II British codebreaker Alan Turing created what he believed would be the ultimate test of computer intelligence.

He imagined a human participant sitting in front of a screen and talking to either a human or a computer through a text-only interface.

According to Turing, if the computer could not be distinguished from a human on a wide range of possible topics, we would have to admit that it is just as intelligent as a human.

The researchers repeated this famous test and asked 500 participants to talk to four different agents: three AIs and one human.

The AIs were ChatGPT-4 and ChatGPT-3.5 – two models behind OpenAI’s popular chatbot – as well as a chat program from the 1960s called ELIZA.

Participants were placed in a chat room and told that they had to either convince the other person that they were human or figure out who they were talking to.

Participants were then allowed to chat for five minutes and were then asked whether they thought they were talking to a computer or a human, why they thought that, and how confident they felt about it.

Participants were placed in a chat room with a human or a computer and asked to guess who they were talking to.
The experiment was a replica of the experiment designed by Alan Turing (pictured) in the 1950s.
Turing Test: Can you tell the difference? One of these conversations is with a human and all three others are with AIs. Read them carefully and guess – the answers are in the box below.

While the outdated ELIZA only deceived participants in 22 percent of cases, ChatGPT-3.5 remained undetected in 50 percent of conversations.

The more advanced version, ChatGPT-4, performed even better: In 54 percent of cases, participants described it as human.

Click here to resize this module

This means that ChatGPT-4 can impersonate a human more often than chance would suggest.

And if that number seems low to you, it’s worth noting that participants correctly identified people in only 67 percent of conversations.

The researchers write that these results “provide the first robust empirical evidence that any artificial system passes a two-player interactive Turing test.”

It should be noted that this is a preprint, which means that the peer review process is currently ongoing, so the results must be viewed with some caution.

However, if the results are confirmed, this would be the first strong evidence that an AI has ever passed the Turing test as Alan Turing envisioned.

Nell Watson, an AI researcher at the Institute of Electrical and Electronics Engineers (IEEE), told Live Science: “Machines can confabulate and cobble together plausible justifications for things after the fact, just like humans do.”

“All of these elements mean that AI systems have human-like weaknesses and quirks. This makes them more human-like than previous approaches, which offered little more than a list of ready-made answers.”

People were correctly identified as humans in just over 60 percent of cases (blue bar), while ChatGPT-4 was able to deceive its interlocutors in 54 percent of cases

Turing Test – Answers

Chat A: ChatGPT-4

ChatB: Human

Chat C: ChatGPT-3.5

Chat D: ELIZA

Importantly, the poor performance of the ELIZA programme also underscores the importance of these results.

While it may seem strange to include a program from the 1960s in a test of cutting-edge technology, this model was included to test something called the “ELIZA effect.”

The ELIZA effect describes the idea that humans could attribute human-like properties to even very simple systems.

But the fact that people were fooled by ChatGPT and not by ELIZA suggests that this result is “non-trivial.”

The researchers also point out that the changing public perception of AI may have changed the expected results of the Turing test.

They write: “At first glance, humans’ low success rate might be surprising.

“If the test measures human likeness, shouldn’t the value be 100%?”

This is the first time an AI has passed the test invented by Alan Turing in 1950, according to the new study. The life of this early computer pioneer and the invention of the Turing test were dramatized in The Imitation Game, starring Benedict Cumberbatch (pictured).

Click here to resize this module

In 1950, this assumption would have made sense, since in a world without advanced AI, we would assume that anything that sounds human is human.

But the more public awareness and trust we have in AI, the more likely we are to mistakenly view humans as AI.

This could mean that the small gap between the success rate of humans and ChatGPT-4 is even more compelling evidence of computer intelligence.

In February of this year, researchers from Stanford found that ChatGPT was able to pass a version of the Turing Test, in which the AI ​​answered a widely used personality test.

Although these researchers found that ChatGPT-4’s results were indistinguishable from humans, this latest work marks one of the first times that AI has passed a robust, conversation-based Turing Test with two players.

However, the researchers also admit that there has been long-standing and justified criticism of the Turing test.

The researchers point out that “stylistic and socioemotional factors play a greater role in passing the Turing Test than traditional notions of intelligence.”

The researchers say that this does not necessarily mean that the AI ​​has become more intelligent, but only that it has become better at pretending to be human (symbol image).

As reasons for identifying their interlocutor as a robot, the interrogators cited style, personality and tone of voice significantly more often than anything related to intelligence.

One of the most successful strategies for identifying robots was to ask about human experiences. This strategy worked in 75 percent of cases.

This suggests that the Turing Test does not actually prove the intelligence of a system, but rather measures its ability to imitate or deceive humans.

The researchers say this provides, at best, “probabilistic” support for the claim that ChatGPT is intelligent.

Participants identified the AI ​​based on an assessment of their personality and personal information rather than information based on their intelligence.

Click here to resize this module

However, this does not mean that the Turing Test is worthless, as researchers point out that the ability to impersonate a human will have enormous economic and social consequences.

The researchers say that sufficiently convincing AIs “could take on economically valuable customer-facing tasks previously reserved for human workers, mislead the public or their own human operators, and undermine societal trust in authentic human interactions.”

Ultimately, the Turing Test may represent only part of what we need to assess when developing an AI system.

Ms Watson says: “Pure intellect only goes so far. What really counts is being intelligent enough to understand a situation and the capabilities of others and having the empathy needed to put those elements together.”

“Skills are only a small part of AI’s value – its ability to understand the values, preferences and limitations of others is equally important.”