• chicken@lemmy.dbzer0.com
    link
    fedilink
    arrow-up
    5
    ·
    5 months ago

    From wikipedia:

    The Turing test, originally called the imitation game by Alan Turing in 1949,[2] is a test of a machine’s ability to exhibit intelligent behaviour equivalent to that of a human. In the test, a human evaluator judges a text transcript of a natural-language conversation between a human and a machine. The evaluator tries to identify the machine, and the machine passes if the evaluator cannot reliably tell them apart.

    This isn’t as hard a test as the one you’re describing. There’s research showing LLMs pass very similar tests:

    randomised, controlled, and pre-registered Turing tests on independent populations. Participants had 5 minute conversations simultaneously with another human participant and one of these systems before judging which conversational partner they thought was human. When prompted to adopt a humanlike persona, GPT-4.5 was judged to be the human 73% of the time: significantly more often than interrogators selected the real human participant. LLaMa-3.1, with the same prompt, was judged to be the human 56% of the time – not significantly more or less often than the humans they were being compared to – while baseline models (ELIZA and GPT-4o) achieved win rates significantly below chance (23% and 21% respectively). The results constitute the first empirical evidence that any artificial system passes a standard three-party Turing test.

    That’s not quite the same thing as LLMs being so good at imitating humans that a trained expert has no possible edge for telling the difference, but it is a major milestone, and I think it’s technically accurate to say “AI has passed the Turing Test” at this point.