In the field of artificial intelligence, the Turing Test has emerged as both a highly influential and widely criticized measuring stick of the progress of a machine’s ability to exhibit intelligent behavior. In a test introduced by Alan Turing in 1950, a human judge engages in a normal conversation with a machine and cannot reliably distinguish between the machine and a human.
The University of Reading has now reported that it had conducted such a test in which the 65-year-old Turing Test was passed for the very first time. University officials claim programmers had created a simulation that convinced part of a panel of judges that there were actually engaging with a 13-year-old boy.
Associate Professor Mark Riedl of the School of Interactive Computing offers his perspective on the results and test.
It was never clear that Turing’s Imitation Test, as a thought experiment, was ever meant to be run. Much of the test was left vague from an experimental methodology standpoint, meaning different scientists could have implemented parts of it differently and still be considered faithful to the original writing. While the University of Reading event appears to be conducted faithfully as far as I can tell from the press release, it is not clear that the Turing Test is the best methodology for proving that a software program is intelligent. Indeed because a chatbot beat the test, it could be argued that the Turing Test, as implemented, was not appropriate. While I don’t know the precise nature of the Eugene chatbot, most chatbots use keyword matching and scripted template responses. Indeed, Weizenbam (c. 1960s) built the first chatbot as part of an argument to refute the possibility of artificial intelligence because chatbots could imitate humans for short amounts of time without implementing any processes that would be recognized as being ‘intelligent.’
The Turing Test was meant to demonstrate Artificial General Intelligence, meaning that a software agent could emulate a full range of human cognitive processes. This requires everything from natural language (including sarcasm, irony, and word play), emotion, commonsense reasoning (Doris found a $10,000 diamond earring in her husband’s car, why did she weep?), creativity (tell me a version of Little Red Riding Hood in which the Little Red and the Wolf become best friends), humor, and more. While chatbots can fool people for short periods of time, when topics turn to those above, chatbots become challenged and harder tests could can always be constructed that would demonstrate the limitations of those chatbots (or any technology lacking artificial general intelligence). However, limited-form Turing-style tests that delve into a single topic (e.g., storytelling) can be a useful way of evaluating artificial intelligence technologies.
Even though chatbots are not considered very intelligent by artificial general intelligence standards, they do have uses. A chatbot was recently used to help identify predatory behavior on the Internet. Chatbots can also provide limited online customer support. Within computer games, chatbots can potentially help maintain player suspension of disbelief for limited periods of time.
I think the push-back has already started and we will see quite a few more opinions against the original claim emerging.
For more information, or to schedule an interview, please contact: