The Success of Conversational AI and the AI Evaluation Challenge it Reveals
By Ian Beaver
Research interest in Conversational AI has experienced a massive growth over the last few years and several recent advancements have enabled systems to produce rich and varied turns in conversations similar to humans. However, this apparent creativity is also creating a real challenge in the objective evaluation of such systems as authors are becoming reliant on crowd worker opinions as the primary measurement of success and, so far, few papers are reporting all that is necessary for others to compare against in their own crowd experiments. This challenge is not unique to ConvAI, but demonstrates as AI systems mature in more “human” tasks that involve creativity and variation, evaluation strategies need to mature with them.