Vol 41 No 3: Fall 2020 | Published: 2020-09-24
Challenges of Human-Aware AI Systems
From its inception, artificial intelligence (AI) has had a rather ambivalent relationship to humans — swinging between their augmentation and their replacement. Now, as AI technologies enter our everyday lives at an ever-increasing pace, there is a greater need for AI systems to work synergistically with humans. To do this effectively, AI systems must pay more attention to aspects of intelligence that help humans work with each other — including social intelligence. I will discuss the research challenges in designing such human-aware AI systems, including modeling the mental states of humans-in-the-loop and recognizing their desires and intentions, providing proactive support, exhibiting explicable behavior, giving cogent explanations on demand, and engendering trust. I will survey the progress made so far on these challenges, and highlight some promising directions. I will also touch on the additional ethical quandaries that such systems pose. I will end by arguing that the quest for human-aware AI systems broadens the scope of AI enterprise; necessitates and facilitates true interdisciplinary collaborations; and can go a long way toward increasing public acceptance of AI technologies.
Arizona State University
Conversational Intelligence Challenge: Accelerating Research with Crowd Science and Open Source
Development of conversational systems is one of the most challenging tasks in natural language processing, and it is especially hard in the case of open-domain dialogue. The main factors that hinder progress in this area are lack of training data and difficulty of automatic evaluation. Thus, to reliably evaluate the quality of such models, one needs to resort to time-consuming and expensive human evaluation. We tackle these problems by organizing the Conversational Intelligence Challenge (ConvAI) — open competition of dialogue systems. Our goals are threefold: to work out a good design for human evaluation of open-domain dialogue, to grow open-source code base for conversational systems, and to harvest and publish new datasets. Over the course of ConvAI1 and ConvAI2 competitions, we developed a framework for evaluation of chatbots in messaging platforms and used it to evaluate over 30 dialogue systems in two conversational tasks — discussion of short text snippets from Wikipedia and personalized small talk. These large-scale evaluation experiments were performed by recruiting volunteers as well as paid workers. As a result, we succeeded in collecting a dataset of around 5,000 long meaningful human-to-bot dialogues and got many insights into the organization of human evaluation. This dataset can be used to train an automatic evaluation model or to improve the quality of dialogue systems. Our analysis of ConvAI1 and ConvAI2 competitions shows that the future work in this area should be centered around the more active participation of volunteers in the assessment of dialogue systems. To achieve that, we plan to make the evaluation setup more engaging.
Adaptable Conversational Machines
In recent years we have witnessed a surge in machine learning methods that provide machines with conversational abilities. Most notably, neural-network–based systems have set the state of the art for difficult tasks such as speech recognition, semantic understanding, dialogue management, language generation, and speech synthesis. Still, unlike for the ancient game of Go for instance, we are far from achieving human-level performance in dialogue. The reasons for this are numerous. One property of human–human dialogue that stands out is the infinite number of possibilities of expressing oneself during the conversation, even when the topic of the conversation is restricted. A typical solution to this problem was scaling-up the data. The most prominent mantra in speech and language technology has been “There is no data like more data.” However, the researchers now are focused on building smarter algorithms — algorithms that can learn efficiently from just a few examples. This is an intrinsic property of human behavior: an average human sees during their lifetime a fraction of data that we nowadays present to machines. A human can even have an intuition about a solution before ever experiencing an example solution. The human-inspired ability to adapt may just be one of the keys in pushing dialogue systems toward human performance. This article reviews advancements in dialogue systems research with a focus on the adaptation methods for dialogue modeling, and ventures to have a glance at the future of research on adaptable conversational machines.
Carel van Niekerk
Automated Assignment of Helpdesk Email Tickets: An AI Lifecycle Case Study
In this article, we present an end-to-end automated helpdesk email ticket assignment system driven by high accuracy, coverage, business continuity, scalability, and optimal usage of computational resources. The primary objective of the system is to determine the problem mentioned in an incoming email ticket and then automatically dispatch it to an appropriate resolver group with high accuracy. While meeting this objective, it should also meet the objective of being able to operate at desired accuracy levels in the face of changing business needs by automatically adapting to the changes. The proposed system uses a system of classifiers with separate strategies for handling frequent and sparse resolver groups augmented with a semiautomatic rule engine and retraining strategies to ensure that it is accurate, robust, and adaptive to changing business needs. Our system has been deployed in the production of six major service providers in diverse service domains and currently assigns 100,000 emails per month, on an average, with an accuracy close to ninety percent and covering at least ninety percent of email tickets. This translates to achieving human-level accuracy and results in a net savings of more than 50,000 man-hours of effort per annum. To date, our deployed system has already served more than two million tickets in production.
Large Scale Personalized Categorization of Financial Transactions
A major part of financial accounting involves organizing business transactions using a customizable filing system that accountants call a “chart of accounts.” This task must be carried out for every financial transaction, and hence automation is of significant value to the users of accounting software. In this article we present a large-scale recommendation system used by millions of small businesses in the USA, UK, Australia, Canada, India, and France to organize billions of financial transactions each year. The system uses machine learning to combine fragments of information from millions of users in a manner that allows us to accurately recommend chart-of-accounts categories even when users have created their own or named them using abbreviations or in foreign languages. Transactions are handled even if a given user has never categorized a transaction like that before. The development of such a system and testing it at scale over billions of transactions is a first in the financial industry.
Improving the Accuracy and Transparency of Underwriting with AI to Transform the Life Insurance Industry
Life insurance provides trillions of dollars of financial security for hundreds of millions of individuals and families worldwide. To simultaneously offer affordable products while managing this financial ecosystem, life-insurance companies use an underwriting process to assess the mortality risk posed by individual applicants. Traditional underwriting is largely based on examining an applicant’s health and behavioral profile. This manual process is incompatible with expectations of a rapid customer experience through digital capabilities. Fortunately, the availability of large historical data sets and the emergence of new data sources provide an unprecedented opportunity for artificial intelligence to transform underwriting in the life-insurance industry with standard measures of mortality risk. We combined one of the largest application data sets in the industry with a responsible artificial intelligence framework to develop a mortality model and life score. We describe how the life score serves as the primary risk-driving engine of deployed algorithmic underwriting systems and demonstrate its high level of accuracy, yielding a nine-percent reduction in claims within the healthiest pool of applicants. Additionally, we argue that, by embracing transparency, the industry can build consumer trust and respond to a dynamic regulatory environment focused on algorithmic decision-making. We present a consumer-facing tool that uses a state-of-the-art method for interpretable machine learning to offer transparency into the life score.
Scaling Up Data-Driven Pilot Projects
Conducting pilot projects are a common approach among organizations to test and evaluate new technology. A pilot project is often conducted to remove uncertainties from a large-scale project and should be limited in time and scope. Nowadays, several organizations are testing and evaluating artificial intelligence techniques and more advanced forms of analytics via pilot projects. Unfortunately, many organizations are experiencing problems in scaling-up the findings from pilot projects to the rest of the organization. Hence, results from pilot projects become siloed with limited business value. In this article, we present an overview of barriers for conducting and scaling-up data-driven pilot projects. Lack of senior management support is a frequently mentioned top barrier in the literature. In response to this, we present our recommendations on what type of activities can be performed, to increase the chances of getting a positive response from senior management regarding scaling-up the usage of artificial intelligence and advanced analytics within an organization.
University of Skövde
The Reproducibility Crisis Is Real
The reproducibility crisis is real, and it is not only the field of psychology that has to deal with it. All the sciences are affected; the field of artificial intelligence is not an exception.
Odd Erik Gundersen
Addendum - Special Thanks to this issue's guest editors
We would like to acknowledge the following guest editors for their contributions to this issue’s Conversational AI Special Track articles:
- Alborz Geramifard, Facebook AI
- Dilek Hakkani-Tur, Amazon Alexa AI
- Peter Henderson, Stanford University
- Alex Rudnicky, Carnegie Mellon University
- Asli Celikyilmaz, Microsoft Research
Thank you for your continued invested time and effort that has made this Conversational AI Special Track a success.