Researchers from Google AI released two new dialog datasets for natural-language processing (NLP) development: Coached Conversational Preference Elicitation (CCPE) and Taskmaster-1. The datasets contain thousands of conversations as well as labels and annotations for training digital assistants to better determine users’ preferences and intentions.
The team gave an overview of the two datasets in a recent blog post. Motivated by a lack of high-quality dialog data for training digital assistants, the researchers captured the dialog created by two people engaging in conversation. According to the blog:
The conversations we might observe with today’s digital assistants don’t reach the level of dialog complexity we need to model human-level understanding.
Instead of capturing dialogs from a single human conversing with a chatbot, the team used trained human agents simulating a digital assistant using a “Wizard of Oz” platform, interacting with crowd-sourced users. After capture, the dialogs were annotated with metadata about the conversation, which is key for using the data in machine-learning.
The Taskmaster-1 dataset is described in detail in a conference paper. The data focuses on common digital-assistant automation scenarios, including ordering pizza, scheduling a ride-share, or purchasing movie tickets. To increase the diversity of conversation flows, users were encouraged to “change their minds” mid-conversation, and agents were told to sometimes respond that requested items were unavailable. This results in ambiguous references, a common challenge to overcome when designing robust NLP systems. To simplify the annotation process, only the API argument values for each type of conversation were labelled; for example, the movie name and showtime for a movie ticket purchase. In addition to two-person dialogs, Taskmaster-1 also includes “self-dialogs,” in which workers wrote out both sides of an imagined dialog based on an assigned scenario. The resulting dialogs, according to the authors, are “surprisingly rich in content.”
The team contrasted their Taskmaster-1 dataset with MultiWOZ, a benchmark dialog dataset. Taskmaster-1 has around 13,000 dialogs compared with MultiWOZ’s roughly 10,000, and contains almost 10x the named entities of MultiWOZ. When used to train several different deep-learning NLP models, Taskmaster-1 achieved higher perplexities and lower BLEU scores compared to MultiWOZ, which suggests the Taskmaster-1 dialogs are difficult to model; therefore, chatbots that master these dialogs should be able to handle a wider range of human interaction.
Besides booking movie tickets and restaurant reservations, digital assistants are often asked to recommend movies, restaurants, or other entertainment. Current dialog systems based on common filtering metadata often require “unnatural or tedious dialog” to determine users’ preferences in order to make useful recommendations; for example, movie-goers’ preferences are often based on subjective details such as plot or levels of violence instead of objective facts such as directors or genres. As described in another conference paper, the CCPE dataset is intended to provide a dataset that reflects how “real people describe their preferences when encouraged to do so naturally in a conversational setting.” The dataset, like Taskmaster-1, consists of dialogs between a trained human agent and a crowdsourced user. The agent was given a set of questions to ask in order to identify the user’s movie preferences:
- What sort of movies does the user like?
- What is an example of a liked movie?
- What in particular was appealing?
- What is an example of a disliked movie?
- What in particular was not appealing?
- Select example movies, and for each:
- Has the user heard of / seen it?
- If so, what are similar preferences?
The dialogs focus on preference elicitation instead of slot-filling. The resulting dataset consists of 502 conversations, consisting of 11,972 utterances with 15,646 annotations. The annotations include anchor items, which are the names of movies, people, or other entities, and preferences expressed by the user about the anchor items.