Siri, Alexa, Google – all these are AI systems that have been invented with a single objective in mind: to understand us.
Spectacular advancement has already been witnessed. The current AI approach can understand certain types of text with human-level precision by accomplishing hundreds of billions of computations in the blink of an eye. However, the task becomes remarkably more appalling when text is part of a larger conversation, where context must be considered for interpreting what the user means and deciding the way to respond. Nonetheless, chatbots such as Facebook’s BlenderBot 2.0 appear to herald far less infuriating interactions with AI.
The catch is that the more complications added to these conversational AI bots, the more strenuous it becomes to meet the expectation of real-time response. One such perfect example is BlenderBot 2.0. BlenderBot 2.0 is much more intricate than its antecedent because it addresses the critical constraints of BlenderBot 1.0, such as its shortcoming of long-term memory. As a result, it’s more difficult to speed up the machine learning (ML) that makes it work discreetly.
Speed limitations of conversational AI and chatbots
There is no secret to having a natural conversation. Instead, it requires a mind-bogglingly enormous network of ML models, each of which solves a small piece of the puzzle in deciding what is to be said next. One model might take into account the user’s location, another the interaction history, and yet another the feedback of the identical responses acquired in the past — with each model adding priceless milliseconds to the system’s latency.
In other words, the actual restraint for conversational AI is our tolerance.
The extent of dependency
Our anticipations for AI are different in an academic setting, where we are ready to wait for long hours or even days for results in comparison to a live setting, where we anticipate an instant response. Especially for conversational AI bots, each prospective enhancement must be contemplated against the preference for lower latency.
That latency is the result of what’s known as the “critical path” which is the shortest sequence of linked ML framework needed to go from an input to an output, where input is the user’s message, and output is the bot’s response. This is an old project management concept, but it’s most pertinent to today’s ML networks while trying to avoid irrelevant steps.
What is the way to find the critical path? The answer is dependency, which has been a defining issue in software development for quite a long time. In any type of linked software architecture, enhancing one application can force engineers in updating the whole system. However, there are times when an update that is necessary for Application A is not compatible with Applications B, C, and D.
This is referred to as “dependency hell.” And, in the absence of extraordinary observation to detail, machine learning dependencies amplify that frustration to new heights.
Standard software dependencies count on APIs that convey the simple, distinct state of a given application, like a cell in a spreadsheet switching from red to green. APIs enable engineers for developing every application independently while assuring that they remain on the same page. Engineers must deal with abstract probability distributions when dealing with ML dependencies, which makes it difficult to predict how changes to one model will affect the larger ML network. Only by mastering these nuanced model-to-model relationships will we be able to make conversational AI a reality.
Time-saving by skipping steps
To master conversational AI dependencies, one must integrate machine learning with human intuition.
Consider the example where the conversational AI bot is designed for handling employee requests, like obtaining a PowerPoint license or inquiring about the PTO policy. It turns out that even seemingly minor issues can lead to a downward spiral into dependency hell.
The answer to a PTO question may be on page 53 of the employee handbook, and it may differ for a salesperson in Canada and an engineer in Spain. Add to that the challenge to ignore irrelevant details, such as the employee’s Hawaiian vacation plans, and we have got dozens of particular ML models that must all work together.
The trick is to decide which models — which steps in the crucial path — are required to solve each problem. The first step is natural language understanding (NLU), which aims to convert unstructured text into machine-actionable data. NLU consists of a pipeline of several ML models that rectify the typos, recognize critical entities, isolate the signal from the noise, identify the user’s intention, etc. With this information, one can begin to weed out unnecessary models downstream.
This entails guessing what a useful solution to the problem might be — before analyzing the real solutions that the organization has available. An employee requesting PowerPoint access may be benefitted from a software license or a request form, but they almost certainly do not require a map of the new office. One can forecast the models to be activated and the models to be bypassed utilizing information from our NLU process, which is known as a “pre-trigger” system.
Because the probability distributions involved are abstract, our pre-trigger system counts on both machine learning and intuition-based directives from human experts. Finally, allocating time where it matters is both an art and a science.
Progressing with Conversational AI bots
Nobody knows what conversational artificial intelligence will look like in ten years. What we do know is that there is a requirement for enhancing our chatbots now to make room for future advancement. If a conversational experience has to be maintained, we can’t keep adding complexity without considering the overall system’s latency.
In contrast to science fiction, the “breakthroughs” in artificial intelligence that we see are the result of several small, incremental enhancements to existing models and techniques. The work of optimizing conversational AI is not for the movies, and it rarely occurs overnight. But it’s years of unwavering effort — not flashes of brilliance — that enable chatbots to understand and assist us in real-time.