Adoption of emerging technologies

“Some people worry that artificial intelligence will make us feel inferior, but then, anybody in his right mind should have an inferiority complex every time he looks at a flower.”

— Alan Kay

This chapter is about the emerging technologies starting to enter our world. Here I look at IBM’s Watson, deep learning by Google and others, including predictive analytics and machine learning.

Getting to Know Smart Machines

As we look at AI systems, we tend to want them to fail. It would be easier for us emotionally and philosophically if we lived in a world in which we were smarter than our machines.

The problem with this view is that much may be gained from the world of intelligent machines. They help us solve wide ranges of problems and do so in a way that avoids the various biases and prejudices that often stand in the way of us getting to the answers we need. For example:

Consider what it would be like to have a system that could look at massive volumes of text, apply thousands of rules that link together questions and possible answers, and build up evidence for believing an idea. That’s Watson.

Imagine starting with an incredibly detailed set of features that makes no sense to you and having a system recognize and categorize things in a way that does make sense. That’s deep learning.

Envision being able to anticipate, and therefore avoid, problems before they arise. That’s predictive analytics.

This chapter looks at all three.

Gathering Evidence: Watson

Watson is the only software system to have its global debut broadcast on national television. Going head‐to‐head against acclaimed brainiacs Ken Jennings and Brad Rutner, Watson demonstrated that it could not only be smarter than people, it could be smarter than people in an arena that seemed to define smartness. Even more important, it was smarter using techniques that seemed uncannily similar to the way our own minds work.

Watson is an evidence engine. Starting with a question, a list of symptoms, or a set of financial goals, it provides an answer, a diagnosis, or advice by building up an argument for the truth of competing responses. It fires off thousands of rules that map its information needs onto patterns of answers in the text that it reads. Each rule has a weight associated with it so that rules with the same answer can reinforce each other while rules with different answers compete. At the end of it all, the answers with the best overall value bubble to the top.

Watson starts with language and has a wide range of techniques for mapping questions and queries onto a focus for its own reasoning. These include rules of syntax (such as knowing that the subject of a verb comes before it), semantic components (such as knowing that France is a country), and some special rules for certain domains such as knowing that diagnostic queries consist of lists of symptoms. These techniques allow Watson to determine its focus.

When looking at a question like “What is the best financial instrument for long‐term retirement planning?” Watson understands that “what” means it is looking for a thing, “financial instrument” defines the class of things it is looking for, and “long‐term retirement planning” defines the role this object has to play. This is the focus.

Watson then applies rules for finding information in the corpus of text it has available. These rules look for patterns in the text that link elements of the query to possible answers. Each rule, with the information from the focus, is sent out to find patterns in the text and propose an answer.

The weights for each of the rules that point to any given answer are summed up, giving the score for each possible answer. The answer with the highest score wins.

Some rules may match against patterns where the “X” is “Roth IRA” while others match against text where the answer seems to be “401(k).” Depending on how many rules provide evidence for each answer and what their weights are, one of these ends up the winner.

Watson has a learning component. It learns the weights of each of the rules by looking at questions and known answers and then modifies the weight of each rule depending on how well it gets to the correct answer. In effect, Watson learns how well each of the pieces of its own reasoning is working and rewards those that work best.

Although Watson uses rules, it uses them in a very different way than the “if‐then” model we often think of when considering AI. Watson makes its inferences by building up and evaluating evidence. When Watson learns, it doesn’t learn about the world. It learns about itself.

The only caveat about these types of systems is, although they are quite adept at explaining the world around them, they still struggle to explain themselves. In order to truly partner with us, they need to learn to talk to us about the method behind their madness in a language we can understand.

Networking: Deep Learning

While Watson is trying to infer answers to questions by looking for evidence in vast stores of text, companies such as Google and Facebook are trying to recognize and categorize objects, words, and relationships by taking huge feature sets and learning to assess them.

In order to index images effectively (“this is a house” rather than “this is a collection of pixels”) or recognize words in audio signals, you need to determine how the initial features can be mapped to more complex features. For example, you need to map pixels or waveforms to lines, curves, or the sounds of “p” and “q.” You have to transform features that are too detailed into more recognizable features that support indexing and inference. This is deep learning.

Deep learning is based on reasoning using neural nets. This kind of learning makes use of layers of input nodes sending signals to a series of internal layers of nodes (called hidden layers), each of which sends signals to the next layer until the output layer is reached. No single node does all the work, and the network as a whole produces the result for any given input. Work in deep learning is inspired by the layering of computation that takes place in the cerebral cortex of the human brain.

Each of the connections between the nodes has a weight associated with it that is adjusted during learning. On the input side we might have all the pixel values of an image with output values that stand for a category like “cat” or “house.” If the output determined by the passing of values through these links is not the same as the output value set by the category, each node failing to match sends a signal back indicating that there was an error and the weights on the relevant links must change.

Over time, these tiny changes steer the network toward the set of weights that enables the network to correctly assess that a new input is in the appropriate category. The activations sent from one side of the network result in the right values at the other end.

Deep learning systems learn in either supervised or unsupervised modes:

In supervised mode, the network is taught with examples used as input while the output layer is locked down to values associated with a category. The output layer is exactly that: the output.

In unsupervised mode, both the input and the output are locked down to the example being processed. The inner layers are more compressed than the outer layers so the network has to learn to compress its features such that they can still be used to represent the example. Here the inner or hidden layers end up being the output.

With enough time and enough examples, deep learning systems can learn about new features that may be used to support inference, by combining lower level features that cannot do so on their own.

The only downside of using techniques such as this is that they are somewhat impenetrable. It is hard for these systems to report on the features that they have discovered. They lack the crucial ability to explain themselves. This means they may be in a position of making decisions for you based on analysis that you can never see or understand.

Anticipating the Outcome: Predictive Analytics

Just as Watson is aimed at inference and deep learning is aimed primarily at assessment, predictive analytics is aimed at the third piece of our practical AI puzzle: prediction.

The three integral components of AI are:

Assessment

Inference

Prediction

But prediction is the prize!

Like both Watson and deep learning, work in predictive analytics leverages learning. The goal of the work is to craft a relationship between some set of visible features and outcomes. We start with what we want to know, such as when a cell phone user is going to drop our service, when someone is laundering money, or when someone is going to default on a loan. We want to identify the features that can predict (or perhaps explain) these events so that we can anticipate them.

The work of predictive analytics makes use of a wide variety of techniques. On the more formal side, systems use techniques such as regression analysis aimed at crafting a mathematical model that links a source (the features you see) and the target (the feature you care about). For situations where a single metric is being tracked over time, time series analysis allows a system to project the path of a metric into the future given past behavior. And we often see the use of machine learning techniques such as Naïve Bayes where individual features and their predictive power are combined together to produce a single result or even neural nets such as those used in deep learning.

Although some of these techniques have their roots in statistics and others in artificial intelligence, the difference between them is primarily the words used to describe them. In statistics, you use examples to calculate the probabilities that link features together. In AI, you learn them. But for both, you end up with a link between features you know and a prediction you want to make.

Predictive analytics is useful in a wide variety of places: for example, fraud detection, customer retention, risk management, direct marketing, and even cross‐selling. The core techniques can be used anywhere there is historical data that includes both features that you know and features that you want predicted.

The issue of applicability comes down to the question of whether you have the data and if a real relationship exists between what you have and what you need to know. With the appropriate data, companies reap tremendous benefits from the application of predictive analytics applied to their business goals. But — and this is crucial — you must have the data.

These techniques, as with all of the others I have discussed, depend on the volume, quality, and appropriateness of the data. They are not magic; they are simply the intelligent application of techniques against data.

This article has been published from the source link without modifications to the text. Only the headline has been changed.

[ad_2]

Source link