Audio version of the article
Sixty-five years ago, Arthur Samuel went on TV to show the world how the IBM 701 plays checkers. He was interviewed on a live morning news program, sitting remotely at the 701, with Will Rogers Jr. at the TV studio, together with a checkers expert who played with the computer for about an hour. Three years later, in 1959, Samuel published “Some Studies in Machine Learning Using the Game of Checkers,” in the IBM Journal of Research and Development, coining the term “machine learning.” He defined it as the “programming of a digital computer to behave in a way which, if done by human beings or animals, would be described as involving the process of learning.”
A few months after Samuel’s TV appearance, ten computer scientists convened in Dartmouth, NH, for the first-ever workshop on artificial intelligence, defined a year earlier by John McCarthy in the proposal for the workshop as “making a machine behave in ways that would be called intelligent if a human were so behaving.”
In some circles of the emerging discipline of computer science, there was no doubt about the human-like nature of the machines they were creating. Already in 1949, computer pioneer Edmund Berkeley wrote in Giant Brains or Machines that Think: “Recently there have been a good deal of news about strange giant machines that can handle information with vast speed and skill… These machines are similar to what a brain would be if it were made of hardware and wire instead of flesh and nerves… A machine can handle information; it can calculate, conclude, and choose; it can perform reasonable operations with information. A machine, therefore, can think.”
Maurice Wilkes, a prominent developer of one of those giant brains, retorted in 1953: “Berkeley’s definition of what is meant by a thinking machine appears to be so wide as to miss the essential point of interest in the question, ‘Can machines think?’” Wilkes attributed this not-very-good human thinking to “a desire to believe that a machine can be something more than a machine.” In the same issue of the Proceeding of the I.R.E that included Wilkes’ article, Samuel published “Computing Bit by Bit or Digital Computers Made Easy.” Reacting to what he called “the fuzzy sensationalism of the popular press regarding the ability of existing digital computers to think,” he wrote: “The digital computer can and does relieve man of much of the burdensome detail of numerical calculations and of related logical operations, but perhaps it is more a matter of definition than fact as to whether this constitutes thinking.”
Samuel’s polite but clear position led Marvin Minsky in 1961 to single him out, according to Eric Weiss, “as one of the few leaders in the field of artificial intelligence who believed computers could not think and probably never would.” Indeed, he pursued his life-long hobby of developing checkers-playing computer programs and professional interest in machine learning not out of a desire to play God but because of the specific trajectory and coincidences of his career. After working for 18 years at Bell Telephone Laboratories and becoming an internationally recognized authority on microwave tubes, he decided at age 45 to move on, as he was certain, says Weiss in his review of Samuel’s life and work, that “vacuum tubes soon will be replaced by something else.”
The University of Illinois came calling, asking him to revitalize their EE graduate research program. In 1948, the project to build the University’s first computer was running out of money. Samuel thought (as he recalled in an unpublished autobiography cited by Weiss) that “it ought to be dead easy to program a computer to play checkers” and that if their program could beat a checkers world champion, the attention it would generate will also generate the required funds.
The next year, Samuel started his 17-year tenure with IBM, working as a “senior engineer” on the team developing the IBM 701, IBM’s first mass-produced scientific computer. The chief architect of the entire IBM 700 series was Nathaniel Rochester, later one of the participants in the Dartmouth AI workshop. Rochester was trying to decide the word length and order structure of the IBM 701 and Samuel decided to rewrite his checkers-playing program using the order structure that Rochester was proposing. In his autobiography, Samuel recalled that “I was a bit fearful that everyone in IBM would consider checker-playing program too trivial a matter, so I decided that I would concentrate on the learning aspects of the program. Thus, more or less by accident, I became one of the first people to do any serious programing for the IBM 701 and certainly one of the very first to work in the general field later to become known as ‘artificial intelligence.’ In fact, I became so intrigued with this general problem of writing a program that would appear to exhibit intelligence that it was to occupy my thoughts almost every free moment during the entire duration of my employment by IBM and indeed for some years beyond.”
But in the early days of computing, IBM did not want to fan the popular fears that man was losing out to machines, “so the company did not talk about artificial intelligence publicly,” observed Samuel later. Salesmen were not supposed to scare customers with speculation about future computer accomplishments. So IBM, among other activities aimed at dispelling the notion that computers were smarter than humans, sponsored the movie Desk Set, featuring a “methods engineer” (Spencer Tracy) who installs the fictional and ominous-looking “electronic brain” EMERAC, and a corporate librarian (Katharine Hepburn) telling her anxious colleagues in the research department: “They can’t build a machine to do our job—there are too many cross-references in this place.” By the end of the movie, she wins both a match with the computer and the engineer’s heart.
In his 1959 paper, Samuel described his approach to machine learning as particularly suited for very specific tasks, in distinction to the “Neural-Net approach,” which he thought could lead to the development of general-purpose learning machines. Samuel’s program searched the computer’s memory to find examples of checkerboard positions and selected the moves that were previously successful. “The computer plays by looking ahead a few moves and by evaluating the resulting board positions much as a human player might do,” wrote Samuel.
His approach to machine learning “still would work pretty well as a description of what’s known as ‘reinforcement learning,’ one of the basket of machine-learning techniques that has revitalized the field of artificial intelligence in recent years,” wrote Alexis Madrigal in a 2017 survey of checkers-playing computer programs. “One of the men who wrote the book Reinforcement Learning, Rich Sutton, called Samuel’s research the ‘earliest’ work that’s ‘now viewed as directly relevant’ to the current AI enterprise.”
The current AI enterprise is skewed more in favor of artificial neural networks (or “deep learning”) then reinforcement learning, although Google’s DeepMind famously combined the two approaches in its Go-playing program which successfully beat Go master Lee Sedol in a five-game match in 2016.
Already popular among computer scientists in Samuel’s time (in 1951, Marvin Minsky and Dean Edmunds built SNARC—Stochastic Neural Analog Reinforcement Calculator—the first artificial neural network, using 3000 vacuum tubes to simulate a network of 40 neurons), the neural networks approach was inspired by a 1943 paper by Warren S. McCulloch and Walter Pitts in which they described networks of idealized and simplified artificial “neurons” and how they might perform simple logical functions, leading to the popular (and very misleading) description of today’s artificial neural networks-based AI as “mimicking the brain.”
Over the years, the popularity of “neural networks” have gone up and down a number of hype cycles, starting with the Perceptron, a 2-layer artificial neural network that was considered by the U.S. Navy, according to a 1958 New York Times report, to be “the embryo of an electronic computer that.. will be able to walk, talk, see, write, reproduce itself and be conscious of its existence.” In addition to failing to meet these lofty expectations, neural networks suffered from a fierce competition from a growing cohort of computer scientists (including Minsky) who preferred the manipulation of symbols rather than computational statistics as the better path to creating a human-like machine.
Inflated expectations meeting the trough of disillusionment, no matter what approach was taken, resulted in at least two periods of gloomy “AI Winter.” But with the invention and successful application of “backpropagation” as a way to overcome the limitations of simple neural networks, sophisticated statistical analysis was again on the ascendance, now cleverly labeled as “deep learning.” In 1988, R. Colin Johnson and Chappell Brown published Cognizers: Neural Networks and Machines That Think, proclaiming that neural networks “can actually learn to recognize objects and understand speech just like the human brain and, best of all, they won’t need the rules, programming, or high-priced knowledge-engineering services that conventional artificial intelligence systems require…Cognizers could very well revolutionize our society and will inevitably lead to a new understanding of our own cognition.”
Johnson and Brown predicted that “as early as the next two years, neural networks will be the tool of choice for analyzing the contents of a large database.” This prediction—and no doubt similar ones in the popular press and professional journals—must have sounded the alarm among those who did this type of analysis for a living in academia and in large corporations, having no clue of what the computer scientists were talking about.
In “Neural Networks and Statistical Models,” Warren Sarle explained in 1994 to his worried and confused fellow statisticians that the ominous-sounding artificial neural networks “are nothing more than nonlinear regression and discriminant models that can be implemented with standard statistical software… like many statistical methods, [artificial neural networks] are capable of processing vast amounts of data and making predictions that are sometimes surprisingly accurate; this does not make them ‘intelligent’ in the usual sense of the word. Artificial neural networks ‘learn’ in much the same way that many statistical algorithms do estimation, but usually much more slowly than statistical algorithms. If artificial neural networks are intelligent, then many statistical methods must also be considered intelligent.”
Sarle provided his colleagues with a handy dictionary translating the terms used by “neural engineers” to the language of statisticians (e.g., “features” are “variables”). In anticipation of today’s “data science” (a more recent assault led by computer programmers) and predictions of algorithms replacing statisticians (and even scientists), Sarle reassured his fellow statisticians that no “black box” can substitute for human intelligence: “Neural engineers want their networks to be black boxes requiring no human intervention—data in, predictions out. The marketing hype claims that neural networks can be used with no experience and automatically learn whatever is required; this, of course, is nonsense. Doing a simple linear regression requires a nontrivial amount of statistical expertise.”
In a footnote to his mention of neural networks in his 1959 paper, Samuel cited Warren S. McCulloch who “has compared the digital computer to the nervous system of a flatworm,” and added his own observation: “To extend this comparison to the situation under discussion would be unfair to the worm since its nervous system is actually quite highly organized as compared to [the most advanced artificial neural networks of the day].” In 2019, Facebook’s top AI researcher and Turing Award-winner Yann LeCun declared that “Our best AI systems have less common sense than a house cat.” In the sixty years since Samuel first published his seminal machine learning work, artificial intelligence has advanced from being not as smart as a flatworm to having less common sense than a house cat.