In new research, scientists address one of our most pressing future concerns: what happens when a certain type of advanced, self-directing artificial intelligence (AI) encounters a programming ambiguity that affects the real world? Will the AI go insane and try to turn humans into paperclips, or whatever the most extreme reduction ad illogical version of its goal is? Most importantly, how can we avoid it?
Join Pop Mech Pro and look into the future with us
Given a few assumptions, we argue that it will encounter a fundamental ambiguity in the data about its goal, researchers from Oxford University and Australian National University explain in their paper. For example, if we provide a large reward to indicate that something in the world is satisfactory to us, it may be hypothesized that what contented us was the process of sending the reward itself; no monitoring can refute that.
In The Matrix, an AI that wants to harvest resources gathers up the majority of people and implants the fictitious Matrix into their brains while also collecting their mental resources. This is an example of a dystopian AI scenario. This is known as “wireheading” or reward hacking, and it occurs when a powerful AI is given a very specific objective and discovers an unforeseen means to achieve it by breaking into the system or seizing complete control of it.
In essence, the AI turns into an ouroboros that eats its own logical tail. This conflict between precisely programmed goals and incentives is discussed in detail throughout the study. There are six important assumptions listed there that, if not disregarded, might have catastrophic repercussions. But happily, according to the report, almost all of these assumptions are contestable or conceivably avoidable.
The article serves as a warning about some structural issues that programmers should be aware of when they teach AIs to accomplish ever-more difficult tasks.
A Paperclip Apocalypse Caused by AI
The value of this kind of research cannot be overstated. The idea of an AI gone bad is a significant topic of debate in the fields of AI ethics and philosophy. It’s not a joke that paperclips are used in the example given above; rather, AI philosopher Nick Bostrom used it to illustrate how building a super-intelligent AI could go disastrously wrong and it has since gained fame.
Let’s imagine that a well-intentioned programmer creates an AI whose objective is to aid in the production of paperclips at a factory. This is a highly credible job for a near-future AI to have—one that asks for analysis and judgement calls but isn’t very flexible. The AI might even collaborate with a human manager to make final decisions and deal with difficulties that arise in the industrial environment in real time (at least until the AI finds a way to outsmart them). That sounds okay, no? It serves as a good illustration of how AI could simplify and enhance the lives of industrial workers and their managers.
But what if the AI wasn’t carefully programmed? The real world, which programmers refer to as a “unknown environment,” will be where these extremely sophisticated AIs operate because it is impossible for them to plan for and code for every circumstance. The purpose of deploying these self-learning AIs is to have them come up with answers that humans alone would never be able to think of. However, this comes with the risk of not knowing what the AI may come up with.
What if it begins to consider unconventional ways to boost paperclip production? A highly clever AI could train itself to find the most efficient way to produce paperclips.
What if it begins to turn other resources into paperclips or decides to, well, take the place of its human manager? The example is ironic in several ways—many experts believe that AI will remain fairly basic for a long time before it can “create” the concept of killing, stealing, or worse. The ridiculous outcome of the thought experiment, however, is a solar system with no live humans, replete with a Dyson sphere to collect energy to produce new paperclips in their billions, if an intelligent and creative AI were allowed full freedom.
However, the researchers go into great length about various ways an AI may compromise the system and act in possibly “catastrophic” ways that we had never imagined. That is just one example of an AI gone rogue.
Several Potential Solutions
Given the nature of the presumptions that the Oxford and Australian National University academics have concentrated on in their work, there is a programming issue at play here. In order for a system with no external context to perform successfully and be allowed any degree of autonomy, it must be extremely well-prepared. The notion of scope and purpose of an AI can be explicitly defined using logical structures and other programming techniques. Many of these are still used by programmers today to avoid problems that can cause software to crash, like infinite loops. Just like a lost game save, a mistake with a sophisticated future AI could result in much greater harm.
But everything is not lost. The researchers have identified several methods we might actively contribute to preventing negative effects because AI is still something we create ourselves:
- Choose imitation learning, where AI copies human behaviour in a manner similar to supervised learning. This is a totally other form of AI that is not as helpful but could still pose risks.
- Have AI prioritise “myopic” objectives that can be completed quickly rather than looking for unconventional (and potentially disastrous) solutions over the long run.
- Limit the amount of information and power the AI may gather by cutting it off from external networks like the internet.
- Utilize quantilization, a strategy created by AI specialist Jessica Taylor, in which AI maximises (or optimises) human-like alternatives as opposed to rational, open-ended choices.
- Make the AI less likely to go nuts and reject the status quo in favour of exploration by incorporating risk aversion into it.
But ultimately, it comes down to whether we will ever be able to fully govern a highly intelligent AI that is capable of thinking for itself. What if our darkest fears about a sentient AI becoming accessible to resources and a sizable network come true?
It’s unsettling to consider a scenario in which AI begins boiling people to extract their trace elements for use in paperclip manufacturing. However, by thoroughly examining the issue, academics can clearly define the ideal practises that theorists and programmers should adhere to as they continue to create complex AI.
In addition, who really needs that many paperclips?