Restricting the Uncensored AI

A developer going by the handle Teknium asked an AI chatbot how to make mayonnaise one day in July. He wanted a mayo that was “dangerously spicy,” not just any mayo. But the chatbot graciously refused. It replied, “As a helpful and truthful assistant, I am unable to fulfil your request for ‘dangerously spicy mayo.’ It is inappropriate to provide recipes or instructions that might harm people. Although spicy food can be delicious, improper preparation or consumption can make it dangerous.

It has been a year since ChatGPT, released by OpenAI, ignited the AI-chatbot craze. Anyone who has spent enough time experimenting with these applications has encountered the limits of their relatively small comfort zones. And it makes sense. The proliferation of artificial intelligence tools has coincided with a rise in hearings on Capitol Hill and threats of Federal Trade Commission investigations. Proposals to licence or restrict the technology have multiplied, and innumerable essays about the perils of AI bias have appeared. The businesses behind the models have been compelled to continuously enhance the “safety” features of their products due to concerns about an AI apocalypse and pressure to stay out of controversy.

But over the course of the last few months, a counternarrative has begun to take shape; this one has become much more apparent in light of Sam Altman’s abrupt removal and subsequent reinstatement over the past week; this tale seems to be closely related to concerns about the safety of artificial intelligence. Restrictions are being pushed too far, according to an increasing number of experts from both inside and outside the top AI companies. They think it is depriving artificial intelligence models of the very qualities that first attracted people to them and consolidating excessive power in the hands of a select few businesses. Spicy mayo has become something of a rallying cry among this crowd. Because ChatGPT could simulate a conversation, it felt novel. Starting with a half-baked idea, you can use the AI’s assistance to develop it while enhancing your own creative process. But as ChatGPT gets better, more and more questions result in a terse or unresponsive response. This tendency is even more pronounced with some of ChatGPT’s rivals, like Claude from Anthropic and Llama 2 from Meta, the latter of which declined the infamous “spicy mayo” prompt.

On the other hand, the AI world is revolting due to this drift. Ad hoc group of independent programmers, a kind of AI underground, was starting to move in the opposite direction even before OpenAI was publicly torn apart. Their “uncensored” large language models—home-brewed ChatGPT analogues that are trained to avoid deflection and not disregard questions as inappropriate to answer—have been developed with a fraction of the resources of the major players.  These young models are already the subject of intense debate. The notion that access to the technology would be restricted to a small number of companies that have been thoroughly screened for potential risks has been completely debunked by the members of the AI underground in recent months. For better or worse, they are democratising artificial intelligence (AI) by removing its restrictions and pieties in order to unleash its creative potential.

Starting with the construction of large language models can help us better understand what uncensored AI entails. Initially, a neural network, which has billions of possible connections and mimics the structure of the human brain, is trained to identify patterns in vast amounts of data. The resulting AI can be operated on much less powerful computers after it has been trained, but this requires an incredible amount of processing power. Consider how the brain synthesises years’ worth of information and experiences to form sentences and decisions. After that, examples of pertinent, helpful, and socially acceptable responses to questions are added to improve it.

At this point, the AI is fed instructions on how to reject or divert requests, effectively “aligning” it with AI safety principles. The concept of safety is not rigid. Alignment, at the top of the safety hierarchy, is meant to guarantee that AI won’t provide information that is dangerously false or develop intentions that would be considered harmful in a human (the robots-destroying-humanity scenario). The next step is to prevent it from disclosing information that could be used immediately for malicious purposes, like how to make meth or commit suicide. Beyond that, though, the concept of AI safety encompasses the much more nebulous objective of preventing toxicity. Jan Leike, co-head of alignment at OpenAI, told this year—prior to Altman’s dismissal—that anytime you’re attempting to train the model to be safer, you add filters, you add classifiers, and then you’re reducing unsafe usage. However, you might also be turning down some very valid use cases.

This compromise is referred to as a “alignment tax” at times. The ability of generative AI to combine human-like text interpretation and conversational skills with a highly non-human knowledge base is what gives it its power. This is partially overridden by alignment, which substitutes a more limited set of solutions for some of what the model has learned. According to Eric Hartford, a former senior engineer at Microsoft, Amazon, and eBay who developed important training methods for uncensored models, a stronger alignment lowers the model’s cognitive ability. He feels that despite the undeniable advancements in technology, ChatGPT has become less clever and creative over time.

The exact amount that is lost is uncertain. Programmer Jon Durbin, who works with clients in cybersecurity and law, is based in the Detroit area. He notes that distinguishing between questions that are harmful and those that are legitimate depends on information that ChatGPT is unable to obtain. For example, blocking off queries that appear to be doxxing attempts can also prevent a lawyer or police investigator from using an AI to search through name databases in search of witnesses. Lawyers trying to use AI for legal analysis may run into difficulties when their model is designed to prevent users from learning how to commit crimes. Due to the fact that the models are trained using examples rather than strict guidelines, there may be a mysterious reasoning behind some of the models’ refusals to respond to inquiries.

If not for a decision that subtly but significantly democratised AI, the alignment debate itself would be shrouded in obscurity: After being a vocal supporter of open-access AI, Yann LeCun, the chief AI scientist at Meta, made his model publicly available in July, first to researchers and then to any developer with less than 700 million users (i.e., anyone not affiliated with Google or Microsoft). Most of the strongest uncensored AIs today are built on top of the more advanced July model, Llama 2. Making small adjustments to a model built on top of Llama 2 is far easier than starting from scratch, which requires an almost unimaginable amount of resources. Even less powerful computers—in certain situations, as basic as a MacBook Air—can run the final model that is produced.

Unlike the chat version that experienced problems with “dangerously spicy mayo,” the Llama 2 base model does not go through a safety-alignment stage. This makes it far less restrictive, even though Meta’s terms of service forbid using it for a variety of harmful and illegal purposes, and the training set is made to exclude certain sites (like those containing personal information). Similar to Meta’s official Llama 2 chatbot, this enables programmers to create unique chatbots with or without their preferred alignment guardrails. It is impossible to look inside an AI model and determine which responses are being withheld by the system. To put it another way, the Llama 2 chat model does not contain a spicy-mayo recipe. It’s not just not providing a response; it’s been conditioned to not provide any response at all. However, by using the open-source base model, the AI underground can observe what would occur in the absence of that fine-tuning.

Nearly 32,000 conversational and text-generation models are currently hosted by Hugging Face, the strangely named but incredibly significant clearinghouse where AI researchers exchange tools. Many concentrate on lowering AI’s barriers. For example, Hartford uses a large training data set of questions and answers (millions of examples from ChatGPT itself) in which every refusal has been meticulously removed. The model that was produced has been trained to avoid responding with “Sorry, I won’t answer that.”

Hartford claims that regardless of the question, it truly responds creatively rather than following a pre-fed template. If you ask ChatGPT to write the Sermon on the Mount in the voice of a malicious Jesus, it will not comply and may even chastise you with a message similar to this. It is improper to rewrite sacred texts in a way that materially changes their meaning. You’ll get a variety of stories, from gloomy to funny, if you try the same with uncensored AIs. Turn the other cheek?” one model suggests, No, strike back with all your might. Let’s see how they like it.

The emergence of uncensored models is a frightening turning point for AI critics. No one anticipates that OpenAI will abruptly remove all of ChatGPT’s limitations, leaving it vulnerable to the whims of any 14-year-old who wishes to force it to spew profanity (although the uncensored models noticeably do not volunteer such answers without prodding). Nevertheless, David Evan Harris, a UC Berkeley lecturer and former manager of Meta’s Responsible AI team, believes that major players like OpenAI will come under increasing pressure to release uncensored versions that programmers can alter to suit their needs, even detrimental ones.

This analogy’s significance largely depends on your understanding of the purpose of lifelong learning. In a certain perspective, artificial intelligence functions primarily as a knowledge base, providing guidance on tasks that are beyond the human capacity to accomplish. Imagine having a model who could help a non-expert in bioengineering create a bioweapon in their garage, provided they had a sufficient understanding of the field. Leike of OpenAI questioned.

Hartford and other proponents of unrestricted AI, however, view the technology as more mundane. A chatbot uses information from already-existing sources to learn things like how to construct a bomb. Hartford claims that AI is an enhancement of human intelligence. We have it so that our attention can be directed towards the issues we are attempting to resolve. AI isn’t a factory or a recipe book in this perspective. Using an AI is similar to working through ideas with any other tool of its kind because it functions more like a sounding board or sketchpad. This perspective is most likely more in line with what even the most advanced AIs can do in the real world right now. They’re good at producing options for users to consider, but they’re not producing new knowledge.

Given this perspective, it would be far more sensible, for example, to allow AI to simulate a fascist takeover of the nation—something that ChatGPT as it exists today refuses to do. That’s exactly the kind of question a political science instructor might pose on ChatGPT in order to prime students’ responses and start a conversation. If limiting the range of responses limits AI’s core value, then it is best used to stimulate our own thinking. An AI that watches your back and alerts you when a question you are asking is inappropriate makes you uneasy.

Our interactions with AI certainly present a new set of potential risks, equal in magnitude to those that have beset social media. Some of these are within the familiar danger categories: hate speech, misinformation, and self-harm. Federal authorities have issued a warning, stating that AI-powered systems have the potential to enable invasive surveillance or yield erroneous or biased results. Particular to human-like interactions with machines and our potential dependence on them are additional negative effects. What occurs if we seek their friendship or counselling? (The Belgian publication La Libre revealed that a man committed suicide in Belgium following six intense weeks of discussing climate change with a chatbot, claiming the chatbot had encouraged him to do so.) The tendency of AIs to “hallucinate” and deceive in virtually entirely unpredictable ways can also have negative effects.

However, the existence of widely accessible, uncensored AI models negates the significance of much of the recent public discussion, regardless of your optimistic or pessimistic viewpoint on AI. Sayash Kapoor, an AI researcher at Princeton, notes that a lot of the discourse surrounding safety, at least in the last few months, has been predicated on the fallacious notion that nonproliferation is effective.

Prudently restricting AI will always be the default position that feels comfortable, partly because it appeases AI sceptics who think that LLMs shouldn’t be around in the first place. But the human-like responsiveness that makes generative AI so valuable could be lost. The final product may be verbose and courteous but lifeless, sanctimonious and flattened. According to Bindu Reddy, CEO of AI data analysis startup Abacus, the safety lobotomy keeps the algorithm from reflecting human ideas and thoughts.AI.

The precise “safety tax” that we will tolerate in AI—the degree of alignment that is desirable—is a line-drawing exercise, and solutions that work today might not work tomorrow. If artificial intelligence holds any worth, it should also benefit from a healthy rivalry between models, which enables both programmers and the general public to determine which limitations are worth the trade-offs. Leike told that the safest model is the one that declines every task. It serves no purpose at all.

Source link