A study released on Monday asserted that employees at some of the top AI firms in the world had serious concerns about the security of their jobs and the motivations guiding their leadership.
In response to what it claims are serious national security threats posed by advanced AI, the State Department commissioned a report from Gladstone AI personnel that offers several recommendations for how the United States should proceed.
Over two hundred experts were consulted in order to compile the report. These experts included staff members from top AI labs such as OpenAI, Google DeepMind, Meta, and Anthropic, which are all aiming to create “artificial general intelligence,” a theoretical system that would be able to accomplish most tasks on par with or better than human ability. Without identifying the individuals or the particular organization they work for, the authors published snippets of private conversations they had with employees from several of these labs regarding concerns. Remarks were not immediately received by OpenAI, Google, Meta, or Anthropic.
According to Gladstone CEO and report author Jeremie Harris, they have functioned as a de-facto clearinghouse for frontier researchers’ concerns, as they are not persuaded that their organizations’ default trajectory would prevent disastrous outcomes.
Concerns about what the study described as a “lax approach to safety” at an unnamed AI lab were expressed to its authors by a single employee, who said the firm’s efforts to develop more potent systems were being sped up rather than slowed down. Even if the lab thinks AGI is a near-term possibility, another person raised concern that there were not enough containment measures in place to keep an AGI from escaping their control.
Others expressed worries about cybersecurity. According to the subjective view of many of their own technical personnel, many frontier AI laboratories’ security procedures are unable to withstand a persistent IP exfiltration campaign by a sophisticated attacker,” the paper claims. Given the current state of frontier lab security, such model exfiltration attempts are likely to succeed in the absence of direct US government cooperation, if they haven’t already.
As Harris notes, many of those who voiced those concerns did so while grappling with the realization that coming out with information in public would probably mean losing their future capacity to influence important choices. It is hard to exaggerate the degree of worry shown by some of the workers in these labs over the decision-making process and how managerial incentives influence important choices, he says. Most often, the most concerned people are those who are keeping a close eye on the danger side of things and are, in many circumstances, the most informed about it.
According to the authors, the absence of catastrophic consequences for humanity caused by current AI systems does not imply that larger systems will be safe in the future. Chief technology officer at Gladstone and co-author of the paper Edouard Harris notes that one of the main themes they have heard from those working at the forefront about the technology being built behind closed doors is that it’s somewhat of a Russian roulette game. Let’s pull the trigger once more since, well, we pulled the trigger and we’re alright.
Over the past year, a lot of countries around the world have realized how dangerous powerful AI systems can be. President Biden signed an executive order in October establishing safety guidelines for AI labs located in the United States, and in November, the United Kingdom sponsored an AI Safety Summit where world leaders pledged to collaborate to establish global standards for the technology. There are, however, currently few legal limitations on what AI labs can and cannot do when it comes to building sophisticated models because Congress has not yet passed an AI law.
The National Institute of criteria and Technology is mandated by Biden’s executive order to establish “rigorous standards” for the testing that AI systems must undergo before being made available to the general public. However, the Gladstone research advises government authorities not to place undue reliance on these AI evaluations, which are already widely used to determine whether an AI system exhibits potentially harmful behaviors or capabilities. The research claims that assessments “can be undermined and manipulated easily” because AI models’ developers can “fine tune,” or make superficial adjustments to their models to pass assessments if the questions are known ahead of time. Importantly, it is simpler to educate a model to conceal risky behaviors with these adjustments than it is to completely eliminate them.
According to the study, one expert who has “direct knowledge” of one AI lab’s operations concluded that the unnamed lab was engaging in this kind of game evaluations. The paper makes the argument that “AI evaluations can only reveal the presence, but not confirm the absence, of dangerous capabilities.” “AI developers and regulators may become overly complacent due to an over-reliance on AI evaluations.”