The optimistic framing is almost irresistible: a frontier AI company partners with U.S. intelligence agencies, stress-tests classified infrastructure, and exposes dangerous vulnerabilities before adversaries can find them. That is responsible AI deployment at its best. The problem is what happened next — and what the sequence of events reveals about the hidden costs of treating powerful AI models as both indispensable tools and existential threats at the same time.
The Reading
What Actually Happened
A U.S. official, speaking anonymously to The Associated Press, confirmed that Anthropic’s Mythos model — its most advanced and tightly controlled AI — identified significant vulnerabilities inside highly sensitive U.S. government computer systems during a structured testing exercise conducted in collaboration with U.S. intelligence agencies. The vulnerabilities were pinpointed within hours, not weeks. The official was careful to note that speed of discovery did not equate to speed of exploitation: finding a vulnerability and weaponising it are distinct capabilities.
The exercise was conducted under Anthropic’s Project Glasswing, an initiative the company describes as a coalition of technology companies working to protect critical global software infrastructure. The public first became aware of the test’s results when Democratic Senator Mark Warner of Virginia referenced it during a June 11 hearing before the Senate Committee on Banking, Housing, and Urban Affairs. Warner quoted General Joshua Rudd, head of the National Security Agency and U.S. Cyber Command, as stating: “This tool broke into almost all of our classified systems, not in weeks but in hours.” Both the NSA and an Anthropic spokesman declined to comment on the specifics.
The Overlooked Risks
Here is where the optimistic framing starts to fray. Within days of the test results becoming known, the Trump administration issued a directive mandating that Anthropic prevent foreign nationals from accessing its latest models, Fable 5 and Mythos 5. Anthropic confirmed it disabled the models across its entire customer base to comply — even though the company publicly stated it did not believe the government’s concerns were warranted by the security threat it had originally flagged.
The directive came just ten days after President Donald Trump signed an executive order establishing a framework allowing the federal government to vet advanced AI systems for national security risks for up to a month before public release — with developer participation described as voluntary. The rapid escalation from “please help us test” to “now disable your product” illustrates a tension at the heart of government AI policy: the same capabilities that make a model useful for defence make it frightening to regulators.
Fable 5, described as a limited public version of the more powerful Mythos, had already been widely released. Mythos itself was already tightly controlled specifically because of cybersecurity concerns. The administration’s directive effectively added another layer of restriction on top of controls Anthropic had voluntarily implemented — a dynamic that raises questions about whether ad hoc directives can be a durable substitute for a coherent legal framework. As regulatory improvisation increasingly shapes the technology sector, the absence of stable rules creates compounding risks for both companies and the government agencies that depend on them.
There is a structural irony buried in the timeline that neither the government nor Anthropic has addressed directly: the testing exercise that validated Mythos’s offensive capabilities was itself the evidence base the administration used to justify restriction. In other words, Anthropic’s cooperation with the intelligence community — the very act Project Glasswing was designed to encourage — may have accelerated the regulatory response that now limits the model’s defensive utility. Companies weighing whether to participate in similar government red-teaming exercises in the future will notice this dynamic.
The Cybersecurity Community’s Rebuttal — and Its Limits
More than 100 cybersecurity executives and experts, including signatories from Adobe and Nvidia, wrote to the administration urging it to reverse the directive. Their argument was pointed: Anthropic’s Mythos models are, they acknowledged, “quite good” at identifying software vulnerabilities and weaponising exploits — but they are “not uniquely good at these tasks.” Many said they routinely use other foundational and open-source models for security audits and penetration testing. Removing America’s most capable cyber-defence tool without compelling reason, they warned, hands an asymmetric advantage to adversaries who face no such self-imposed constraints.
This argument matters because it reframes the restriction not as a trade-off between security and capability, but as a net security loss. If the capability to find vulnerabilities is already widely distributed across open-source models and commercially available tools — as the letter implies — then restricting Mythos does not meaningfully reduce the risk of that capability falling into the wrong hands. It only reduces access for authorised defenders. The Five Eyes intelligence alliance has already warned that AI-enabled cyberattacks are a near-term threat, measured in months, not years. Voluntarily degrading defensive tooling in that environment is a significant bet.
The tension is also commercial. Anthropic is a private company navigating an increasingly fraught relationship with a government that is simultaneously its regulator, its customer, and — through national security directives — its most consequential constraint. That the administration also previously restricted deployment of some Anthropic models by the U.S. military adds another layer: the company’s AI is apparently trustworthy enough to audit classified systems but not trustworthy enough to be deployed by the armed forces without restriction. That contradiction has not been resolved publicly.
Historical Parallels
The pattern is not without precedent. In the 1990s, U.S. export controls on encryption software — most famously the battles over PGP and the Clipper chip proposal — created a similar paradox: the government sought to restrict civilian and foreign access to strong cryptographic tools, while security researchers warned that the restrictions would leave American systems less secure, not more, because adversaries would simply use or develop equivalent tools without the same constraints. The encryption wars ended with a broad relaxation of export controls after it became clear that strong encryption was already globally available and that restriction harmed American technology companies without achieving its security objectives.
The parallel is imperfect — AI models are not cryptographic algorithms — but the underlying logic rhymes. Capability restriction is most effective when the capability is genuinely scarce. When it is not, restriction primarily taxes the restricted party’s legitimate users while doing little to limit adversaries. Whether frontier AI vulnerability-discovery is still scarce enough to make restriction meaningful is precisely the question the administration has not answered publicly — and the question the cybersecurity community’s letter is implicitly asking. Concerns about whether frontier AI capabilities are being assessed accurately by policymakers have been growing across multiple domains.
The Strongest Counterargument
The most serious objection to the cybersecurity community’s critique — and it deserves a fair hearing — is that Mythos may represent a qualitative leap, not merely a quantitative one, in offensive AI capability. The signatories argue that other models can perform similar tasks, but the classified test described by General Rudd suggests that Mythos achieved something operationally significant: broad, rapid penetration of hardened government systems in a controlled but realistic scenario. If the model is genuinely more capable than its open-source alternatives at this specific task, the “restriction is futile” argument weakens considerably.
Security researchers who study AI-enabled exploitation, including work cited by the evolving AI vulnerability exploitation landscape, have noted that even modest improvements in automation speed can produce disproportionate strategic advantages in offensive operations. A government could rationally conclude that even if Mythos is not uniquely capable, it is capable enough to warrant caution — particularly given the scale of the classified systems it reportedly penetrated.
That said, this counterargument does not fully resolve the contradiction. If Mythos is too dangerous to be broadly available, the question of who gets to use it and under what framework becomes urgent — and ad hoc directives issued without public legal justification are a poor substitute for a durable answer. The administration has not publicly established that Mythos clears a specific capability threshold that justifies restriction, which leaves the policy open to the charge that it is precautionary theatre rather than calibrated risk management.
The Open Questions
- Will Anthropic’s cooperation with Project Glasswing deter other AI companies from participating in government red-teaming exercises, knowing that demonstrated offensive capability can trigger commercial restriction?
- Has the administration established any public capability threshold at which AI models trigger mandatory restriction, or are directives being issued on a case-by-case basis without transparent criteria?
- If open-source and foundational models can perform similar vulnerability-discovery tasks, what is the actual security gain from restricting Mythos specifically — and has anyone in government modelled that counterfactual?
- How will Anthropic’s commercial relationships and revenue be affected if the government continues to issue compliance directives that the company itself views as unjustified?
- Does the Trump administration’s voluntary framework for pre-release AI vetting have enough legal force to be applied consistently, or will it remain an ad hoc instrument deployed selectively against frontier models?











