Anthropic's Mythos Is Too Dangerous to Ship Publicly

Anthropic just announced the most capable AI model it has ever built — and immediately decided the public can't have it.

Claude Mythos Preview, the frontier model at the heart of a new cybersecurity coalition called Project Glasswing, has already done something that decades of security tooling couldn't: it autonomously surfaced thousands of high-severity zero-day vulnerabilities across every major operating system and web browser, including a 27-year-old flaw in OpenBSD and a 16-year-old bug in FFmpeg that survived five million automated test runs without detection. The model didn't just flag these issues — it developed working exploit chains, entirely on its own, without a human in the loop.

That combination of capability and autonomy is precisely why Anthropic isn't putting this one on the Claude API shelf for anyone with a credit card.

What Anthropic Built — and Why It Won't Release It

Mythos wasn't purpose-built for hacking. It was trained to be an exceptional general-purpose reasoning and coding model — essentially a step above the existing Opus line in both intelligence and scale. The cybersecurity capabilities emerged as a byproduct of that coding prowess, which makes the situation simultaneously more impressive and more unsettling.

On the CyberGym evaluation benchmark, Mythos Preview scores 83.1% versus 66.6% for Anthropic's next-best model, Claude Opus 4.6. On SWE-bench Verified, it hits 93.9% compared to Opus 4.6's 80.8%. These aren't incremental gains — they represent a meaningful capability jump, the kind that changes what an adversary with access to the model could realistically accomplish.

Anthropic's own framing is unusually candid for a product launch: Newton Cheng, the company's frontier red team cyber lead, told reporters that similar capabilities will "proliferate, potentially beyond actors who are committed to deploying them safely" within months. Dario Amodei echoed that in a launch video, noting that more powerful models are coming from Anthropic and from competitors alike, which is precisely why a reactive approach won't cut it.

The answer Anthropic landed on is Project Glasswing — a controlled-access coalition of more than 40 organizations, including Amazon, Apple, Google, Microsoft, Nvidia, Cisco, CrowdStrike, JPMorgan Chase, the Linux Foundation, and Palo Alto Networks. These partners get private access to Mythos Preview for defensive security work only: scanning their own codebases, patching vulnerabilities before bad actors find them, and collectively building a body of knowledge about what AI-enabled threat detection actually looks like in production.

Anthropic is backing the initiative with up to $100 million in usage credits and $4 million in direct donations to open-source security foundations — $2.5 million to Alpha-Omega and OpenSSF via the Linux Foundation, and $1.5 million to the Apache Software Foundation.

The Linux Foundation's CEO Jim Zemlin put the stakes plainly: open-source maintainers, whose software underpins critical global infrastructure, have historically been left to navigate security largely without resources or expertise. Project Glasswing, at least in theory, is an attempt to close that gap at scale before adversaries close it for them.

The Disclosure Problem No One Has Solved at This Scale

Finding thousands of zero-days is one thing. Handling the aftermath responsibly is a genuinely hard operational problem that Anthropic deserves credit for thinking through — even if the approach will face stress tests that no bug disclosure program has previously encountered.

Anthropic says it has built a triage pipeline that routes all findings through professional human validators before anything reaches a maintainer. The goal is to avoid flooding unpaid open-source contributors with an automated firehose of unverified reports. The company also says it won't dump large batches of bugs on a single project without agreeing on a pace the maintainers can sustain, and it aims to include candidate patches labeled by AI provenance with each report.

For vulnerabilities still in remediation, Anthropic is publishing cryptographic hashes now, with full technical disclosure coming 45 days after a patch ships — shorter if details are already public, longer if the deployment complexity demands it.

These are reasonable principles, borrowed intelligently from coordinated vulnerability disclosure norms. But the sheer volume here is unprecedented. A 45-day window assumes maintainers can actually produce, test, and ship fixes for complex Linux kernel-level bugs in that timeframe, which isn't guaranteed. And sustaining a human-validated triage operation across thousands of findings simultaneously requires a level of operational rigor that, frankly, Anthropic hasn't demonstrated flawlessly of late.

The Trust Problem Anthropic Can't Patch With a Press Release

Here's the awkward part: the same week Anthropic announced a model it describes as capable of autonomously compromising hardened operating systems, the company is also answering questions about two recent operational security failures.

In late March, a draft blog post about Mythos — then internally called "Capybara" — was left in an unsecured, publicly searchable data store, exposing roughly 3,000 internal assets. Days later, a packaging error in Claude Code version 2.1.88 briefly exposed more than 512,000 lines of Anthropic's original source code to anyone running a standard install command. The subsequent cleanup attempt caused thousands of GitHub repositories to go dark.

Anthropic's response — that these were publishing and packaging errors, not breaches of core model infrastructure — is technically accurate. Neither incident touched model weights or training systems. But the distinction is harder to sustain as a public argument when the company is simultaneously asking Fortune 500 CISOs and government officials to trust it as the sole gatekeeper of a model that can autonomously chain Linux kernel exploits.

Dianne Penn, Anthropic's head of product management, told The Verge that the company is "taking steps in terms of solidifying our processes." That's the right answer, but it's also the minimum acceptable answer. For an organization whose core value proposition in this initiative is responsible stewardship of dangerous capabilities, operational credibility is load-bearing.

This puts pressure on Anthropic's IPO narrative, which the company is reportedly evaluating for as early as October 2026. A high-profile government-adjacent initiative backed by blue-chip partners pairs well with the $30 billion annualized revenue milestone the company also disclosed this week. But institutional investors doing diligence will notice the gap between the company's safety positioning and its recent track record of preventable operational errors.

What This Means

The core bet Anthropic is making with Project Glasswing is that transparent, structured access to dangerous capabilities — shared early with defenders under careful restrictions — will outpace the inevitable moment when similar capabilities land in less careful hands. Cheng says the window is measured in months, not years.

For security engineers and CISOs: This is a genuine signal to revisit assumptions baked into existing vulnerability management workflows. AI-driven autonomous exploit development changes the economics of attack in ways that perimeter-focused or patch-cadence-dependent defenses weren't designed for. If you're not in the Glasswing coalition, start thinking about what the equivalent capability access looks like for your organization.
For open-source maintainers: The $4 million in foundation donations and the triage pipeline are meaningful gestures, but the real ask here is capacity. If Anthropic's disclosures arrive faster than your team can triage and ship patches, even well-intentioned AI security tooling becomes a burden. Push for clear SLAs and pace agreements before you're in the middle of an avalanche.
For founders building in the security space: A coalition of Microsoft, Google, CrowdStrike, and Palo Alto Networks coalescing around a single AI security layer is a competitive forcing function. The incumbents are moving toward AI-native defense faster than the funding cycles of most startups. The opportunity is in vertical specialization and workflow integration, not replicating what Mythos does at the infrastructure layer.
For everyone watching AI's trajectory: The most consequential detail in the Glasswing announcement isn't the benchmark scores or the partner list — it's the timeline. When Anthropic's own red team lead says frontier capabilities will proliferate to adversaries within months, that's a statement about the pace of the underlying technology race, not just this product. The question of whether the glasswing's wings were ever opaque enough to delay what's coming doesn't have a reassuring answer yet.

Written by

Daily Neural Team