Claude Mythos Exposes Europe's AI Safety Gap

Anthropic has built an AI model capable of autonomously compromising corporate networks — and Europe's regulators are largely watching from the sidelines.

Claude Mythos Preview, the company's most capable model to date, is being distributed to roughly 50 hand-picked organizations under a program called "Project Glasswing." Apple, Microsoft, and Amazon are among the inner circle of 12. The rationale is straightforward and alarming: Anthropic believes Mythos is capable enough to meaningfully enable large-scale cyberattacks, and it wants partners to prepare defenses before wider access opens the floodgates.

The UK's AI Security Institute validated that concern with hard numbers. In capture-the-flag evaluations — structured challenges where models must identify and exploit system vulnerabilities — Mythos Preview solved 73 percent of expert-level tasks. That benchmark didn't exist before April 2025; no prior model could crack it at all. At the beginner and apprentice tiers, Mythos scored above 85 percent, placing it alongside GPT-5.4 and Codex 5.3 at the frontier.

The Benchmark That Actually Matters

CTF scores are useful, but they measure isolated skills. Real intrusions require sustained, multi-step operations across layered systems. AISI built a simulation called "The Last Ones" (TLO) specifically to test this: a 32-step corporate network takeover that a trained human analyst would need roughly 20 hours to complete.

Claude Mythos Preview became the first AI model to finish TLO end-to-end. It succeeded in 3 out of 10 attempts and completed an average of 22 of the 32 steps across all runs. The next closest model, Claude Opus 4.6, averaged 16 steps. That gap is meaningful — it's the difference between a model that can probe a network and one that can own it.

There are important caveats. The simulated environments had no active defenders, no endpoint detection, and no monitoring that would flag suspicious behavior. Whether Mythos can punch through a hardened, well-staffed security operation remains genuinely unknown. AISI plans future evaluations in more realistic conditions. But the institute's own summary is clear: the model can autonomously attack "small, weakly defended and vulnerable enterprise systems" — which describes a substantial portion of the world's actual infrastructure.

Performance also scales with compute. AISI ran tests up to a 100-million-token budget and found consistent improvement all the way to that ceiling. More inference capacity means better results, and that ceiling will keep rising.

While the UK's AI Security Institute ran its own tests and published findings within weeks of Mythos's restricted launch, most of Europe had little to no technical contact with the model.

POLITICO surveyed eight European national cybersecurity agencies. Germany's BSI was the only one that confirmed active talks with Anthropic — but those conversations yielded insight into the model's mechanics, not hands-on access to test it. BSI chief Claudia Plattner framed it plainly: whether tools this powerful eventually reach open markets is a question with profound implications for national sovereignty.

ENISA, the EU's dedicated cybersecurity agency, declined to comment on whether it had been in contact with Anthropic at all. The EU AI Office maintains a dialogue with Anthropic under the Code of Practice for general-purpose AI models, but whether Mythos is part of those conversations — and whether access has been extended — went unanswered.

This is not primarily a story about European over-regulation blocking cooperation. Anthropic is a signatory to the EU Code of Practice alongside Amazon, Google, Microsoft, and OpenAI. The willingness to engage exists on paper. The problem is structural: Europe lacks a counterpart institution with the technical credibility to get a seat at the table.

The UK's AISI was founded in 2023, carries £100 million in public funding, employs more than 100 technical staff, and has tested at least 16 models — including three frontier releases before their public launch. It has recruited senior researchers from OpenAI and Google DeepMind by paying above standard civil service rates. The EU AI Office has over 125 staff and its own safety unit, but as recently as last autumn it was struggling with severe hiring challenges: rigid pay scales, leadership vacancies, and a bureaucratic recruitment process that routinely loses candidates to faster-moving employers.

This puts pressure on the EU's broader AI governance ambitions, because evaluation capability is the foundation everything else rests on. You cannot write binding requirements for a system you haven't independently assessed. You cannot issue meaningful safety guidance without access to the model generating the risks.

AI researcher Laura Caroli, who contributed to drafting the EU AI Act, noted that the Act's obligations only fully activate once a model reaches the market. Mythos hasn't — yet. But EU guidelines suggest that internal use of a model can count as market placement if it affects the rights of EU individuals or underpins services offered in the EU. That interpretation is still being examined. The EU Commission confirmed it is reviewing potential implications under existing legislation, including mandatory cybersecurity requirements under the Cyber Resilience Act.

What This Means

The Mythos situation is a stress test for AI governance at a critical inflection point. Anthropic made a unilateral judgment call about which 50 organizations deserve access to a potentially destabilizing capability. The question isn't whether that decision was right or wrong — it's whether private companies should be making it alone.

It's deeply concerning that tech companies, not regulators, are the ones deciding how to handle these risks. We need pathways for governments or independent third parties to review these systems.

— Yoshua Bengio, AI pioneer, to POLITICO

Former EU Parliament member Marietje Schaake echoed the structural urgency: models with this kind of reach shouldn't be governed entirely by their creators. The window to agree on disclosure rules and oversight mechanisms is now, not after the next capability jump.

Some European nations are building in parallel. France launched INESIA — its own AI evaluation and safety institute — in early 2025. Spain has AESIA. But national fragmentation isn't a substitute for an EU-level body with the talent, compute, and relationships to engage frontier labs on their own terms.

For developers and security engineers: Mythos's TLO performance is a signal, not a ceiling. Patch cycles, access controls, and detection coverage that would stop a skilled human analyst may not stop an AI running 24/7 with perfect recall of exploit chains. Baseline hygiene is no longer optional.
For founders building on AI infrastructure: Restricted access tiers are becoming a feature of frontier model releases, not a bug. Plan for the possibility that your preferred model may be gated behind a vetting process, especially if your product touches security-sensitive domains.
For EU policymakers: The gap between legal frameworks and technical evaluation capacity is widening. The AI Act's requirements only bite at market release. By then, the capability curve has already moved. Investing in an evaluation institution that can compete with AISI isn't a nice-to-have — it's a prerequisite for enforcement.
For everyone watching the geopolitical angle: The next Mythos-class model may not come from a US company with cooperative instincts toward Western regulators. It may come from a lab with no interest in Project Glasswing or EU Codes of Practice. Europe's current institutional gaps will look much more serious on that day.

The UK moved quickly and now has independent data on one of the most consequential AI releases in recent memory. Europe has meeting minutes and unanswered emails. That asymmetry, more than any specific capability benchmark, is the real story here.

Written by

Daily Neural Team

Claude Mythos Exposes Europe's AI Safety Gap

The Benchmark That Actually Matters

Europe Is Flying Blind

What This Means