The UK AI Security Institute evaluated Claude Mythos Preview, finding that the model can execute complex cyber attacks without human assistance.
The UK AI Security Institute published on Monday the results of its evaluation of Claude Mythos Preview, Anthropic‘s flagship model not yet available to the public. The tests confirm that the system is capable of executing sophisticated cyberattacks autonomously, with success rates unprecedented for an artificial intelligence model.
The existence of Claude Mythos had surfaced in late March through a website leak, with Anthropic subsequently confirming the model’s capabilities in finding and exploiting cybersecurity vulnerabilities at a level never seen before. During the pre-release phase, the system reportedly identified thousands of zero-day vulnerabilities autonomously — many of them dating back one or two decades — present across all major operating systems. Anthropic has chosen not to make the model publicly available, granting limited access to dozens of security research firms.
Test results show that Mythos Preview achieved a 73% success rate on expert-level capture-the-flag tasks — challenges that no AI model was able to complete before April 2025. The model became the first AI system to complete “The Last Ones” (TLO), a corporate network attack simulation structured across 32 stages, which normally takes a human approximately 20 hours to complete. Mythos Preview completed the simulation in 3 out of 10 attempts, progressing through an average of 22 of the 32 total steps across all runs. The simulation covers the full cycle of a real-world intrusion, from initial reconnaissance through to complete network takeover.
The next best-performing model, Claude Opus 4.6, averaged only 16 steps. The British institute also noted that Mythos Preview’s capabilities continue to scale with increased computational resources, using up to 100 million tokens per evaluation session. When explicitly directed and given network access in controlled environments, the model demonstrated the ability to execute multi-stage attacks and discover vulnerabilities without any human intervention.





