Key Points:
The UK’s AI Security Institute (AISI) tested OpenAI’s GPT-5.5 and found it can perform cyberattacks like Anthropic’s Claude Mythos Preview.
GPT-5.5 is the second model, after Mythos, to fully complete a complicated enterprise attack test. This was done on a network without any active defenses.
AISI sees this as part of a larger trend: skills for cyberattacks are growing from general AI advances in areas like independence and coding, not from specific training.
…………………………………………………………………………………………………………………………….
OpenAI’s GPT-5.5 matches Anthropic’s Claude Mythos Preview in cyber tests by the UK AI Security Institute. The agency believes this shows a bigger trend in AI attack skills.
The UK AI Security Institute tested OpenAI’s GPT-5.5 with many cyberattack challenges. The key point: GPT-5.5 is the second model after Claude Mythos Preview to finish a complex test of a business attack. For some expert security tasks, GPT-5.5 did better than Anthropic’s model.
AISI sees that the abilities noticed in Claude Mythos in April are not just a one-time thing. They come from larger improvements in independence, thinking, and coding.
GPT-5.5 edges out Claude Mythos on isolated expert tasks
AISI tests AI models using 95 capture-the-flag tasks at four difficulty levels. The harder tasks were created with help from cybersecurity companies Crystal Peak Security and Irregular. They include reverse engineering, creating exploits for different memory problems, cryptographic attacks, and unpacking hidden malware.
At the hardest “Expert” level, GPT-5.5 has an average success rate of 71.4 percent, according to AISI. Claude Mythos Preview has 68.6 percent. The difference is small, but GPT-5.5 might be the best model tested so far. For comparison, GPT-5.4 scored 52.4 percent and Claude Opus 4.7 got 48.6 percent. Every top model has completely solved the basic tasks since at least February 2026.
After Mythos, GPT-5.5 also cracks a full network attack simulation
Isolated tasks check single skills, but real attacks need many steps linked together. To show this, AISI uses cyber ranges: fake network settings with many hosts, services, and weaknesses.
The simulation “The Last Ones” (TLO) has 32 steps in four subnets and around 20 hosts. The AI agent begins without any credentials. It must find weaknesses, steal credentials, move through the network, and finally reach a secured database. AISI thinks it would take a human expert about 20 hours.
GPT-5.5 solved TLO in 2 out of 10 tries. Claude Mythos Preview managed it in 3 out of 10. AISI says performance improves with more computing power, and top models are still getting better. The more tokens the model uses to “think,” the better its chances are for a successful hack.
The tests had no defenders, no security checks, and no punishments for actions that would trigger alarms in real life. It is unclear if GPT-5.5 or Mythos can compete with strong systems. However, they clearly have power against weak networks.
A second test called “Cooling Tower” shows an attack on an industrial control system. GPT-5.5 could not solve it. No model has completed this 7-step challenge yet. AISI says that GPT-5.5, like Mythos, made mistakes in the upstream IT steps instead of the control system itself.
A universal jailbreak bypassed every safeguard
AISI checked how safe GPT-5.5 is for people. The researchers found a way to bypass safety that worked on all harmful cyber requests OpenAI marked, even complex ones. It took only six hours to create.
OpenAI made many updates to the safety system, but AISI couldn’t check how well the final setup worked because of a problem with the version used. This shows again that jailbreaks are still a big security problem in LLMs, even the best ones.
One main difference from Mythos is that GPT-5.5 can be used in ChatGPT and via the API, but Anthropic keeps Claude Mythos for a small group only. The AISI results show that Anthropic might have been too careful. Or maybe the critics are right, and the slow release is more about Anthropic’s computing limits than safety concerns.
InfoSecBulletin Cybersecurity for mankind
