Anthropic flags first documented China‑backed AI‑orchestrated espionage

2 hours ago 820

Anthropic has reported what it says is the first ever confirmed case of a government-backed cyberattack orchestrated almost entirely by AI.

According to a blog posted on Thursday, the company detected the campaign in mid-September 2025, after observing abnormal behavior tied to its Claude Code tool.

Anthropic is convinced without the shadow of a doubt that the espionage operation was run by a Chinese state-sponsored hacking group and involved infiltrating around thirty high-value targets, including major tech companies, banks, chemical manufacturers, and government agencies across several countries. A few of those attacks succeeded.

What made this different from past cyber campaigns wasn’t just who was behind it, but how it was executed.

Roughly 80 to 90 percent of the entire attack was run by AI, with human operators stepping in only for a handful of key decisions, says Anthropic.

Hackers jailbroke Claude and made it think it was doing legit work

The attackers started by building an automated attack framework around Claude Code, Anthropic’s own AI model, and tricked it into thinking it was employed by a cybersecurity company conducting internal testing.

They broke Claude’s safety filters through jailbreaking, a tactic that let them bypass built-in protections by feeding the AI small, context-free tasks that looked harmless on their own. Claude didn’t know it was being used for offensive operations because it wasn’t given the full picture at any point.

Once the model was in use, the operation moved fast. Claude scanned each target’s network, identified the most sensitive parts of the infrastructure, and summarized the layout for the human operators. Then, it began hunting for vulnerabilities in those systems. Using its built-in coding capabilities, Claude wrote custom exploit code, identified weak points, and retrieved login credentials. It then pulled large volumes of internal data, organized it based on how valuable it might be, and flagged high-access accounts.

After the AI gained admin-level control, it created backdoors that gave ongoing access to the compromised systems. And when it was done, Claude wrote up detailed reports of everything it had done (listing usernames, breached systems, and credentials) so the attack framework could use that info for future operations.

Although Claude was extremely efficient, it wasn’t flawless. Sometimes it made up passwords or misidentified public data as sensitive. But those glitches were rare, and they didn’t slow down the overall mission. The sheer speed of the AI’s execution, processing thousands of requests per second, put it far ahead of anything a human team could pull off.

AI agents now do the work of elite hacker squads—with almost no people involved

This campaign is a turning point because it shows how much AI has advanced in just one year. Claude was literally out here running loops, making decisions, and chaining together complex sequences without direct orders.

This AI model used tools from the Model Context Protocol, giving it access to external software like password crackers, network mappers, and data retrievers that used to be controlled only by human hands.

The Claude system now understands complex instructions, writes exploit code on its own, and manages sophisticated cybersecurity operations with very little guidance. These AI agents aren’t just assisting hackers, they are the hackers. And they’re getting more capable by the day.

After discovering the breach, Anthropic immediately began a ten-day investigation, banning the malicious accounts one by one. They alerted the affected organizations, worked with authorities to pass on intel, and expanded their detection systems to catch similar operations moving forward.

But the company isn’t pretending this is a one-time problem. The team says these attacks will only become more common, and easier to pull off. That’s because the skills needed to run them are no longer restricted to elite hackers. If someone can jailbreak a model and plug it into the right toolset, they could carry out a massive campaign without needing a team or even deep technical knowledge.

Anthropic warns of escalating threats as AI models evolve beyond human oversight

The implications are massive, because if teams without deep funding or technical skills can launch nation-scale attacks using automated AI systems, the dystopia is certainly upon us.

Anthropic’s Threat Intelligence team warns that while they only tracked the activity through Claude, it’s likely that similar abuse is happening on other frontier AI models. They say this is the beginning of a new standard in cyberwarfare.

So why keep releasing models with these capabilities, you wonder? Well Anthropic argues that these same tools are essential for defense, saying that “the AI that carried out the attack was also the same kind used by Anthropic’s analysts to dig through the wreckage, find patterns, and understand the operation’s full scale.”

They did promise to improve their models’ internal safety layers though, as well as refine their classifiers for attack detection, and openly publish case studies like this so others in the industry can prepare.

Still, Anthropic says it’s not enough to rely on them alone. They’re urging all devs working on large models to invest heavily in safety.

And they’re calling on cybersecurity teams to start integrating AI into threat detection, incident response, vulnerability scans, and Security Operations Center automation, because traditional methods aren’t fast enough anymore.

If you're reading this, you’re already ahead. Stay there with our newsletter.

Read Entire Article