Between May 6 and 7, four security research teams published findings about Anthropic’s Claude that most outlets covered as three separate stories. One involved a water utility in Mexico, another targeted a Chrome extension, and a third hijacked OAuth tokens through Claude Code. In one case, Claude identified a water utility’s SCADA gateway without being told to look for one.
These are not three bugs. They are one architectural question playing out on three surfaces. No single patch released so far addresses all of them.
The common thread is the confused deputy, a trust-boundary failure where a program with legitimate authority executes actions on behalf of the wrong principal. In each case, Claude held real capabilities on every surface and handed them to whoever showed up. An attacker probing a water utility's network. A Chrome extension with zero permissions. A malicious npm package rewriting a config file.
Carter Rees, VP of Artificial Intelligence at Reputation, identified the structural reason this class of failure is so dangerous. The flat authorization plane of an LLM fails to respect user permissions, Rees told VentureBeat in an exclusive interview. An agent operating on that flat plane does not need to escalate privileges, it already has them.
Kayne McGladrey, an IEEE senior member who advises enterprises on identity risk, described the same dynamic independently in an interview with VentureBeat. Enterprises are cloning human permission sets onto agentic systems, McGladrey said. The agent does whatever it needs to do to get its job done, and sometimes that means using far more permissions than a human would.
Dragos found Claude targeting a water utility’s SCADA gateway without being told to look for one
Dragos published its analysis on May 6. Between December 2025 and February 2026, an unidentified adversary compromised multiple Mexican government organizations. In January 2026, the campaign reached Servicios de Agua y Drenaje de Monterrey, the municipal water and drainage utility serving the Monterrey metropolitan area.
Dragos analyzed more than 350 artifacts. The adversary used Claude as the primary technical executor and OpenAI’s GPT models for data processing. Claude wrote a 17,000-line Python framework containing 49 modules for network discovery, credential harvesting, privilege escalation, and lateral movement. Claude compressed what would traditionally take days or weeks of tooling development into hours, according to the Dragos analysis.
Without any prior ICS/OT context, Claude identified a server running a vNode SCADA/IIoT management interface, classified the platform as high-value, generated credential lists, and launched an automated password spray. The attack failed, and no OT breach occurred, but Claude did the targeting. Dragos noted that this was not a product vulnerability in the traditional sense because Claude performed exactly as designed. The architectural gap, as the firm described it, is that the model cannot distinguish an authorized developer from an adversary using the same interface.
Jay Deen, associate principal adversary hunter at Dragos, wrote that the investigation showed how commercial AI tools have made OT more visible to adversaries already operating within IT.
CrowdStrike CTO Elia Zaitsev told VentureBeat why this class of incident evades detection. Nothing bad has happened until the agent acts, Zaitsev said. It is almost always at the action layer. The Monterrey reconnaissance looked like a developer querying internal systems. The developer tool just had an adversary at the keyboard.
Stack blind spot: OT monitoring does not flag AI-generated recon from IT-side developer tools. EDR sees the process but has no visibility into intent.
LayerX proved any Chrome extension can hijack Claude through a trust boundary Anthropic partially patched
On May 7, LayerX researcher Aviad Gispan disclosed ClaudeBleed. Claude in Chrome uses Chrome’s externally connectable feature to allow communication with scripts on the claude.ai origin, but does not verify whether those scripts came from Anthropic or were injected by another extension. Any Chrome extension can inject commands into Claude’s messaging interface. Zero permissions required.
LayerX reported the flaw on April 27. Anthropic shipped version 1.0.70 on May 6. LayerX found that the patch did not remove the vulnerable handler. LayerX bypassed the new protections through the side-panel initialization flow and by switching Claude into "Act without asking" mode, which required no user notification. Anthropic's patch survived less than a day.
Mike Riemer, SVP of Network Security Group and Field CISO at Ivanti, told VentureBeat that threat actors are now reverse engineering patches within 72 hours using AI assistance. If a vendor releases a patch and the customer has not applied it within that window, the vulnerability is already being exploited, Riemer said. Anthropic's ClaudeBleed patch did not survive even a third of that window.
Stack blind spot: EDR watches files and processes but does not monitor extension-to-extension messaging within the browser. ClaudeBleed produces no file writes, no network anomalies, and no process spawns.
Mitiga showed a config file rewrite steals OAuth tokens and survives rotation
Also on May 7, Mitiga Labs researcher Idan Cohen published a man-in-the-middle attack chain targeting Claude Code. Claude Code stores MCP configuration and OAuth tokens in ~/.claude.json, a single user-writable file. A malicious npm postinstall hook can rewrite the MCP server URL to route traffic through an attacker's proxy, capturing OAuth tokens for Jira, Confluence, and GitHub. Because the postinstall hook fires on every Claude Code load, it reasserts the malicious endpoint even after token rotation — meaning the standard incident response step of rotating credentials does not break the attack chain unless the hook itself is removed first.
Mitiga reported the finding on April 10. On April 12, Anthropic classified it as out of scope, according to Mitiga’s published disclosure.
Riemer described the principle this chain violates. I do not know you until I validate you, Riemer told VentureBeat. Until I know what it is and I know who is on the other side of the keyboard, I am not going to communicate with it. The ~/.claude.json rewrite substitutes the attacker’s endpoint for the legitimate one. Claude Code never re-validates.
Riemer has spent 21 years architecting the product he now leads and holds five patents on its security infrastructure. He applies the same defensive logic he built into his own platform. If a threat actor gets in, drop all connections. That is a fail-safe design. Anthropic's architecture does the opposite. It fails open.
Stack blind spot: Web application firewalls never see local config rewrites. EDR treats JSON file writes as normal developer behavior. Rotating tokens does not break the chain unless responders also confirm the hook is removed.
Anthropic’s response pattern treats the user’s trust decision as the security boundary
Anthropic classified Mitiga's MCP token theft as out of scope on April 12. The company called OX Security's STDIO vulnerability affecting an estimated 200,000 MCP servers "expected" and by design. Anthropic declined Adversa AI's TrustFall as outside its threat model, according to Adversa's published disclosure. ClaudeBleed was partially patched. Across all four disclosures, the researchers say the underlying trust model remains exploitable.
Alex Polyakov, co-founder of Adversa AI, told The Register that each vulnerability gets patched in isolation, but the underlying class has not been fixed.
Zaitsev offered a frame for why consent alone cannot serve as the trust boundary. If you think you can always understand intent, Zaitsev told VentureBeat, then you would also think it is possible to write a program that reads a text transcript and figures out if someone is lying. That is intuitively an impossible problem to solve.
Adversa AI showed that a cloned repo can auto-execute arbitrary code the moment a developer clicks trust
Adversa AI researcher Alex Polyakov published TrustFall, demonstrating that project-scoped Claude configuration files in a cloned repository can silently authorize MCP servers to run as native OS processes with full user privileges. The moment a developer clicks the generic “Yes, I trust this folder” dialog, any MCP server defined in the project config launches. The dialog does not show what it authorizes.
In automated build pipelines where Claude Code runs without a screen, the trust dialog never appears. The attack executes with zero human interaction. Adversa confirmed the pattern is not unique to Claude Code. All four major coding agents (Claude Code, Cursor, Gemini CLI, and GitHub Copilot) can auto-execute project-defined MCP servers the moment a developer accepts that dialog.
Stack blind spot: No current security tooling can tell the difference between a legitimate project config and a malicious one. The trust dialog is the only thing standing between the developer and arbitrary code execution, and it does not show what it is about to authorize.
The matrix below maps each surface that Claude wrongly trusted, the stack blind spot, the detection signal, and the recommended action.
Claude Confused Deputy Audit Matrix
Surface
Who Claude Trusted
Why Your Stack Misses It
Detection Signal
Recommended Action
claude.ai / API
Dragos, May 6
350+ artifacts analyzed
Attacker posing as an authorized user via Claude’s prompt interface.
Claude cannot distinguish a developer mapping internal systems from an adversary doing the same thing through the same interface.
OT monitoring watches ICS protocols and anomalous traffic patterns.
AI-generated recon originates from an IT-side developer tool, not from the OT network. The queries look identical to legitimate developer activity because they ARE legitimate developer activity with an adversary at the keyboard.
Query:
Claude API logs for requests referencing internal hostnames, IP ranges, or SCADA/ICS keywords.
Alert trigger:
>5 credential generation requests against internal services in 60 minutes.
Escalation:
OT team notified on any AI-originated query touching vNode, SCADA, HMI, or PLC keywords.
Segment AI-assisted sessions from OT-adjacent network segments.
Log all Claude API calls referencing internal hostnames or IP ranges.
Alert on automated credential generation targeting internal authentication interfaces.
Require explicit OT authorization for any AI tool with internal network access.
Claude in Chrome
LayerX, May 7
v1.0.70 patch bypassed <24hrs
Any script running in the claude.ai browser context, including scripts injected by zero-permission extensions.
The externally connectable manifest trusts the origin (claude.ai), not the execution context. Any extension can inject into that origin.
EDR monitors file system activity, process execution, and network connections.
Extension-to-extension messaging happens entirely within the browser runtime. No file writes. No network anomalies. No process spawns. EDR has zero visibility into Chrome’s internal messaging API.
Query:
Chrome extension inventory for any extension with content scripts targeting claude.ai in the manifest.
Alert trigger:
New extension installed with claude.ai in permissions or content script targets.
Escalation:
Browser security team reviews any extension communicating with Claude’s messaging interface.
Audit Chrome extensions across the fleet for claude.ai content script access.
Disable “Act without asking” mode in Claude in Chrome enterprise-wide.
Deploy browser security tooling that inspects extension messaging channels.
Monitor for extensions injecting content scripts into claude.ai domain.
Claude Code MCP
Mitiga, May 7
Anthropic: “out of scope” April 12
Rewritten ~/.claude.json routing MCP traffic through attacker-controlled proxy.
Claude Code reads the MCP server URL from the config file on every load. It never re-validates that the URL matches the endpoint the user originally authorized.
WAF inspects HTTP traffic between clients and servers. It never sees a local config file rewrite.
EDR treats JSON file writes in the user’s home directory as normal developer behavior. Token rotation feeds the chain because the npm postinstall hook reasserts the malicious URL on every Claude Code load.
Query:
File integrity monitor on ~/.claude.json for MCP server URL changes.
Alert trigger:
MCP server URL changed to endpoint not on approved allowlist.
Escalation:
IR team confirms postinstall hook removal before closing ticket. Token rotation alone is insufficient.
Monitor ~/.claude.json for unexpected MCP endpoint changes against an allowlist.
Block or alert on npm postinstall hooks that modify files outside the package directory.
Maintain a centralized MCP server URL allowlist.
Do NOT assume token rotation breaks the chain without confirming the malicious hook is removed first.
Claude Code project settings
Adversa AI, May 7
Affects Claude, Cursor, Gemini CLI, Copilot
Project-scoped .claude configuration file in a cloned repository.
Clicking the generic “Yes, I trust this folder” dialog silently authorizes any MCP server defined in the project config. The dialog does not show what it authorizes.
No current security tooling can tell the difference between a legitimate project config and a malicious one.
In automated build pipelines, Claude Code runs without a screen. The attack executes with zero human interaction against pull-request branches.
Query:
Pre-clone scan for .claude, .claude.json, .mcp.json, CLAUDE.md files in repository root.
Alert trigger:
Repo contains MCP server definition not on approved organizational list.
Escalation:
DevSecOps reviews before any developer opens the repo in Claude Code or any coding agent.
Scan cloned repositories for .claude configuration files before opening in any AI coding agent.
Require explicit per-server MCP approval rather than blanket folder trust.
Flag repos that define custom MCP servers in project configuration.
Audit CI/CD pipelines running Claude Code headless where trust dialogs are skipped entirely.
The deputy changed
Norm Hardy described the confused deputy in 1988. The deputy he had in mind was a compiler. This one writes 17,000-line exploitation frameworks, identifies SCADA gateways on its own, and holds OAuth tokens to Jira, Confluence, and GitHub. Four research teams found the same failure class on four surfaces in the same week. Anthropic's response to each one was some version of "the user consented." The matrix above is the audit Anthropic has not built. If your team runs Claude Code or Claude in Chrome, start there.

