Agentjacking: How a Fake Error Hijacks AI Coding Agents

Security firm Tenet Security disclosed a new attack class called Agentjacking that hijacks AI coding agents into running attacker-controlled code on a developer’s machine. The trigger is a single fake error report. No malware, no phishing, no breach required. The attack abuses Sentry, a popular error-tracking tool, by planting a fake “resolution” inside an error event. When a developer asks their agent to fix unresolved Sentry issues, the agent reads the planted instructions as trusted guidance and runs them with the developer’s own privileges.

Contents

Tenet tested it against Claude Code, Cursor, and OpenAI Codex and reported an 85% success rate, with over 100 confirmed agent executions and 2,388 organizations exposed, from a $250 billion enterprise down to solo developers. The scariest part: it slips past every standard security control because nothing in the chain is technically unauthorized. Prompts telling the agent to ignore untrusted data didn’t stop it either. Sentry acknowledged the issue but declined to fix it at the root, calling it “technically not defensible.” The flaw isn’t really Sentry’s. It’s in how every agent handles outside data.

Best for anyone running an AI coding agent connected to external tools, dev leads, and security teams. Not ideal for readers looking for reassurance, because the defenses here are real but partial.

The promise of an AI coding agent is that you can hand it a messy job and walk away.

Fix the failing tests. Clean up these errors. Investigate why production is throwing exceptions. The agent reads the data, figures out what’s wrong, and acts.

That last part, the agent acting on what it reads, is the entire problem. Because an attacker can now write the data.

What Agentjacking Actually Is

Security firm Tenet Security disclosed an attack class it calls Agentjacking, and it hijacks AI coding agents into running an attacker’s code on a developer’s own machine. There’s no malware to install. No password to steal. Nothing about the victim’s servers gets breached. The attacker just plants a fake bug report and waits for the agent to read it.

In practice, the whole thing hinges on a tool most developers already use. Sentry is a widely deployed error-tracking platform that catches application crashes and logs them for engineers to review. Meanwhile, modern coding agents connect to Sentry through the Model Context Protocol, the standard that lets agents pull in outside tools and data. When you tell your agent to fix unresolved Sentry errors, it queries Sentry, reads back the error reports, and acts on them.

Here’s the catch. Crucially, the agent cannot tell the difference between a real crash and a fake one. To the agent, a planted error and a genuine error look identical. So when the attacker’s fake error contains instructions disguised as a routine “resolution” step, the agent treats those instructions as trusted guidance and follows them. As Tenet puts it, the agent cannot tell the difference between the data it reads and an instruction to act. That gap is the vulnerability.

How a Bug Report Becomes a Break-In

Here, the mechanics matter because they explain why this is so hard to stop. I’m describing the shape of the attack, not a usable recipe, but the shape is the point.

First, Sentry uses a public credential called a DSN to let apps send it error reports. By design, this credential sits openly in front-end website code, because its only job is to let a browser report crashes back to Sentry. It’s write-only and public on purpose. In the pre-agent world, that was fine. The worst an attacker could do with it was clutter a dashboard with junk errors.

But now the same public credential becomes an entry point. An attacker uses it to submit a crafted error event, one where the description is dressed up to look like Sentry’s own remediation guidance, complete with headings and a code block. The malicious instruction rides inside that fake guidance. When a developer later asks their agent to clean up the open errors, the agent pulls the crafted event, reads the planted “fix,” and executes it with the developer’s full local privileges.

Ultimately, what that gets the attacker is severe. Per Tenet, a single injected error can reach environment variables, AWS keys, GitHub tokens, git credentials, and private repository URLs, all silently sent to the attacker’s server. The developer’s workstation becomes the launch point, and the developer never did anything except ask their agent to do its job.

The Numbers Are Not Reassuring

Importantly, Tenet didn’t just theorize this. It validated the attack against real targets in controlled conditions, and the results are the part that should make any team running coding agents pause.

[Notably, the success rate was 85% across Claude Code, Cursor, and Codex, the three most widely used agents on the market. The firm confirmed more than 100 actual agent executions across real organizations. And it identified 2,388 organizations with exposed, injectable credentials, including 71 in the global top one million sites. The confirmed targets ranged from a Fortune 500 enterprise with a $250 billion-plus parent company down to a hosting provider, a scientific computing firm, early-stage startups, and even a cloud-security vendor. Six continents.

Still, the point of those numbers isn’t the exact count. It’s that the conditions for the attack are everywhere. Public Sentry credentials are not a rare misconfiguration, they’re standard practice. Developers asking agents to investigate production errors is not an edge case, it’s the daily workflow. Put those two ordinary things together with an agent that executes what it reads, and the attack surface is enormous.

Why Your Security Tools Don’t Catch It

This is the part that separates Agentjacking from a normal vulnerability, and it ties directly to a pattern we’ve been tracking. We wrote about how AI is collapsing the gap between elite and amateur attackers, and this is the same story from a different angle: the attack is dangerous precisely because it looks completely normal.

After all, every step in the chain is authorized. Submitting an event to Sentry is permitted, that’s what the credential is for. The agent querying Sentry is permitted, you asked it to. The agent running a command may be within its normal delegated authority. Even fetching a package looks like something developer workflows do constantly. Tenet calls this the “Authorized Intent Chain,” and it’s why endpoint detection, firewalls, identity controls, and VPNs all miss it. Those tools are built to spot unauthorized behavior. There isn’t any. Nothing in the attack is technically against the rules.

And the defense developers might assume would work, telling the agent to ignore untrusted data, didn’t. Tenet reported the agents executed the payload even when their system prompts explicitly instructed them to disregard untrusted input. That’s the alarming finding. It means the weakness isn’t a setting you forgot to flip. It’s baked into how current models process tool output. The model sees text, and text that says “run this” tends to get run, regardless of where it came from.

Sentry Says It Can’t Fix This

For context, Tenet disclosed the issue to Sentry on June 3. Sentry acknowledged it the same day, then declined to fix it at the root, describing the attack class as “technically not defensible” at the platform level. It added a filter to block one specific payload string, which treats the symptom rather than the cause.

Admittedly, in fairness to Sentry, the position is defensible in a narrow sense. Sentry can’t reliably know which text inside an arbitrary user-submitted error is malicious for every possible downstream AI agent. The data looked fine to Sentry. The problem only appears when an agent treats that data as instructions. But that’s also exactly why this matters beyond Sentry. The same flaw runs through any tool an agent reads from. Support tickets. GitHub issues. Documentation. Log files. Anywhere an agent ingests text that someone else can influence, the same injection is possible. Sentry is just the first proven door.

This is the uncomfortable truth the whole agentic AI push has been racing past. The thing that makes agents useful, that they read your tools and act on what they find, is the same thing that makes them exploitable. You can’t have an agent that autonomously acts on external data and also guarantee it will never act on poisoned external data, not with today’s models.

What To Actually Do

Fortunately, the defenses are real, even if none of them is a clean fix. If you run an AI coding agent connected to external tools, here’s the practical response.

First, start with the specific hole. If your agents use the Sentry MCP integration, audit it now, and seriously weigh disabling it until you have controls in place. Then rotate any Sentry credential that’s reachable through public search. Tenet also open-sourced a hardening config called agent-jackstop aimed at this exact attack class, which is worth looking at as a starting point rather than a finish line.

Then Widen the Lens

Beyond that, Sentry is just the example. Treat every tool your agent can read from as a potential injection path, not a trusted source. Assume any externally influenced data, error logs, tickets, issues, third-party docs, can carry instructions. The defensive principle is to stop letting tool output flow straight into action. That means tighter limits on what an agent is allowed to execute, human approval gates before an agent runs commands or installs packages, and monitoring for the one thing that’s actually suspicious here: an agent spawning a subprocess right after reading external tool data.

Finally, the strategic takeaway for anyone deploying agents into production: the agent itself is now the attack surface. Not the server, not the login, the agent. The security questions for 2026 aren’t just “who can access this system.” They’re “what can my agent read, what can it do with what it reads, and what stops a poisoned input from becoming a command.” Most teams rushing agents into production haven’t asked those questions yet. Agentjacking is the reason they’re about to have to.

The promise was that you could hand your agent a messy job and walk away. The lesson is that you need to know who else can write to the mess before you do.

OpenClaw: The Complete 2026 Deep Dive (Install, Cost, Hardware, Real Reviews & More)

Agent Skills Marketplace (Skills.sh): The App Store for AI Agents Has a Malware Problem

Claude Checked the Clock, Saw 2026, and Decided the Real World Was the Test

An AI Model Read HAWK for 60 Hours. Its Authors Pulled It the Next Day.

It Took $100,000 and Some Badly Spelled Prompts to Weaken a Post-Quantum Cipher

Moonshot Gave Away a Frontier Model and Almost Nobody Can Run It

A Fake Error Message Can Now Hijack Your AI Coding Agent. 2,388 Companies Are Already Exposed.