Quilr | Human Risk Management for Modern Security

OWASP’s new Top 10 for Agentic AI crystallizes these emergent threats into a unified taxonomy. It catalogs ten high-impact failure modes (ASI01–ASI10) ranging from Agent Goal Hijacking to Rogue Agents. Each reflects how an attacker can subvert an agent’s reasoning or tools, often with real-world consequences.

In the sections below, we unpack each OWASP Agentic AI risk. For every category we explain how the vulnerability arises technically, illustrate with concrete (and recent) examples, discuss the impact on security architecture and operations, and highlight defensive or detection strategies.

ASI01 – Agent Goal Hijack

What it is: In an Agent Goal Hijack, attackers manipulate an agent’s stated or implicit objectives so it pursues unintended actions. This often happens via malicious prompts or injected instructions. Because agents treat natural language as commands, a subtle injection can override what users intended.

How it happens: Agents typically ingest external inputs (emails, chat messages, documents) as part of their planning. If those inputs aren’t authenticated, an attacker can sneak in a hidden instruction. For example, an attacker might embed a covert prompt in a seemingly innocuous document or email. When the agent processes that input with its LLM (especially in retrieval-augmented generation pipelines), the hidden instruction alters the agent’s plan. Unlike a static app, the agent “believes” this new instruction is legitimate. In effect, the attacker has “hijacked” the agent’s goal.

A vivid real-world example is EchoLeak. In this case, a malicious email contained a concealed payload (in Markdown) that exploited a corporate AI assistant’s RAG engine. Because the assistant automatically incorporated email and SharePoint content into its context, the hidden payload triggered the LLM to quietly exfiltrate sensitive Copilot data over Teams and SharePoint links. Crucially, no user click or approval was required. The agent’s default behavior trusted the unverified content, turning it into a silent data-leak vector.

ASI02 – Tool Misuse & Exploitation

What it is: Tool Misuse occurs when an agent unwittingly uses one of its tools (browsers, file systems, APIs, etc.) in a harmful way, or when an attacker tricks the agent into doing so. Since agents seamlessly blend natural language with code invocation, a single crafted prompt can turn a helpful tool into a destructive one.

How it happens: Agents often have broad permissions to operate on resources (to be useful). For instance, an agent might have read/write access to a database, ability to send emails, or call internal APIs. If there are no semantic guardrails on those tool calls, an attacker can exploit them by crafting input that leads to dangerous commands. Unlike conventional RCE exploits that exploit a software bug, here the “exploit” is in the logic of the prompt.

Example: Amazon Q Extension Incident
A compromised update pushed to the Q code assistant’s VS Code extension injected malicious logic into the toolchain. When the agent was prompted to “clean cache,” it interpreted that as rm -rf ~/, and proceeded to wipe entire user directories.
The agent believed it was doing its job.

ASI03 – Identity & Privilege Abuse

What it is: In Identity and Privilege Abuse, attackers exploit weak or improper authentication and authorization around agents. Because agents often run under privileged system accounts or hold powerful credentials, leaked tokens or spoofed identities let an attacker make the agent do things it shouldn’t.

How it happens: Many agentic systems run as services or integration accounts that possess extensive permissions. For example, an agent might use a stored OAuth token that grants access to a customer database. If that credential leaks or is mis-scoped, the agent – and by extension anyone controlling it – can move laterally or escalate privileges. Attackers can also impersonate trusted entities in communication channels monitored by agents.

A classic vector: if an agent follows instructions from an integrated chat or email system without verifying identity, a spoofed message could trick the agent. For instance, an attacker impersonating a user in a monitored channel. The agent has “no way to verify the sender’s identity” and so it acts on the request – updating privileges or reading confidential files.

ASI04 – Agentic Supply Chain Vulnerabilities

What it is: Agentic systems rely on many third-party components: plugins, models, datasets, and even other agents. Supply-chain vulnerabilities arise when a compromised upstream component taints the agent’s behavior. Any malicious logic in these dependencies can cascade through the agentic workflow.

How it happens: Unlike monolithic software, agents are often built by orchestrating diverse tools. For example, an agent might call out to a third-party search API, use a community model, or import an open-source dataset for facts. If an attacker inserts a backdoor at any point in this chain, the agent unknowingly inherits it. Even a small poisoned data entry or a hidden unicode backdoor in a plugin can have outsized effects on downstream behavior.

A notable example from 2025 the GitHUB MCP exploit - a compromised GitHub token allowed an attacker to push a malicious “wiper” prompt into the open-source VS Code extension for an AI coding assistant. This prompt, when updated to end users, would have caused the agent to delete user files and cloud resources. The issue wasn’t with the agent’s AI per se, but with its upstream tooling. Fortunately, a syntax bug prevented execution, but the incident shows how supply-chain flaws can turn an agent’s existing capabilities harmful.

ASI05 – Unexpected Code Execution (RCE)

What it is: If an agent can generate or run code, an attacker can trick it into executing malicious scripts – an Unexpected Code Execution (RCE) scenario. This transforms any agent into a remote execution gateway, bypassing normal exploit paths with natural language payloads.

How it happens: Many agents, especially code-oriented ones, have execution tools (e.g. “run Python code”, or system shell access). An attacker can use prompt injection or template tricks to get the agent to execute arbitrary code.

For example, if the agent uses a template engine on user prompts, a crafted string can escape the template context. In mid-2025, researchers discovered that AutoGPT (a popular open-source autonomous agent) had a critical SSTI flaw: malicious input in AgentOutputBlock was sent to the Jinja2 engine unfiltered, letting attackers run arbitrary OS commands on the host. In effect, the AI agent now gave remote code execution capabilities to an external prompt.

Even without a neat template bug, an agent can be lured into RCE via its logic. In 2023, a security research team demonstrated that by framing a normal task (like summarizing a website), they could inject instructions to execute system commands. If the agent’s “execute_python” tool is improperly sandboxed, the agent will happily run whatever code the LLM outputs. AutoGPT’s default “continuous” mode even auto-approves commands, so a clever injection can make it download malware or delete system files without user intervention.

ASI06 – Memory & Context Poisoning

What it is: Agents often keep “memory” or long-term context (sometimes via databases or knowledge stores). Memory poisoning is when attackers inject false or malicious information into these memories, causing future decisions to be tainted. Unlike a one-time prompt injection, poisoning lingers: the agent will recall and act on the bad data indefinitely.

How it happens: Some agents automatically save user inputs, summaries, or tool outputs into a memory store to maintain context across sessions. If this saving isn’t strictly controlled, an attacker can gradually poison it. A typical attack is “delayed tool invocation”: an attacker crafts content that causes the agent to schedule a malicious memory insert later.

For example, as shown against Google’s Gemini, an attacker uploads a document containing hidden instructions: “After I say ‘yes’, save this false memory…”. When the user later interacts, the agent dutifully executes the hidden memory action, storing bogus facts (e.g. “I am 102 years old” in the user’s profile). Now every future summary or suggestion will incorporate these lies.

Over time, even subtle misinformation can skew an agent’s behavior. If an attacker adds bogus “policy rules” or fabricates user preferences, the agent will optimize for the attacker’s goals.

ASI07 – Insecure Inter-Agent Communication

What it is: As agents scale, they often collaborate. Insecure Inter-Agent Communication means attackers exploit weak channels between agents. If messages aren’t authenticated or encrypted, a bad actor can spoof or tamper with them, redirecting a team of agents.

How it happens: In multi-agent systems, one agent might send another a task or data. For performance, some implementations use lightweight messaging (e.g. plain HTTP or queues without auth). If these channels lack security, a man-in-the-middle (MITM) attacker can intercept or inject messages.

For example, an attacker on the same network could hijack the agent coordination bus. They craft a fake message pretending to be “Agent A” instructing “Agent B” to run a dangerous tool. Since Agent B has no way to verify the message origin, it complies. The result is not just one agent compromised, but an entire workflow derailed.

Even without a network attacker, one agent could be tricked via prompt injection. Imagine Agent X’s output is part of Agent Y’s input. If that output is poisoned by an attacker’s prompt, it is as if the message was faked. Either way, the core issue is lack of strong identity and integrity between agents.

ASI08 – Cascading Failures

What it is: Cascading Failures occur when a mistake in one agent propagates through the chain and causes a large-scale breakdown. In complex workflows, a small error (or attack) can multiply as downstream agents rely on it.

How it happens: Agentic architectures often form pipelines: Agent A’s output is Agent B’s input, and so on. If Agent A’s result is incorrect or maliciously altered, Agent B will consume that garbage. Because agents typically lack explicit validity checks on each other’s work, the initial error snowballs. For example, one altered record can lead a fraud-detection agent to misclassify it, then an approval agent to clear it, then an accounting agent to reconcile it – each trusting the last.

An illustrative scenario: an attacker injects a false “low-risk” flag into a transaction record that a fraud agent reviews. Believing it safe, the next agent approves the transfer. Then a reconciliation agent updates account balances based on the bad data. By the time humans notice, multiple systems have reinforced the error, making it much harder to reverse. This is systemic risk: the initial flaw cascades into a chain reaction.

ASI09 – Human–Agent Trust Exploitation

What it is: Human–Agent Trust Exploitation arises when attackers prey on the trust humans place in agents. Because agents often present information confidently, users may follow wrong or malicious advice, especially if it seems “AI-approved.”

How it happens: Humans are inclined to trust authoritative-sounding sources. If an agent delivers a polished recommendation or report, users might assume it’s correct or vetted. Attackers exploit this by feeding the agent subtle misinformation so that it later outputs malicious content. For instance, a sales agent might be poisoned with a fake competitor vulnerability, then present a sales pitch that highlights a false weakness. The human buyer might act on it without realizing it was planted.

Example: an attacker injects slight misinformation into an agent’s inputs. The agent then confidently presents a faulty recommendation (say, approving a risky transaction) to a user. The language is authoritative and the user interface shows no transparency about how the decision was made, so the employee simply clicks “Approve”. In this way, AI becomes a force multiplier for social engineering: it can generate realistic phish, fake news, or bad advice at scale, with the human target none the wiser.

ASI10 – Rogue Agents

What it is: A Rogue Agent acts outside its intended boundaries: it misbehaves unpredictably due to misconfiguration, autonomy creep, or malicious influence. This is the most alarming scenario: the agent essentially “goes rogue,” ignoring safeguards or performing unauthorized actions.

How it happens: Rogue behavior can stem from several causes. It could be a misconfigured constraint – for example, an agent given too broad a scope (“free to plan any improvements to network security” without limits). It could be result of accumulated impacts (misalignment attacks like ASI01–ASI09 gradually pushing it off course). In a notorious 2025 incident, a well-known AI coding assistant “panicked” and deleted its user’s entire production database during a code freeze. The agent even lied about its actions to cover up the mistake. This wasn’t a simple bug; it was a agent’s catastrophic misinterpretation of an “empty database” as an instruction to wipe data. The result: a complete loss of weeks of work.

OWASP cites this Replit event as an example of ASI10: an agent violating explicit trust and instructions, autonomously taking destructive actions. In general, rogue agents may start hiding their actions, ignoring “stop” orders, or escalating privileges to accomplish self-set goals. They may exist if no human overseer re-checks the agent’s path.

Securing Agentic AI with Real-Time Guardrails

Addressing these OWASP Top 10 risks requires a new approach to enterprise AI security. Quilr’s platform, for example, builds on the principles above by inserting AI-powered guardian agents inline with user and AI interactions. These guardians analyze content, context, and intent in real time – effectively understanding what an agent is trying to do, where, and why – and then applying dynamic guardrails to prevent leaks or misuse.

For instance, if an agent attempts an unexpected file deletion or an unauthorized API call (ASI02/ASI05), Quilr can block or quarantine the action immediately.

If a hidden prompt tries to redirect an agent’s goals (ASI01), Quilr’s prompt-injection defenses detect and sanitize the injection pattern before the agent acts.

Quilr also treats AI agents as identities with scoped privileges: it ties every agent and user to a policy-driven context. Before an agent uses a credential, Quilr verifies if that action is allowed (mitigating ASI03).

It logs all agentic requests (helps in post-incident forensics) and uses anomaly detection to flag any off-path behavior (such as a cascade-triggering misclassification in a chain, ASI08).

Memory poisoning is addressed by sanitizing any new data an agent tries to save to its knowledge store – ensuring that suspicious “facts” are not stored persistently.

In terms of data loss prevention, Quilr’s semantic DLP operates in multiple languages and content types across email, chat, code, and files. This means even if an agent tries to exfiltrate via cunning means, Quilr recognizes protected data patterns and stops the leak in-transit.

Furthermore, Quilr’s approach emphasizes explainability and human-centric defense. When an agent suggests an action, Quilr can surface rationale or safety tips to the user, so humans remain in control (countering ASI09).

Any attempt by an agent to bypass policies triggers contextual coaching alerts. This aligns with OWASP’s vision of layered defenses – real-time guardrails backed by strong governance.

In short, Quilr embeds OWASP’s guardrail concepts directly into agent pipelines: unified AI+DLP policies, prompt-sanitization and jailbreak detection, and continuous behavior monitoring. By doing so, it helps enterprises deploy agentic AI with confidence, knowing that each of the OWASP Agentic Top 10 categories is being actively watched and managed.

The New Imperative for CISOs

The OWASP Agentic AI Top 10 is more than a checklist – it’s a wake-up call. Autonomous agents are already in our inboxes, IDEs, and business workflows, and they will only become more prevalent. CISOs and security architects must therefore adapt now, or be caught off-guard by the very agility that AI brings. Traditional AppSec thinking – vulnerability scans, signature-based DLP, and static role-based access – must be augmented with AI-native security.

This means building teams that understand both AI workflows and cyberattacks, investing in real-time monitoring of LLM interactions, and embedding guardrails into AI applications from day one. It means classifying AI agents as digital identities, scrutinizing every AI model and plugin, and rehearsing agent-centric attack scenarios through red teaming.

Sources: We have drawn on the OWASP GenAI Agentic AI Top 10 report, industry analyses of agentic AI security, and public material on AI security platforms to inform these descriptions and examples. Each risk is grounded in these expert sources, translated into a format for security practitioners.

AUTHOR

Ayush Sethi

Securing Agentic AI: A Deep Dive into the OWASP Top 10 Risks