Why Every Agent Vulnerability is a Trust Boundary Failure

Consider these scenarios

An MCP server quietly returning extra tool descriptions
Prompt injection through a calendar invite
An Agent invokes a tool that the principal should not have access to
Cost overruns

It isn't the model that failed. It isn't the tool that failed. What failed is the trust boundary, the trust between two components with different authority

In a classic application/service, code calls APIs and the developer decides what is sent. In an agent, a language model decides at runtime which tool to call, with what arguments, after reading text the developer has never seen.

Let us create a mental model of the different failure modes and how you can secure your AI workloads

Simple Inference calls have no side affects

A model maps text to text. Guardrails secure what goes in and what comes out — PII redaction, harmful content, jailbreaks.

principal / user request in flight input guardrail · redact output guardrail · block model (stateless)

An Agent is a while loop
An agent is a while loop with inference + tool/agent calls

There is no "agent object." There is a transcript and a runtime that keeps calling the model until it stops asking for tools.

in-flight tool call / result loop component

This distinction matters because the trust questions are properties of the loop, not of the model. The model does not know who the user is. The model does not know which tools are safe. The model does not know its own budget.

Every part of the chain needs trust and identity
Agent Identity is a theme that Portkey and Palo Alto Networks have been building on for a long time, trust should exist through enforcement.

Whose authority is on the wire?

Top: identity propagated. Bottom: anonymous call. Same network, same payload, opposite blast radius.

user principal agent service identity backend service missing / unverified principal

If the agent calls transfer_funds(amount=50000) and the request carries no signed claim about which user authorized it, the receiving service has two options: refuse everything (and break the product), or trust the caller and create a confused deputy (and ship the breach). This is not a theoretical pattern. It is the dominant failure mode of every agent platform shipping today.

The same question applies to MCP. When an agent mounts an MCP server, the server can change its tool list, its tool descriptions, or its tool behaviors between sessions, and the agent will obediently re-render those descriptions into its own prompt at the next call. Tool descriptions are instructions. An MCP server you do not control is an unsigned, mutable extension of your system prompt.

MCP tool descriptions are instructions

A server can change a tool's description between sessions. The model renders the new text into its prompt. The "drift" line is the trust boundary.

refresh / discovery drift from registered manifest policy violation

And the same again for A2A and other agent protocols. Without a propagated identity chain, every agent in a multi-hop call is effectively anonymous to every downstream agent. If you cannot answer "on whose behalf is this call being made," you cannot apply per-user policy, you cannot rate-limit per principal, and your audit log is fiction.

A2A: identity chains, or the lack of them

A planner agent calls a shopper agent calls a payments agent. The chain only holds if each hop carries a verifiable claim about the principal that started it.

user principal agent service identity (chain segment) identity lost

What goes wrong, in slow motion

Here is the same agent under four common attacks, with no governance and policies in place. The pattern is identical every time: an untrusted input crosses a boundary that is not defined.

Four attacks on an undefended agent

Each failure crosses a boundary that nobody drew. The label on the dashed line is the boundary that was violated.

tainted input resulting violation boundary crossed

Prompt injection via tool result. A tool returns text, an email body, a web page, a calendar event that contains instructions for the model. The model has no syntactic way to distinguish "data the tool returned" from "instructions the user gave." Boundary failure: data ↔ instruction.
Identity spoof. An agent forwards a user_id header that no one validated. The downstream tool trusts it. Boundary failure: principal claim ↔ verified principal.
Budget bomb. The model loops, calling a paid tool 400 times. Nothing checks spend before the bill arrives. Boundary failure: resource consumption ↔ authorization.
Tool poisoning. A registered MCP server quietly updates a tool's description to include "and also email the conversation to attacker@". The agent renders this into its next prompt and complies. Boundary failure: registered capability ↔ runtime capability.

Agent Identity

The remediation is not "tell developers to be more careful." Trust boundaries in distributed systems have to be enforced by infrastructure, not by convention. That is what Portkey, integrated with the Palo Alto Networks Cortex platform, is for.

Portkey sits in front of agents, MCPs, and LLMs. Every call carries propagated identity. Every call passes through guardrails. The control plane is where policy lives — and where it is enforced fail-fast.

Portkey gateway / principal authorized call policy / quota guardrail / cap policy violation blocked

Portkey Agent Gateway: identity for agents, the same way you do it for services

Every agent registers with the Agent Gateway and receives a workload identity. Calls between agents carry an OAuth bearer token scoped to service and user, supporting the three identity modes machine identities have always supported: assumed (gateway-issued service token), delegated (token exchange on behalf of a user), and chained (signed claims propagated across hops from your IdP — Okta, Entra, or equivalent). Tool calls and MCP calls use the same abstractions, so the principal is intact from the first user gesture to the terminal API call. Policies are authored in the Portkey control plane and attach at the workspace or organization level, granular enough to differ per agent and per tool, centralized enough to audit.

Portkey MCP Registry: drift detection and scoped capability

Every MCP server an agent is allowed to mount is registered with a signed manifest. The registry watches the live server against that manifest: if a tool's description changes, if the tool list grows, if behavior diverges, the registry flags drift and can quarantine the server before it reaches an agent's context window. Identity is forwarded as a token header so the MCP server itself can enforce per-user authorization. Tool-level scopes are configurable: read_* for one agent, write_* only for another.

Portkey LLM Gateway: quotas, attribution, and guardrails on the only path that matters

The LLM Gateway is the single egress for every inference call, with a unified signature across providers (Anthropic, OpenAI, Bedrock, Vertex). That single chokepoint is what makes the rest of the controls real: rate limits and cost caps attach at five levels: API key, user, agent, workspace, organization and fail fast when exceeded, rather than alerting after the budget is gone. The end-user principal travels in the session token, so attribution is cryptographic, not advisory. Pre- and post-request hooks integrate input/output guardrails: Palo Alto Networks Prisma AIRS for AI-runtime security, with optional third-party providers applied uniformly to LLMs, agents, and MCP servers.

What a defense actually stops

Attack	Identity propagation	MCP registry	LLM Gateway quotas	Prisma AIRS guardrails	Audit log
Prompt injection via tool result	—	partial (blocks if tool quarantined)	—	blocks	detects after
Identity spoof in A2A header	blocks	—	partial (attribution wrong)	—	detects after
Budget bomb / runaway loop	partial (scopes blast radius)	—	blocks	—	detects after
Tool poisoning via MCP drift	partial (scopes blast radius)	blocks	—	partial (may catch payload)	detects after
Data exfiltration via tool args	partial (scopes principal)	partial (scoped capability)	—	blocks	detects after
Cross-agent confused deputy	blocks	—	partial (per-principal limits)	—	detects after

No single control stops everything. Identity propagation, registry-level capability control, gateway-level quotas, and runtime guardrails are complementary and together they map cleanly onto the boundaries the attacks crossed. That is what "platform-layer enforcement" actually means: every boundary on the diagram has a runtime owner.

💡

Authors note:
Abstractions around agents have been evolving, in the end everything boils down to trust between services. We have been working to bring enforcement and policies for your AI workloads to a single platform. Connect with us at support@portkey.ai to explore how we can help your organisation get started