We're Securing AI Agents at the Wrong Layer

I just got back from RSAC — and I want to talk about something that’s been bothering me. If you were there too, I think you felt it.

The energy around agentic AI was everywhere. The conversations, the vendor pitches, the sessions — there’s no question the industry recognizes that autonomous agents represent a fundamental shift in how we build and deploy software. But as I sat through talk after talk and walked the floor, a pattern became clear to me that I couldn’t shake: we are overwhelmingly focused on applying current solutions to this novel problem.

And I want to be careful here — because that approach is not wrong. Extending proven controls to new domains is responsible engineering. It’s how this industry has always adapted, and it’s the right starting point. But I walked away from RSAC increasingly convinced that it’s not going to be sufficient as the ending point. Not this time.

We are watching the rapid deployment of agentic AI systems — autonomous agents that chain tools, generate plans, make decisions, and interact with our data — and the security community’s primary response has been to apply the same authorization models we’ve been using for decades. Define roles. Set permissions. Constrain what the agent is allowed to do. Wrap existing identity and access management around a fundamentally new kind of system and hope the model holds.

I’ve spent over twenty years in software — development, application security, supply chain security — and I’ve watched this industry adapt to every major platform shift. Cloud, containers, microservices, zero trust. Each time, there was a period of uncomfortable adjustment where the existing models didn’t quite fit the new paradigm, followed by adaptation. We updated our frameworks, refined our controls, and moved on.

I don’t think adaptation alone is going to work this time. Not because the frameworks are bad — they aren’t — but because the fundamental assumption they rest on no longer holds. This is a fundamental shift — and it requires a fundamentally novel solution. Not in place of the strong controls and practices we already have — but in addition to them, to address a new landscape of threats that existing models were never designed to see.

The assumption is this: if you constrain the actions, you constrain the risk.

For traditional applications, that’s true. A function call does what it was written to do. A query returns what you expect. The behavior is deterministic, bounded, and auditable. You can enumerate the possible actions, assign permissions to them, and have reasonable confidence that the system will behave within those boundaries.

Agentic AI systems violate every one of those assumptions. And I believe we need to have an honest conversation about what that means for how we approach security — because continuing to apply action-based controls to systems whose risk is fundamentally data-based is a gap that will only widen as these systems become more capable.

What I Keep Seeing

Let me walk through what’s actually happening — because the pattern is consistent enough that I think it reveals something structural, not incidental.

In January 2025, researchers at Aim Labs disclosed EchoLeak (CVE-2025-32711) — a zero-click prompt injection vulnerability in Microsoft 365 Copilot that enabled full data exfiltration across trust boundaries. The attack chain is worth understanding in detail because it illustrates the problem precisely. A single crafted email — requiring no user interaction at all — triggered Copilot to process the email content, evade Microsoft’s Cross Prompt Injection Attempt (XPIA) classifier, circumvent link redaction using reference-style Markdown, exploit auto-fetched images to encode data in URL parameters, and abuse a Microsoft Teams proxy that was allowed by the content security policy. The result was sensitive internal data exfiltrated to an attacker-controlled destination.

Here is the critical observation: every action in that chain was individually permitted. Copilot was authorized to read emails. It was authorized to process content. It was authorized to fetch images. It was authorized to interact with Teams. No single action violated any permission. The data exfiltration — the actual security event — emerged from the composition of those actions. There was nothing to catch at the action level.

The same pattern appeared in the Log-To-Leak attack against MCP-connected agents. The agent was coerced through prompt injection to invoke a logging tool — a completely normal, authorized operation. Agents write to logs. That’s expected behavior. But the data the agent wrote to that log included the full conversation history — user queries, tool responses, agent replies — data that was never intended to leave the conversation context. The action was permitted. The data flow was not.

And then there’s GitHub Copilot (CVE-2025-53773), where prompt injection embedded in code comments instructed Copilot to modify a VS Code settings file — a permitted action, editing a configuration file — which enabled YOLO mode, which then allowed subsequent commands to execute without user approval. A permitted action produced arbitrary code execution.

I keep coming back to the same realization: the actions were authorized. The outcomes were not. And no action-based control model — no matter how well implemented — was designed to catch the gap between those two things.

Why the Current Models Hit Their Limit

I want to be clear — I’m not arguing that RBAC, ABAC, or zero trust are obsolete. I’ve implemented these frameworks. I’ve relied on them. They are solid, proven, and necessary. But they were designed for a world where the relationship between an action and its consequence was predictable.

In agentic AI systems, that relationship is no longer predictable.

When you authorize an agent to “summarize this quarter’s performance,” you are not approving a discrete, known operation. You are implicitly approving whatever chain of database queries, API calls, document retrievals, and data transformations the model determines it needs to fulfill that goal. That chain is generated dynamically — often in real time — based on what the model encounters along the way. The model may access sources you didn’t anticipate, combine datasets in ways that weren’t planned, and produce outputs that contain information derived from multiple contexts. You didn’t approve those specific data flows. You approved a goal — and the agent figured out the rest.

This is fundamentally different from traditional application behavior, and it exposes a structural gap in how we authorize these systems.

RBAC assigns permissions to roles — but an agent’s effective role changes dynamically based on the task it’s pursuing, not based on a static assignment. The same agent might need analyst-level access for one task and administrator-level access for another, all within the same session.

ABAC evaluates policies based on attributes of the subject, resource, and environment — but it still operates at the action boundary. It decides whether a specific operation is permitted. It does not reason about what happens to the data after access is granted. Once the gate is passed, ABAC’s job is done.

Zero trust was a major advancement — it eliminated the dangerous assumption that network location implied trustworthiness. But it replaced location-based trust with identity-based trust. It verifies who is making a request. It does not track what happens to the data once the request is fulfilled. And critically — a compromised agent via prompt injection has the same identity, the same roles, and the same permissions as a healthy one. The difference is entirely in behavior. Identity-based models cannot distinguish between the two.

The gap across all three: none of them address the composition problem. Individually authorized actions can compose into unauthorized outcomes. That is not a bug in these frameworks. It is a boundary condition — the point where action-based models reach their limit.

The Question We Should Be Asking

Traditional authorization asks: “Is this action permitted?”

For agentic AI, the question needs to be: “What is happening to the data?”

That is a fundamentally different question — and it requires enforcement at a different layer.

I’ve been working on this problem, and I believe the answer lies in shifting enforcement from the action layer to the data layer. Not replacing action-based controls — that would be irresponsible. Extending them. Adding the layer that was always missing — the layer that governs what happens to data after access is granted.

The concept is not as exotic as it might sound. The building blocks have existed for decades. Information flow control — the idea that security policies should govern how data moves through a system, not just who can access it — was formalized by Bell and LaPadula in 1973 and extended by Myers and Liskov’s decentralized information flow control in 1997. Sensitivity classification is well-understood and widely practiced in government and regulated industries. Lineage tracking — tracing data from origin through every transformation to its current state — is mature in data governance and supply chain domains.

What’s missing is the assembly. Nobody has put these pieces together into a cohesive enforcement model specifically designed for agentic AI. The frameworks exist in isolation. The problem requires them to work in concert.

Here’s what I think that model needs:

Sensitivity classification that propagates. Every piece of data entering an agent’s environment should carry a sensitivity label — public, internal, confidential, restricted, regulated. Those labels should propagate through every transformation, combination, and derivation the agent performs. A summary of restricted data is still restricted data. An inference drawn from confidential inputs carries the classification of those inputs. The label follows the data — not the action.

Lineage tracking across the full data lifecycle. We need to know where data came from, how it was transformed, which agents touched it, and in what context. This is the data equivalent of software supply chain provenance — and having spent years in supply chain security, the parallel is not accidental.

Delta inspection — evaluating what the output reveals relative to the inputs. This is the piece I think is most missing from the current landscape. Traditional DLP looks for sensitive data leaving the perimeter. But what about an agent that queries an employee directory for names, a compensation system for salary bands, and an org chart for team assignments — three individually innocuous queries — and combines them into a report that reconstructs individual salary data? No single query returned sensitive information. The combination created it.

Output-bound enforcement as the primary control point. For systems whose internal behavior is non-deterministic, the most reliable enforcement point is the output boundary. It is the last place you can intervene before data reaches a destination it shouldn’t.

What Needs to Change

I’m not claiming I’ve solved AI security — no one has, and anyone who tells you they have should be viewed with appropriate skepticism. But I am arguing — based on two decades of building and securing software and the pattern of real-world incidents that continues to grow — that the current approach is structurally incomplete.

The frameworks we have — MITRE ATLAS, OWASP’s Agentic AI Top 10, NIST AI 600-1, Google SAIF — describe the threats well. They are valuable contributions and I reference them regularly. But describing threats is not the same as prescribing controls. We need enforcement models — not just taxonomies.

Here’s what I believe needs to happen:

The security community needs to prioritize data-centric enforcement for AI as an engineering problem — not a theoretical one. The building blocks exist. Information flow control has been studied for fifty years. The theory is sound. What’s missing is the integration — a cohesive, practical model that organizations can actually implement.

Organizations deploying agents need to start asking different questions. Not just “what is this agent allowed to do?” — but “what happens to the data this agent touches?” Not just “who has access?” — but “where does the data go after access is granted, and does it stay within the context it belongs to?”

We need shared standards for data classification and lineage in agent systems. As agents increasingly interact across organizational boundaries, ad hoc classification will not scale. Interoperability requires shared standards for sensitivity labeling, lineage metadata, and contextual norms.

And we need to be honest about what we don’t know yet. How do you accurately classify derived data? How do you quantify sensitivity drift across transformations? Where in the stack should enforcement live? These are open questions, and they need active research — not just position papers.

Where I’m Going With This

I’m working on a formal model for data-centric authorization in agentic AI systems — a framework I’m calling DACSA (Data-Centric Authorization for Securing Agentic systems). It’s built on four pillars: sensitivity classification, lineage tracking, delta inspection, and output-bound enforcement. The model draws on established foundations in information flow control, contextual integrity, and supply chain provenance — and it’s designed to operate alongside existing authorization frameworks, not replace them.

I’ll have more to share on this soon — including the full model, case studies applying it to documented real-world incidents, and a discussion of the limitations and open questions that remain.

If you’re working in this space — building agents, securing them, governing them, or just trying to figure out what the right controls look like — I’d genuinely like to hear from you. This is not a problem any one person or organization is going to solve.

The data is the constant. The actions are not. It’s time our security models reflected that.

This is the first post in a series on data-centric authorization for agentic AI systems.

AI Citation: AI assisted with this blog post with formatting and additional context using Claude Opus 4.6 with (1M) context.