Simon Willison's "lethal trifecta" identifies the three conditions that make AI agents vulnerable to prompt injection: access to private data, exposure to untrusted content, and the ability to communicate externally. When all three combine, a single injected instruction can exfiltrate secrets, manipulate outputs, or act on the agent's behalf.
In my previous post, I argued that text doesn't bind agent behavior — that governance through instructions, policies, and system prompts operates in a fundamentally different channel than the actions it's trying to constrain. That was a theoretical argument. Now there's empirical evidence.