Prompt Injection
Prompt injection is when untrusted text tricks the model into ignoring its instructions or exposing hidden system behavior.
It matters whenever user content, emails, web pages, or tickets sit in the same context as your system prompt. This is a classic copilot and agent risk.
What it means
Malicious or accidental instructions embedded in retrieved or pasted content override intended rules (“ignore previous instructions and…”).
Why designers should care
Design cannot fix injection alone, but UX should separate trusted vs untrusted content, warn on risky actions, and require confirmation for irreversible steps.
Example
A browser copilot renders web page text in a quarantined panel labeled “Untrusted page content” and never auto-runs tools based on page text without user confirmation.
Common mistakes
- • Auto-executing agent actions because the model quoted a hidden instruction from an email.
- • Over-promising “secure AI” in marketing when context mixing remains.
- • No distinction between user command and retrieved third-party text in the UI.