GlossarySafety and trust

Prompt Injection

Prompt injection is when untrusted text tricks the model into ignoring its instructions or exposing hidden system behavior.

It matters whenever user content, emails, web pages, or tickets sit in the same context as your system prompt. This is a classic copilot and agent risk.

What it means

Malicious or accidental instructions embedded in retrieved or pasted content override intended rules (“ignore previous instructions and…”).

Why designers should care

Design cannot fix injection alone, but UX should separate trusted vs untrusted content, warn on risky actions, and require confirmation for irreversible steps.

Example

A browser copilot renders web page text in a quarantined panel labeled “Untrusted page content” and never auto-runs tools based on page text without user confirmation.

Common mistakes

  • Auto-executing agent actions because the model quoted a hidden instruction from an email.
  • Over-promising “secure AI” in marketing when context mixing remains.
  • No distinction between user command and retrieved third-party text in the UI.

Explore more

Weekly AI UX notes

Patterns, prompts, and glossary updates for designers building AI products on Substack. No spam.

Subscribe on Substack