Moderation
Moderation is the process of detecting and handling harmful, abusive, off-brand, or policy-violating content in AI inputs and outputs.
Products with user-generated prompts, public generations, or community features need moderation UX for reporters, authors, and moderators.
What it means
Automated classifiers and human reviewers flag content categories (hate, violence, sexual content, PII) and enforce actions (block, blur, warn, escalate).
Why designers should care
Moderation affects tone of refusals, appeal flows, age gates, and trauma-sensitive defaults, especially for consumer-facing creative AI.
Example
A public gallery blurs flagged images, shows why content was limited, and gives creators an appeal form with status tracking, not a permanent unexplained ban.
Common mistakes
- • Opaque bans with no category, timestamp, or appeal path.
- • Moderation messages that leak sensitive classifier details to attackers.
- • Different moderation standards between free and paid tiers without disclosure.