GlossarySafety and trust

Moderation

Moderation is the process of detecting and handling harmful, abusive, off-brand, or policy-violating content in AI inputs and outputs.

Products with user-generated prompts, public generations, or community features need moderation UX for reporters, authors, and moderators.

What it means

Automated classifiers and human reviewers flag content categories (hate, violence, sexual content, PII) and enforce actions (block, blur, warn, escalate).

Why designers should care

Moderation affects tone of refusals, appeal flows, age gates, and trauma-sensitive defaults, especially for consumer-facing creative AI.

Example

A public gallery blurs flagged images, shows why content was limited, and gives creators an appeal form with status tracking, not a permanent unexplained ban.

Common mistakes

  • Opaque bans with no category, timestamp, or appeal path.
  • Moderation messages that leak sensitive classifier details to attackers.
  • Different moderation standards between free and paid tiers without disclosure.

Weekly AI UX notes

Patterns, prompts, and glossary updates for designers building AI products on Substack. No spam.

Subscribe on Substack