GlossaryProduct and performance

Streaming Response

Streaming is when the AI sends its answer incrementally as tokens generate, instead of waiting for the full reply.

Streaming improves perceived latency but changes layout, stop controls, and error handling mid-generation.

What it means

The client renders partial text as it arrives over HTTP or WebSocket until the model signals completion or stop.

Why designers should care

Design for partial states: jumping scroll, edit-while-streaming, stop button, and clear distinction between draft stream and final locked message.

Example

A copilot streams bullet suggestions; users can Stop, Edit last bullet, or Accept all. The cursor stays stable and code blocks render only after the fence closes.

Common mistakes

  • Auto-scroll that fights users reading earlier paragraphs.
  • Actions enabled on incomplete JSON or half-rendered markdown.
  • No indication when stream stalled vs still thinking.

Weekly AI UX notes

Patterns, prompts, and glossary updates for designers building AI products on Substack. No spam.

Subscribe on Substack