Streaming Response for AI Designers: Definition, Examples, and UX Tips

What it means

The client renders partial text as it arrives over HTTP or WebSocket until the model signals completion or stop.

Why designers should care

Design for partial states: jumping scroll, edit-while-streaming, stop button, and clear distinction between draft stream and final locked message.

Example

A copilot streams bullet suggestions; users can Stop, Edit last bullet, or Accept all. The cursor stays stable and code blocks render only after the fence closes.

Common mistakes

• Auto-scroll that fights users reading earlier paragraphs.
• Actions enabled on incomplete JSON or half-rendered markdown.
• No indication when stream stalled vs still thinking.

Related terms

Latency: Network, queue, retrieval, tool calls, and generation length all add wait time before users can read, edit, or act on output.
Token: Models process and bill text as tokens; a rough rule is ~¾ of a word per token in English, but code and symbols can consume more.
Inference: Each user request triggers inference: tokens in, tokens out, optionally interleaved with tool calls and retrieval.
Large Language Model (LLM): An LLM reads a sequence of tokens (words and symbols) and generates the next tokens, producing paragraphs, code, JSON, or tool requests from natural-language instructions.
Workflow: Defined stages (inputs → generate → review → export) often spanning people, systems, and multiple model calls with clear deliverables.

Streaming Response

What it means

Why designers should care

Example

Common mistakes

Related patterns

Streaming

Related glossary terms

Latency

Token

Inference

Large Language Model (LLM)

What it means

Why designers should care

Example

Common mistakes

Related terms

Related patterns

Streaming

Related glossary terms

Latency

Token

Inference

Large Language Model (LLM)

Weekly AI UX notes