Streaming Response
Streaming is when the AI sends its answer incrementally as tokens generate, instead of waiting for the full reply.
Streaming improves perceived latency but changes layout, stop controls, and error handling mid-generation.
What it means
The client renders partial text as it arrives over HTTP or WebSocket until the model signals completion or stop.
Why designers should care
Design for partial states: jumping scroll, edit-while-streaming, stop button, and clear distinction between draft stream and final locked message.
Example
A copilot streams bullet suggestions; users can Stop, Edit last bullet, or Accept all. The cursor stays stable and code blocks render only after the fence closes.
Common mistakes
- • Auto-scroll that fights users reading earlier paragraphs.
- • Actions enabled on incomplete JSON or half-rendered markdown.
- • No indication when stream stalled vs still thinking.