Latency
Latency is the delay between a user action and a usable AI response: time to first token, time to complete answer, or time to finish an agent run.
Perceived speed often matters more than raw model speed; streaming, skeleton UI, and background jobs shape whether AI feels instant or broken.
What it means
Network, queue, retrieval, tool calls, and generation length all add wait time before users can read, edit, or act on output.
Why designers should care
Without latency-aware UX, users double-submit prompts, abandon flows, or distrust “thinking” states that hang with no progress signal.
Example
A report generator streams section headings first, shows elapsed time, and lets users cancel while heavy charts render in a background step with email notify.
Common mistakes
- • Blocking the entire screen with no partial output during long runs.
- • Promising “real-time” when retrieval plus tools routinely exceed two seconds.
- • Ignoring latency on mobile or low-bandwidth users in agent loops.