GlossaryProduct and performance

Token Burn Rate

Token burn rate is how fast a product consumes tokens over time: per request, per user session, or per agent run.

High burn rates show up as cost overruns, rate limits, and sluggish agent loops long before finance sends a warning.

What it means

Tokens in plus tokens out, multiplied by call frequency and retries, equals burn rate. Long context, verbose prompts, and chained inference multiply it quickly.

Why designers should care

Design choices (auto-expand context, unlimited regenerate, verbose CoT visible by default) directly raise burn. Users need budgets, summaries, and caps that match value.

Example

An agent dashboard shows “~12k tokens this task” with a breakdown by step; users can switch to a compact mode that summarizes tool results before the next LLM call.

Common mistakes

  • Agent UIs that silently re-run full context on every minor retry.
  • No session or org-level visibility until users hit hard rate limits.
  • Optimizing latency without measuring token cost per successful task completion.

Weekly AI UX notes

Patterns, prompts, and glossary updates for designers building AI products on Substack. No spam.

Subscribe on Substack