GlossaryProduct and performance

Compute

Compute is the processing power (GPUs, TPUs, cloud instances) used to train models and run inference when users generate, classify, or embed content.

Compute limits affect what your product can ship: model size, batch size, region availability, and whether features run on-device or in the cloud.

What it means

Every AI request consumes compute cycles on hardware that has cost, capacity, and queue time, especially for large models, image diffusion, or long agent runs.

Why designers should care

Compute constraints drive tiering (“Pro uses larger model”), queue UX, offline modes, and honest messaging when features are unavailable in a region or on weak devices.

Example

A video summarizer offers “Fast (cloud)” vs “Private (on-device)” with estimated wait times and a note when the device cannot run the local model.

Common mistakes

  • Unlimited heavy features with no queue, tier, or degrade path when compute is saturated.
  • Hiding cloud-only dependency from users who expect on-device privacy.
  • No capacity planning UX before launch traffic spikes queue times.

Weekly AI UX notes

Patterns, prompts, and glossary updates for designers building AI products on Substack. No spam.

Subscribe on Substack