GlossaryRetrieval and model behavior

Context Window

The context window is the maximum amount of text (in tokens) a model can consider in one request: your prompt, system instructions, retrieved docs, and chat history combined.

When products “forget” earlier messages or drop attachments, they hit this limit, not arbitrary malice.

What it means

Everything sent to the model must fit inside the context window; overflow is truncated, summarized, or rejected depending on product policy.

Why designers should care

Design explicit context management: pinning key facts, summarizing threads, showing what is included, and retrieval instead of endless paste.

Example

A long research session shows a “Session summary” chip users can edit; new questions attach the summary plus only the three most relevant source chunks, not the full 200-page upload.

Common mistakes

  • Infinite scroll chat with no strategy for older turns.
  • Silent truncation without telling users what dropped out.
  • Expecting the model to recall facts never included in context.

Weekly AI UX notes

Patterns, prompts, and glossary updates for designers building AI products on Substack. No spam.

Subscribe on Substack