Multimodal for AI Designers: Definition, Examples, and UX Tips

What it means

A multimodal model accepts or produces multiple media types, not just strings of text, often in one conversation or agent run.

Why designers should care

Multimodal features need input-specific affordances, latency expectations, accessibility fallbacks, and honest limits when a modality is read-only or unsupported.

Example

A design copilot accepts PNG mockups and returns annotated feedback plus optional text-to-code for one component, with clear labels on which outputs are vision-based vs inferred.

Common mistakes

• Marketing “multimodal” when the product only accepts text and URLs.
• No fallback when vision or audio fails on low-quality inputs.
• Same UI chrome for text chat and heavy media uploads with no progress or size guidance.

Related terms

Multimodal Input: The interface accepts multiple input modalities in a single turn: type a question, attach a photo, paste a link, or record audio, then send one combined prompt.
Large Language Model (LLM): An LLM reads a sequence of tokens (words and symbols) and generates the next tokens, producing paragraphs, code, JSON, or tool requests from natural-language instructions.
Token: Models process and bill text as tokens; a rough rule is ~¾ of a word per token in English, but code and symbols can consume more.
Inference: Each user request triggers inference: tokens in, tokens out, optionally interleaved with tool calls and retrieval.
Tool Use: The model emits structured calls (name + arguments) that your app executes, then feeds results back into the conversation for the next step.

Multimodal

What it means

Why designers should care

Example

Common mistakes

Related patterns

Multimodal Input

Related glossary terms

Multimodal Input

Large Language Model (LLM)

Token

Inference

What it means

Why designers should care

Example

Common mistakes

Related terms

Related patterns

Multimodal Input

Related glossary terms

Multimodal Input

Large Language Model (LLM)

Token

Inference

Weekly AI UX notes