ChatGPT's composer teardown

Updated June 12, 2026

The composer starts calm on purpose. Capability shows up in layers: open the + menu, pick a mode, see a chip, maybe grab a starter. Familiarity first, power when you ask for it.

That works when the mode teaches you something before you send. It falls apart when the only feedback is a chip. Think is where we would push hardest.

Calm default

Empty bar, no chips. Starter pills already hint at three jobs.

What works

No tool vocabulary on first load. You can just type.
Starter pills teach three common jobs without opening the + menu.

What we would push on

Starter pills and the + menu are two front doors. Some users may never open the + menu.
Attach and advanced modes need another way in if pills stay the primary path.

Takeaway

Approachable surface, hidden depth. Fine if you accept the discoverability tradeoff.

Tools & modes

Photos, image, Think, research, and web search on the first screen. More is for overflow.

What works

One + menu groups attach, creation, and behavioral modes. You pick intent, not infrastructure.
Open the + menu and the first screen lists attach, Create image, Thinking, Deep research, and Web search. More and Projects hold overflow.

What we would push on

Does web search need to be a mode? Most people already ask in plain language ("What is the stock price for SpaceX today?"). A mode earns its place when the output contract changes: citations, live results, slower run, a chip that confirms web use.
"Look something up" on the starter pill and Web search in the + menu are the same job, two doors.
Think and Deep research sit side by side with one line each. Different jobs, same visual weight, and the composer only teaches the difference after you pick Deep research.
Every row looks the same weight. Nothing signals that Deep research takes minutes while Think is a slower single reply.

Takeaway

The + menu organizes well. It does not help you choose between modes that look equal but behave very differently.

Attachments

Thumbnail preview with dismiss. Attach is inside the + menu, not on the bar.

What works

Inline thumbnail with dismiss. You see what ships and can fix mistakes without resetting the thread.
Attach stacks with mode chips without clearing them. Flexible for power users.

What we would push on

Attach inside the + menu fights the paperclip habit from email and Slack. Sending a file is not an edge case.
File + mode chip + custom text makes the composer dense fast.

Takeaway

Steal the preview. Question hiding attach behind the + menu unless you have another strong discovery path.

Deep research

Chip, pre-filled prompt, report starters. This is what a pre-send contract looks like.

What works

Chip, placeholder ("Get a detailed report"), credible starters below. You know the job before you type.
Pre-fill teaches prompting without trapping you. Edit, remove chip, still fine.
Apps and Sites filters scope the research run inside the composer.

Takeaway

This is the bar for behavioral modes: name the outcome, suggest a first prompt, keep scope in the bar.

Think mode

A chip appears. Nothing else changes. No pre-fill, no starters, no hint about what Think actually does.

What works

Removable chip shows scope before send.
Placeholder stays editable. No locked template.

What we would push on

Thinnest mode in the set. Research and Search rewrite placeholder and starters. Think adds a chip. Send looks like default chat.
Deep research pre-fills "Get a detailed report," adds Apps and Sites, drops report starters. Think stays on "Ask anything." Same output shape, supposedly smarter reasoning. The product gap is real. The UI gap is invisible.
Can you explain the tradeoff before send? Think fails. Deep research passes.

Takeaway

Steal the chip. Do not steal label-only as differentiation. Behavioral modes need a pre-send contract.

Web search

Search chip, web placeholder, trending starters.

What works

Chip, web-specific placeholder, trending starters. Live web is in scope before send.
Same rhythm as Deep research. Feels learnable once selected.

What we would push on

Did you need to pick a mode at all? Default chat handles "search for X" in plain language for many queries.
Research and Search overlap for anyone who wants fresh info. The UI does not help you choose.
"Look something up" on the empty state points at the same job without explaining how it differs from Web search here.

Takeaway

Execution is solid after selection. The open question is whether search should be a mode or just something you say.

Image mode

Chip in the bar. Edit, style, and Explore ideas open up below.

What works

Scope in the bar, exploration below. You commit to making an image without inventing a prompt from zero.
"Explore ideas" helps people who want inspiration, not a template to paste.
Starters swap when the tool changes. Image prompts do not bleed into Search.

Takeaway

One of the clearer modes. Chip plus contextual starters teach the job before send.

Image controls

Aspect ratio sits in the bar, only while Image is active.

What works

Aspect ratio inline when Image is active. Message-scoped, not global settings.
Progressive disclosure. The bar earns density only after you pick a generative mode.

What we would push on

Chip, ratio dropdown, and attach preview can stack. Hide controls when the chip goes away.

Takeaway

Good pattern for send-time output controls. Watch crowding at the high end.

Voice in the composer

Dictation replaces the text field. Waveform, cancel, confirm.

What works

Mic on the right, + menu on the left. Separate jobs, separate active states.
Dictation replaces typing instead of stacking on top.
Waveform plus confirm/cancel gives a clear listening contract.

What we would push on

Two voice paths exist: inline dictation and full voice session. The bar mic does not tell you the richer mode exists.

Takeaway

Dictation in the composer is well scoped. The product-level voice map is not.

Voice session

Full-screen voice. Conversation mode, not text replacement.

What works

Full viewport fits back-and-forth talk. Different job from speak once, edit, send.
"Start talking" and a visible End button. Clear exit.

What we would push on

Dictation feeds the composer. Session bypasses it. Naming and placement should make that split impossible to miss.

Takeaway

Right surface for conversation mode. Wrong that users have to discover it separately from the bar mic.

How it fits together

The pattern

Calm default, + menu for capability, chip for scope, starters when the mode needs them, send.
Attach and modes share one + menu. Voice and send stay on the right.

Where it breaks

Research, Search, and Image teach before send. Think does not.
That inconsistency makes the chip system harder to trust.
Dual discovery (starter pills and the + menu), hidden attach, two voice paths with no map.

Takeaway

One of the better composer architectures for hiding power without cluttering the bar. Mode quality is uneven. Steal the structure, not every mode treatment.