ChatGPT logo

ChatGPT's composer teardown

Updated June 12, 2026

The composer starts calm on purpose. Capability shows up in layers: open the + menu, pick a mode, see a chip, maybe grab a starter. Familiarity first, power when you ask for it.

That works when the mode teaches you something before you send. It falls apart when the only feedback is a chip. Think is where we would push hardest.

Calm default

Empty bar, no chips. Starter pills already hint at three jobs.
Empty bar, no chips. Starter pills already hint at three jobs.

What works

  • No tool vocabulary on first load. You can just type.
  • Starter pills teach three common jobs without opening the + menu.

What we would push on

  • Starter pills and the + menu are two front doors. Some users may never open the + menu.
  • Attach and advanced modes need another way in if pills stay the primary path.

Takeaway

Approachable surface, hidden depth. Fine if you accept the discoverability tradeoff.

Tools & modes

Photos, image, Think, research, and web search on the first screen. More is for overflow.
Photos, image, Think, research, and web search on the first screen. More is for overflow.

What works

  • One + menu groups attach, creation, and behavioral modes. You pick intent, not infrastructure.
  • Open the + menu and the first screen lists attach, Create image, Thinking, Deep research, and Web search. More and Projects hold overflow.

What we would push on

  • Does web search need to be a mode? Most people already ask in plain language ("What is the stock price for SpaceX today?"). A mode earns its place when the output contract changes: citations, live results, slower run, a chip that confirms web use.
  • "Look something up" on the starter pill and Web search in the + menu are the same job, two doors.
  • Think and Deep research sit side by side with one line each. Different jobs, same visual weight, and the composer only teaches the difference after you pick Deep research.
  • Every row looks the same weight. Nothing signals that Deep research takes minutes while Think is a slower single reply.

Takeaway

The + menu organizes well. It does not help you choose between modes that look equal but behave very differently.

Attachments

Thumbnail preview with dismiss. Attach is inside the + menu, not on the bar.
Thumbnail preview with dismiss. Attach is inside the + menu, not on the bar.

What works

  • Inline thumbnail with dismiss. You see what ships and can fix mistakes without resetting the thread.
  • Attach stacks with mode chips without clearing them. Flexible for power users.

What we would push on

  • Attach inside the + menu fights the paperclip habit from email and Slack. Sending a file is not an edge case.
  • File + mode chip + custom text makes the composer dense fast.

Takeaway

Steal the preview. Question hiding attach behind the + menu unless you have another strong discovery path.

Deep research

Chip, pre-filled prompt, report starters. This is what a pre-send contract looks like.
Chip, pre-filled prompt, report starters. This is what a pre-send contract looks like.

What works

  • Chip, placeholder ("Get a detailed report"), credible starters below. You know the job before you type.
  • Pre-fill teaches prompting without trapping you. Edit, remove chip, still fine.
  • Apps and Sites filters scope the research run inside the composer.

Takeaway

This is the bar for behavioral modes: name the outcome, suggest a first prompt, keep scope in the bar.

Think mode

A chip appears. Nothing else changes. No pre-fill, no starters, no hint about what Think actually does.
A chip appears. Nothing else changes. No pre-fill, no starters, no hint about what Think actually does.

What works

  • Removable chip shows scope before send.
  • Placeholder stays editable. No locked template.

What we would push on

  • Thinnest mode in the set. Research and Search rewrite placeholder and starters. Think adds a chip. Send looks like default chat.
  • Deep research pre-fills "Get a detailed report," adds Apps and Sites, drops report starters. Think stays on "Ask anything." Same output shape, supposedly smarter reasoning. The product gap is real. The UI gap is invisible.
  • Can you explain the tradeoff before send? Think fails. Deep research passes.

Takeaway

Steal the chip. Do not steal label-only as differentiation. Behavioral modes need a pre-send contract.

Image mode

Chip in the bar. Edit, style, and Explore ideas open up below.
Chip in the bar. Edit, style, and Explore ideas open up below.

What works

  • Scope in the bar, exploration below. You commit to making an image without inventing a prompt from zero.
  • "Explore ideas" helps people who want inspiration, not a template to paste.
  • Starters swap when the tool changes. Image prompts do not bleed into Search.

Takeaway

One of the clearer modes. Chip plus contextual starters teach the job before send.

Image controls

Aspect ratio sits in the bar, only while Image is active.
Aspect ratio sits in the bar, only while Image is active.

What works

  • Aspect ratio inline when Image is active. Message-scoped, not global settings.
  • Progressive disclosure. The bar earns density only after you pick a generative mode.

What we would push on

  • Chip, ratio dropdown, and attach preview can stack. Hide controls when the chip goes away.

Takeaway

Good pattern for send-time output controls. Watch crowding at the high end.

Voice in the composer

Dictation replaces the text field. Waveform, cancel, confirm.
Dictation replaces the text field. Waveform, cancel, confirm.

What works

  • Mic on the right, + menu on the left. Separate jobs, separate active states.
  • Dictation replaces typing instead of stacking on top.
  • Waveform plus confirm/cancel gives a clear listening contract.

What we would push on

  • Two voice paths exist: inline dictation and full voice session. The bar mic does not tell you the richer mode exists.

Takeaway

Dictation in the composer is well scoped. The product-level voice map is not.

Voice session

Full-screen voice. Conversation mode, not text replacement.
Full-screen voice. Conversation mode, not text replacement.

What works

  • Full viewport fits back-and-forth talk. Different job from speak once, edit, send.
  • "Start talking" and a visible End button. Clear exit.

What we would push on

  • Dictation feeds the composer. Session bypasses it. Naming and placement should make that split impossible to miss.

Takeaway

Right surface for conversation mode. Wrong that users have to discover it separately from the bar mic.

How it fits together

The pattern

  • Calm default, + menu for capability, chip for scope, starters when the mode needs them, send.
  • Attach and modes share one + menu. Voice and send stay on the right.

Where it breaks

  • Research, Search, and Image teach before send. Think does not.
  • That inconsistency makes the chip system harder to trust.
  • Dual discovery (starter pills and the + menu), hidden attach, two voice paths with no map.

Takeaway

One of the better composer architectures for hiding power without cluttering the bar. Mode quality is uneven. Steal the structure, not every mode treatment.

Steal this

  • One + menu for attach, tools, and modes
  • Removable chips that show scope before send
  • Pre-send contract: placeholder and starters that match the mode
  • Outcome labels ("Create image," "Deep research") instead of model names
  • Inline output controls only while the relevant chip is on

Skip this

  • Modes that are only a chip (Think)
  • A dedicated search mode when plain prompts already route to the web
  • Starter pills and the + menu as parallel discovery with no bridge between them
  • Two voice entry points without explaining dictation vs conversation
  • Every menu row looking equally important when some modes are slow or costly

Original gallery pages: Tool Switching in Composer · Dictation Mode