Google Gemini

Multimodal Input

Gemini supports multiple input types including text, images, and files in a unified composer interface, allowing users to attach and interact with various content types simultaneously.

Composer interface with text input and multimodal (+) icon

1 / 5

What's happening

Gemini provides a dedicated multimodal (+) icon in the composer for attaching text, images, and files, as well as importing code from GitHub or external URLs. It leverages the Google ecosystem by integrating directly with Google Workspace (Docs, Drive, Gmail) and promoting specialized tools like NotebookLM for source-based grounding and research. This is distinct from the adjacent tools icon, keeping the multimodal workflow focused.

Patterns

Multimodal (+) icon opens a menu to attach files, images, or import from GitHub/Workspace/NotebookLM

Context Chip Management

Attached items from local storage, Workspace, or NotebookLM shown as preview cards with remove option

Open playground

UX Insights

•Unified interface for all input types including ecosystem-specific sources
•Deep integration with Google Workspace leverages existing user data as context
•Cross-promotion of specialized AI products like NotebookLM for specific use cases like research
•Visual previews confirm what will be sent
•Supports multiple attachments simultaneously from varied sources

Design Decisions

The unified composer approach makes multimodal input feel natural and integrated, while the Workspace and NotebookLM integrations demonstrate how a large ecosystem can be leveraged to provide richer, grounded AI context without leaving the chat interface.

More case studies

Claude

Context Chip Management

Claude allows users to attach files, images, and other context sources to conversations, displayed as removable chips.

DeepSeek

Tool Switching in Composer

DeepSeek composer: Instant/Expert tier toggle, DeepThink and Search chips in the card, and attach with tier-specific limits.

Grok