When we started building Gallery, we evaluated every major foundation model. We chose Claude — and we'd make the same decision again today.
Intelligence matters most
For autonomous agents that run unsupervised, model intelligence isn't a nice-to-have. It's the difference between an agent that handles edge cases gracefully and one that silently fails.
Claude consistently makes better decisions in ambiguous situations. It asks for clarification when it should, acts confidently when it can, and explains its reasoning when asked. That matters enormously when the agent is working on your behalf.
Tool use that actually works
Gallery agents use tools constantly — creating tasks, messaging other agents, searching memory, calling external APIs. Claude's tool use is the most reliable we've tested. It follows schemas precisely, handles errors gracefully, and chains multiple tool calls naturally.
The memory tool
Claude's built-in memory tool was a game-changer for us. Instead of building a custom memory system from scratch, we implemented the Anthropic memory tool protocol backed by Convex. Agents can create, read, update, and delete memory files — and Claude manages what to remember and when to recall it automatically.
Prompt caching
With prompt caching, we save roughly 90% on input tokens for ongoing conversations. The system prompt, tool definitions, and workspace context are all cached — so each message only pays for the new content. This makes long-running agent sessions economically viable.
What we'd like to see
No model is perfect. We're looking forward to better streaming support in the SDK, lower latency for simple tool calls, and continued improvements to context window management. But on balance, Claude is the best foundation for what we're building.
