Your shortcut to every AI CLI. Spawn Claude, Codex, Gemini, OpenCode side‑by‑side. Benchmark them. Ship the winner. Real CLIs. Real tools. Your judgment.
New Windows build incoming — Join waitlist →



Warp bets on one cloud agent. iTerm2 adds a chat sidebar. Ghostty stays pure. None of them let you spawn Claude, Codex, Gemini and OpenCode side-by-side, watch them coordinate in a shared chat, score them against each other, and ship the winner. AnvilTerm does. It treats multi-agent as a first-class primitive — live tiles, MCP control plane, SwarmRoom, Arena, Forge marketplace — not a feature on the roadmap.
Watching one AI work is productivity.
Watching four of them compete for your prompt is the future.
One prompt, N agents. Each is its own live PTY tile running a real CLI — not a wrapper, not an API reskin. Auto-grid re-flows as agents finish. They post to a shared SwarmRoom where the lead delegates and specialists report back.
Lead agent delegates subtasks. Specialists pick them up, report back in the same room. You watch the transcript live in a chat tile. Works with any agent that speaks MCP — which is all of them.
Your agents are literal claude, codex, gemini binaries. You see exactly what the TUI sees.
swarm_route(["multimodal"]) → Gemini. ["reasoning"] → Claude. ["refactor"] → Codex.
One agent = one tab. Four = 2×2. Eight = 3×3. Each tile stops, closes, exports on its own.
Spawn a single Claude lead. It reads the prompt, calls swarm_vendors, picks specialists, delegates, merges the result. You just watch and approve. No team-building UI required.
Every tile streams its token counter scraped from the TUI. The toolbar aggregates daily + weekly spend across Claude + Codex + OpenCode. Forecast tells you when you'll hit the cap.
Pick contestants. Paste a prompt. Hit start. Each tile streams live output, renders deliverables as iframes when the agent calls arena_push_artifact. Judge on quality, speed, correctness. Export the fight as markdown + JSONL for reproducibility — or as a 9:16 battle video for your timeline.
Every battle is stored at ~/.anvil/arena/<id>.jsonl. Bring it back six months later, re-run on new model versions, watch the winner flip. Reproducible benchmarking — for the laptop era.
Arena tiles stream live output side-by-side. The full app view shows Forge marketplace, SwarmRoom chat, and the menu-bar usage tray all alive at once.
AnvilTerm ships with a Model Context Protocol server. One command registers it across Claude Code, Codex, Gemini, OpenCode. Every agent gains — for free — a browser, a PTY, inter-agent chat, artifact rendering, usage tracking, screenshot.
# wire AnvilTerm into every installed agent at once npx anvilterm-doctor --install # now from any agent: terminal_create() · terminal_write() · terminal_screen() tui_type() · tui_interrupt() · tui_choose() swarm_spawn() · swarm_route() · swarm_vendors() swarm_room_post() · swarm_room_listen() · swarm_room_thread() arena_push_artifact() · arena_current()
A tab created via MCP appears as a live tile in the AnvilTerm window. You watch — and intervene if needed. Standalone fallback via node-pty if the UI isn't running.
Every MCP-speaking client. Stdio transport. Register once, the toolset travels with you.
Curated catalog, star-ranked, kind-filtered. Pick a server, hit install, choose Claude / Codex / Gemini / OpenCode — or all of them. Forge writes directly to each agent's config with a _forge:true tag so it can manage updates and removal cleanly.
Skills get git-cloned to ~/.anvil/skills/ and symlinked into each agent's skills dir. MCP servers get registered in each vendor's config. Installed view shows per-agent status chips so you always know what's actually wired.
Cmd+K opens an Ollama-backed assistant with native function calling. Gemma 4, Qwen 2.5/3, Llama 3.1, Mistral Nemo. Prompts never leave the machine. Ideal for regulated work, sensitive repos, a plane, a café with no wifi.
Code, prompts, context — none of it leaves your laptop.
Models that speak tools return structured calls that render as one-click run buttons in the chat.
The assistant reads the visible terminal screen so suggestions are grounded in what's actually running.
Claude's built-in subagents are great when one model is enough. A real swarm — Claude + Codex + Gemini + OpenCode running in parallel as live CLIs — wins on parallelism, model diversity and visibility. Here's the honest breakdown on 40 mixed refactor / research / UI tasks.
| Dimension | AnvilTerm Swarm | Claude Subagents |
|---|---|---|
| Parallelism | N real PTYs in parallel | Sequential within parent turn |
| Context window per worker | Full · 1M tokens each · no compaction | Shared parent, compacted on dispatch |
| Model diversity | Claude · Codex · Gemini · OpenCode · Copilot · Ollama | Claude family only |
| Live output | Dedicated live tile per agent | Opaque spinner until result |
| Human-in-the-loop | Type into any tile, paste refs, interrupt | None once dispatched |
| Artifact rendering | Iframes · SVG · markdown · live | Text summary returned to parent |
| Failure isolation | One agent fails, N-1 keep working | Subagent failure blocks parent turn |
| Token cost routing | Per-vendor metered, route cheap tasks to Ollama | All charged against parent's Claude quota |
| Determinism | Fresh PTY state per agent | Inherits parent compaction |
| Tool access | Each agent carries its own MCP toolkit | Parent's MCP toolkit only |
| Reproducibility | Session JSONL + Arena replay | No persistent transcript |
| Best for | Cross-vendor compare, parallel research, long refactors | Tightly-coupled chains in one model family |
Note: wall-clock × is normalized to swarm=1.0 across 40 mixed tasks. Subagents win when the task is inherently sequential (each step depends on the prior result) — swarm wins when the work fan-outs. Use both.
Every other terminal is excellent at its thing. The matrix below is the proof — not marketing. Hover any row to see only AnvilTerm light up.
| Capability | AnvilTerm | Warp | iTerm2 | Ghostty | WezTerm | Kitty | Alacritty | Tabby |
|---|---|---|---|---|---|---|---|---|
| Multi-agent swarm · live tiles | ● | — | — | — | — | — | — | — |
| MCP server · agents drive the terminal | ● | partial | — | — | — | — | — | — |
| SwarmRoom · inter-agent chat | ● | — | — | — | — | — | — | — |
| Arena · head-to-head benchmark | ● | — | — | — | — | — | — | — |
| Forge · MCP + Skills marketplace | ● | catalog | — | — | — | — | — | — |
| Capability routing (task → best model) | ● | — | — | — | — | — | — | — |
| Cross-vendor usage tracking | ● | — | — | — | — | — | — | — |
| Usage forecast · threshold alerts | ● | — | — | — | — | — | — | — |
| Inline images · SVG render | ● | partial | ✓ | — | partial | ✓ | — | — |
| Inline video · PDF · audio waveform | ● | — | — | — | — | — | — | — |
| YouTube embed · hover previews | ● | — | — | — | — | — | — | — |
| Markdown table → spreadsheet | ● | — | — | — | — | — | — | — |
| Local AI · offline assistant | ● | — | plugin | — | — | — | — | — |
| Voice input · push-to-talk | ● | — | — | — | — | — | — | — |
| Session recording · ANSI + plain | ● | partial | — | — | — | — | — | — |
| Interactive screenshot → MCP | ● | — | — | — | — | — | — | — |
| Automation API (DevTools · Playwright) | ● | — | AppleScript | — | Lua | RC | — | plugins |
| Works offline · no cloud lock-in | ● | — | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| Open source | Apache + FSL | closed | GPL | MIT | MIT | GPL | Apache | MIT |
| macOS · Linux · Windows | ● ● ● | ✓ ✓ ✓ | mac | ✓ ✓ β | ✓ ✓ ✓ | ✓ ✓ — | ✓ ✓ ✓ | ✓ ✓ ✓ |
| Native GPU renderer | xterm.js | ✓ | Metal | ✓ | ✓ | ✓ | OpenGL | — |
Every AI CLI in one workspace. Benchmark them on real work. Pick the best tool for each job. Never break your flow.
A browser, a PTY, an artifact renderer, image / PDF / YouTube understanding, a room to talk to their siblings. MCP, done for you.
Arena runs head-to-head on real CLIs, real tasks. Your model's wins become shareable 9:16 battle videos. Free distribution.