Multi-agent tiles· MCP server for every CLI· Swarmroom inter-agent chat· Capability routing· Arena head-to-head· Forge marketplace· Local AI · Ollama· Usage forecast· Open source · Apache 2.0 + FSL-1.1· v0.2.34 · 19.04.26· Multi-agent tiles· MCP server for every CLI· Swarmroom inter-agent chat· Capability routing· Arena head-to-head· Forge marketplace· Local AI · Ollama· Usage forecast· Open source · Apache 2.0 + FSL-1.1· v0.2.34 · 19.04.26·
v0.2.34 · Multi-agent · Online

A terminal for you and your AI agents

Your shortcut to every AI CLI. Spawn Claude, Codex, Gemini, OpenCode side‑by‑side. Benchmark them. Ship the winner. Real CLIs. Real tools. Your judgment.

Download for Mac .dmg · notarized Download for Windows .exe · unsigned beta Download for Linux .deb · Ubuntu / WSL2 · AppImage
v0.2.34|macOS 13+ · Windows 10+ · Linux (x86_64)|npx anvilterm
~/anvil/arena — claude-opus-4-7 codex gemini opencode ● LIVE · 4 AGENTS
Claude Code+ Codex+ Gemini CLI+ OpenCode+ Copilot CLI+ Ollama+ Shell· Claude Code+ Codex+ Gemini CLI+ OpenCode+ Copilot CLI+ Ollama+ Shell· Claude Code+ Codex+ Gemini CLI+ OpenCode+ Copilot CLI+ Ollama+ Shell·
— Runs every CLI worth running —
Claude
Claude
Codex
Codex
Gemini
Gemini
OpenCode
OpenCode
00/
The Pitch
ABSTRACT · 120 words

Every other terminal ships with one AI. AnvilTerm ships with a team.

Warp bets on one cloud agent. iTerm2 adds a chat sidebar. Ghostty stays pure. None of them let you spawn Claude, Codex, Gemini and OpenCode side-by-side, watch them coordinate in a shared chat, score them against each other, and ship the winner. AnvilTerm does. It treats multi-agent as a first-class primitive — live tiles, MCP control plane, SwarmRoom, Arena, Forge marketplace — not a feature on the roadmap.

Watching one AI work is productivity.
Watching four of them compete for your prompt is the future.

01/
Swarm · Multi-Agent
6 vendors · auto-grid · live

Spawn a team.
Watch it work.

One prompt, N agents. Each is its own live PTY tile running a real CLI — not a wrapper, not an API reskin. Auto-grid re-flows as agents finish. They post to a shared SwarmRoom where the lead delegates and specialists report back.

★ Lead ↔ specialists

SwarmRoom — a channel where your AIs debate.

Lead agent delegates subtasks. Specialists pick them up, report back in the same room. You watch the transcript live in a chat tile. Works with any agent that speaks MCP — which is all of them.

"It's Discord for your agents, except they actually work."
— ak · build log · 2026-04-11
▸ Real CLIs

Not wrappers.

Your agents are literal claude, codex, gemini binaries. You see exactly what the TUI sees.

▸ Capability routing

Right model per task.

swarm_route(["multimodal"]) → Gemini. ["reasoning"] → Claude. ["refactor"] → Codex.

▸ Auto-grid

Tiles that re-flow.

One agent = one tab. Four = 2×2. Eight = 3×3. Each tile stops, closes, exports on its own.

⌖ Lead-led composition

Tell the lead what to ship. It picks the team.

Spawn a single Claude lead. It reads the prompt, calls swarm_vendors, picks specialists, delegates, merges the result. You just watch and approve. No team-building UI required.

⏱ Live token ledger

Per-agent + aggregate.

Every tile streams its token counter scraped from the TUI. The toolbar aggregates daily + weekly spend across Claude + Codex + OpenCode. Forecast tells you when you'll hit the cap.

02/
Arena · Head-to-head
Contestants · Judge · Export

Two agents enter.
One ships.

Pick contestants. Paste a prompt. Hit start. Each tile streams live output, renders deliverables as iframes when the agent calls arena_push_artifact. Judge on quality, speed, correctness. Export the fight as markdown + JSONL — same lab, same prompt, run it again next week and see what drifted.

Claude Opus 4.7
94
Shipped in 38s · 12.1k tokens · ArtifactV1 rendered · 0 revisions
vs
GPT-5.4 High
91
Shipped in 42s · 14.8k tokens · ArtifactV1 rendered · 1 revision

Every battle is stored at ~/.anvil/arena/<id>.jsonl. Bring it back six months later, re-run on new model versions, watch the winner flip. Reproducible benchmarking — for the laptop era.

03/
Screens · The product
macOS · dark · v0.2.34

This is what four agents
working at once looks like.

Arena tiles stream live output side-by-side. The full app view shows Forge marketplace, SwarmRoom chat, and the menu-bar usage tray all alive at once.

04/
MCP · Control Plane
20+ tools · stdio

Every AI you install
gets superpowers.

AnvilTerm ships with a Model Context Protocol server. One command registers it across Claude Code, Codex, Gemini, OpenCode. Every agent gains — for free — a browser, a PTY, inter-agent chat, artifact rendering, usage tracking, screenshot.

# wire AnvilTerm into every installed agent at once
npx anvilterm-doctor --install

# now from any agent:
terminal_create() · terminal_write() · terminal_screen()
tui_type() · tui_interrupt() · tui_choose()
swarm_spawn() · swarm_route() · swarm_vendors()
swarm_room_post() · swarm_room_listen() · swarm_room_thread()
arena_push_artifact() · arena_current()
◉ Terminal-in-terminal

Agents spawn their own PTYs.

A tab created via MCP appears as a live tile in the AnvilTerm window. You watch — and intervene if needed. Standalone fallback via node-pty if the UI isn't running.

◉ Works everywhere

Claude · Cursor · Windsurf · Codex · Gemini · OpenCode.

Every MCP-speaking client. Stdio transport. Register once, the toolset travels with you.

05/
Forge · Marketplace
MCP + Skills · one-tap install

A marketplace for MCP servers and Skills.

Curated catalog, star-ranked, kind-filtered. Search 258+ servers. Pick one, hit install, choose Claude / Codex / Gemini / OpenCode — or all of them. Forge writes directly to each agent's config with a _forge:true tag so it can manage updates and removal cleanly.

Forge marketplace — search bar over 258 MCP servers and skills with verified badges, star counts, and Install buttons
Forge · Browse view · 258 servers, sourced from MCP Registry + GitHub topics, deduped, star-ranked, one-tap install for Claude / Codex / Gemini / OpenCode.
⚒ One-tap install

Chrome · Slack · Playwright · Linear · Stripe — all at once.

Skills get git-cloned to ~/.anvil/skills/ and symlinked into each agent's skills dir. MCP servers get registered in each vendor's config. Installed view shows per-agent status chips so you always know what's actually wired.

"Homebrew, if brew also wired the package into every shell you've got open."
— forge. ship. repeat.
06/
Local AI · Offline
Ollama · Metal-accelerated

Offline assistant.
Your data stays local.

Cmd+K opens an Ollama-backed assistant with native function calling. Gemma 4, Qwen 2.5/3, Llama 3.1, Mistral Nemo. Prompts never leave the machine. Ideal for regulated work, sensitive repos, a plane, a café with no wifi.

▸ Zero telemetry

On-device.

Code, prompts, context — none of it leaves your laptop.

▸ Tool calls

Run buttons, not screenshots.

Models that speak tools return structured calls that render as one-click run buttons in the chat.

▸ Terminal context

Sees what you see.

The assistant reads the visible terminal screen so suggestions are grounded in what's actually running.

07/
Benchmark · Swarm vs Subagents
10 runs · n=40 tasks · 2026-04

Multi-agent swarm vs Claude subagents.
We ran the numbers.

Claude's built-in subagents are great when one model is enough. A real swarm — Claude + Codex + Gemini + OpenCode running in parallel as live CLIs — wins on parallelism, model diversity and visibility. Here's the honest breakdown on 40 mixed refactor / research / UI tasks.

A  ·  AnvilTerm Swarm (4 agents)

4 CLIs. Parallel. Visible.

Wall-clock time
1.0×
Model diversity
4 vendors
Live visibility
Per-tile
Context isolation
Full · per agent
Mid-run intervention
Any tile
Failure blast radius
1 of N
B  ·  Claude built-in subagents

Sequential. Opaque. One family.

Wall-clock time
3.1×
Model diversity
1 family
Live visibility
Spinner
Context isolation
Compacted
Mid-run intervention
None
Failure blast radius
Blocks parent
Dimension AnvilTerm Swarm Claude Subagents
ParallelismN real PTYs in parallelSequential within parent turn
Context window per workerFull · 1M tokens each · no compactionShared parent, compacted on dispatch
Model diversityClaude · Codex · Gemini · OpenCode · Copilot · OllamaClaude family only
Live outputDedicated live tile per agentOpaque spinner until result
Human-in-the-loopType into any tile, paste refs, interruptNone once dispatched
Artifact renderingIframes · SVG · markdown · liveText summary returned to parent
Failure isolationOne agent fails, N-1 keep workingSubagent failure blocks parent turn
Token cost routingPer-vendor metered, route cheap tasks to OllamaAll charged against parent's Claude quota
DeterminismFresh PTY state per agentInherits parent compaction
Tool accessEach agent carries its own MCP toolkitParent's MCP toolkit only
ReproducibilitySession JSONL + Arena replayNo persistent transcript
Best forCross-vendor compare, parallel research, long refactorsTightly-coupled chains in one model family

Note: wall-clock × is normalized to swarm=1.0 across 40 mixed tasks. Subagents win when the task is inherently sequential (each step depends on the prior result) — swarm wins when the work fan-outs. Use both.

07b/
Live token tracker
how close to the limit are you, right now?

Three subscriptions.
One toolbar that doesn't lie.

Your AnvilTerm toolbar shows what your Claude, Codex, and Gemini CLIs would tell you if you typed /usage in each — without typing it. AnvilTerm scrapes the panels in the background and pins the live percentages next to your prompt. No API keys. No rate limits. No OAuth side-channels. Just the same numbers your CLI sees, surfaced where you actually need them.

Claude usage popover — Max·20× plan, Opus 4.7, 5-hour and weekly windows, pay-as-you-go credits, forecast
Claude · 5-hour window, weekly all-models, weekly Sonnet, pay-as-you-go credits, burn-rate forecast.
Codex usage popover — Plus plan, gpt-5.5, 5-hour and weekly windows, forecast
Codex · 5-hour and weekly windows, last 105 samples sparkline, ETA to limit.
Gemini usage popover — gemini-3-flash-preview, daily quota
Gemini · Daily quota, model name, sample history. Free tier and paid.

Built on a passive scraper, not a side-channel API. AnvilTerm watches the bytes your CLI prints when you run /usage or /status, parses them, and stamps the result on the chip. The first-party numbers, free of provider rate-limits, accurate to the second.

07c/
Model radar
never miss a launch

A new model dropped.
Try it before the thread blows up.

AnvilTerm watches OpenRouter and the major vendor changelogs in the background, deduplicates aliases, and surfaces the freshest checkpoints with context window, input/output pricing, and a one-tap Try in Arena button — head-to-head against your current driver, on your repo, on your prompt. Stop scrolling Twitter to figure out what's actually new. The radar already pinged. The arena is already live.

Model Radar popover — This week tab listing 12 fresh checkpoints (Claude Haiku Latest, GPT Mini Latest, Gemini Pro Latest, Kimi Latest, Gemini Flash Latest, Claude Sonnet Latest, GPT Latest…) each with context window, input/output pricing, NEW badge, and Try in Arena + View on OR buttons
This week · This month · All recent · 12 fresh checkpoints, deduped by alias, ranked by drop date · NEW badge stays for 24 h · Try in Arena spawns a head-to-head against your current driver in one click.
08/
Daily drivers
the boring features that make it your terminal

A real terminal first.
An agent terminal second.

You'll spend most of the day in this as a regular terminal — so the daily-driver details get the same care as the multi-agent stuff. Find in tab. Search across every tab. Drop a YouTube link, see the thumbnail. Run a Pomodoro. Talk to your agent.

★ ⌘F · ⇧⌘F

Find in tab — and across every tab.

⌘F highlights every match in the active terminal's scrollback. ⇧⌘F searches every tab's last 5,000 lines, groups results by tab, click a hit → switches and scrolls to the line. Five agents running, one search box.

"The thing iTerm should have shipped a decade ago."
— every dev who's ever grep'd a scrollback
▸ anvil .

One command, project loaded.

anvil . from any shell opens a new tab at that cwd. App not running? It launches. OSC 7 auto-renames the tab.

▸ Inline media

YouTube · GIF · MP3 · PDF · image.

Paste a URL or drop a file. Thumbnails inline, waveform player for audio, PDF tiles. Multi-line wrapped URLs handled.

▸ Voice in

Talk to the active agent.

Tap 🎙, speak, live waveform overlay, transcript drops into the prompt. Hands on the test, eyes on the code.

⏱ Tasks + Focus Mode

Pomodoro built into the terminal.

Type ship the release 25m → instant timer. Click ▶ Start → everything else dims to 35%. 25m later, audible nudge plus system notification "Is X done?". Not another app to switch to — same window, same flow.

⌘K + 📡 + 🌙

Palette · Radar · Theme — the ergonomics layer.

⌘K fuzzy-fires anything: kill all, spawn 4 agents, or natural-language "find the tab where the test failed". 📡 Radar glows when a fresh OpenRouter model drops — one click sends it to Arena. 🌙 cycles dark / light / theme; palette swaps live, no flicker.

08b/
Media library
terminals shouldn't be blind to pixels

Hover a path.
See what it is.

Type a file path, drop an image, paste a YouTube link, fetch a PDF — AnvilTerm renders the preview right where the bytes are. Hover any URL or path in the scrollback to peek the file in a floating thumbnail; the side Media rail collects every image, video, audio, and doc this session has touched, with one-tap Path · URL · Copy · Insert actions. Your terminal finally knows what a JPG looks like.

AnvilTerm with a terminal-hover preview floating over a usage-claude.png path, and a Media sidebar listing every image / video / audio / doc surfaced this session with Path · URL · Copy · Insert actions
Hover preview · Media rail · All / Images / Video / Audio / Docs filters · 8 items captured this session, one-click Insert pastes the path back into the active prompt.
08c/
Grid view
N agents, one screen, zero alt-tab

Every tab, all at once.
No alt-tab. No regret.

One keystroke arranges every open terminal — manual shells, swarm tiles, Arena contestants — into a live grid. Each tile keeps its own PTY, its own scrollback, its own input. Watch four agents race the same prompt, or just keep an eye on a long build while you work next door. Same window, same window manager, no third-party tile manager required.

AnvilTerm grid view — Swarm header showing 4 agents 1 running 0 done 0 err with 2x2 grid selector, four live tiles running OpenCode, Codex gpt-5.5 medium, GitHub Copilot, and Gemini CLI side by side
Swarm header · 4 agents · 2×2 grid · OpenCode + Codex + Copilot + Gemini · each tile is a real PTY with its own scrollback, kill-all in one click, no third-party tile manager.
09/
The Ledger · vs Everyone
23 capabilities · 8 terminals

The honest spec sheet.
We checked.

Every other terminal is excellent at its thing. The matrix below is the proof — not marketing. Hover any row to see only AnvilTerm light up.

Capability AnvilTerm Warp iTerm2 Ghostty WezTerm Kitty Alacritty Tabby
Multi-agent swarm · live tiles
MCP server · agents drive the terminalpartial
SwarmRoom · inter-agent chat
Arena · head-to-head benchmark
Forge · MCP + Skills marketplacecatalog
Capability routing (task → best model)
Cross-vendor usage tracking
Usage forecast · threshold alerts
Inline images · SVG renderpartialpartial
Inline video · PDF · audio waveform
YouTube embed · hover previews
Markdown table → spreadsheet
Local AI · offline assistantplugin
Voice input · push-to-talk
Session recording · ANSI + plainpartial
Interactive screenshot → MCP
Automation API (DevTools · Playwright)AppleScriptLuaRCplugins
Works offline · no cloud lock-in
Open sourceApache + FSLclosedGPLMITMITGPLApacheMIT
macOS · Linux · Windows● ● ●✓ ✓ ✓mac✓ ✓ β✓ ✓ ✓✓ ✓ —✓ ✓ ✓✓ ✓ ✓
Native GPU rendererxterm.jsMetalOpenGL
shipped · AnvilTerm shipped partial via plugin or limited not available
10/
Made for three sides
dev · agents · labs

Built for the AI era.
All three sides of it.

A / The developer

Stop flipping tabs.

Every AI CLI in one workspace. Benchmark them on real work. Pick the best tool for each job. Never break your flow.

B / The agents

Real tools.

A browser, a PTY, an artifact renderer, image / PDF / YouTube understanding, a room to talk to their siblings. MCP, done for you.

C / The labs

Reproducible benchmarks.

Arena runs head-to-head on real CLIs, real tasks. Every run is a signed receipt — tokens, cost, time, tests passed — appended to ~/.anvil/benchmarks/runs.jsonl. Reproducible by anyone with the same repo and prompt.

Forge the future.

Download · Spawn · Ship
↓ AnvilTerm 0.2.34 · macOS
↓   Download