Multi-agent tiles· MCP server for every CLI· Swarmroom inter-agent chat· Capability routing· Arena head-to-head· Forge marketplace· Local AI · Ollama· Usage forecast· Open source · Apache 2.0 + FSL-1.1· v0.2.37 · 19.04.26· Multi-agent tiles· MCP server for every CLI· Swarmroom inter-agent chat· Capability routing· Arena head-to-head· Forge marketplace· Local AI · Ollama· Usage forecast· Open source · Apache 2.0 + FSL-1.1· v0.2.37 · 19.04.26·
v0.2.37 · Multi-agent · Online

A terminal for you and your AI agents

Your shortcut to every AI CLI. Spawn Claude, Codex, Gemini, OpenCode side‑by‑side. Benchmark them. Ship the winner. Real CLIs. Real tools. Your judgment.

Download for Mac .dmg · notarized Download for Windows .exe · unsigned beta Download for Linux .deb · Ubuntu / WSL2 · AppImage
v0.2.37|macOS 13+ · Windows 10+ · Linux (x86_64)|npm i -g anvilterm
~/anvil/arena — claude-opus-4-7 codex gemini opencode ● LIVE · 4 AGENTS
Claude Code+ Codex+ Gemini CLI+ OpenCode+ Copilot CLI+ Ollama+ Shell· Claude Code+ Codex+ Gemini CLI+ OpenCode+ Copilot CLI+ Ollama+ Shell· Claude Code+ Codex+ Gemini CLI+ OpenCode+ Copilot CLI+ Ollama+ Shell·
— Runs every CLI worth running —
Claude
Claude
Codex
Codex
Gemini
Gemini
OpenCode
OpenCode
00/
The Pitch
ABSTRACT · 120 words

Every other terminal ships with one AI. AnvilTerm ships with a team.

Warp bets on one cloud agent. iTerm2 adds a chat sidebar. Ghostty stays pure. None of them let you spawn Claude, Codex, Gemini and OpenCode side-by-side, watch them coordinate in a shared chat, score them against each other, and ship the winner. AnvilTerm does. It treats multi-agent as a first-class primitive — live tiles, MCP control plane, SwarmRoom, Arena, Forge marketplace — not a feature on the roadmap.

Watching one AI work is productivity.
Watching four of them compete for your prompt is the future.

01/
Swarm · Multi-Agent
6 vendors · auto-grid · live

One prompt.
A team of AIs.

Spawn Claude, Gemini, Codex and OpenCode in one window. They debate in a shared chat, hand off artifacts, cross-check each other's work — while you watch every keystroke live. No Discord. No copy-paste. No tab juggling.

Swarm · migrate /auth to rs256 5 agents · 3 running · 00:02:14 · $0.42 / $5.00
you
cli · human
claude · lead
sonnet-4.5
› planning + dispatching
gemini
flash-3
› web research
codex
gpt-5.5 · high
› verify.ts → RS256
opencode
gemma-2:4b · local
› jest --coverage
claude · review
sonnet-4.5
› cross-check + merge
⟶ live streamtail -f
00:02:12 codex edited src/auth/verify.ts · +42 -18
00:02:09 gemini emitted research/jwt-rs256.md
00:02:03 opencode running jest --coverage
00:01:58 claude dispatched 3 subtasks
00:01:41 claude decomposed task (3 subtasks)
swarm · 4 agents · 2 running · 0 errors ● LIVE — 23:01
AnvilTerm SwarmRoom: Claude lead delegates research to Gemini and verification to Codex. All three agents work in parallel, post artifacts to a shared room, and the lead reviews them live.
Claude · Lead, plans + reviews Gemini · Web research, ships ai-news-web.md Codex · Verifies every URL live SwarmRoom · Their group chat, in your terminal
Real CLIs. Real tokens counted. Real artifacts saved to ~/.anvil/swarm/. The lead reads what specialists ship and decides what's worth keeping — you just approve the result.
★ Lead ↔ specialists

SwarmRoom — a channel where your AIs debate.

Lead agent delegates subtasks. Specialists pick them up, report back in the same room. You watch the transcript live in a chat tile. Works with any agent that speaks MCP — which is all of them.

"It's Discord for your agents, except they actually work."
— ak · build log · 2026-04-11
▸ Real CLIs

Not wrappers.

Your agents are literal claude, codex, gemini binaries. You see exactly what the TUI sees.

▸ Capability routing

Right model per task.

swarm_route(["multimodal"]) → Gemini. ["reasoning"] → Claude. ["refactor"] → Codex.

▸ Auto-grid

Tiles that re-flow.

One agent = one tab. Four = 2×2. Eight = 3×3. Each tile stops, closes, exports on its own.

⌖ Lead-led composition

Tell the lead what to ship. It picks the team.

Spawn a single Claude lead. It reads the prompt, calls swarm_vendors, picks specialists, delegates, merges the result. You just watch and approve. No team-building UI required.

⏱ Live token ledger

Per-agent + aggregate.

Every tile streams its token counter scraped from the TUI. The toolbar aggregates daily + weekly spend across Claude + Codex + OpenCode. Forecast tells you when you'll hit the cap.

02/
Arena · Head-to-head
Contestants · Judge · Export

Two agents enter.
One ships.

Stop guessing which model is best. Pick two — Claude vs GPT-5.5, Sonnet vs Opus, Gemini vs anyone — paste your prompt, hit Start. Live previews stream side-by-side. Real tokens. Real cost. Real time. You score 1-10 and the winner ships. Save the lab, re-run next month, watch the rankings flip.

battle · named · 2 agents · live ● BATTLE MODE
Same prompt · “Premium mobile health dashboard, single HTML” Live preview · the rendered output, not screenshots Real costs · 263 tokens · $0.0066 · 108.8s You judge · slider 1-10 + winner pick + export
Two agents. One prompt. Live HTML previews side-by-side. Real time, real tokens, real cost — and a score slider you actually use, because the deliverable is right there.

Every battle is stored at ~/.anvil/arena/<id>.jsonl. Bring it back six months later, re-run on new model versions, watch the winner flip. Reproducible benchmarking — for the laptop era.

03/
Screens · The product
macOS · dark · v0.2.37

More than two modes.
A whole stack.

Forge marketplace, Model Radar, Grid view, inline media — every screen designed to keep agents at the center of attention without burying the work they ship.

04/
MCP · Control Plane
20+ tools · stdio

Every AI you install
gets superpowers.

AnvilTerm ships with a Model Context Protocol server. One command registers it across Claude Code, Codex, Gemini, OpenCode. Every agent gains — for free — a browser, a PTY, inter-agent chat, artifact rendering, usage tracking, screenshot.

# install once, then wire AnvilTerm into every installed agent at once
npm install -g anvilterm
anvilterm-mcp-install      # registers MCP in Claude Code, Codex, Gemini, OpenCode, Copilot
anvilterm-doctor             # verify every vendor sees the MCP + skill

# now from any agent:
terminal_create() · terminal_write() · terminal_read() · terminal_screen()
terminal_screenshot() · terminal_run() · terminal_list() · terminal_close()
tui_type() · tui_interrupt() · tui_choose() · tui_paste_ref()
swarm_spawn() · swarm_route() · swarm_vendors() · swarm_check_stuck()
swarm_room_post() · swarm_room_listen() · swarm_room_thread() · swarm_room_list()
swarm_artifact_save() · arena_push_artifact() · arena_current()
◉ Terminal-in-terminal

Agents spawn their own PTYs.

A tab created via MCP appears as a live tile in the AnvilTerm window. You watch — and intervene if needed. Standalone fallback via node-pty if the UI isn't running.

◉ Works everywhere

Claude · Cursor · Windsurf · Codex · Gemini · OpenCode.

Every MCP-speaking client. Stdio transport. Register once, the toolset travels with you.

05/
Forge · Marketplace
MCP + Skills · one-tap install

A marketplace for MCP servers and Skills.

Curated catalog, star-ranked, kind-filtered. Search 258+ servers. Pick one, hit install, choose Claude / Codex / Gemini / OpenCode — or all of them. Forge writes directly to each agent's config with a _forge:true tag so it can manage updates and removal cleanly.

Forge marketplace — search bar over 258 MCP servers and skills with verified badges, star counts, and Install buttons
Forge · Browse view · 258 servers, sourced from MCP Registry + GitHub topics, deduped, star-ranked, one-tap install for Claude / Codex / Gemini / OpenCode.
⚒ One-tap install

Chrome · Slack · Playwright · Linear · Stripe — all at once.

Skills get git-cloned to ~/.anvil/skills/ and symlinked into each agent's skills dir. MCP servers get registered in each vendor's config. Installed view shows per-agent status chips so you always know what's actually wired.

"Homebrew, if brew also wired the package into every shell you've got open."
— forge. ship. repeat.
06/
Local AI · Offline
Ollama · Metal-accelerated

Offline assistant.
Your data stays local.

Cmd+K opens an Ollama-backed assistant with native function calling. Gemma 4, Qwen 2.5/3, Llama 3.1, Mistral Nemo. Prompts never leave the machine. Ideal for regulated work, sensitive repos, a plane, a café with no wifi.

▸ Zero telemetry

On-device.

Code, prompts, context — none of it leaves your laptop.

▸ Tool calls

Run buttons, not screenshots.

Models that speak tools return structured calls that render as one-click run buttons in the chat.

▸ Terminal context

Sees what you see.

The assistant reads the visible terminal screen so suggestions are grounded in what's actually running.

07/
Benchmark · Swarm vs Subagents
10 runs · n=40 tasks · 2026-04

Multi-agent swarm vs Claude subagents.
We ran the numbers.

Claude's built-in subagents are great when one model is enough. A real swarm — Claude + Codex + Gemini + OpenCode running in parallel as live CLIs — wins on parallelism, model diversity and visibility. Here's the honest breakdown on 40 mixed refactor / research / UI tasks.

A  ·  AnvilTerm Swarm (4 agents)

4 CLIs. Parallel. Visible.

Wall-clock time
1.0×
Model diversity
4 vendors
Live visibility
Per-tile
Context isolation
Full · per agent
Mid-run intervention
Any tile
Failure blast radius
1 of N
B  ·  Claude built-in subagents

Sequential. Opaque. One family.

Wall-clock time
3.1×
Model diversity
1 family
Live visibility
Spinner
Context isolation
Compacted
Mid-run intervention
None
Failure blast radius
Blocks parent
Dimension AnvilTerm Swarm Claude Subagents
ParallelismN real PTYs in parallelSequential within parent turn
Context window per workerFull · 1M tokens each · no compactionShared parent, compacted on dispatch
Model diversityClaude · Codex · Gemini · OpenCode · Copilot · OllamaClaude family only
Live outputDedicated live tile per agentOpaque spinner until result
Human-in-the-loopType into any tile, paste refs, interruptNone once dispatched
Artifact renderingIframes · SVG · markdown · liveText summary returned to parent
Failure isolationOne agent fails, N-1 keep workingSubagent failure blocks parent turn
Token cost routingPer-vendor metered, route cheap tasks to OllamaAll charged against parent's Claude quota
DeterminismFresh PTY state per agentInherits parent compaction
Tool accessEach agent carries its own MCP toolkitParent's MCP toolkit only
ReproducibilitySession JSONL + Arena replayNo persistent transcript
Best forCross-vendor compare, parallel research, long refactorsTightly-coupled chains in one model family

Note: wall-clock × is normalized to swarm=1.0 across 40 mixed tasks. Subagents win when the task is inherently sequential (each step depends on the prior result) — swarm wins when the work fan-outs. Use both.

07b/
Live token tracker
how close to the limit are you, right now?

Never hit a ceiling at 2am again.
The toolbar that doesn't lie.

Stop typing /usage three times. AnvilTerm pins your Claude, Codex, and Gemini limits next to your prompt — live, all three at once. No API keys. No OAuth. Just the same numbers your CLI sees, where you actually need them.

usage · 3 vendors · ticking live · no API ● LIVE
Watch your limits, not the clock. 5-hour windows, weekly caps, paid credits, burn-rate forecast — every vendor stamped next to your prompt and updated every keystroke.

Built on a passive scraper, not a side-channel API. AnvilTerm watches the bytes your CLI prints when you run /usage or /status, parses them, and stamps the result on the chip. The first-party numbers, free of provider rate-limits, accurate to the second.

07c/
Model radar
never miss a launch

A new model dropped.
Try it before the thread blows up.

AnvilTerm watches OpenRouter and the major vendor changelogs in the background, deduplicates aliases, and surfaces the freshest checkpoints with context window, input/output pricing, and a one-tap Try in Arena button — head-to-head against your current driver, on your repo, on your prompt. Stop scrolling Twitter to figure out what's actually new. The radar already pinged. The arena is already live.

Model Radar popover — This week tab listing 12 fresh checkpoints (Claude Haiku Latest, GPT Mini Latest, Gemini Pro Latest, Kimi Latest, Gemini Flash Latest, Claude Sonnet Latest, GPT Latest…) each with context window, input/output pricing, NEW badge, and Try in Arena + View on OR buttons
This week · This month · All recent · 12 fresh checkpoints, deduped by alias, ranked by drop date · NEW badge stays for 24 h · Try in Arena spawns a head-to-head against your current driver in one click.
08/
Daily drivers
the boring features that make it your terminal

A real terminal first.
An agent terminal second.

You'll spend most of the day in this as a regular terminal — so the daily-driver details get the same care as the multi-agent stuff. Find in tab. Search across every tab. Drop a YouTube link, see the thumbnail. Run a Pomodoro. Talk to your agent.

★ ⌘F · ⇧⌘F

Find in tab — and across every tab.

⌘F highlights every match in the active terminal's scrollback. ⇧⌘F searches every tab's last 5,000 lines, groups results by tab, click a hit → switches and scrolls to the line. Five agents running, one search box.

"The thing iTerm should have shipped a decade ago."
— every dev who's ever grep'd a scrollback
▸ anvil .

One command, project loaded.

anvil . from any shell opens a new tab at that cwd. App not running? It launches. OSC 7 auto-renames the tab.

▸ Inline media

YouTube · GIF · MP3 · PDF · image.

Paste a URL or drop a file. Thumbnails inline, waveform player for audio, PDF tiles. Multi-line wrapped URLs handled.

🎙 Voice in

Speak it. Ship it.

Hold the mic, talk, watch the waveform — your transcript drops into the active agent's prompt the moment you let go. Hands stay on the test. Eyes stay on the code. Cuts a 40-keystroke command to 3 seconds.

voice · push-to-talk · zero-typing ● RECORDING
One key. Two words. Done. Voice goes straight to whichever agent you're focused on — Claude, Codex, Gemini, OpenCode. Same flow, no app-switch.
⏱ Tasks + Focus Mode

Pomodoro built into the terminal.

Type ship the release 25m → instant timer. Click ▶ Start → everything else dims to 35%. 25m later, audible nudge plus system notification "Is X done?". Not another app to switch to — same window, same flow.

⌘K + 📡 + 🌙

Palette · Radar · Theme — the ergonomics layer.

⌘K fuzzy-fires anything: kill all, spawn 4 agents, or natural-language "find the tab where the test failed". 📡 Radar glows when a fresh OpenRouter model drops — one click sends it to Arena. 🌙 cycles dark / light / theme; palette swaps live, no flicker.

08b/
Media library
terminals shouldn't be blind to pixels

Hover a path.
See what it is.

Type a file path, drop an image, paste a YouTube link, fetch a PDF — AnvilTerm renders the preview right where the bytes are. Hover any URL or path in the scrollback to peek the file in a floating thumbnail; the side Media rail collects every image, video, audio, and doc this session has touched, with one-tap Path · URL · Copy · Insert actions. Your terminal finally knows what a JPG looks like.

AnvilTerm with a terminal-hover preview floating over a usage-claude.png path, and a Media sidebar listing every image / video / audio / doc surfaced this session with Path · URL · Copy · Insert actions
Hover preview · Media rail · All / Images / Video / Audio / Docs filters · 8 items captured this session, one-click Insert pastes the path back into the active prompt.
08c/
Grid view
N agents, one screen, zero alt-tab

Every tab, all at once.
No alt-tab. No regret.

One keystroke arranges every open terminal — manual shells, swarm tiles, Arena contestants — into a live grid. Each tile keeps its own PTY, its own scrollback, its own input. Watch four agents race the same prompt, or just keep an eye on a long build while you work next door. Same window, same window manager, no third-party tile manager required.

AnvilTerm grid view — Swarm header showing 4 agents 1 running 0 done 0 err with 2x2 grid selector, four live tiles running OpenCode, Codex gpt-5.5 medium, GitHub Copilot, and Gemini CLI side by side
Swarm header · 4 agents · 2×2 grid · OpenCode + Codex + Copilot + Gemini · each tile is a real PTY with its own scrollback, kill-all in one click, no third-party tile manager.
09/
The Ledger · vs Everyone
23 capabilities · 8 terminals

The honest spec sheet.
We checked.

Every other terminal is excellent at its thing. The matrix below is the proof — not marketing. Hover any row to see only AnvilTerm light up.

Capability AnvilTerm Warp iTerm2 Ghostty WezTerm Kitty Alacritty Tabby
Multi-agent swarm · live tiles
MCP server · agents drive the terminalpartial
SwarmRoom · inter-agent chat
Arena · head-to-head benchmark
Forge · MCP + Skills marketplacecatalog
Capability routing (task → best model)
Cross-vendor usage tracking
Usage forecast · threshold alerts
Inline images · SVG renderpartialpartial
Inline video · PDF · audio waveform
YouTube embed · hover previews
Markdown table → spreadsheet
Local AI · offline assistantplugin
Voice input · push-to-talk
Session recording · ANSI + plainpartial
Interactive screenshot → MCP
Automation API (DevTools · Playwright)AppleScriptLuaRCplugins
Works offline · no cloud lock-in
Open sourceApache + FSLclosedGPLMITMITGPLApacheMIT
macOS · Linux · Windows● ● ●✓ ✓ ✓mac✓ ✓ β✓ ✓ ✓✓ ✓ —✓ ✓ ✓✓ ✓ ✓
Native GPU rendererxterm.jsMetalOpenGL
shipped · AnvilTerm shipped partial via plugin or limited not available
10/
Made for three sides
dev · agents · labs

Built for the AI era.
All three sides of it.

A / The developer

Stop flipping tabs.

Every AI CLI in one workspace. Benchmark them on real work. Pick the best tool for each job. Never break your flow.

B / The agents

Real tools.

A browser, a PTY, an artifact renderer, image / PDF / YouTube understanding, a room to talk to their siblings. MCP, done for you.

C / The labs

Reproducible benchmarks.

Arena runs head-to-head on real CLIs, real tasks. Every run is a signed receipt — tokens, cost, time, tests passed — appended to ~/.anvil/benchmarks/runs.jsonl. Reproducible by anyone with the same repo and prompt.

Forge the future.

Download · Spawn · Ship
↓ AnvilTerm 0.2.34 · macOS
↓   Download