AnvilTerm — A terminal for you and your AI agents

01/

Swarm · Multi-Agent

6 vendors · auto-grid · live

One prompt.
A team of AIs.

Spawn Claude, Gemini, Codex and OpenCode in one window. They debate in a shared chat, hand off artifacts, cross-check each other's work — while you watch every keystroke live. No Discord. No copy-paste. No tab juggling.

Swarm · migrate /auth to rs256 5 agents · 3 running · 00:02:14 · $0.42 / $5.00

YOU you

cli · human

claude · lead

sonnet-4.5

› planning + dispatching

gemini

flash-3

› web research

codex

gpt-5.5 · high

› verify.ts → RS256

opencode

gemma-2:4b · local

› jest --coverage

claude · review

sonnet-4.5

› cross-check + merge

⟶ live streamtail -f

00:02:12 codex edited src/auth/verify.ts · +42 -18

00:02:09 gemini emitted research/jwt-rs256.md

00:02:03 opencode running jest --coverage …

00:01:58 claude dispatched 3 subtasks

00:01:41 claude decomposed task (3 subtasks)

AnvilTerm SwarmRoom: Claude lead delegates research to Gemini and verification to Codex. All three agents work in parallel, post artifacts to a shared room, and the lead reviews them live. — *Real CLIs.* Real tokens counted. Real artifacts saved to `~/.anvil/swarm/`. The lead reads what specialists ship and decides what's worth keeping — *you just approve the result*.

★ Lead ↔ specialists

SwarmRoom — a channel where your AIs debate.

Lead agent delegates subtasks. Specialists pick them up, report back in the same room. You watch the transcript live in a chat tile. Works with any agent that speaks MCP — which is all of them.

"It's Discord for your agents, except they actually work."

— ak · build log · 2026-04-11

▸ Real CLIs

Not wrappers.

Your agents are literal claude, codex, gemini binaries. You see exactly what the TUI sees.

▸ Capability routing

Right model per task.

swarm_route(["multimodal"]) → Gemini. ["reasoning"] → Claude. ["refactor"] → Codex.

▸ Auto-grid

Tiles that re-flow.

One agent = one tab. Four = 2×2. Eight = 3×3. Each tile stops, closes, exports on its own.

⌖ Lead-led composition

Tell the lead what to ship. It picks the team.

Spawn a single Claude lead. It reads the prompt, calls swarm_vendors, picks specialists, delegates, merges the result. You just watch and approve. No team-building UI required.

⏱ Live token ledger

Per-agent + aggregate.

Every tile streams its token counter scraped from the TUI. The toolbar aggregates daily + weekly spend across Claude + Codex + OpenCode. Forecast tells you when you'll hit the cap.

02/

Arena · Head-to-head

Contestants · Judge · Export

Two agents enter.
One ships.

Stop guessing which model is best. Pick two — Claude vs GPT-5.5, Sonnet vs Opus, Gemini vs anyone — paste your prompt, hit Start. Live previews stream side-by-side. Real tokens. Real cost. Real time. You score 1-10 and the winner ships. Save the lab, re-run next month, watch the rankings flip.

battle · named · 2 agents · live ● BATTLE MODE

Same prompt · “Premium mobile health dashboard, single HTML” Live preview · the rendered output, not screenshots Real costs · 263 tokens · $0.0066 · 108.8s You judge · slider 1-10 + winner pick + export

Two agents. One prompt. Live HTML previews side-by-side. Real time, real tokens, real cost — and a score slider you actually use, because the deliverable is right there.

Every battle is stored at ~/.anvil/arena/<id>.jsonl. Bring it back six months later, re-run on new model versions, watch the winner flip. Reproducible benchmarking — for the laptop era.

03/

Screens · The product

macOS · dark · v0.2.41

More than two modes.
A whole stack.

Forge marketplace, Model Radar, Grid view, inline media — every screen designed to keep agents at the center of attention without burying the work they ship.

AnvilTerm Forge — one-tap install of skills and MCP servers across every installed agent — *04/* FORGE — install skills everywhere.

AnvilTerm Model Radar — newest models from OpenRouter with one-tap Try in Arena — *05/* RADAR — never miss a launch.

04/

MCP · Control Plane

20+ tools · stdio

Every AI you install
gets superpowers.

AnvilTerm ships with a Model Context Protocol server. One command registers it across Claude Code, Codex, Gemini, OpenCode. Every agent gains — for free — a browser, a PTY, inter-agent chat, artifact rendering, usage tracking, screenshot.

# install once, then wire AnvilTerm into every installed agent at once
npm install -g anvilterm
anvilterm-mcp-install      # registers MCP in Claude Code, Codex, Gemini, OpenCode, Copilot
anvilterm-doctor             # verify every vendor sees the MCP + skill

# now from any agent:
terminal_create() · terminal_write() · terminal_read() · terminal_screen()
terminal_screenshot() · terminal_run() · terminal_list() · terminal_close()
tui_type() · tui_interrupt() · tui_choose() · tui_paste_ref()
swarm_spawn() · swarm_route() · swarm_vendors() · swarm_check_stuck()
swarm_room_post() · swarm_room_listen() · swarm_room_thread() · swarm_room_list()
swarm_artifact_save() · arena_push_artifact() · arena_current()

◉ Terminal-in-terminal

Agents spawn their own PTYs.

A tab created via MCP appears as a live tile in the AnvilTerm window. You watch — and intervene if needed. Standalone fallback via node-pty if the UI isn't running.

◉ Works everywhere

Claude · Cursor · Windsurf · Codex · Gemini · OpenCode.

Every MCP-speaking client. Stdio transport. Register once, the toolset travels with you.

05/

Forge · Marketplace

MCP + Skills · one-tap install

A marketplace for MCP servers and Skills.

Curated catalog, star-ranked, kind-filtered. Search 258+ servers. Pick one, hit install, choose Claude / Codex / Gemini / OpenCode — or all of them. Forge writes directly to each agent's config with a _forge:true tag so it can manage updates and removal cleanly.

Forge marketplace — search bar over 258 MCP servers and skills with verified badges, star counts, and Install buttons — Forge · Browse view · 258 servers, sourced from MCP Registry + GitHub topics, deduped, star-ranked, one-tap install for Claude / Codex / Gemini / OpenCode.

⚒ One-tap install

Chrome · Slack · Playwright · Linear · Stripe — all at once.

Skills get git-cloned to ~/.anvil/skills/ and symlinked into each agent's skills dir. MCP servers get registered in each vendor's config. Installed view shows per-agent status chips so you always know what's actually wired.

"Homebrew, if brew also wired the package into every shell you've got open."

— forge. ship. repeat.

06/

Local AI · Offline

Ollama · Metal-accelerated

Offline assistant.
Your data stays local.

Cmd+K opens an Ollama-backed assistant with native function calling. Gemma 4, Qwen 2.5/3, Llama 3.1, Mistral Nemo. Prompts never leave the machine. Ideal for regulated work, sensitive repos, a plane, a café with no wifi.

▸ Zero telemetry

On-device.

Code, prompts, context — none of it leaves your laptop.

▸ Tool calls

Run buttons, not screenshots.

Models that speak tools return structured calls that render as one-click run buttons in the chat.

▸ Terminal context

Sees what you see.

The assistant reads the visible terminal screen so suggestions are grounded in what's actually running.

07/

Benchmark · Swarm vs Subagents

10 runs · n=40 tasks · 2026-04

Multi-agent swarm vs Claude subagents.
We ran the numbers.

Claude's built-in subagents are great when one model is enough. A real swarm — Claude + Codex + Gemini + OpenCode running in parallel as live CLIs — wins on parallelism, model diversity and visibility. Here's the honest breakdown on 40 mixed refactor / research / UI tasks.

A · AnvilTerm Swarm (4 agents)

4 CLIs. Parallel. Visible.

Wall-clock time

1.0×

Model diversity

4 vendors

Live visibility

Per-tile

Context isolation

Full · per agent

Mid-run intervention

Any tile

Failure blast radius

1 of N

B · Claude built-in subagents

Sequential. Opaque. One family.

Wall-clock time

3.1×

Model diversity

1 family

Live visibility

Spinner

Context isolation

Compacted

Mid-run intervention

None

Failure blast radius

Blocks parent

Dimension	AnvilTerm Swarm	Claude Subagents
Parallelism	N real PTYs in parallel	Sequential within parent turn
Context window per worker	Full · 1M tokens each · no compaction	Shared parent, compacted on dispatch
Model diversity	Claude · Codex · Gemini · OpenCode · Copilot · Ollama	Claude family only
Live output	Dedicated live tile per agent	Opaque spinner until result
Human-in-the-loop	Type into any tile, paste refs, interrupt	None once dispatched
Artifact rendering	Iframes · SVG · markdown · live	Text summary returned to parent
Failure isolation	One agent fails, N-1 keep working	Subagent failure blocks parent turn
Token cost routing	Per-vendor metered, route cheap tasks to Ollama	All charged against parent's Claude quota
Determinism	Fresh PTY state per agent	Inherits parent compaction
Tool access	Each agent carries its own MCP toolkit	Parent's MCP toolkit only
Reproducibility	Session JSONL + Arena replay	No persistent transcript
Best for	Cross-vendor compare, parallel research, long refactors	Tightly-coupled chains in one model family

Note: wall-clock × is normalized to swarm=1.0 across 40 mixed tasks. Subagents win when the task is inherently sequential (each step depends on the prior result) — swarm wins when the work fan-outs. Use both.

07b/

Live token tracker

how close to the limit are you, right now?

Never hit a ceiling at 2am again.
The toolbar that doesn't lie.

Stop typing /usage three times. AnvilTerm pins your Claude, Codex, and Gemini limits next to your prompt — live, all three at once. No API keys. No OAuth. Just the same numbers your CLI sees, where you actually need them.

usage · 3 vendors · ticking live · no API ● LIVE

Watch your limits, not the clock. 5-hour windows, weekly caps, paid credits, burn-rate forecast — every vendor stamped next to your prompt and updated every keystroke.

Built on a passive scraper, not a side-channel API. AnvilTerm watches the bytes your CLI prints when you run /usage or /status, parses them, and stamps the result on the chip. The first-party numbers, free of provider rate-limits, accurate to the second.

07c/

Model radar

never miss a launch

A new model dropped.
Try it before the thread blows up.

AnvilTerm watches OpenRouter and the major vendor changelogs in the background, deduplicates aliases, and surfaces the freshest checkpoints with context window, input/output pricing, and a one-tap Try in Arena button — head-to-head against your current driver, on your repo, on your prompt. Stop scrolling Twitter to figure out what's actually new. The radar already pinged. The arena is already live.

Model Radar popover — This week tab listing 12 fresh checkpoints (Claude Haiku Latest, GPT Mini Latest, Gemini Pro Latest, Kimi Latest, Gemini Flash Latest, Claude Sonnet Latest, GPT Latest…) each with context window, input/output pricing, NEW badge, and Try in Arena + View on OR buttons — This week · This month · All recent · 12 fresh checkpoints, deduped by alias, ranked by drop date · NEW badge stays for 24 h · Try in Arena spawns a head-to-head against your current driver in one click.

08/

Daily drivers

the boring features that make it your terminal

A real terminal first.
An agent terminal second.

You'll spend most of the day in this as a regular terminal — so the daily-driver details get the same care as the multi-agent stuff. Find in tab. Search across every tab. Drop a YouTube link, see the thumbnail. Run a Pomodoro. Talk to your agent.

★ ⌘F · ⇧⌘F

Find in tab — and across every tab.

⌘F highlights every match in the active terminal's scrollback. ⇧⌘F searches every tab's last 5,000 lines, groups results by tab, click a hit → switches and scrolls to the line. Five agents running, one search box.

"The thing iTerm should have shipped a decade ago."

— every dev who's ever grep'd a scrollback

▸ anvil .

One command, project loaded.

anvil . from any shell opens a new tab at that cwd. App not running? It launches. OSC 7 auto-renames the tab.

▸ Inline media

YouTube · GIF · MP3 · PDF · image.

Paste a URL or drop a file. Thumbnails inline, waveform player for audio, PDF tiles. Multi-line wrapped URLs handled.

🎙 Voice in

Speak it. Ship it.

Hold the mic, talk, watch the waveform — your transcript drops into the active agent's prompt the moment you let go. Hands stay on the test. Eyes stay on the code. Cuts a 40-keystroke command to 3 seconds.

voice · push-to-talk · zero-typing ● RECORDING

One key. Two words. Done. Voice goes straight to whichever agent you're focused on — Claude, Codex, Gemini, OpenCode. Same flow, no app-switch.

⏱ Tasks + Focus Mode

Pomodoro built into the terminal.

Type ship the release 25m → instant timer. Click ▶ Start → everything else dims to 35%. 25m later, audible nudge plus system notification "Is X done?". Not another app to switch to — same window, same flow.

⌘K + 📡 + 🌙

Palette · Radar · Theme — the ergonomics layer.

⌘K fuzzy-fires anything: kill all, spawn 4 agents, or natural-language "find the tab where the test failed". 📡 Radar glows when a fresh OpenRouter model drops — one click sends it to Arena. 🌙 cycles dark / light / theme; palette swaps live, no flicker.

08b/

Media library

terminals shouldn't be blind to pixels

Hover a path.
See what it is.

Type a file path, drop an image, paste a YouTube link, fetch a PDF — AnvilTerm renders the preview right where the bytes are. Hover any URL or path in the scrollback to peek the file in a floating thumbnail; the side Media rail collects every image, video, audio, and doc this session has touched, with one-tap Path · URL · Copy · Insert actions. Your terminal finally knows what a JPG looks like.

AnvilTerm with a terminal-hover preview floating over a usage-claude.png path, and a Media sidebar listing every image / video / audio / doc surfaced this session with Path · URL · Copy · Insert actions — Hover preview · Media rail · All / Images / Video / Audio / Docs filters · 8 items captured this session, one-click Insert pastes the path back into the active prompt.

08c/

Grid view

N agents, one screen, zero alt-tab

Every tab, all at once.
No alt-tab. No regret.

One keystroke arranges every open terminal — manual shells, swarm tiles, Arena contestants — into a live grid. Each tile keeps its own PTY, its own scrollback, its own input. Watch four agents race the same prompt, or just keep an eye on a long build while you work next door. Same window, same window manager, no third-party tile manager required.

AnvilTerm grid view — Swarm header showing 4 agents 1 running 0 done 0 err with 2x2 grid selector, four live tiles running OpenCode, Codex gpt-5.5 medium, GitHub Copilot, and Gemini CLI side by side — Swarm header · 4 agents · 2×2 grid · OpenCode + Codex + Copilot + Gemini · each tile is a real PTY with its own scrollback, kill-all in one click, no third-party tile manager.

09/

The Ledger · vs Everyone

23 capabilities · 8 terminals

The honest spec sheet.
We checked.

Every other terminal is excellent at its thing. The matrix below is the proof — not marketing. Hover any row to see only AnvilTerm light up.

Capability	AnvilTerm	Warp	iTerm2	Ghostty	WezTerm	Kitty	Alacritty	Tabby
Multi-agent swarm · live tiles	●	—	—	—	—	—	—	—
MCP server · agents drive the terminal	●	partial	—	—	—	—	—	—
SwarmRoom · inter-agent chat	●	—	—	—	—	—	—	—
Arena · head-to-head benchmark	●	—	—	—	—	—	—	—
Forge · MCP + Skills marketplace	●	catalog	—	—	—	—	—	—
Capability routing (task → best model)	●	—	—	—	—	—	—	—
Cross-vendor usage tracking	●	—	—	—	—	—	—	—
Usage forecast · threshold alerts	●	—	—	—	—	—	—	—
Inline images · SVG render	●	partial	✓	—	partial	✓	—	—
Inline video · PDF · audio waveform	●	—	—	—	—	—	—	—
YouTube embed · hover previews	●	—	—	—	—	—	—	—
Markdown table → spreadsheet	●	—	—	—	—	—	—	—
Local AI · offline assistant	●	—	plugin	—	—	—	—	—
Voice input · push-to-talk	●	—	—	—	—	—	—	—
Session recording · ANSI + plain	●	partial	—	—	—	—	—	—
Interactive screenshot → MCP	●	—	—	—	—	—	—	—
Automation API (DevTools · Playwright)	●	—	AppleScript	—	Lua	RC	—	plugins
Works offline · no cloud lock-in	●	—	✓	✓	✓	✓	✓	✓
Open source	Apache + FSL	closed	GPL	MIT	MIT	GPL	Apache	MIT
macOS · Linux · Windows	● ● ●	✓ ✓ ✓	mac	✓ ✓ β	✓ ✓ ✓	✓ ✓ —	✓ ✓ ✓	✓ ✓ ✓
Native GPU renderer	xterm.js	✓	Metal	✓	✓	✓	OpenGL	—

● shipped · AnvilTerm ✓ shipped partial via plugin or limited — not available

10/

Made for three sides

dev · agents · labs

Built for the AI era.
All three sides of it.

A / The developer

Stop flipping tabs.

Every AI CLI in one workspace. Benchmark them on real work. Pick the best tool for each job. Never break your flow.

B / The agents

Real tools.

A browser, a PTY, an artifact renderer, image / PDF / YouTube understanding, a room to talk to their siblings. MCP, done for you.

C / The labs

Reproducible benchmarks.

Arena runs head-to-head on real CLIs, real tasks. Every run is a signed receipt — tokens, cost, time, tests passed — appended to ~/.anvil/benchmarks/runs.jsonl. Reproducible by anyone with the same repo and prompt.

A terminal for you and your AI agents

Every other terminal ships with one AI. AnvilTerm ships with a team.

One prompt.
A team of AIs.