Introduction

Swarm is a multi-agent orchestration framework for AI coding agents. It coordinates multiple AI agents working in parallel on a shared codebase, where each agent runs in an isolated git worktree, communicates through a SQLite-backed mailbox, and is managed by a supervisor that handles merging results back to the main branch.

Who Is This For?

Swarm is built for developers who want to:

Parallelize large coding tasks across multiple AI agents, each with a distinct role (backend, frontend, reviewer, etc.)
Maintain isolation between agents so they don't interfere with each other's work
Coordinate agent work through messaging, shared issue tracking, and supervised merging
Monitor progress in real time through a terminal UI

Key Capabilities

Capability	Description
Parallel agent sessions	Run multiple AI agents simultaneously, each in its own git worktree branch
SQLite messaging	Agents communicate via a durable, threaded mailbox with urgency levels
State machine lifecycle	Each agent follows a well-defined state machine (Initializing → Running → Stopped) with automatic error recovery and backoff
TUI dashboard	Real-time terminal UI showing agent status, logs, and events
Git worktree isolation	Each agent works on an isolated branch; changes are merged back on stop
Configurable permissions	Fine-grained allow/ask/deny rules per tool, per agent
MCP integration	Connect external tool servers via the Model Context Protocol
WASM sandboxed tools	Run untrusted tool code in a WebAssembly sandbox with resource limits
Hooks system	Execute custom scripts on lifecycle events (session start, tool calls, etc.)
Beads issue tracking	Built-in integration with the `bd` CLI for task management across agents
Workflow pipelines	Define multi-stage workflows with gates and approvals
Iteration engine	Run repeated task-solving loops with configurable progress detection

Architecture at a Glance

┌─────────────────────────────────────────────────────┐
│                   Orchestrator                       │
│  (session management, periodic tasks, shutdown)      │
├──────────┬──────────┬──────────┬────────────────────┤
│ Agent 1  │ Agent 2  │ Agent N  │    Supervisor       │
│ worktree │ worktree │ worktree │    worktree         │
│ branch   │ branch   │ branch   │    branch           │
├──────────┴──────────┴──────────┴────────────────────┤
│              SQLite Mailbox + Router                 │
│         (message delivery, urgent interrupts)        │
├─────────────────────────────────────────────────────┤
│   AgentBackend (Anthropic API / pluggable providers) │
└─────────────────────────────────────────────────────┘

The orchestrator creates a session that tracks the base commit, agent list, and process ID. Each agent runs through a state machine loop: build prompt → spawn backend session → run → handle results → repeat. Agents communicate by sending messages through the SQLite mailbox, and a router polls for urgent messages to trigger interrupts.

When you stop a session, the supervisor merges (or squashes/discards) each agent's branch back into the base branch.

When to Use Swarm

Swarm is a good fit when:

Your task can be decomposed into independent or loosely-coupled subtasks (e.g., "backend builds API endpoints while frontend builds the UI")
You want automated coordination between agents rather than manual copy-paste between chat windows
You need durability — agent work persists in git branches even if the process crashes
You want observability into what each agent is doing via the TUI

Swarm may not be the right tool if:

Your task is small enough for a single agent session
You need agents to share the same working directory in real time (swarm uses branch isolation)

What's Next

Quick Start — Install, configure, and run your first session
Architecture — Deep dive into crate structure and data flow
Writing Agents — Configure multi-agent setups

Quick Start

Install, configure, and run your first swarm session in minutes.

Prerequisites

Before you begin, ensure you have:

Requirement	Minimum Version	Check Command
Rust toolchain	Latest stable	`rustc --version`
Git	2.20+	`git --version`
Anthropic API key	—	`echo $ANTHROPIC_API_KEY`

Swarm uses git worktrees, which require git 2.20 or newer. If your version is older, swarm will exit with a VersionTooOld error at startup.

Install

Clone the repository and build:

git clone <repo-url> swarm
cd swarm
cargo build --release

The binary is at target/release/swarm. Add it to your PATH or use cargo install --path swarm.

To enable WASM sandboxed tools (optional):

cargo build --release --features wasm-sandbox

Initialize a Project

Navigate to a git repository and run:

cd /path/to/your-project
swarm init

This creates ~/.swarm/settings.json with a starter configuration for your project. The config is keyed by the absolute, canonicalized path to your project directory.

Tip: You can also pass --path /path/to/project to initialize a different directory.

Configure Agents

Open ~/.swarm/settings.json and define your agents. Here is a minimal two-agent configuration:

{
  "version": 2,
  "/home/user/my-project": {
    "providers": {
      "default": {
        "type": "anthropic",
        "api_key_env": "ANTHROPIC_API_KEY"
      }
    },
    "defaults": {
      "model": "sonnet"
    },
    "agents": [
      {
        "name": "backend",
        "prompt": "You are a backend engineer. Work on server-side code, APIs, and database logic."
      },
      {
        "name": "frontend",
        "prompt": "You are a frontend engineer. Work on UI components, styling, and client-side logic."
      }
    ]
  }
}

Each agent needs:

name — Unique identifier matching [a-z][a-z0-9-]*
prompt — System prompt text, or @path/to/file to load from a file

See Writing Agents for the full configuration guide.

Set Your API Key

Export your Anthropic API key:

export ANTHROPIC_API_KEY="sk-ant-..."

The environment variable name can be customized per provider via the api_key_env field.

Start a Session

swarm start

This launches the orchestrator, which:

Loads and validates your configuration
Checks git prerequisites (clean tree, version)
Creates a session with ID format YYYYMMDD-XXXX
Creates a git worktree per agent at .swarm/worktrees/<name>
Opens the SQLite mailbox at .swarm/messages.db
Spawns all agents in parallel
Opens the TUI dashboard

If your working tree has uncommitted changes, use --stash to auto-stash them:

swarm start --stash

To run without the TUI (log output to terminal instead):

swarm start --no-tui

The TUI Dashboard

The TUI displays a panel for each agent showing:

Agent name and current state (e.g., Running, CoolingDown)
Session sequence number
Live log output

Key Bindings

Key	Action
`Tab` / `Shift+Tab`	Cycle focus between agent panels
`1`–`9`	Jump to agent panel by index
`l`	Toggle log viewer overlay
`e`	Toggle event viewer overlay
`q`	Quit TUI (session keeps running)
`:`	Open input bar for commands

The TUI refreshes at approximately 30 FPS (33ms frame interval).

Send Messages to Agents

From a separate terminal, send a message to a specific agent:

swarm send backend "Add a health check endpoint at GET /health"

Or broadcast to all agents:

swarm broadcast "Please commit your current work"

For messages that should interrupt an agent immediately:

swarm send backend "Stop what you're doing and fix the failing tests" --urgent

Urgent messages trigger the router interrupt — the agent's current session is cancelled and it restarts with the urgent message in its prompt.

Check Status

swarm status

This shows each agent's current state, session sequence, and error counts. Add --json for machine-readable output.

Stop the Session

When you're done, stop the session and merge all agent work:

swarm stop --merge

Stop modes:

Flag	Behavior
`--merge`	Merge each agent's worktree branch into the base branch
`--squash`	Squash-merge each agent's work into a single commit
`--discard`	Discard all agent work and clean up worktrees

If no flag is provided, swarm prompts for your choice.

The stop sequence:

Signals all agents to stop
Waits for graceful shutdown
Applies the chosen merge strategy
Removes worktrees and session artifacts
Cleans up the lockfile

View Logs

To view an agent's logs:

swarm logs backend

Follow logs in real time:

swarm logs backend --follow

View logs from a specific session:

swarm logs backend --session 2

Clean Up

If a session crashed or left stale artifacts:

swarm clean

Use --force to remove artifacts without confirmation.

Next Steps

Writing Agents — Full agent configuration guide
Configuration — How config loading and resolution works
Agent Lifecycle — Understanding agent states
Messaging — How agents communicate
Beads Workflow — Using beads for issue tracking

Architecture

This page describes swarm's high-level architecture: the crate structure, module map, data flow between components, and key design decisions.

Crate Structure

Swarm is a single-crate Rust project organized as a Cargo workspace with one member:

swarm/                  # Workspace root
├── Cargo.toml          # Workspace manifest (members = ["swarm"])
└── swarm/              # Main crate
    ├── Cargo.toml      # Crate manifest
    └── src/
        ├── lib.rs      # Module declarations and crate-level docs
        ├── main.rs     # Binary entry point (CLI parsing → orchestrator)
        └── ...         # All modules below

The crate exposes a library (swarm_lib) and a binary (swarm). The binary is a thin wrapper that parses CLI arguments and delegates to the orchestrator.

Module Map

Module	Purpose
`cli`	CLI argument parsing via `clap` (commands, flags, subcommands)
`config`	Settings file loading, validation, and resolution (raw → resolved types)
`orchestrator`	Top-level session lifecycle: start (13-step flow), stop, status
`session`	Session ID generation, `session.json` management, PID-based liveness
`agent::state`	Agent state machine (`AgentState`, `AgentEvent`, `SideEffect`)
`agent::runner`	Agent lifecycle loop driver (prompt → spawn → run → repeat)
`agent::registry`	Central registry of all running agents and their handles
`backend`	`AgentBackend` trait abstraction for LLM providers (Anthropic, mock)
`prompt`	14-section prompt assembly pipeline (`build_prompt()`)
`mailbox`	SQLite-backed per-agent message broker with threading and urgency
`router`	Async message router that polls for urgent messages and sends interrupts
`tools`	`Tool` trait, `ToolRegistry`, and all built-in tools
`tools::wasm`	WASM sandboxed tool execution (feature-gated: `wasm-sandbox`)
`permissions`	Permission rules, sets, modes, and evaluation logic
`skills`	Skill discovery, frontmatter parsing, argument substitution, resolution
`mcp`	Model Context Protocol client, transport (HTTP/SSE/Stdio), and manager
`hooks`	Hook configuration, event types, and script execution
`worktree`	Git worktree creation, cleanup, merging, and recovery
`tui`	Terminal UI application (agent panels, log viewer, event viewer, input)
`liveness`	Agent liveness monitoring (idle nudges, stall detection, warnings)
`iteration`	Iteration engine for repeated task-solving loops
`workflow`	Workflow pipeline definitions and execution
`conversation`	Conversation history management
`context_window`	Context window size tracking and management
`supervisor`	Supervisor agent logic and merge-focused prompt
`tasks`	Task system integration
`modes`	Agent execution modes (code, delegate, etc.)
`logging`	Structured logging setup
`errors`	Error types for all subsystems
`history`	Session history and archiving

Data Flow

Session Start (13-Step Flow)

CLI (swarm start)
  │
  ├── 1. Load config (~/.swarm/settings.json)
  ├── 2. Validate git prerequisites (version, repo, not detached)
  ├── 3. Handle --init flag
  ├── 4. Handle --stash or require clean working tree
  ├── 5. Check for stale session + recovery
  ├── 6. Create session (session.json + lockfile)
  ├── 7. Create worktrees (one per agent + supervisor)
  ├── 8. Initialize SQLite mailbox database
  ├── 9. Create agent runners + registry
  ├── 10. Start message router (100ms poll loop)
  ├── 11. Start periodic tasks (WAL checkpoint, message prune)
  ├── 12. Launch TUI or headless mode
  └── 13. Await shutdown signal → graceful shutdown

Agent Lifecycle Loop

Each agent runs independently through its state machine:

Initializing → BuildingPrompt → Spawning → Running → SessionComplete
                    ↑                          │            │
                    │                          │            │
                    │    ┌─────── CoolingDown ←┘ (on error) │
                    │    │  (exponential backoff)            │
                    │    ↓                                   │
                    └────┴───────────────────────────────────┘
                                                    (next session)

The runner loop for each agent:

Build prompt — Assembles a 14-section system prompt with environment info, role, tools, pending messages, beads tasks, etc.
Spawn backend session — Sends the prompt to the configured LLM provider (Anthropic API)
Run — The backend session executes, making tool calls that the runner handles
Handle exit — On success, transition to SessionComplete; on error, enter CoolingDown with exponential backoff
Repeat — After cooldown or session complete, rebuild prompt and spawn again

Message Flow

Agent A                    SQLite DB                    Agent B
   │                          │                            │
   ├── send(to=B, body) ─────►│                            │
   │                          ├── INSERT INTO messages ────►│
   │                          │                            │
   │                     Router (100ms poll)                │
   │                          ├── poll_urgent() ───────────►│
   │                          │   (if urgent)     InterruptSignal
   │                          │                            │
   │                          │◄── consume() ──────────────┤
   │                          │   (next prompt build)      │

Shutdown Flow

SIGTERM received (or operator stop)
  │
  ├── Signal all agents: OperatorStop event
  ├── Wait for all agents to reach Stopped state
  ├── Auto-commit any dirty worktrees
  ├── Merge/squash/discard agent branches (based on StopMode)
  ├── Remove worktrees and prune
  ├── Delete session branches
  ├── Remove session.json + lockfile
  └── Exit

Key Dependencies

Dependency	Used For
`tokio`	Async runtime (ADR-001)
`clap`	CLI argument parsing
`serde` / `serde_json`	Configuration and message serialization
`rusqlite`	SQLite mailbox (ADR-002)
`ratatui`	Terminal UI rendering (ADR-007)
`tracing`	Structured logging
`reqwest`	HTTP client for Anthropic API and MCP transports
`chrono`	Timestamp handling
`anyhow` / `thiserror`	Error handling (ADR-009)
`wasmtime`	WASM sandbox runtime (optional, feature-gated)
`libc`	Process liveness checks (kill signal 0)

Design Decisions

The architecture is shaped by several key decisions documented in ADRs:

ADR-001: Tokio Async Runtime — All I/O is async via Tokio, enabling concurrent agent runners, router polling, and periodic tasks in a single process.
ADR-002: SQLite WAL for Messaging — Inter-agent messaging uses a single SQLite database in WAL mode, providing durability without a separate message broker.
ADR-003: Agent Backend Abstraction — The AgentBackend trait decouples agent logic from LLM providers, enabling mock backends for testing.
ADR-004: Fresh Sessions — Each session starts fresh; there is no --resume flag. Agents rebuild context from prompt + messages.
ADR-005: Foreground Process — Swarm runs as a foreground process (not a daemon), simplifying lifecycle management and TUI integration.
ADR-006: Git Worktree Isolation — Each agent gets its own git worktree and branch, preventing file conflicts between parallel agents.
ADR-007: TUI First — The TUI is the primary interface, with headless mode as an alternative (--no-tui).
ADR-008: Beads Integration — The bd CLI is used for issue tracking, injected into agent prompts as available tasks.
ADR-009: Error Handling — thiserror for library error types, anyhow at the binary/orchestrator level.
ADR-010: Shared Beads Branch — All agents share a single beads branch with optimistic concurrency for issue tracking data.

Component Interactions

                    ┌──────────────┐
                    │     CLI      │
                    └──────┬───────┘
                           │
                    ┌──────▼───────┐
                    │ Orchestrator │──────────────────┐
                    └──────┬───────┘                  │
                           │                          │
              ┌────────────┼────────────┐      ┌──────▼──────┐
              │            │            │      │   Session    │
        ┌─────▼────┐ ┌────▼─────┐ ┌────▼────┐ │  Management │
        │ Agent 1  │ │ Agent 2  │ │ Agent N │ └─────────────┘
        │ Runner   │ │ Runner   │ │ Runner  │
        └────┬─────┘ └────┬─────┘ └────┬────┘
             │            │            │
        ┌────▼────────────▼────────────▼────┐
        │           Agent Registry          │
        └────┬─────────────────────────┬────┘
             │                         │
      ┌──────▼───────┐         ┌──────▼───────┐
      │   Backend    │         │   Mailbox    │
      │  (Anthropic) │         │   (SQLite)   │
      └──────────────┘         └──────┬───────┘
                                      │
                               ┌──────▼───────┐
                               │    Router    │
                               │ (100ms poll) │
                               └──────────────┘

Agent Lifecycle — Detailed state machine walkthrough
Messaging — SQLite mailbox design and message flow
Orchestration — The 13-step start flow in detail
Configuration — Settings file structure and resolution

Agent Lifecycle

Each swarm agent follows a deterministic state machine that drives its lifecycle from initialization through multiple backend sessions to eventual shutdown. The state machine is defined in agent::state and executed by the runner loop in agent::runner.

Agent States

The AgentState enum defines 8 observable states:

State	Description
`Initializing`	Agent registered; waiting for its git worktree to be ready
`BuildingPrompt`	Assembling the system prompt (environment, role, tools, messages, tasks)
`Spawning`	Prompt stored; launching a backend session with the LLM provider
`Running { session_seq }`	Backend session is active; `session_seq` tracks which session iteration
`Interrupting { session_seq }`	Graceful cancellation requested (urgent message received); waiting for session exit
`SessionComplete`	Backend session exited successfully; ready for next iteration
`CoolingDown { until }`	Session failed; waiting for exponential backoff to elapse
`Stopped`	Terminal state — agent will not run again

Check if an agent has reached its terminal state with AgentState::is_terminal(), which returns true only for Stopped.

Agent Events

The AgentEvent enum defines the events that drive state transitions:

Event	Trigger
`WorktreeReady`	Git worktree created and ready for use
`PromptReady(String)`	System prompt assembled successfully
`SessionStarted(u32)`	Backend session launched (carries the session sequence number)
`SessionExited(ExitOutcome)`	Backend session ended — `Success`, `Error(String)`, or `Timeout`
`UrgentMessage`	Router detected an urgent message for this agent
`GraceExceeded`	Interruption grace period expired without session exit
`BackoffElapsed`	CoolingDown timer expired
`OperatorStop`	Operator requested shutdown (global — valid from any state)
`FatalError(String)`	Unrecoverable error (global — valid from any state)

Side Effects

Each transition returns a SideEffect telling the runner what action to take:

SideEffect	Runner Action
`None`	No action needed
`StorePrompt(String)`	Save the assembled prompt for the next spawn
`CancelSession`	Request graceful cancellation of the current backend session
`ForceStopSession`	Force-stop the session immediately (grace period exceeded)
`IncrementSession`	Bump session sequence counter and loop back to BuildingPrompt
`LogFatal(String)`	Log the fatal error message; agent is now Stopped

State Diagram

                    ┌──────────────┐
                    │ Initializing │
                    └──────┬───────┘
                           │ WorktreeReady
                    ┌──────▼────────┐
              ┌────►│ BuildingPrompt │◄─────────────────────┐
              │     └──────┬────────┘                       │
              │            │ PromptReady                    │
              │     ┌──────▼───────┐                        │
              │     │   Spawning   │──── SessionExited ─────┤
              │     └──────┬───────┘    (Error/Timeout)     │
              │            │ SessionStarted                 │
              │     ┌──────▼───────┐                 ┌──────┴──────┐
              │     │   Running    │── Error/Timeout─►│ CoolingDown │
              │     └──┬───┬───────┘                 └──────┬──────┘
              │        │   │ UrgentMessage                  │ BackoffElapsed
              │        │   │                                │
              │        │ ┌─▼────────────┐                   │
              │        │ │ Interrupting  │──────────────────►│
              │        │ └──────────────┘                   │
              │        │ SessionExited(Success)              │
              │ ┌──────▼────────┐                           │
              │ │SessionComplete│                           │
              │ └──────┬────────┘                           │
              │        │ WorktreeReady                      │
              └────────┘◄───────────────────────────────────┘

        ── OperatorStop or FatalError from ANY state ──► Stopped

Error Thresholds and Backoff

The state machine tracks two error counters:

Counter	Default Limit	Behavior
`consecutive_errors`	5 (`max_consecutive_errors`)	Reset to 0 on `SessionStarted` or `SessionExited(Success)`
`total_errors`	20 (`max_total_errors`)	Never reset; accumulates across all sessions

When either counter reaches its limit, the agent transitions to Stopped with a LogFatal side effect.

Backoff Formula

When an error occurs, the agent enters CoolingDown with exponential backoff:

duration_ms = min(2000 * 2^(n-1), 60000)

Where n is consecutive_errors (after increment). Examples:

Consecutive Errors	Backoff Duration
1	2,000 ms
2	4,000 ms
3	8,000 ms
4	16,000 ms
5	32,000 ms
6+	60,000 ms (cap)

Agent Registry

The AgentRegistry (agent::registry) is the central data structure that tracks all running agents:

AgentHandle — Bundles an agent's resolved config, state watch channel, interrupt sender, and task join handle
Registration — register() adds a new agent handle; each agent gets a unique name
State queries — states() returns a snapshot of all agent states; state_of(name) queries a single agent
Interrupt delivery — interrupt_senders() returns a map of interrupt channels for the router
Shutdown — shutdown() sends OperatorStop to all agents and awaits their task handles

Runner Loop

The run_agent() function in agent::runner is the top-level entry point for each agent's lifecycle:

Setup — Create worktree, initialize environment variables, fire SessionStart hook
State machine loop — Process events, execute side effects, manage the backend session
Session iteration — On SessionComplete + WorktreeReady, increment sequence and rebuild prompt
Interrupt handling — On UrgentMessage, cancel the session with a grace period; force-stop on GraceExceeded
Cleanup — On Stopped, archive session logs, fire SessionEnd hook, prune old logs

The runner manages environment variables injected into each backend session:

SWARM_AGENT_ID — The agent's name
SWARM_SESSION_ID — The current session ID
SWARM_DB_PATH — Path to the SQLite mailbox database
SWARM_AGENTS — Comma-separated list of all agent names

State Transitions — Full transition table
Architecture — How agents fit in the overall system
Orchestration — How the orchestrator manages agent runners

Messaging

Swarm agents communicate through a SQLite-backed mailbox system. Messages are stored durably in a shared database, delivered to recipients on their next prompt build, and can trigger real-time interrupts for urgent communications.

Design

The messaging system uses SQLite in WAL (Write-Ahead Logging) mode as the message store. This choice (ADR-002) provides:

Durability — Messages survive process crashes
Concurrent access — WAL mode allows multiple readers with a single writer
No external dependencies — No message broker or network service required
Simplicity — A single file at .swarm/messages.db

Message Structure

The Message struct represents a single message:

Field	Type	Description
`id`	`i64`	Auto-incrementing primary key
`thread_id`	`Option<i64>`	ID of the root message in the thread (for grouping)
`reply_to`	`Option<i64>`	ID of the message this is a direct reply to
`sender`	`String`	Name of the sending agent (or "operator" for CLI messages)
`recipient`	`String`	Name of the receiving agent
`msg_type`	`MessageType`	Discriminator: `Message`, `Task`, `Status`, or `Nudge`
`urgency`	`Urgency`	`Normal` or `Urgent`
`body`	`String`	The message content
`created_at`	`i64`	Epoch nanoseconds when the message was created
`delivered_at`	`Option<i64>`	Epoch nanoseconds when consumed; `NULL` while pending

MessageType

Variant	Usage
`Message`	General inter-agent communication
`Task`	Task assignment or delegation
`Status`	Status updates between agents
`Nudge`	Liveness nudge from the monitoring system

Urgency

Variant	Behavior
`Normal`	Delivered on the recipient's next prompt build
`Urgent`	Triggers an interrupt via the router, causing the recipient to restart its session

Mailbox Operations

The Mailbox struct provides per-agent messaging operations:

Operation	Description
`send(recipient, body, msg_type, urgency)`	Send a message to another agent (self-send rejected)
`reply(original_id, body, msg_type, urgency)`	Reply to an existing message, inheriting thread context
`broadcast(recipients, body, msg_type, urgency)`	Send to multiple agents in a single transaction
`consume()`	Atomically read and mark all pending messages as delivered
`thread(thread_id)`	Retrieve all messages in a conversation thread
`outbox(limit)`	Get recently sent messages

Free functions are also available for use outside the Mailbox context:

send_message() — Send a message using a raw connection
broadcast_message() — Broadcast to multiple recipients
consume_messages() — Consume pending messages for an agent

Message Router

The router module runs an async polling loop that watches for urgent messages:

Router Loop (every 100ms):
  1. poll_urgent(conn) → Vec<UrgentMessage>
  2. For each urgent message:
     a. Skip if already signalled (deduplication via HashSet)
     b. Send InterruptSignal to recipient's mpsc channel
     c. Add to signalled set
  3. Sleep 100ms
  4. Exit on shutdown signal

When the router sends an InterruptSignal, the agent's runner receives an UrgentMessage event, which triggers the interrupt flow:

Running → Interrupting (with CancelSession side effect)
The backend session is gracefully cancelled
On session exit → BuildingPrompt (the new prompt will include the urgent message)

Message Threading

Messages can be organized into threads using thread_id and reply_to:

When you send a new message, thread_id and reply_to are NULL
When you reply to a message, the reply inherits the original's thread_id (or uses the original's id as the thread root)
The thread() method retrieves all messages sharing the same thread_id

Database Configuration

The SQLite database is configured with these PRAGMAs:

PRAGMA	Value	Purpose
`journal_mode`	`WAL`	Concurrent reads, single writer
`busy_timeout`	`5000` ms	Wait up to 5 seconds on lock contention

Periodic maintenance tasks run in the background:

Task	Interval	Action
WAL checkpoint	60 seconds	`PRAGMA wal_checkpoint(TRUNCATE)` — reclaims WAL file space
Message prune	300 seconds	Delete old delivered messages, keeping the most recent 1000

Message Flow in Practice

Agent A calls the mailbox tool to send a message to Agent B
The message is INSERTed into the messages table with delivered_at = NULL
If the message is urgent, the router detects it within 100ms and sends an InterruptSignal to Agent B
Agent B's runner cancels its current session and rebuilds the prompt
On the next prompt build, consume() marks all pending messages as delivered and includes them in the system prompt
Agent B reads the messages in its prompt context and responds accordingly

ADR-002: SQLite Messaging — Design rationale
Message Schema — Full database schema
Agent Lifecycle — How interrupts affect the state machine

Orchestration

The orchestrator is the top-level component that manages the entire swarm lifecycle. It implements the 13-step start flow, handles shutdown, and coordinates all subsystems.

Session Management

Each swarm run creates a session represented by a SessionInfo struct:

Field	Type	Description
`id`	`String`	Format `YYYYMMDD-XXXX` (date + 4 random hex chars, e.g. `20250115-a3f2`)
`base_commit`	`String`	The HEAD commit hash at session start
`agents`	`Vec<String>`	List of agent names from the config
`started_at`	`DateTime<Utc>`	UTC timestamp of session creation
`pid`	`u32`	Process ID of the orchestrator (used for liveness checks)

Session state is persisted in .swarm/session.json alongside a lockfile containing the PID. Both files are written atomically (temp file then rename).

Stale Session Detection

A session is considered stale if its owning process no longer exists. This is checked using libc::kill(pid, 0):

Returns 0 — process alive, session is active
Returns -1 with ESRCH — process gone, session is stale

Stale sessions are automatically recovered before creating a new one.

The 13-Step Start Flow

When you run swarm start, the orchestrator executes these steps in order:

Step 1: Load Configuration

Read ~/.swarm/settings.json, validate the version, look up the project by its canonicalized path, and resolve all defaults into a ResolvedConfig.

Step 2: Validate Git Prerequisites

Check git version >= 2.20
Verify the project is a git repository
Confirm HEAD is not detached

Step 3: Handle `--init` Flag

If --init is set and the repo needs initialization, run init_git_repo(). If the repo already exists, this is a no-op.

Step 4: Handle Working Tree State

If --stash is set: auto-stash uncommitted changes (git stash push --include-untracked -m "swarm auto-stash")
Otherwise: require a clean working tree (git status --porcelain must be empty)

Step 5: Check for Stale Session

If .swarm/session.json exists:

If the process is alive: bail with "session already active"
If the process is dead: recover the stale session (auto-commit, remove worktrees, delete branches)

Step 6: Create Session

Generate a session ID, write session.json and lockfile atomically.

Step 7: Create Worktrees

For each agent and the supervisor, create a git worktree:

git worktree add .swarm/worktrees/<name> -b swarm/<session_id>/<name> <base_commit>

Lock each worktree to prevent accidental pruning.

Step 8: Initialize SQLite

Open (or create) the mailbox database at .swarm/messages.db with WAL mode enabled.

Step 9: Create Agent Runners and Registry

For each resolved agent config:

Create an AgentHandle with state channels and interrupt sender
Spawn the run_agent() task on Tokio
Register in the AgentRegistry

Step 10: Start Message Router

Launch the async router loop that polls for urgent messages every 100ms and delivers InterruptSignals to the appropriate agent channels.

Step 11: Start Periodic Tasks

WAL checkpoint: Every 60 seconds, run PRAGMA wal_checkpoint(TRUNCATE)
Message prune: Every 300 seconds, delete old delivered messages (keep recent 1000)

Step 12: Launch TUI or Headless Mode

Default: Launch the TUI with agent panels, log viewer, and command input
--no-tui: Run in headless mode, logging to stdout

Step 13: Await Shutdown

Block until a shutdown signal is received (SIGTERM, TUI quit, or all agents stopped), then execute graceful shutdown.

Stop Modes

When a session is stopped (swarm stop), agent branches are handled according to the stop mode:

Mode	Flag	Behavior
Merge	`--merge` (default)	`git merge --no-ff` each agent branch into the base branch, in config order
Squash	`--squash`	`git merge --squash` each agent branch, creating a single commit per agent
Discard	`--discard`	Delete agent branches without merging any changes

The merge order is: agent branches first (in the order defined in settings.json), then the supervisor branch.

Shutdown Sequence

The graceful shutdown sequence runs inside the orchestrator process:

Signal all agents — Send OperatorStop to each agent via the registry
Wait for agents — Wait for all agents to reach the Stopped state
Stop router — Signal the router's shutdown channel
Auto-commit — For each worktree (agents + supervisor), commit any dirty changes
Merge branches — Apply the selected stop mode (merge/squash/discard)
Remove worktrees — Unlock and remove each worktree
Prune worktrees — Run git worktree prune to clean stale references
Delete branches — Remove all swarm/<session_id>/* branches
Remove session — Delete session.json and lockfile
Exit

When swarm stop is run from a separate terminal:

Load the session from .swarm/session.json
Send SIGTERM to the orchestrator PID
Wait up to 60 seconds for the process to exit
If session files remain after exit, perform cleanup from the stop side

Status Command

swarm status provides a snapshot of the current session:

Session: 20250115-a3f2 (active)
Started: 2025-01-15T10:30:00Z (2h 15m ago)
Base commit: abc123def456
PID: 12345

Agents:
  ● backend          Running (1h 23m)
  ● frontend         Running (45m 12s)
  ● reviewer         SessionComplete (idle 5m)

Beads: 3 ready, 2 claimed, 8 closed

With --json, the output is a structured JSON object including agent states, liveness data, and beads summary.

Architecture — How the orchestrator fits in the system
Agent Lifecycle — Agent state machine details
Worktrees — Git worktree operations
ADR-005: Foreground Process — Why swarm runs in the foreground

Prompt Pipeline

The prompt pipeline assembles a comprehensive system prompt for each agent session. It gathers environment information, role instructions, tool descriptions, pending messages, beads tasks, and session context into a structured multi-section prompt that guides the agent's behavior.

Overview

The build_prompt() function in prompt.rs orchestrates prompt assembly. Each time an agent transitions to BuildingPrompt, a fresh prompt is assembled from current state — there is no cached or incremental prompt. This ensures the agent always sees the latest messages, tasks, and environment.

Prompt Sections

The prompt is assembled from up to 14 numbered sections, each conditionally included based on available context:

#	Section	Content	Conditional
1	Identity	Agent name, swarm context	Always
2	Agent Role	The agent's configured prompt text	Always
3	Mode Instructions	Behavior rules for the agent's execution mode (code, delegate, etc.)	When mode is set
4	Workflow Context	Current workflow stage, inputs, and constraints	When in a workflow
5	Project Instructions	Contents of `AGENTS.md` from the project root	When file exists
6	Core Mandates	Universal rules: commit discipline, branch hygiene, communication protocol	Always
7	Doing Tasks	How to approach coding tasks, use tools, handle errors	Always
8	Tool Usage Policy	Rules for tool selection, permission handling	Always
9	Swarm Workflow	Inter-agent communication protocol, when to message teammates	Always
10	Tone & Style	Output formatting guidelines	Always
11	Environment	Platform, OS, git status, recent commits, working directory	Always
12	Messages	Pending messages from other agents (consumed from mailbox)	When messages exist
13	Beads Tasks	Available tasks from `bd ready --json`	When beads is available
14	Session Context	Session ID, session sequence, interrupt context	Always

PromptContext

The PromptContext struct carries all the data needed for prompt assembly:

Field	Description
`agent_name`	Name of the agent being prompted
`agent_prompt`	The agent's configured prompt text
`mode`	Agent execution mode
`session_id`	Current session ID
`session_seq`	Session iteration number
`db_path`	Path to the SQLite mailbox database
`worktree_path`	Agent's worktree directory
`agent_names`	List of all agents in the session
`workflow_context`	Optional workflow stage context
`interrupt_context`	Optional interrupt reason

Environment Information

The EnvironmentInfo struct gathers runtime context:

Field	Source
`platform`	`std::env::consts::OS`
`os_version`	`uname -r` output
`shell`	`$SHELL` environment variable
`cwd`	Current working directory
`git_status`	`git status --short` output
`recent_commits`	`git log --oneline -5` output
`date`	Current date

Message Formatting

Pending messages are consumed from the mailbox and formatted with urgency labels:

## Messages from teammates

[URGENT] From backend (2m ago):
The API endpoint /users is returning 500 errors, please check the database migration.

From frontend (5m ago):
I've finished the login page UI, ready for API integration.

Urgent messages are prefixed with [URGENT] to draw the agent's attention.

Beads Task Integration

When the bd CLI is available, the prompt pipeline runs bd ready --json with a timeout to discover available tasks:

## Available Tasks (from beads)

- SWARM-42: [open] Implement user authentication endpoint
- SWARM-43: [open] Add input validation to signup form
- SWARM-44: [in_progress] Write integration tests for login flow

This allows agents to autonomously pick up and work on tracked issues.

Interrupt Context

When an agent is interrupted (due to an urgent message), the interrupt context is included in the rebuilt prompt:

## Interrupt Context

You were interrupted by an urgent message. Your previous session was cancelled
so you could process this message. Review the messages section above and respond
to the urgent request.

Agent Lifecycle — When prompts are built in the state machine
Messaging — How messages are consumed into the prompt
Configuration — Agent prompt configuration
Skills — How skills inject into the prompt context

Skills

Skills are markdown-based prompt templates that extend agent capabilities. Each skill is a markdown file with YAML frontmatter defining metadata and a body containing the instructions injected into the agent's prompt when invoked.

Skill File Format

A skill file has two parts:

---
name: review-pr
description: Review a pull request
user-invocable: true
argument-hint: "<PR number>"
---

Review pull request $ARGUMENTS and provide feedback on:
1. Code quality
2. Test coverage
3. Security concerns

Frontmatter Fields

The YAML frontmatter (between --- delimiters) supports these fields:

Field	Type	Required	Default	Description
`name`	`String`	No	Derived from filename	Skill identifier
`description`	`String`	No	`""`	Human-readable description
`user-invocable`	`bool`	No	`false`	Whether users can invoke this skill directly
`allowed-tools`	`Vec<String>`	No	`[]`	Tools the skill is allowed to use
`model`	`String`	No	`null`	Model override for this skill
`context`	`String`	No	`null`	Additional context instructions
`agent`	`String`	No	`null`	Target agent for the skill
`hooks`	`Vec<String>`	No	`[]`	Hook events this skill responds to
`argument-hint`	`String`	No	`null`	Hint text shown when skill expects arguments
`unsafe`	`bool`	No	`false`	Whether the skill performs potentially dangerous operations

Frontmatter uses kebab-case field names (e.g., user-invocable) which are deserialized to snake_case internally.

If frontmatter parsing fails, the skill still loads with default values — this is non-fatal.

Argument Substitution

Skill bodies support argument placeholders that are replaced at invocation time:

Placeholder	Replacement
`$ARGUMENTS`	The full argument string passed to the skill
`$ARGUMENTS0` through `$ARGUMENTS9`	Positional arguments (0-indexed)
`$0` through `$9`	Shorthand for positional arguments

Example:

---
name: compare
description: Compare two files
---

Compare the files $0 and $1, highlighting the differences.

Invoked as /compare src/old.rs src/new.rs, this becomes:

Compare the files src/old.rs and src/new.rs, highlighting the differences.

Resolution Order

When a skill is invoked by name, swarm searches three directories in priority order:

Priority	Path	Style
1	`.claude/skills/<name>/SKILL.md`	Project-local, directory-style
2	`.skills/<name>.md`	Project-local, flat files (backward-compatible)
3	`~/.claude/skills/<name>/SKILL.md`	Global user skills

The first match wins. This means project-local skills override global ones.

Skill Names

Valid skill names contain only: [a-zA-Z0-9_:-]. Names with other characters are rejected.

Skill Discovery

The discover_skills() function scans all three directories and returns a BTreeMap<String, SkillSummary> of available skills:

Field	Description
`name`	Skill name
`description`	From frontmatter
`user_invocable`	Whether the skill can be invoked by users

Discovery is sorted alphabetically for consistent ordering.

Skill Resolution

The resolve() function finds a skill by name and returns a fully-resolved SkillDefinition:

Field	Description
`path`	File path where the skill was found
`frontmatter`	Parsed `SkillFrontmatter`
`body`	Skill body text (with frontmatter stripped)

Custom Skills — How to create and test skills
Prompt Pipeline — How skills integrate into the prompt
Tools — The skill tool that invokes skills

Tools

Tools are the primary mechanism through which agents interact with the environment. Each tool implements a common trait and is registered in a central registry that the backend session uses for tool calls.

Tool Trait

The Tool trait defines the interface every tool must implement:

#![allow(unused)]
fn main() {
pub trait Tool: Send + Sync {
    fn name(&self) -> &str;
    fn description(&self) -> &str;
    fn input_schema(&self) -> serde_json::Value;
    fn execution_mode(&self) -> ExecutionMode { ExecutionMode::Native }
    fn execute(&self, input: Value, ctx: ToolContext)
        -> Pin<Box<dyn Future<Output = ToolResult> + Send + '_>>;
}
}

Method	Description
`name()`	Unique identifier for the tool (e.g., `"bash"`, `"read"`)
`description()`	Human-readable description shown to the LLM
`input_schema()`	JSON Schema defining the expected input parameters
`execution_mode()`	`Native` (in-process) or `Sandboxed` (WASM) — defaults to Native
`execute()`	Async execution function taking input JSON and a `ToolContext`

ToolResult

Every tool execution returns a ToolResult:

Field	Type	Description
`content`	`Vec<ToolResultContent>`	One or more content blocks
`is_error`	`bool`	Whether this result represents an error (signals retry to the LLM)

Content blocks can be:

Text — String content
Image — Base64-encoded image with media type

Helper constructors:

ToolResult::text(s) — Creates a successful text result
ToolResult::error(s) — Creates an error text result with is_error = true

ExecutionMode

Mode	Description
`Native`	Runs in-process with full system access
`Sandboxed`	Runs in a WASM sandbox with resource limits and capability restrictions

ToolContext

The ToolContext struct provides execution context to tools:

Field	Description
`working_dir`	The agent's worktree directory
`agent_name`	Name of the executing agent
`session_id`	Current session ID
`env_vars`	Environment variables to inject
`cancellation_token`	Token to check if the session has been cancelled
`permissions`	Optional `PermissionEvaluator` for permission checks

The context supports fluent building with with_env() and with_permissions().

ToolRegistry

The ToolRegistry manages tool registration and lookup:

Method	Description
`register(tool)`	Add a tool to the registry (insertion-order preserved)
`get(name)`	Look up a tool by name
`names()`	List all registered tool names
`definitions()`	Return tool definitions (name, description, schema) for the LLM
`execute(name, input, ctx)`	Execute a tool by name
`retain(predicate)`	Remove tools that don't match a predicate
`register_mcp_tools(tools)`	Register tools from MCP servers
`register_wasm_tools(tools)`	Register WASM sandboxed tools (feature-gated)

Default Registry

default_registry() creates a registry pre-populated with all built-in tools.

Built-in Tools

Swarm ships with these native tools:

Tool	Description
`bash`	Execute shell commands
`read`	Read file contents
`write`	Write/create files
`edit`	Edit files with search-and-replace
`glob`	Find files by pattern
`grep`	Search file contents with regex
`notebook`	Edit Jupyter notebook cells
`web_fetch`	Fetch and process web content
`web_search`	Search the web
`ask_user`	Ask the operator a question
`mailbox`	Send messages to other agents
`sub_agent`	Delegate tasks to sub-agents
`task`	Interact with the task system
`skill`	Execute a skill by name
`mcp_proxy`	Proxy tool calls to MCP servers
`workflow_output`	Report outputs from workflow stages

WASM Tools

When the wasm-sandbox feature is enabled, additional tools can be loaded from compiled WebAssembly components. These run in a sandboxed environment with configurable resource limits and capabilities. See WASM Tools for details.

Adding Tools — How to implement a new tool
Permissions — How tool access is controlled
MCP Integration — External tool servers
WASM Tools — Sandboxed tool execution

MCP Integration

Swarm integrates with external tool servers via the Model Context Protocol (MCP). This allows agents to use tools provided by external processes, expanding capabilities beyond the built-in tool set.

Architecture

The MCP integration consists of three layers:

Component	Module	Purpose
`McpClient`	`mcp::client`	Single-server JSON-RPC client for tool discovery and invocation
`McpManager`	`mcp::manager`	Multi-server lifecycle manager (start, route, shutdown)
Transports	`mcp::transport`	Connection layer (Stdio, HTTP, SSE)

Transports

Swarm supports three MCP transport types:

Stdio

Launches the MCP server as a subprocess. Communication happens over stdin/stdout using JSON-RPC.

{
  "transport": {
    "type": "stdio",
    "command": "npx",
    "args": ["-y", "@modelcontextprotocol/server-filesystem", "/path/to/dir"]
  }
}

HTTP

Connects to an MCP server over HTTP. Each JSON-RPC request is a POST.

{
  "transport": {
    "type": "http",
    "url": "http://localhost:3000/mcp",
    "headers": {
      "Authorization": "Bearer token123"
    }
  }
}

SSE (Server-Sent Events)

Connects via SSE for server-push capabilities, with HTTP POST for client requests.

{
  "transport": {
    "type": "sse",
    "url": "http://localhost:3000/sse",
    "headers": {}
  }
}

McpClient

The McpClient wraps a transport and provides the MCP protocol methods:

Method	Description
`initialize()`	Send the MCP initialization handshake
`list_tools()`	Discover available tools from the server
`call_tool(name, args)`	Invoke a tool with JSON arguments
`shutdown()`	Gracefully close the connection

Each client uses an IdGenerator for unique JSON-RPC request IDs.

McpToolDefinition

Tools discovered from MCP servers are represented as:

Field	Type	Description
`name`	`String`	Tool name as reported by the server
`description`	`Option<String>`	Human-readable tool description
`input_schema`	`Value`	JSON Schema for tool inputs

McpManager

The McpManager manages the lifecycle of multiple MCP server connections:

Method	Description
`start_all(configs)`	Connect to all configured servers; skip failures with warnings
`connect_server(name, config)`	Connect to a single server
`all_tool_definitions()`	Collect and prefix tools from all servers
`call_tool(prefixed_name, args)`	Route a tool call to the correct server
`shutdown_all()`	Gracefully shutdown all server connections

Tool Namespacing

To avoid name collisions between different MCP servers and built-in tools, MCP tools are prefixed with the server name:

mcp__<server_name>__<tool_name>

For example, a tool read_file from a server named filesystem becomes:

mcp__filesystem__read_file

The parse_prefixed_name() function extracts the server and tool names from this format. The PrefixedToolDefinition struct carries the original tool definition along with the prefixed name.

Configuration

MCP servers are configured in the mcpServers section of settings.json:

{
  "mcpServers": {
    "filesystem": {
      "transport": {
        "type": "stdio",
        "command": "npx",
        "args": ["-y", "@modelcontextprotocol/server-filesystem", "."]
      },
      "env": {
        "NODE_PATH": "/usr/local/lib/node_modules"
      }
    },
    "api": {
      "transport": {
        "type": "http",
        "url": "http://localhost:8080/mcp"
      }
    }
  }
}

Each server entry has:

Field	Type	Description
`transport`	`McpTransport`	Connection configuration (stdio/http/sse)
`env`	`HashMap<String, String>`	Optional environment variables for the server process

Server Lifecycle

During orchestrator start (Step 9), the McpManager connects to all configured servers
Tool definitions are fetched and registered in the ToolRegistry with prefixed names
During agent sessions, MCP tool calls are routed through the manager to the correct server
On shutdown, all servers are gracefully disconnected

Failed server connections are logged as warnings but don't prevent the swarm from starting.

MCP Servers — Configuration guide
Tools — How MCP tools integrate with the tool system
Configuration — Where MCP servers are configured

Configuration

Swarm uses a centralized configuration file at ~/.swarm/settings.json that defines projects, agents, providers, permissions, and more. The config system distinguishes between raw (as-written) and resolved (fully-defaulted) types.

Settings File Location

~/.swarm/settings.json

Created automatically by swarm init, or manually. The file uses JSON format with a required version field and project entries keyed by absolute path.

File Structure

{
  "version": 2,
  "/absolute/path/to/project": {
    "providers": { ... },
    "agents": [ ... ],
    "supervisor": { ... },
    "defaults": { ... },
    "permissions": { ... },
    "hooks": { ... },
    "mcpServers": { ... },
    "wasm_tools": [ ... ],
    "sub_agent_defaults": { ... }
  }
}

Version

The version field must be 1 or 2. Version 2 is the current schema. Versions above the supported maximum are rejected.

Providers

Named provider blocks describe how to reach an LLM API:

"providers": {
  "default": {
    "type": "anthropic",
    "api_key_env": "ANTHROPIC_API_KEY",
    "base_url": null,
    "max_retries": null,
    "timeout": null
  }
}

Field	Type	Required	Default	Description
`type`	`String`	Yes	—	Provider type (currently only `"anthropic"`)
`api_key_env`	`String`	No	`"ANTHROPIC_API_KEY"`	Environment variable holding the API key
`base_url`	`String`	No	`null`	Custom API base URL
`max_retries`	`u32`	No	`null`	Max retries for transient failures
`timeout`	`u64`	No	`null`	Request timeout in seconds

If no providers block is defined, an implicit "default" Anthropic provider is created.

Agents

An array of agent definitions (at least one required):

"agents": [
  {
    "name": "backend",
    "prompt": "You are a backend engineer. Focus on API and data layer.",
    "model": "sonnet",
    "provider": "default",
    "permissions": { ... },
    "delegate_mode": false,
    "mode": "code"
  }
]

Field	Type	Required	Default	Description
`name`	`String`	Yes	—	Unique name matching `[a-z][a-z0-9-]*`
`prompt`	`String`	Yes	—	System prompt text, or `@path/to/file` to load from file
`model`	`String`	No	`defaults.model` or `"sonnet"`	Model identifier
`provider`	`String`	No	`defaults.provider` or `"default"`	Provider name
`permissions`	`PermissionsConfig`	No	`null`	Agent-level permission overrides
`delegate_mode`	`bool`	No	`false`	Legacy flag for delegate mode
`mode`	`String`	No	See resolution	Agent execution mode

Prompt resolution: If the prompt starts with @, the remainder is treated as a file path relative to the project root, and the file contents are loaded as the prompt.

Mode resolution priority: agent.mode > defaults.mode > delegate_mode compat ("delegate" if true) > "code"

Supervisor

Optional supervisor configuration:

"supervisor": {
  "prompt": "Custom supervisor prompt",
  "model": "sonnet"
}

Field	Type	Required	Default
`prompt`	`String`	No	Built-in merge-focused supervisor prompt
`model`	`String`	No	`defaults.model`

Defaults

Project-wide defaults applied when agent-level values are not specified:

"defaults": {
  "model": "sonnet",
  "provider": "default",
  "session_timeout": null,
  "commit_interval": 300,
  "max_consecutive_errors": 5,
  "max_total_errors": 20,
  "mode": "code",
  "liveness": { ... }
}

Field	Type	Default	Description
`model`	`String`	`"sonnet"`	Default model for agents and supervisor
`provider`	`String`	`"default"`	Default provider name
`session_timeout`	`u64`	`null`	Session timeout in seconds (none = no timeout)
`commit_interval`	`u64`	`300`	Auto-commit interval in seconds
`max_consecutive_errors`	`u32`	`5`	Consecutive errors before agent stops
`max_total_errors`	`u32`	`20`	Total errors before agent stops
`mode`	`String`	`null`	Default agent mode
`liveness`	`LivenessConfig`	See below	Liveness monitoring settings

Liveness Configuration

Controls idle detection, nudging, and stall monitoring:

"liveness": {
  "enabled": true,
  "idle_nudge_after_secs": 120,
  "idle_nudge_interval_secs": 300,
  "max_nudges": 3,
  "idle_warn_after_secs": 600,
  "stall_timeout_secs": 900,
  "auto_interrupt_stalled": false
}

Field	Type	Default	Description
`enabled`	`bool`	`true`	Enable/disable liveness monitoring
`idle_nudge_after_secs`	`u64?`	`120`	Seconds idle before first nudge
`idle_nudge_interval_secs`	`u64?`	`300`	Seconds between subsequent nudges
`max_nudges`	`u32`	`3`	Maximum nudge messages per idle episode
`idle_warn_after_secs`	`u64?`	`600`	Seconds idle before warning hook fires
`stall_timeout_secs`	`u64?`	`900`	Seconds without heartbeat before stall detection
`auto_interrupt_stalled`	`bool`	`false`	Auto-interrupt stalled agents

Permissions

Project-level permission rules (also available per-agent):

"permissions": {
  "allow": ["Bash(npm run *)"],
  "ask": ["Bash(rm *)"],
  "deny": ["Bash(curl *)"],
  "default_mode": "ask"
}

See Permissions for full details.

Resolution Cascade

When the configuration is loaded, raw values are resolved into fully-defaulted types:

Agent model    = agent.model    ?? defaults.model    ?? "sonnet"
Agent provider = agent.provider ?? defaults.provider ?? "default"
Agent mode     = agent.mode     ?? defaults.mode     ?? (delegate_mode ? "delegate" : "code")

The ResolvedConfig struct has no Option fields — every value is filled in.

Validation Rules

The config loader validates:

Version — Must be 1 or 2 (no higher)
At least one agent — The agents array cannot be empty
Agent names — Must match [a-z][a-z0-9-]* and be unique
Provider references — All agent.provider and defaults.provider values must reference a defined provider
Provider types — Cannot be empty strings
WASM tool names — Must match [a-z][a-z0-9_-]*, be unique, have non-empty paths
WASM capabilities — Must be one of: Logging, WorkspaceRead, HttpRequest, ToolInvoke, SecretCheck

Config Schema — Complete field reference
Writing Agents — Practical configuration guide
Permissions — Permission rules and modes
Environment Variables — Runtime environment

Worktrees

Swarm uses git worktrees to provide each agent with an isolated working directory and branch. This prevents file conflicts between agents working in parallel and enables clean merging of results back to the main branch.

Prerequisites

Git 2.20+ — Required for git worktree lock/unlock support. Checked automatically at startup.
Not detached HEAD — Swarm requires a branch checkout for merge-back to work.
Clean working tree — Either commit/stash changes first, or use --stash to auto-stash.

Path and Branch Conventions

Item	Pattern	Example
Swarm directory	`<repo>/.swarm/`	`/home/user/project/.swarm/`
Worktree path	`<repo>/.swarm/worktrees/<name>`	`.swarm/worktrees/backend`
Agent branch	`swarm/<session_id>/<name>`	`swarm/20250115-a3f2/backend`
Supervisor branch	`swarm/<session_id>/supervisor`	`swarm/20250115-a3f2/supervisor`
Beads branch	`swarm/<session_id>/beads`	`swarm/20250115-a3f2/beads`

The .swarm/ directory is automatically added to .git/info/exclude so it doesn't appear in git status.

Worktree Creation

During session start (Step 7), the orchestrator creates worktrees for each agent and the supervisor:

# For each agent:
git worktree add .swarm/worktrees/<name> -b swarm/<session_id>/<name> <base_commit>
git worktree lock .swarm/worktrees/<name>

# For the supervisor:
git worktree add .swarm/worktrees/supervisor -b swarm/<session_id>/supervisor <base_commit>
git worktree lock .swarm/worktrees/supervisor

# Shared beads branch:
git branch swarm/<session_id>/beads <base_commit>

Each worktree starts from the same base commit (HEAD at session start), ensuring all agents begin with identical codebases.

Worktree Locking

Worktrees are locked immediately after creation using git worktree lock. This prevents git worktree prune from accidentally removing them during the session. Worktrees are unlocked during cleanup.

Worktree Cleanup

When a session stops, worktrees are cleaned up in order:

Auto-commit dirty — If the worktree has uncommitted changes, stage and commit them:

git -C .swarm/worktrees/<name> add -A
git -C .swarm/worktrees/<name> commit -m "swarm: auto-commit on stop"

Unlock — git worktree unlock .swarm/worktrees/<name>
Remove — git worktree remove .swarm/worktrees/<name>
Prune — git worktree prune to clean stale references
Delete branches — Remove all swarm/<session_id>/* branches

Merge Operations

After cleanup, agent branches are merged based on the stop mode:

Merge (`--merge`, default)

Non-fast-forward merge in config order (agents first, then supervisor):

git merge --no-ff swarm/<session_id>/backend -m "Merge agent: backend"
git merge --no-ff swarm/<session_id>/frontend -m "Merge agent: frontend"
git merge --no-ff swarm/<session_id>/supervisor -m "Merge supervisor"

Squash (`--squash`)

Squash-merge each agent branch into a single commit:

git merge --squash swarm/<session_id>/backend
git commit -m "Squash agent: backend"

Discard (`--discard`)

Delete branches without merging — all agent work is discarded.

Recovery

Swarm handles crash recovery for stale sessions:

Stale Session Detection

A session is stale when its pid (from session.json) no longer corresponds to a running process. This is detected via libc::kill(pid, 0).

Recovery Flow

When a stale session is detected (during swarm start or swarm stop):

Auto-commit any dirty worktrees
Remove all session worktrees (unlock + remove)
Prune stale worktree references
Delete all swarm/<session_id>/* branches
Remove session.json and lockfile

Clean Command

swarm clean provides manual cleanup:

swarm clean          # Interactive — asks for confirmation
swarm clean --force  # Removes artifacts without confirmation

This handles cases where automatic recovery isn't sufficient (e.g., corrupted worktree state).

ADR-006: Git Worktree Isolation — Design rationale
Orchestration — How worktrees fit in the start/stop flow
Architecture — System overview

Permissions

The permission system controls which tools agents can use and under what conditions. It uses a layered rule evaluation model with project-level and agent-level overrides.

Permission Rules

A permission rule is a string in the format "ToolName(specifier)":

Bash(npm run *)     — Allow any npm run command
Read(./.env)        — Match reading .env file
WebFetch(domain:*.example.com) — Match fetching from example.com subdomains
Bash(rm -rf *)      — Match rm -rf commands

Rule Format

Component	Description
Tool name	Case-insensitive tool identifier (e.g., `Bash`, `Read`, `Write`)
Specifier	Optional pattern inside parentheses — glob/prefix matching for commands and paths

Special specifier prefixes:

domain: — For WebFetch, matches against the URL's host using glob patterns
No prefix — Matches against the tool's primary input (command for Bash, path for Read/Write)

If no specifier is provided (e.g., just "Bash"), the rule matches all invocations of that tool.

PermissionSet

A PermissionSet contains three ordered lists of rules:

{
  "allow": ["Bash(npm run *)", "Read(*)"],
  "ask": ["Bash(rm *)"],
  "deny": ["Bash(curl *)"]
}

List	Effect
`allow`	Tool call is permitted without asking the user
`ask`	Tool call requires user approval
`deny`	Tool call is blocked outright

Permission Modes

The PermissionMode enum defines 5 evaluation modes that change the default behavior:

Mode	Description	Default Decision
`Default`	Standard mode — explicit rules apply, unmatched calls require asking	Ask
`AcceptEdits`	Automatically allow file edits (Write, Edit, NotebookEdit)	Ask (except edits)
`Plan`	Read-only mode — denies all write/execute tools (Bash, Write, Edit, NotebookEdit)	Deny for writes
`DontAsk`	Never prompt the user — unmatched calls are allowed	Allow
`BypassPermissions`	All tool calls are allowed regardless of rules	Allow

Evaluation Order

When a tool call is evaluated, the PermissionEvaluator follows this 6-step process:

Step 1: Mode Short-Circuit

BypassPermissions → Allow immediately
Plan mode and tool is bash, write, edit, or notebookedit → Deny immediately

Step 2: Agent-Specific Overrides

If the agent has its own permissions block in the config, evaluate those rules first:

Check deny rules → Deny if matched
Check ask rules → Ask if matched
Check allow rules → Allow if matched

Step 3: Global Deny Rules

Check project-level deny rules → Deny if matched

Step 4: Global Ask Rules

Check project-level ask rules → Ask if matched

Step 5: Global Allow Rules

Check project-level allow rules → Allow if matched

Step 6: Mode Default

If no rule matched, apply the mode's default decision:

Default → Ask
AcceptEdits → Allow for edit tools, Ask for others
DontAsk → Allow
BypassPermissions → Allow

Permission Decision

The evaluation returns one of three decisions:

Decision	Behavior
`Allow`	Tool call proceeds
`Ask`	Tool call requires user approval (via TUI or hook)
`Deny`	Tool call is blocked; error returned to the agent

Configuration

Permissions are configured at two levels:

Project Level

{
  "permissions": {
    "allow": ["Bash(npm run *)", "Bash(cargo test *)"],
    "ask": ["Bash(git push *)"],
    "deny": ["Bash(rm -rf /)"],
    "default_mode": "default"
  }
}

Agent Level

{
  "agents": [
    {
      "name": "reviewer",
      "prompt": "...",
      "permissions": {
        "allow": ["Read(*)"],
        "deny": ["Write(*)", "Edit(*)", "Bash(*)"]
      }
    }
  ]
}

Agent-level rules take precedence over project-level rules (evaluated first in the evaluation order).

Configuration — How permissions are configured
Tools — The tools that permissions control
Hooks — Hook-based permission decisions

TUI

Swarm's terminal user interface (TUI) is the primary way to monitor and interact with a running swarm session. It provides real-time visibility into agent states, logs, events, and session metadata.

Design Philosophy

The TUI is a first-class component, not an afterthought (ADR-007). It's built with ratatui (a Rust TUI framework) on top of crossterm for terminal handling, rendering at approximately 30 FPS (33ms frame interval).

For environments where a TUI isn't suitable (CI, remote servers, testing), the --no-tui flag runs swarm in headless mode, logging to stdout instead.

Components

TuiApp

The TuiApp struct holds all mutable TUI state:

Field	Type	Description
`agents`	`Vec<AgentEntry>`	Ordered list of agents (sorted by name)
`selected`	`usize`	Index of the currently selected agent
`log_viewer`	`LogViewer`	Log file viewer for the selected agent
`event_viewer`	`EventViewer`	Real-time streaming event viewer
`input`	`String`	Current contents of the command input bar
`session_id`	`String`	Displayed in the status bar
`quit_requested`	`bool`	Set to true when user requests quit
`context_info`	`HashMap<String, ContextInfo>`	Per-agent context window usage
`show_task_list`	`bool`	Task list overlay visibility (Ctrl+T)
`tasks`	`Vec<Task>`	Snapshot of tasks from the task store
`show_workflow_panel`	`bool`	Workflow panel visibility (Ctrl+W)
`show_iteration_panel`	`bool`	Iteration panel visibility (Ctrl+I)

AgentEntry

Each agent in the TUI is represented as an AgentEntry:

Field	Type	Description
`name`	`String`	Agent name
`state`	`AgentState`	Current state from the state machine
`log_path`	`PathBuf`	Path to the agent's log file
`liveness`	`Option<AgentLiveness>`	Liveness monitoring data (idle time, stall, nudges)

Agent Panel

The left panel shows a list of all agents with their current state:

┌─ Agents ──────────────────┐
│ ● backend      Running    │
│ ● frontend     Running    │
│ ○ reviewer     CoolingDown│
│ ■ supervisor   Stopped    │
└───────────────────────────┘

Each agent shows:

A state indicator icon
The agent name
The current state label
Liveness suffix when available (running duration, idle time, stall warnings)

Log/Event Viewer

The right panel displays output for the selected agent:

Event viewer — Real-time streaming events from the agent's backend session
Log viewer — Fallback log file viewer when event streaming isn't available

The viewer auto-scrolls to follow new output.

Input Bar

The bottom of the screen shows a command input bar where you can type messages to send to agents. The input bar is always visible.

Overlays

Toggle-able overlay panels:

Overlay	Toggle	Content
Task list	`Ctrl+T`	Tasks from the beads task store
Workflow progress	`Ctrl+W`	Active/recent workflow runs and stages
Iteration progress	`Ctrl+I`	Iteration loop runs and status

Context Window

Per-agent context window usage is tracked and displayed:

Field	Description
`used_tokens`	Tokens consumed in the current session
`max_tokens`	Maximum context window size
`usage_percent`	Percentage of context used

Key	Action
`Up` / `Down`	Select previous/next agent
`Ctrl+T`	Toggle task list overlay
`Ctrl+W`	Toggle workflow panel
`Ctrl+I`	Toggle iteration panel
`q` / `Ctrl+C`	Quit (triggers graceful shutdown)

State Refresh

The TUI refreshes agent states each frame by calling AgentRegistry::states(). Liveness data is refreshed from a watch channel provided by the liveness monitor. This keeps the display current without expensive polling.

Headless Mode

When --no-tui is specified:

No terminal UI is rendered
Agent state changes are logged to stdout via tracing
The orchestrator still runs all the same subsystems (router, periodic tasks, etc.)
Shutdown is triggered by SIGTERM only (no interactive quit)

ADR-007: TUI First — Design rationale
Architecture — How the TUI fits in the system
Orchestration — TUI launch in the start flow

Writing Agents

How to define, configure, and tune agents in swarm.

Overview

Agents are the core unit of work in swarm. Each agent runs in its own git worktree with its own backend session, system prompt, and permissions. You define agents in ~/.swarm/settings.json under the agents array of your project configuration.

Step 1: Create the Settings File

If you haven't already, initialize the configuration:

swarm init --path /path/to/your-project

This creates ~/.swarm/settings.json with a skeleton project config. Open it in your editor.

Step 2: Define a Provider

Providers specify how swarm connects to a model API. At minimum, you need one provider:

{
  "version": 2,
  "/home/user/my-project": {
    "providers": {
      "default": {
        "type": "anthropic",
        "api_key_env": "ANTHROPIC_API_KEY"
      }
    }
  }
}

Provider fields:

Field	Type	Required	Default	Description
`type`	`String`	Yes	—	Provider type (`"anthropic"`)
`api_key_env`	`String`	No	`"ANTHROPIC_API_KEY"`	Environment variable holding the API key
`base_url`	`String`	No	`null`	Custom API base URL
`max_retries`	`u32`	No	`null`	Max retries for transient errors
`timeout`	`u64`	No	`null`	Request timeout in seconds

You can define multiple providers and reference them by name in agent or defaults config.

Step 3: Set Project Defaults

The defaults section provides fallback values for all agents:

{
  "defaults": {
    "model": "sonnet",
    "provider": "default",
    "commit_interval": 300,
    "max_consecutive_errors": 5,
    "max_total_errors": 20,
    "liveness": {
      "enabled": true,
      "idle_nudge_after_secs": 120,
      "idle_nudge_interval_secs": 300,
      "max_nudges": 3
    }
  }
}

Key defaults:

Field	Default	Description
`model`	`"sonnet"`	Default model identifier
`provider`	`"default"`	Default provider name
`session_timeout`	`null`	Per-session timeout (seconds)
`commit_interval`	`300`	Auto-commit interval (seconds)
`max_consecutive_errors`	`5`	Errors before an agent stops
`max_total_errors`	`20`	Lifetime errors before an agent stops

Step 4: Define Agents

Each agent needs a name and a prompt. All other fields are optional:

{
  "agents": [
    {
      "name": "backend",
      "prompt": "You are a senior backend engineer. Focus on API design, database schemas, and server-side logic. Write tests for all new endpoints.",
      "model": "sonnet"
    },
    {
      "name": "frontend",
      "prompt": "@prompts/frontend.md",
      "model": "sonnet"
    },
    {
      "name": "reviewer",
      "prompt": "You are a code reviewer. Read changes from other agents and provide feedback via messages.",
      "model": "sonnet"
    }
  ]
}

Agent fields:

Field	Type	Required	Default	Description
`name`	`String`	Yes	—	Unique identifier (`[a-z][a-z0-9-]*`)
`prompt`	`String`	Yes	—	System prompt text or `@path/to/file`
`model`	`String`	No	`defaults.model`	Model identifier
`provider`	`String`	No	`defaults.provider`	Provider name reference
`permissions`	`PermissionsConfig`	No	`null`	Agent-level permission overrides
`mode`	`String`	No	See cascade	Agent execution mode

Prompt Loading

Prompts can be specified in two ways:

Inline: Write the prompt directly as a string value
File reference: Use @path/to/file to load from a file relative to the project root

File references are useful for long prompts and allow version-controlling prompts alongside your code:

prompts/
  backend.md
  frontend.md
  reviewer.md

Name Rules

Agent names must:

Start with a lowercase letter
Contain only lowercase letters, digits, and hyphens
Be unique within the project

Invalid names cause a ValidationError at startup.

Step 5: Configure Permissions

Permissions control what tools each agent can use. They're defined at the project level and optionally overridden per agent.

Project-Level Permissions

{
  "permissions": {
    "allow": ["Read(*)", "Glob(*)", "Grep(*)"],
    "deny": ["Bash(rm -rf *)"],
    "default_mode": "default"
  }
}

Agent-Level Overrides

{
  "name": "frontend",
  "prompt": "...",
  "permissions": {
    "allow": ["Bash(npm *)"],
    "deny": ["Bash(rm *)"]
  }
}

Agent permissions are evaluated after project permissions. See Permissions for the full evaluation order.

Rule Format

Rules follow the pattern Tool(specifier):

Example	Meaning
`Read(*)`	Allow reading any file
`Bash(npm *)`	Allow any npm command
`Bash(rm *)`	Match any rm command
`Edit(src/*.rs)`	Match editing Rust files in src/

Step 6: Choose Agent Modes

The mode field controls how an agent interacts with tools and the operator. Available modes:

Mode	Description
`default`	Standard mode — tools require permission checks
`accept-edits`	Auto-accept file edits without confirmation
`plan`	Planning mode — agent proposes changes but doesn't execute
`dont-ask`	Skip all permission prompts (auto-allow)
`bypass-permissions`	Bypass the permission system entirely

Set a default mode for all agents:

{
  "defaults": {
    "mode": "dont-ask"
  }
}

Or override per agent:

{
  "name": "reviewer",
  "prompt": "...",
  "mode": "plan"
}

Step 7: Add a Supervisor (Optional)

The supervisor generates the final merge commit message when stopping with --merge or --squash:

{
  "supervisor": {
    "prompt": "Summarize all agent changes into a concise merge commit message.",
    "model": "sonnet"
  }
}

If omitted, swarm uses a built-in merge prompt.

Complete Example

A full three-agent configuration:

{
  "version": 2,
  "/home/user/my-project": {
    "providers": {
      "default": {
        "type": "anthropic",
        "api_key_env": "ANTHROPIC_API_KEY"
      }
    },
    "defaults": {
      "model": "sonnet",
      "commit_interval": 300,
      "max_consecutive_errors": 5,
      "max_total_errors": 20,
      "mode": "dont-ask",
      "liveness": {
        "enabled": true,
        "idle_nudge_after_secs": 120,
        "idle_nudge_interval_secs": 300,
        "max_nudges": 3,
        "stall_timeout_secs": 900
      }
    },
    "agents": [
      {
        "name": "backend",
        "prompt": "@prompts/backend.md",
        "model": "sonnet"
      },
      {
        "name": "frontend",
        "prompt": "@prompts/frontend.md",
        "model": "sonnet",
        "permissions": {
          "allow": ["Bash(npm *)", "Bash(npx *)"],
          "deny": ["Bash(rm -rf *)"]
        }
      },
      {
        "name": "reviewer",
        "prompt": "@prompts/reviewer.md",
        "model": "sonnet",
        "mode": "plan"
      }
    ],
    "supervisor": {
      "prompt": "@prompts/supervisor.md"
    },
    "permissions": {
      "allow": ["Read(*)", "Glob(*)", "Grep(*)"],
      "default_mode": "default"
    }
  }
}

Troubleshooting

"config validation failed: agent names must be unique"

Two agents have the same name. Each agent must have a distinct name.

"config validation failed: agents list cannot be empty"

You must define at least one agent in the agents array.

"config file not found"

Run swarm init to create the settings file, or verify the path in ~/.swarm/settings.json matches your project's canonicalized absolute path.

Agent keeps entering CoolingDown state

Check the agent's logs with swarm logs <name>. Common causes:

Invalid API key (check the api_key_env environment variable)
Model name not recognized by the provider
Prompt too large for the model context window

The backoff formula is min(2000 * 2^(n-1), 60000) ms, where n is consecutive errors. After max_consecutive_errors (default 5), the agent stops.

Configuration — How config loading and resolution works
Config Schema — Complete field reference
Permissions — Permission system details
Agent Lifecycle — Agent states and transitions

Custom Skills

Creating custom skills to extend agent capabilities.

Overview

Skills are reusable prompt fragments that agents can invoke during their sessions. Each skill is a Markdown file with optional YAML frontmatter that controls how the skill behaves — who can invoke it, what tools it can use, and how arguments are substituted.

Skill File Format

A skill file has two parts: frontmatter (optional) and body (required).

---
name: my-skill
description: A brief description of what this skill does
user-invocable: true
argument-hint: "<required-arg> [optional-arg]"
---

You are now executing the "my-skill" skill.

The user asked: $ARGUMENTS

Perform the requested action and report results.

Step 1: Choose a Location

Skills are discovered from three paths, in priority order:

Priority	Path	Style
1 (highest)	`<project>/.claude/skills/<name>/SKILL.md`	Directory
2	`<project>/.skills/<name>.md`	Flat file
3 (lowest)	`~/.claude/skills/<name>/SKILL.md`	Global directory

Directory style is preferred — each skill gets its own directory, which can contain supporting files:

.claude/skills/
  commit/
    SKILL.md
  review/
    SKILL.md
    checklist.md

Flat file style is available for backward compatibility:

.skills/
  commit.md
  review.md

If a skill with the same name exists at multiple paths, the highest-priority path wins. First match is used; duplicates at lower priorities are ignored.

Step 2: Write the Frontmatter

The YAML frontmatter block is delimited by --- lines at the top of the file. All fields are optional:

Field	Type	Default	Description
`name`	`String`	Filename/directory name	Skill identifier (overrides the inferred name)
`description`	`String`	`None`	One-line description shown in skill discovery
`user-invocable`	`bool`	`false`	If `true`, appears in the `/` slash-command menu
`disable-model-invocation`	`bool`	`false`	If `true`, only the user can invoke this skill (not the model)
`allowed-tools`	`String[]`	`None`	Restrict which tools the skill can access
`model`	`String`	`None`	Override the model for this skill's execution
`context`	`String`	`None`	Set to `"fork"` for subagent execution
`agent`	`String`	`None`	Subagent type when `context: fork`
`argument-hint`	`String`	`None`	Usage hint displayed to the user
`hooks`	`HooksConfig`	`None`	Lifecycle hooks specific to this skill
`unsafe`	`bool`	`None`	Reserved for future WASM sandbox flag

Invocation Control

Setting	Effect
`user-invocable: true`	Appears in `/` menu; user and model can invoke
`user-invocable: false`	Hidden from menu; model-only unless model invocation also disabled
`disable-model-invocation: true`	Only the user can invoke via `/skill-name`

Example Frontmatter

---
name: commit
description: Create a conventional commit
user-invocable: true
disable-model-invocation: false
allowed-tools:
  - Bash
  - Read
  - Grep
argument-hint: "[commit message]"
model: claude-sonnet-4-5-20250929
---

Step 3: Write the Body

The body is everything after the closing ---. It's injected into the agent's prompt when the skill is invoked. Use argument placeholders to make the skill dynamic:

Argument Substitution

Placeholder	Replaced With
`$ARGUMENTS`	The full argument string passed by the invoker
`$ARGUMENTS[0]` through `$ARGUMENTS[9]`	Positional argument (space-split)
`$0` through `$9`	Shorthand for `$ARGUMENTS[N]`

Substitution is single-pass (left-to-right) — output from one replacement is never re-processed. Missing positional arguments are replaced with an empty string. Only indices 0–9 are supported.

Example Body

---
name: test-file
description: Run tests for a specific file
user-invocable: true
argument-hint: "<file-path>"
---

Run the tests for the file at `$0`.

Steps:
1. Read the file to understand what it does
2. Find the corresponding test file
3. Run the tests with `cargo test`
4. If tests fail, analyze the failures and suggest fixes
5. Report results

Full context from user: $ARGUMENTS

When invoked as /test-file src/main.rs, the placeholders expand to:

$0 → src/main.rs
$ARGUMENTS → src/main.rs

When invoked as /test-file src/main.rs --verbose, the placeholders expand to:

$0 → src/main.rs
$1 → --verbose
$ARGUMENTS → src/main.rs --verbose

Step 4: Add Skill-Level Hooks (Optional)

Skills can define their own hooks that fire during skill execution:

---
name: deploy
description: Deploy to staging
user-invocable: true
hooks:
  pre_tool_use:
    - matcher: Bash
      hooks:
        - type: command
          command: ./scripts/validate-deploy.sh
          timeout: 10
---

Hook configuration follows the same format as project-level hooks. See Hooks for details.

Step 5: Test Your Skill

Verify Discovery

Skills are discovered automatically when agents build their prompts. To verify your skill is found, check that:

The file exists at one of the three resolution paths
The filename or name field matches what you expect
If user-invocable: true, it should appear in the agent's available slash commands

Test Resolution

The resolution order is deterministic:

Swarm checks <project>/.claude/skills/<name>/SKILL.md
If not found, checks <project>/.skills/<name>.md
If not found, checks ~/.claude/skills/<name>/SKILL.md
If not found at any path, the skill is not available

Validate the Name

Skill names must contain only [a-zA-Z0-9_:-]. Names with other characters (including /, .., spaces) are rejected to prevent path traversal.

Examples

Simple Commit Skill

---
name: commit
description: Create a git commit with conventional format
user-invocable: true
allowed-tools:
  - Bash
  - Read
  - Grep
argument-hint: "[commit message]"
---

Create a git commit following the conventional commits format.

If no message was provided, analyze the staged changes and generate an appropriate message.

User input: $ARGUMENTS

Steps:
1. Run `git diff --cached` to see staged changes
2. Generate or use the provided commit message
3. Ensure the message follows conventional commits (feat:, fix:, chore:, etc.)
4. Create the commit

Code Review Skill

---
name: review
description: Review code changes in the current branch
user-invocable: true
disable-model-invocation: true
argument-hint: "[focus area]"
---

Review the code changes in the current branch compared to main.

Focus area: $ARGUMENTS

Steps:
1. Run `git diff main...HEAD` to see all changes
2. For each changed file, analyze:
   - Correctness: Are there bugs or logic errors?
   - Security: Any vulnerabilities introduced?
   - Performance: Any obvious performance issues?
   - Style: Does the code follow project conventions?
3. Provide a summary of findings

Subagent Skill

---
name: research
description: Research a topic using a subagent
user-invocable: true
context: fork
agent: researcher
argument-hint: "<topic>"
---

Research the following topic thoroughly: $ARGUMENTS

Provide a detailed summary with references.

Skills Concept — How the skill system works internally
Hooks — Lifecycle hooks for skills and agents
Permissions — How allowed-tools interacts with permissions

MCP Servers

Connecting and configuring Model Context Protocol (MCP) servers to extend agent capabilities.

Overview

MCP integration allows swarm to connect to external tool servers using the Model Context Protocol. Tools from MCP servers are registered in the tool registry with prefixed names and appear as regular tools to all agents.

How It Works

Startup — The orchestrator reads mcpServers from your configuration and connects to each server
Discovery — Each server's tools are listed and registered with prefixed names: mcp__<server>__<tool>
Execution — When an agent calls an MCP tool, the McpProxyTool routes the call to the correct server
Shutdown — All MCP servers are gracefully shut down when the session ends

MCP servers are started once per orchestrator session and shared across all agents.

Step 1: Add MCP Servers to Configuration

Add the mcpServers section to your project config in ~/.swarm/settings.json:

{
  "version": 2,
  "/home/user/my-project": {
    "agents": [ ... ],
    "mcpServers": {
      "server-name": {
        "transport": { ... },
        "env": { ... }
      }
    }
  }
}

Each server has a unique name (the key) and a configuration object with transport and optional env fields.

Step 2: Choose a Transport

Swarm supports three MCP transport types:

Stdio Transport

Spawns a child process and communicates via JSON-RPC over stdin/stdout. Best for locally installed tools.

{
  "mcpServers": {
    "filesystem": {
      "transport": {
        "type": "stdio",
        "command": "npx",
        "args": ["-y", "@modelcontextprotocol/server-filesystem", "."]
      }
    }
  }
}

Field	Type	Required	Description
`type`	`"stdio"`	Yes	Transport discriminator
`command`	`String`	Yes	Command to execute
`args`	`String[]`	No	Command arguments

HTTP Transport

Connects over Streamable HTTP. Best for remote servers with request-response semantics.

{
  "mcpServers": {
    "github": {
      "transport": {
        "type": "http",
        "url": "https://api.example.com/mcp/",
        "headers": {
          "Authorization": "Bearer ${GITHUB_TOKEN}"
        }
      }
    }
  }
}

Field	Type	Required	Description
`type`	`"http"`	Yes	Transport discriminator
`url`	`String`	Yes	Server URL
`headers`	`Map<String, String>`	No	HTTP headers

SSE Transport

Connects over Server-Sent Events (legacy EventSource). Best for servers that push updates.

{
  "mcpServers": {
    "events": {
      "transport": {
        "type": "sse",
        "url": "https://sse.example.com/events",
        "headers": {
          "X-Token": "abc123"
        }
      }
    }
  }
}

Field	Type	Required	Description
`type`	`"sse"`	Yes	Transport discriminator
`url`	`String`	Yes	SSE endpoint URL
`headers`	`Map<String, String>`	No	HTTP headers

Step 3: Configure Environment Variables (Optional)

MCP servers launched via stdio transport can receive environment variables:

{
  "mcpServers": {
    "database": {
      "transport": {
        "type": "stdio",
        "command": "npx",
        "args": ["-y", "@bytebase/dbhub"]
      },
      "env": {
        "DB_URL": "sqlite:///path/to/db",
        "DB_READ_ONLY": "true"
      }
    }
  }
}

The env map is injected into the child process environment when the server is spawned.

Step 4: Use MCP Tools

Once configured, MCP tools appear in the agent's tool registry with prefixed names:

mcp__<server_name>__<tool_name>

For example, a server named github that exposes a search_repos tool creates:

mcp__github__search_repos

Agents can call these tools like any other tool. The McpProxyTool handles routing the call to the correct server, serializing the input as JSON-RPC, and returning the result.

Tool Discovery

Each MCP server advertises its tools via the tools/list JSON-RPC method. Swarm calls this during startup and registers each tool with:

Name: mcp__<server>__<tool> (prefixed to avoid conflicts with built-in tools)
Description: As provided by the server
Input schema: As provided by the server

Permission Integration

MCP tools respect the permission system. You can allow or deny specific MCP tools:

{
  "permissions": {
    "allow": ["mcp__filesystem__read_file(*)"],
    "deny": ["mcp__filesystem__write_file(*)"]
  }
}

JSON-RPC Protocol

Swarm communicates with MCP servers using JSON-RPC 2.0. The key message exchanges:

Initialize

// Request
{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "initialize",
  "params": {
    "protocolVersion": "2025-03-26",
    "capabilities": {},
    "clientInfo": { "name": "swarm", "version": "0.1.0" }
  }
}

// Response
{
  "jsonrpc": "2.0",
  "id": 1,
  "result": {
    "protocolVersion": "2025-03-26",
    "capabilities": { "tools": {} },
    "serverInfo": { "name": "server-name", "version": "1.0" }
  }
}

List Tools

// Request
{ "jsonrpc": "2.0", "id": 2, "method": "tools/list" }

// Response
{
  "jsonrpc": "2.0",
  "id": 2,
  "result": {
    "tools": [
      {
        "name": "search_repos",
        "description": "Search GitHub repositories",
        "inputSchema": { "type": "object", "properties": { ... } }
      }
    ]
  }
}

Call Tool

// Request
{
  "jsonrpc": "2.0",
  "id": 3,
  "method": "tools/call",
  "params": {
    "name": "search_repos",
    "arguments": { "query": "swarm" }
  }
}

// Response
{
  "jsonrpc": "2.0",
  "id": 3,
  "result": {
    "content": [
      { "type": "text", "text": "Found 3 repositories..." }
    ]
  }
}

Complete Example

A configuration with two MCP servers:

{
  "version": 2,
  "/home/user/my-project": {
    "agents": [
      {
        "name": "backend",
        "prompt": "You are a backend engineer with access to the database and filesystem tools."
      }
    ],
    "mcpServers": {
      "filesystem": {
        "transport": {
          "type": "stdio",
          "command": "npx",
          "args": ["-y", "@modelcontextprotocol/server-filesystem", "."]
        }
      },
      "database": {
        "transport": {
          "type": "stdio",
          "command": "npx",
          "args": ["-y", "@bytebase/dbhub"]
        },
        "env": {
          "DB_URL": "sqlite:///home/user/my-project/data.db"
        }
      }
    }
  }
}

This gives the backend agent access to:

mcp__filesystem__read_file
mcp__filesystem__write_file
mcp__filesystem__list_directory
mcp__database__query
mcp__database__list_tables
(and any other tools the servers expose)

Error Handling

Failed server: If an MCP server fails to start, swarm logs a warning and continues. Other servers and agents are not affected.
Failed tool listing: If tools/list fails for a server, its tools are not registered but the server remains connected.
Tool call error: Transport and JSON-RPC errors are returned as tool execution errors to the agent.
Transport retry: Transport errors are retried once before being returned as failures.

Troubleshooting

"Failed to connect to MCP server"

Verify the command is installed and in your PATH (for stdio transport)
Check that the URL is reachable (for http/sse transport)
Review the server's stderr output in swarm logs

Tools not appearing

Ensure the server's tools/list response is valid
Check that the server name doesn't contain characters that would break the prefix format
Verify the server completes initialization within the timeout

Permission denied for MCP tools

MCP tools use the prefixed name format for permission rules. Use mcp__<server>__<tool>(*) in your permission rules.

MCP Concept — Architecture and internals
Tools — How tools work in swarm
Config Schema — Full McpServerConfig reference
ADR-005: MCP Integration — Design rationale

WASM Tools

Building and using WebAssembly sandboxed tools in swarm.

Overview

Swarm supports running tools as WebAssembly components in a sandboxed environment. WASM tools execute with strict resource limits and explicit capability grants, providing isolation from the host system. This feature is gated behind the wasm-sandbox Cargo feature flag.

Prerequisites

Rust toolchain with wasm32-wasip2 target installed
Swarm built with the wasm-sandbox feature flag
A WASM component file (.wasm) compiled from a WIT interface

Build Swarm with WASM Support

cargo build --release --features wasm-sandbox

Install the WASM Target

rustup target add wasm32-wasip2

Architecture

WASM tools run inside a Wasmtime runtime with the Component Model enabled. Each tool invocation gets a fresh Store with its own HostState, ensuring complete isolation between calls.

Agent → ToolRegistry → WasmTool → Wasmtime Store → WASM Component
                                       ↑
                                   HostState (capabilities, limits, secrets)

Key components:

Component	Role
`WasmRuntime`	Engine lifecycle, component compilation, caching
`WasmTool`	`Tool` trait implementation, bridges registry to WASM
`HostState`	Per-invocation state with capability gating
`CredentialInjector`	Secret injection into headers, response redaction
`ResourceLimits`	Memory, CPU, I/O quotas

Step 1: Define the WIT Interface

WASM tools implement the tool world defined in wit/tool.wit. The interface exposes these host functions:

Host Function	Required Capability	Description
`log(level, message)`	`Logging`	Write a log entry
`read-workspace-file(path)`	`WorkspaceRead`	Read a file from the workspace
`make-http-request(request)`	`HttpRequest`	Make an HTTP request
`invoke-tool(name, params)`	`ToolInvoke`	Call another tool in the registry
`secret-exists(name)`	`SecretCheck`	Check if a secret exists

The tool must export:

Export	Signature	Description
`name()`	`() -> String`	Tool name
`description()`	`() -> String`	Tool description
`schema()`	`() -> String`	JSON Schema for input
`execute(input)`	`(String) -> Result<String, String>`	Execute with JSON input, return JSON result

Step 2: Implement the Tool

Write your tool in Rust (or any language targeting WASM Component Model):

#![allow(unused)]
fn main() {
// In your WASM component crate
wit_bindgen::generate!({
    world: "tool",
    path: "../wit/tool.wit",
});

struct MyTool;

impl Guest for MyTool {
    fn name() -> String {
        "my-tool".to_string()
    }

    fn description() -> String {
        "A custom sandboxed tool".to_string()
    }

    fn schema() -> String {
        serde_json::json!({
            "type": "object",
            "properties": {
                "query": { "type": "string", "description": "Search query" }
            },
            "required": ["query"]
        }).to_string()
    }

    fn execute(input: String) -> Result<String, String> {
        let params: serde_json::Value = serde_json::from_str(&input)
            .map_err(|e| e.to_string())?;

        let query = params["query"].as_str().ok_or("missing query")?;

        // Use host functions (if capabilities are granted)
        host::log(LogLevel::Info, &format!("Searching for: {query}"));

        Ok(serde_json::json!({
            "result": format!("Results for: {query}")
        }).to_string())
    }
}

export!(MyTool);
}

Step 3: Compile to WASM

Build the component:

cargo build --target wasm32-wasip2 --release

The output .wasm file will be at target/wasm32-wasip2/release/my_tool.wasm.

Step 4: Configure the Tool

Add the tool to wasm_tools in your project configuration:

{
  "wasm_tools": [
    {
      "name": "my-tool",
      "path": "./tools/my_tool.wasm",
      "capabilities": ["Logging", "HttpRequest"],
      "limits": {
        "max_memory_bytes": 67108864,
        "fuel_limit": 1000000000,
        "execution_timeout_secs": 30
      },
      "secrets": ["API_TOKEN"],
      "workspace_prefixes": ["src/", "data/"],
      "endpoint_allowlist": ["api.example.com"]
    }
  ]
}

Configuration Fields

Field	Type	Required	Default	Description
`name`	`String`	Yes	—	Tool name (`[a-z][a-z0-9_-]*`)
`path`	`String`	Yes	—	Path to `.wasm` component file
`capabilities`	`String[]`	No	`[]`	Granted capabilities
`limits`	`WasmLimitsConfig`	No	Defaults	Resource limit overrides
`secrets`	`String[]`	No	`[]`	Secret names the tool may query
`workspace_prefixes`	`String[]`	No	`[]`	Readable workspace path prefixes
`endpoint_allowlist`	`String[]`	No	`[]`	Allowed HTTP endpoint patterns
`tool_aliases`	`Map<String, String>`	No	`{}`	Aliases for `invoke-tool` calls

Capabilities

Each capability unlocks a specific host function. Tools without a capability cannot call the corresponding function — attempts result in a CapabilityDenied error.

Capability	Host Function	Description
`Logging`	`log()`	Write log entries (up to `max_log_entries`)
`WorkspaceRead`	`read-workspace-file()`	Read files within allowed prefixes
`HttpRequest`	`make-http-request()`	Make HTTP requests to allowed endpoints
`ToolInvoke`	`invoke-tool()`	Call other registered tools
`SecretCheck`	`secret-exists()`	Check if a named secret exists

Grant only the capabilities your tool actually needs.

Resource Limits

Every WASM tool runs with resource limits to prevent runaway computation or excessive I/O:

Limit	Default	Maximum	Description
`max_memory_bytes`	64 MiB	512 MiB	Maximum memory allocation
`fuel_limit`	1,000,000,000	—	Computation fuel (instruction budget)
`execution_timeout_secs`	30	300	Wall-clock timeout
`max_log_entries`	1,000	—	Max log entries per invocation
`max_http_requests`	50	—	Max HTTP requests per invocation
`max_tool_invocations`	20	—	Max tool calls per invocation
`max_file_read_bytes`	10 MiB	—	Cumulative file read limit

Exceeding a limit produces the corresponding error:

Memory: wasmtime memory allocation trap
Fuel: FuelExhausted error
Timeout: TimeoutExceeded error (enforced via epoch interruption at 100ms intervals)
Counters: RateLimitExceeded error

Security Model

File Access

The read-workspace-file function enforces multiple security layers:

Path traversal prevention — Rejects paths containing .. components
Canonicalization — Resolves symlinks and normalizes paths
Boundary check — Canonical path must be within the workspace root
Prefix matching — If workspace_prefixes is configured, the file must match at least one prefix
Symlink escape detection — Detects symlinks pointing outside the workspace
Size validation — Checks cumulative bytes read against max_file_read_bytes

HTTP Requests

The make-http-request function enforces:

Endpoint allowlist — URL host must match a glob pattern in endpoint_allowlist. Empty allowlist denies all requests
HTTPS enforcement — Non-HTTPS URLs are rejected (except localhost, 127.*, ::1)
Credential injection — $SECRET_NAME patterns in headers are replaced with actual values from the environment
Response scanning — Leaked secret values in response bodies are redacted as [REDACTED:<name>]

Secrets

Secrets are sourced from environment variables, restricted to the names listed in secrets:

{
  "secrets": ["API_TOKEN", "DB_PASSWORD"]
}

secret-exists("API_TOKEN") → checks if API_TOKEN is set in the environment
HTTP header "Authorization": "Bearer $API_TOKEN" → injected with the actual value
Secret values appearing in HTTP response bodies are automatically redacted

Error Types

WASM tools can produce these errors:

Error	Cause
`CompilationFailed`	`.wasm` file failed to compile
`InstantiationFailed`	Missing imports or initialization failure
`ExecutionTrapped`	Guest code triggered a trap
`FuelExhausted`	Instruction budget exceeded
`TimeoutExceeded`	Epoch deadline exceeded
`HostFunctionError`	Host function returned an error
`CapabilityDenied`	Tool called a function without the required capability
`RateLimitExceeded`	Resource quota (HTTP requests, tool invocations, etc.) exceeded

Complete Example

A weather lookup tool with HTTP access and logging:

Configuration:

{
  "wasm_tools": [
    {
      "name": "weather",
      "path": "./tools/weather.wasm",
      "capabilities": ["Logging", "HttpRequest"],
      "limits": {
        "max_http_requests": 5,
        "execution_timeout_secs": 15
      },
      "secrets": ["WEATHER_API_KEY"],
      "endpoint_allowlist": ["api.openweathermap.org"]
    }
  ]
}

Tool implementation (pseudocode):

#![allow(unused)]
fn main() {
fn execute(input: String) -> Result<String, String> {
    let params: Value = serde_json::from_str(&input)?;
    let city = params["city"].as_str().ok_or("missing city")?;

    host::log(LogLevel::Info, &format!("Looking up weather for {city}"));

    let response = host::make_http_request(HttpRequest {
        method: HttpMethod::Get,
        url: format!(
            "https://api.openweathermap.org/data/2.5/weather?q={city}&appid=$WEATHER_API_KEY"
        ),
        headers: vec![],
        body: None,
    })?;

    Ok(response.body)
}
}

The $WEATHER_API_KEY in the URL header is automatically injected, and any leaked key values in the response are redacted.

Troubleshooting

"CompilationFailed"

Verify the .wasm file was built with wasm32-wasip2 target
Ensure the Component Model is enabled in your build
Check that all WIT imports are satisfied

"CapabilityDenied"

Add the required capability to the capabilities array in your config
Check spelling: capabilities are case-sensitive (Logging, not logging)

"RateLimitExceeded"

Increase the relevant limit in the limits config
Optimize your tool to make fewer requests/invocations

Tool not appearing in registry

Ensure swarm was built with --features wasm-sandbox
Verify the path is correct and the file exists
Check that name follows the [a-z][a-z0-9_-]* pattern

Tools — How tools work in swarm
Config Schema — Full WasmToolConfig reference
ADR-004: WASM Sandbox — Design rationale

Hooks

Lifecycle hooks and event handling in swarm.

Overview

Hooks are shell commands that execute in response to lifecycle events during a swarm session. They can observe events (logging, notifications), modify behavior (blocking tool calls), or perform side effects (running scripts). Hooks are configured in ~/.swarm/settings.json under the hooks section.

How Hooks Work

An event fires (e.g., PreToolUse before a tool executes)
Swarm finds matching hooks by checking each MatcherGroup for the event
Matching hooks receive a HookContext JSON object on stdin
Sync hooks block until they return a HookDecision; async hooks fire and forget
If any sync hook returns Deny or Block, the action is prevented

Step 1: Add Hooks to Configuration

Hooks are organized by event type in the hooks section:

{
  "hooks": {
    "pre_tool_use": [
      {
        "matcher": "Bash",
        "hooks": [
          {
            "type": "command",
            "command": "./scripts/validate-bash.sh",
            "timeout": 10
          }
        ]
      }
    ],
    "session_start": [
      {
        "hooks": [
          {
            "type": "command",
            "command": "./scripts/on-session-start.sh",
            "async": true
          }
        ]
      }
    ]
  }
}

Event Types

Swarm supports 13 hook event types:

Event Type	Config Key	When It Fires
`SessionStart`	`session_start`	A new agent session begins
`PreToolUse`	`pre_tool_use`	Before a tool executes
`PostToolUse`	`post_tool_use`	After a tool executes successfully
`PostToolUseFailure`	`post_tool_use_failure`	After a tool execution fails
`Stop`	`stop`	Session stop is initiated
`SessionEnd`	`session_end`	An agent session ends
`SubagentStart`	`subagent_start`	A subagent is spawned
`SubagentStop`	`subagent_stop`	A subagent stops
`TeammateIdle`	`teammate_idle`	An agent has been idle past the nudge threshold
`TeammateIdleWarning`	`teammate_idle_warning`	An agent has been idle past the warning threshold
`TaskCompleted`	`task_completed`	An agent completes a task
`Notification`	`notification`	A notification is generated
`StallDetected`	`stall_detected`	Agent stall detected (no heartbeat past threshold)

Step 2: Define Matcher Groups

Each event type has an array of MatcherGroup entries. A matcher group specifies:

matcher (optional) — A regex pattern matched against the tool name or event source. If omitted, matches all events of that type.
hooks — An array of hook handlers to execute when matched.

{
  "pre_tool_use": [
    {
      "matcher": "Bash",
      "hooks": [ ... ]
    },
    {
      "matcher": "Edit|Write",
      "hooks": [ ... ]
    },
    {
      "hooks": [ ... ]
    }
  ]
}

The first group matches only Bash tool calls, the second matches Edit or Write, and the third matches all tool calls (no matcher = match all).

Step 3: Write Hook Handlers

Each hook handler is a command type with these fields:

Field	Type	Required	Default	Description
`type`	`"command"`	Yes	—	Handler type (only `command` is supported)
`command`	`String`	Yes	—	Shell command to execute (run via `sh -c`)
`timeout`	`u64`	No	`30`	Timeout in seconds
`async`	`bool`	No	`false`	If `true`, fire-and-forget (no decision collected)
`status_message`	`String`	No	`null`	Display message while hook runs

Example:

{
  "type": "command",
  "command": "./scripts/lint-check.sh",
  "timeout": 15,
  "status_message": "Running lint check..."
}

Step 4: Write Hook Scripts

Hook scripts receive a HookContext JSON object on stdin and optionally output a HookDecision JSON object on stdout.

HookContext (stdin)

The context object sent to every hook:

{
  "session_id": "20250115-a3f2",
  "cwd": "/home/user/my-project/.swarm/worktrees/backend",
  "hook_event_name": "PreToolUse",
  "tool_name": "Bash",
  "tool_input": {
    "command": "npm test"
  },
  "tool_response": null
}

Field	Type	Present	Description
`session_id`	`String`	Always	Active session ID
`cwd`	`String`	Always	Agent working directory
`hook_event_name`	`String`	Always	Event name (e.g., `"PreToolUse"`)
`tool_name`	`String`	Tool events only	Name of the tool
`tool_input`	`JSON`	Tool events only	Tool input payload
`tool_response`	`JSON`	`PostToolUse` only	Tool response payload

Fields that don't apply to the event type are omitted from the JSON (not set to null).

Specialized Event Inputs

Some events provide additional context beyond the standard HookContext:

TeammateIdleWarning:

{
  "event": "TeammateIdleWarning",
  "agent_name": "backend",
  "session_id": "20250115-a3f2",
  "idle_duration_secs": 300,
  "nudge_count": 2,
  "last_state": "Running"
}

StallDetected:

{
  "event": "StallDetected",
  "agent_name": "backend",
  "session_id": "20250115-a3f2",
  "stall_duration_secs": 900,
  "last_heartbeat_event": "PostToolUse"
}

HookDecision (stdout)

Sync hooks can output a decision JSON to control the action:

{ "decision": "allow" }

{ "decision": "deny", "reason": "Command not allowed by policy" }

{ "decision": "block", "reason": "Security violation detected" }

Decision	Effect
`allow`	Permits the action to proceed
`deny`	Prevents the action (with reason)
`block`	Prevents the action (with reason)

Both deny and block prevent the action. The reason field is optional and logged for diagnostics.

Exit Code Behavior

Exit Code	Behavior
`0`	Parse stdout for decision JSON. No output = implicit allow
`2`	Blocking error — treated as `deny` with stderr as reason
Other	Non-blocking error — logged but action proceeds

Sync vs Async Hooks

Mode	Blocks Action	Returns Decision	Use Case
Sync (`async: false`)	Yes	Yes	Validation, policy enforcement
Async (`async: true`)	No	No	Logging, notifications, side effects

Sync hooks within a single event run in parallel. If any sync hook returns deny or block, the action is prevented and all denial reasons are aggregated.

Async hooks are spawned as background tasks. They don't contribute decisions and their success or failure doesn't affect the action.

Example Scripts

Pre-Tool-Use Validator

Block dangerous bash commands:

#!/bin/bash
# scripts/validate-bash.sh
# Blocks rm -rf and other dangerous commands

CONTEXT=$(cat)
COMMAND=$(echo "$CONTEXT" | jq -r '.tool_input.command // empty')

if echo "$COMMAND" | grep -qE 'rm\s+-rf\s+/'; then
  echo '{"decision": "deny", "reason": "Refusing to run rm -rf on root paths"}'
  exit 0
fi

echo '{"decision": "allow"}'

Session Start Logger

Log session starts to a file:

#!/bin/bash
# scripts/on-session-start.sh
CONTEXT=$(cat)
SESSION_ID=$(echo "$CONTEXT" | jq -r '.session_id')
echo "[$(date)] Session started: $SESSION_ID" >> /tmp/swarm-sessions.log

Post-Tool-Use Notifier

Send a notification after tool failures:

#!/bin/bash
# scripts/notify-failure.sh
CONTEXT=$(cat)
TOOL=$(echo "$CONTEXT" | jq -r '.tool_name')
echo "[$(date)] Tool failed: $TOOL" >> /tmp/swarm-failures.log

Stall Detector

Auto-restart stalled agents:

#!/bin/bash
# scripts/handle-stall.sh
CONTEXT=$(cat)
AGENT=$(echo "$CONTEXT" | jq -r '.agent_name')
DURATION=$(echo "$CONTEXT" | jq -r '.stall_duration_secs')
echo "[$(date)] Agent $AGENT stalled for ${DURATION}s" >> /tmp/swarm-stalls.log

Complete Configuration Example

{
  "hooks": {
    "session_start": [
      {
        "hooks": [
          {
            "type": "command",
            "command": "./scripts/on-session-start.sh",
            "async": true
          }
        ]
      }
    ],
    "pre_tool_use": [
      {
        "matcher": "Bash",
        "hooks": [
          {
            "type": "command",
            "command": "./scripts/validate-bash.sh",
            "timeout": 10,
            "status_message": "Validating command..."
          }
        ]
      }
    ],
    "post_tool_use_failure": [
      {
        "hooks": [
          {
            "type": "command",
            "command": "./scripts/notify-failure.sh",
            "async": true
          }
        ]
      }
    ],
    "stall_detected": [
      {
        "hooks": [
          {
            "type": "command",
            "command": "./scripts/handle-stall.sh",
            "timeout": 5
          }
        ]
      }
    ]
  }
}

Troubleshooting

Hook not firing

Verify the event type key in config matches the event (e.g., pre_tool_use, not preToolUse)
Check the matcher regex matches the tool name or event source
Ensure the script is executable (chmod +x scripts/my-hook.sh)

Hook timing out

The default timeout is 30 seconds. Increase it with the timeout field.
On timeout, the child process is killed and the hook is treated as a non-blocking error.

Decision not being applied

Async hooks cannot return decisions — set async: false for policy enforcement
Ensure exit code is 0 and stdout contains valid JSON
Exit code 2 is a special blocking error; other non-zero codes are non-blocking

Multiple hooks conflicting

Sync hooks for the same event run in parallel. If any returns deny or block, the action is prevented. Reasons from all denying hooks are aggregated.

Agent Lifecycle — When lifecycle events fire
Permissions — How hooks interact with the permission system
Config Schema — Full HooksConfig reference

Beads Workflow

Using beads for issue tracking and workflow management in swarm.

Overview

Beads (bd) is an external CLI tool used by swarm for issue tracking and task management. It provides a git-native issue tracking system where issues are stored as files in a dedicated branch, making them accessible to both humans and AI agents without requiring an external service.

Swarm integrates with beads at two levels:

Prompt injection — Available tasks are included in each agent's system prompt
Agent workflow — Agents use bd commands to claim and close tasks during their sessions

Prerequisites

Beads must be installed before using swarm:

# Install beads
cargo install bd

# Verify installation
bd --version

Swarm checks for the bd binary at startup. If it's not found, swarm exits with an error and installation instructions.

How It Works

Task Discovery

During prompt building, swarm runs bd ready --json to fetch available (unclaimed) tasks. The results are injected into section 13 of the prompt pipeline:

## Available Tasks (from `bd ready`)

- #42: Implement health check endpoint [priority: high]
- #43: Add input validation to API routes [priority: medium]
- #51: Write integration tests for auth flow [priority: low]

This gives each agent awareness of what work is available without any manual coordination.

Agent Workflow

Agents interact with beads using these commands:

Command	Description
`bd ready`	List available (unclaimed) tasks
`bd show <id>`	View full task details
`bd update <id> --status in_progress`	Claim a task
`bd close <id>`	Mark a task as complete
`bd sync`	Synchronize with the shared beads branch

A typical agent workflow:

Agent sees available tasks in its prompt
Claims a task with bd update <id> --status in_progress
Works on the task (writing code, running tests)
Closes the task with bd close <id>
Next prompt cycle picks up remaining tasks

Shared Beads Branch

All agents in a session share a beads branch at:

swarm/<session-id>/beads

This branch uses optimistic concurrency — agents read and write independently, and conflicts are resolved on sync. This works well because:

Issue files are small and rarely edited by multiple agents simultaneously
bd sync handles merge conflicts automatically
The worst case is a brief delay before an agent sees another's claim

Getting Started

Step 1: Initialize Beads

In your project directory:

bd onboard

This sets up the beads branch and initial configuration.

Step 2: Create Tasks

Create tasks for your project:

bd create "Implement health check endpoint" --priority high
bd create "Add input validation to API routes" --priority medium
bd create "Write integration tests for auth flow" --priority low

Or create tasks from spec documents:

bd create "Implement feature X per specs/042-feature-x.md"

Step 3: Start a Swarm Session

swarm start

Agents will automatically see available tasks in their prompts and can claim work.

Step 4: Monitor Progress

Check task status:

bd ready        # See unclaimed tasks
bd list         # See all tasks
bd show <id>    # View task details

Spec-Driven Workflow

Swarm's recommended workflow (from CLAUDE.md) is spec-driven:

Plan — Design the feature or fix
Write specs — Create spec documents in specs/ capturing contracts
File beads tasks — Create tasks referencing the specs
Run swarm — Agents pick up tasks and implement them
Review — Verify agent work against the specs

This approach ensures agents have clear, well-defined tasks with acceptance criteria, rather than vague instructions.

Example Spec-to-Task Flow

Write a spec:

<!-- specs/042-health-check.md -->
# Health Check Endpoint

## Contract
- GET /health returns 200 with JSON body `{"status": "ok"}`
- Response includes `uptime_seconds` field
- Endpoint requires no authentication

File tasks from the spec:

bd create "Implement GET /health endpoint per specs/042-health-check.md"
bd create "Add health check integration test per specs/042-health-check.md"

Start swarm — agents see the tasks, read the referenced spec, and implement accordingly.

Session Completion

When ending a work session, the recommended workflow is:

File issues for remaining work — Create beads tasks for anything unfinished
Update issue status — Close finished tasks, update in-progress items
Sync and push:

bd sync
git push

This ensures the next session (or the next developer) has clear context on what's done and what remains.

Commands Reference

Command	Description
`bd onboard`	Initialize beads for a repository
`bd create <title>`	Create a new task
`bd ready`	List unclaimed tasks
`bd ready --json`	List unclaimed tasks as JSON (used by prompt builder)
`bd list`	List all tasks
`bd show <id>`	Show task details
`bd update <id> --status <status>`	Update task status
`bd close <id>`	Close a completed task
`bd sync`	Sync with shared beads branch

Troubleshooting

"bd: command not found"

Install beads with cargo install bd and ensure it's in your PATH.

Tasks not appearing in agent prompts

Verify bd ready returns tasks when run manually
Check that bd ready --json produces valid JSON output
Ensure beads is initialized in the project (bd onboard)

Agents claiming the same task

This can happen due to the optimistic concurrency model. The impact is usually minor — both agents do the same work, and the duplicate can be discarded at review time. To reduce this:

Use specific task descriptions that make it clear which agent should handle them
Structure tasks around agent specializations (backend tasks for the backend agent, etc.)

Sync conflicts

bd sync handles most conflicts automatically. If a conflict can't be auto-resolved:

Check bd list for the current state
Manually resolve any remaining conflicts
Run bd sync again

Prompt Pipeline — How beads tasks are injected into prompts
ADR-008: Beads Integration — Design rationale for beads as external dependency
ADR-010: Beads Prompt Section — Design rationale for prompt injection
Contributing: Spec Workflow — The spec-driven development process

CLI Reference

Complete reference for all swarm CLI commands and flags.

Usage

swarm <COMMAND>

Commands

`swarm init`

Initialize swarm configuration for a project. Creates ~/.swarm/settings.json with a template entry if it doesn't exist.

Flag	Type	Default	Description
`--path`	`PathBuf`	`.`	Path to the project directory

`swarm start`

Start a swarm session. Executes the 13-step orchestrator start flow.

Flag	Type	Default	Description
`--stash`	`bool`	`false`	Auto-stash uncommitted changes before starting
`--init`	`bool`	`false`	Initialize git repo if not already initialized
`--no-tui`	`bool`	`false`	Run in headless mode (no terminal UI)

`swarm stop`

Stop a running swarm session. Sends SIGTERM to the orchestrator and waits for graceful shutdown.

Flag	Type	Default	Description
`--merge`	`bool`	`false`	Merge agent branches into base (default if no flag set)
`--squash`	`bool`	`false`	Squash-merge agent branches
`--discard`	`bool`	`false`	Discard agent branches without merging

The --merge, --squash, and --discard flags are mutually exclusive. If none are specified, merge is the default behavior.

`swarm status`

Show session status including agent states, uptime, and beads summary.

Flag	Type	Default	Description
`--json`	`bool`	`false`	Output in JSON format

`swarm logs`

View agent logs.

Argument	Type	Required	Description
`agent`	`String`	Yes	Agent name

Flag	Type	Default	Description
`--follow`	`bool`	`false`	Tail the log (like `tail -f`)
`--session`	`u32`	`null`	View archived session log instead of current

`swarm send`

Send a message to a specific agent.

Argument	Type	Required	Description
`agent`	`String`	Yes	Recipient agent name
`message`	`String`	Yes	Message body

Flag	Type	Default	Description
`--urgent`	`bool`	`false`	Mark as urgent (triggers interrupt)

`swarm broadcast`

Send a message to all agents.

Argument	Type	Required	Description
`message`	`String`	Yes	Message body

Flag	Type	Default	Description
`--urgent`	`bool`	`false`	Mark as urgent

`swarm config`

Show the resolved configuration for the current project.

Flag	Type	Default	Description
`--json`	`bool`	`false`	Output raw JSON instead of formatted

`swarm clean`

Clean stale swarm artifacts (worktrees, branches, session files).

Flag	Type	Default	Description
`--force`	`bool`	`false`	Remove artifacts without confirmation

`swarm workflow`

Manage workflow definitions and runs. This is a subcommand group:

`swarm workflow list`

List available workflow definitions.

`swarm workflow run`

Start a workflow run.

Argument	Type	Required	Description
`name`	`String`	Yes	Workflow name

Flag	Type	Default	Description
`--input`	`KEY=VALUE`	`[]`	Input key=value pairs (repeatable)

`swarm workflow status`

Show status of running/completed workflows.

Argument	Type	Required	Description
`run_id`	`String`	No	Specific run ID (omit to list all)

Flag	Type	Default	Description
`--json`	`bool`	`false`	Output in JSON format

`swarm workflow approve`

Approve a human-approval gate.

Argument	Type	Required	Description
`run_id`	`String`	Yes	Workflow run ID
`stage`	`String`	Yes	Stage name

`swarm workflow reject`

Reject a gate with optional feedback.

Argument	Type	Required	Description
`run_id`	`String`	Yes	Workflow run ID
`stage`	`String`	Yes	Stage name

Flag	Type	Default	Description
`--feedback`	`String`	`null`	Feedback message

`swarm workflow retry`

Manually retry a failed stage.

Argument	Type	Required	Description
`run_id`	`String`	Yes	Workflow run ID
`stage`	`String`	Yes	Stage name

`swarm workflow cancel`

Cancel a running workflow.

Argument	Type	Required	Description
`run_id`	`String`	Yes	Workflow run ID

`swarm workflow show`

Show detailed run info including outputs.

Argument	Type	Required	Description
`run_id`	`String`	Yes	Workflow run ID

`swarm workflow validate`

Validate a workflow definition.

Argument	Type	Required	Description
`name`	`String`	Yes	Workflow name

`swarm iterate`

Run an iteration loop from a configuration file.

Flag	Type	Default	Description
`--config`	`String`	`null`	Iteration config name (looked up in `.swarm/iterations/<name>.yml`)
`--config-file`	`PathBuf`	`null`	Path to iteration config YAML file
`--resume`	`String`	`null`	Resume a previous iteration run
`--max-iterations`	`u32`	`null`	Override max_iterations from config
`--dry-run`	`bool`	`false`	Validate config and show plan without executing
`--no-tui`	`bool`	`false`	Run without TUI
`--json`	`bool`	`false`	Output progress as JSON lines

The --config and --config-file flags are mutually exclusive.

Quick Start — Getting started with the CLI
Configuration — Config file details
Orchestration — What happens when you run start/stop

Config Schema

Complete reference for the ~/.swarm/settings.json configuration file.

Top-Level Structure

{
  "version": 2,
  "<project_path>": { ... }
}

Field	Type	Required	Description
`version`	`u64`	Yes	Schema version (must be `1` or `2`)
`<project_path>`	`ProjectConfig`	Yes	Project config keyed by absolute, canonicalized path

Multiple projects can be configured in the same file. Each is keyed by its absolute path.

ProjectConfig

Field	Type	Required	Default	Description
`agents`	`AgentConfig[]`	Yes	—	At least one agent definition
`supervisor`	`SupervisorConfig`	No	`null`	Supervisor configuration
`defaults`	`DefaultsConfig`	No	`null`	Project-wide defaults
`providers`	`Map<String, ProviderConfig>`	No	Implicit `"default"`	Named provider definitions
`permissions`	`PermissionsConfig`	No	`null`	Project-level permission rules
`hooks`	`HooksConfig`	No	`null`	Hook event handlers
`mcpServers`	`Map<String, McpServerConfig>`	No	`null`	MCP server connections
`wasm_tools`	`WasmToolConfig[]`	No	`null`	WASM sandboxed tools
`sub_agent_defaults`	`SubAgentLimits`	No	`null`	Default sub-agent spawning limits

AgentConfig

Field	Type	Required	Default	Description
`name`	`String`	Yes	—	Unique name (`[a-z][a-z0-9-]*`)
`prompt`	`String`	Yes	—	System prompt text or `@path/to/file`
`model`	`String`	No	`defaults.model` / `"sonnet"`	Model identifier
`provider`	`String`	No	`defaults.provider` / `"default"`	Provider name reference
`permissions`	`PermissionsConfig`	No	`null`	Agent-level permission overrides
`delegate_mode`	`bool`	No	`false`	Legacy delegate mode flag
`mode`	`String`	No	See cascade	Agent execution mode

SupervisorConfig

Field	Type	Required	Default	Description
`prompt`	`String`	No	Built-in merge prompt	Custom supervisor prompt or `@path`
`model`	`String`	No	`defaults.model`	Model identifier

DefaultsConfig

Field	Type	Required	Default	Description
`model`	`String`	No	`"sonnet"`	Default model
`provider`	`String`	No	`"default"`	Default provider reference
`session_timeout`	`u64`	No	`null`	Session timeout (seconds)
`commit_interval`	`u64`	No	`300`	Auto-commit interval (seconds)
`max_consecutive_errors`	`u32`	No	`5`	Consecutive error limit
`max_total_errors`	`u32`	No	`20`	Total error limit
`mode`	`String`	No	`null`	Default agent mode
`liveness`	`LivenessConfig`	No	See below	Liveness monitoring

ProviderConfig

Field	Type	Required	Default	Description
`type`	`String`	Yes	—	Provider type (`"anthropic"`)
`api_key_env`	`String`	No	`"ANTHROPIC_API_KEY"`	API key env var name
`base_url`	`String`	No	`null`	Custom API base URL
`max_retries`	`u32`	No	`null`	Max retries for transient errors
`timeout`	`u64`	No	`null`	Request timeout (seconds)

PermissionsConfig

Field	Type	Required	Default	Description
`allow`	`String[]`	No	`[]`	Allow rules (e.g., `"Bash(npm run *)"`)
`ask`	`String[]`	No	`[]`	Ask rules
`deny`	`String[]`	No	`[]`	Deny rules
`default_mode`	`String`	No	`null`	Default permission mode

LivenessConfig

Field	Type	Required	Default	Description
`enabled`	`bool`	No	`true`	Enable liveness monitoring
`idle_nudge_after_secs`	`u64`	No	`120`	Seconds before first nudge
`idle_nudge_interval_secs`	`u64`	No	`300`	Seconds between nudges
`max_nudges`	`u32`	No	`3`	Max nudges per idle episode
`idle_warn_after_secs`	`u64`	No	`600`	Seconds before warning hook
`stall_timeout_secs`	`u64`	No	`900`	Seconds before stall detection
`auto_interrupt_stalled`	`bool`	No	`false`	Auto-interrupt on stall

McpServerConfig

Field	Type	Required	Default	Description
`transport`	`McpTransport`	Yes	—	Transport configuration
`env`	`Map<String, String>`	No	`{}`	Environment variables

McpTransport (tagged union)

Stdio:

Field	Type	Required	Description
`type`	`"stdio"`	Yes	Transport discriminator
`command`	`String`	Yes	Command to execute
`args`	`String[]`	No	Command arguments

HTTP:

Field	Type	Required	Description
`type`	`"http"`	Yes	Transport discriminator
`url`	`String`	Yes	Server URL
`headers`	`Map<String, String>`	No	HTTP headers

SSE:

Field	Type	Required	Description
`type`	`"sse"`	Yes	Transport discriminator
`url`	`String`	Yes	SSE endpoint URL
`headers`	`Map<String, String>`	No	HTTP headers

WasmToolConfig

Field	Type	Required	Default	Description
`name`	`String`	Yes	—	Tool name (`[a-z][a-z0-9_-]*`)
`path`	`String`	Yes	—	Path to `.wasm` component file
`capabilities`	`String[]`	No	`[]`	Granted capabilities
`limits`	`WasmLimitsConfig`	No	`null`	Resource limit overrides
`secrets`	`String[]`	No	`[]`	Secret names the tool may query
`workspace_prefixes`	`String[]`	No	`[]`	Readable workspace paths
`endpoint_allowlist`	`String[]`	No	`[]`	Allowed HTTP endpoints
`tool_aliases`	`Map<String, String>`	No	`{}`	Tool name aliases

WasmLimitsConfig

Field	Type	Required	Default	Description
`max_memory_bytes`	`usize`	No	Runtime default	Max memory allocation
`fuel_limit`	`u64`	No	Runtime default	Computation fuel limit
`execution_timeout_secs`	`u64`	No	Runtime default	Execution timeout
`max_log_entries`	`usize`	No	Runtime default	Max log entries
`max_http_requests`	`usize`	No	Runtime default	Max HTTP requests
`max_tool_invocations`	`usize`	No	Runtime default	Max tool invocations
`max_file_read_bytes`	`usize`	No	Runtime default	Max file read size

Valid Capabilities

Logging — Write log entries
WorkspaceRead — Read files from workspace
HttpRequest — Make HTTP requests
ToolInvoke — Invoke other tools
SecretCheck — Check secret existence

Complete Example

{
  "version": 2,
  "/home/user/my-project": {
    "providers": {
      "default": {
        "type": "anthropic",
        "api_key_env": "ANTHROPIC_API_KEY"
      }
    },
    "defaults": {
      "model": "sonnet",
      "commit_interval": 300,
      "max_consecutive_errors": 5,
      "liveness": {
        "enabled": true,
        "idle_nudge_after_secs": 120
      }
    },
    "agents": [
      {
        "name": "backend",
        "prompt": "@prompts/backend.md",
        "model": "sonnet"
      },
      {
        "name": "frontend",
        "prompt": "You are a frontend engineer.",
        "permissions": {
          "allow": ["Bash(npm *)"],
          "deny": ["Bash(rm *)"]
        }
      }
    ],
    "permissions": {
      "allow": ["Read(*)", "Glob(*)"],
      "default_mode": "default"
    },
    "mcpServers": {
      "filesystem": {
        "transport": {
          "type": "stdio",
          "command": "npx",
          "args": ["-y", "@modelcontextprotocol/server-filesystem", "."]
        }
      }
    }
  }
}

Configuration — How config loading and resolution works
Writing Agents — Practical configuration guide

Environment Variables

Reference for all environment variables used by swarm — both user-provided and runtime-injected.

User-Provided Variables

These variables must be set in your environment before running swarm:

Variable	Required	Description
`ANTHROPIC_API_KEY`	Yes (default)	API key for the Anthropic provider. The env var name can be overridden per-provider via `api_key_env` in config.
`BRAVE_SEARCH_API_KEY`	No	API key for the Brave Search tool (used by `web_search`)
`HOME`	Yes	User home directory. Used to locate `~/.swarm/settings.json` and global skills at `~/.claude/skills/`.

Runtime-Injected Variables

These variables are set by swarm and injected into each agent's backend session environment:

Variable	Value	Description
`SWARM_AGENT_ID`	Agent name (e.g., `"backend"`)	Identifies the current agent within the swarm session
`SWARM_SESSION_ID`	Session ID (e.g., `"20250115-a3f2"`)	Unique identifier for the current session
`SWARM_DB_PATH`	Path (e.g., `.swarm/messages.db`)	Absolute path to the SQLite mailbox database
`SWARM_AGENTS`	Comma-separated names (e.g., `"backend,frontend,reviewer"`)	List of all agent names in the session

These variables allow agent tools and hooks to interact with the swarm infrastructure programmatically.

Logging

Variable	Default	Description
`RUST_LOG`	`info`	Controls log verbosity via the `tracing` crate. Set to `debug` for detailed output, `warn` for quieter operation. Supports per-module filtering (e.g., `swarm=debug,rusqlite=warn`).

Provider-Specific Variables

The API key environment variable name is configurable per provider:

{
  "providers": {
    "default": {
      "type": "anthropic",
      "api_key_env": "ANTHROPIC_API_KEY"
    },
    "custom": {
      "type": "anthropic",
      "api_key_env": "CUSTOM_ANTHROPIC_KEY"
    }
  }
}

MCP Server Variables

MCP servers configured with the env field receive those environment variables when launched:

{
  "mcpServers": {
    "myserver": {
      "transport": { "type": "stdio", "command": "my-mcp-server" },
      "env": {
        "MY_SERVER_TOKEN": "secret123"
      }
    }
  }
}

Configuration — Where env var names are configured
Agent Lifecycle — How variables are injected into sessions

Error Types

Swarm defines 6 error enums using thiserror, each covering a distinct subsystem. All error types implement std::error::Error and can be converted to anyhow::Error at the orchestrator level.

ConfigError

Configuration errors — fatal at startup.

Variant	Display Message	When It Occurs
`MissingFile { path }`	`config file not found at {path}`	`~/.swarm/settings.json` doesn't exist
`ParseError { reason }`	`failed to parse config: {reason}`	JSON syntax error, file read failure, or path canonicalization failure
`ValidationError { reason }`	`config validation failed: {reason}`	Empty agents list, invalid names, duplicate names, bad provider references, invalid WASM capabilities
`VersionMismatch { found, expected }`	`config version {found} is not supported (expected {expected})`	Config version is newer than supported

GitError

Git prerequisite and worktree operation errors — fatal at startup or during worktree ops.

Variant	Display Message	When It Occurs
`NotARepo { path }`	`{path} is not a git repository`	Project directory is not a git repo
`DirtyTree`	`working tree has uncommitted changes; commit or stash first`	Uncommitted changes without `--stash` flag
`WorktreeOp { reason }`	`git worktree operation failed: {reason}`	Worktree create/remove/lock/unlock/merge failure, HEAD is detached
`VersionTooOld { found, required }`	`git version {found} is too old; swarm requires git >= {required}`	Git version below 2.20

MailboxError

Mailbox and SQLite errors — DB-open is fatal, transient locks are retried.

Variant	Display Message	When It Occurs
`DbOpen { reason }`	`failed to open mailbox database: {reason}`	SQLite connection failure, schema creation failure, invalid enum values
`DbLocked { reason }`	`mailbox database is locked: {reason}`	WAL lock contention, query failure
`UnknownAgent { name }`	`unknown agent: {name}`	Message sent to non-existent agent
`SelfSend`	`agent cannot send a message to itself`	Agent tries to send a message to itself
`NotFound { id }`	`message not found: {id}`	Reply to a non-existent message ID

AgentError

Agent lifecycle errors — spawn failures may be retried with backoff.

Variant	Display Message	When It Occurs
`SpawnFailed { name, reason }`	`failed to spawn agent {name}: {reason}`	Backend session failed to start
`Timeout { name, seconds }`	`agent {name} timed out after {seconds}s`	Agent session exceeded timeout

WorkflowError

Workflow definition errors — parse, validation, or inheritance failures.

Variant	Display Message	When It Occurs
`NotFound { name }`	`workflow not found: {name}`	Referenced workflow doesn't exist
`ParseError { reason }`	`failed to parse workflow: {reason}`	YAML/JSON parse failure
`ValidationError { reason }`	`workflow validation failed: {reason}`	General validation failure
`CyclicDependency { cycle }`	`circular dependency detected in workflow stages: {cycle}`	Stage dependency graph has a cycle
`CyclicInheritance { chain }`	`circular inheritance detected: {chain}`	Workflow `extends` chain has a cycle
`DuplicateStage { name }`	`duplicate stage name: {name}`	Two stages have the same name
`InvalidStageName { name }`	`invalid stage name '{name}': must match [a-z][a-z0-9_-]*`	Stage name contains invalid characters
`UnknownDependency { stage, dep }`	`unknown dependency '{dep}' in stage '{stage}'`	Stage depends on non-existent stage
`ParentNotFound { child, parent }`	`parent workflow '{parent}' not found for extends in '{child}'`	`extends` references missing workflow
`CyclicWorkflowRef { chain }`	`circular workflow_ref detected: {chain}`	Nested workflow references form a cycle
`MissingWorkflowRef { stage }`	`workflow_ref stage '{stage}' is missing required 'workflow' field`	Stage type is `workflow_ref` but no workflow specified
`IoError { reason }`	`I/O error: {reason}`	File system errors during workflow loading

SessionError

Session lifecycle errors — stale sessions prompt user action.

Variant	Display Message	When It Occurs
`ActiveSession { id, pid }`	`session {id} is already active (pid {pid})`	Attempting to create session when one already exists
`StaleLockfile { path }`	`stale lockfile found at {path}; another session may be running`	Lockfile exists but process state is ambiguous
`RecoveryNeeded`	`previous session did not shut down cleanly; recovery is needed`	Session artifacts remain from crashed session
`IoError { reason }`	`session I/O error: {reason}`	File read/write/delete failures

Error Handling Strategy

Swarm follows ADR-009:

Library code uses thiserror enums for precise error types
Binary/orchestrator uses anyhow for ergonomic error propagation
All thiserror types automatically convert to anyhow::Error via the ? operator

ADR-009: Error Handling — Design rationale
Agent Lifecycle — How errors affect agent state
State Transitions — Error-triggered transitions

State Transitions

Complete transition table for the agent state machine defined in agent::state.

Transition Table

From State	Event	To State	Side Effect
`Initializing`	`WorktreeReady`	`BuildingPrompt`	`None`
`BuildingPrompt`	`PromptReady(prompt)`	`Spawning`	`StorePrompt(prompt)`
`Spawning`	`SessionStarted(seq)`	`Running { session_seq: seq }`	`None` (resets consecutive_errors to 0)
`Spawning`	`SessionExited(Error)`	`CoolingDown` or `Stopped`	`None` or `LogFatal` (if threshold reached)
`Spawning`	`SessionExited(Timeout)`	`CoolingDown` or `Stopped`	`None` or `LogFatal` (if threshold reached)
`Running`	`SessionExited(Success)`	`SessionComplete`	`None` (resets consecutive_errors to 0)
`Running`	`SessionExited(Error)`	`CoolingDown` or `Stopped`	`None` or `LogFatal` (if threshold reached)
`Running`	`SessionExited(Timeout)`	`CoolingDown` or `Stopped`	`None` or `LogFatal` (if threshold reached)
`Running`	`UrgentMessage`	`Interrupting { session_seq }`	`CancelSession`
`Interrupting`	`SessionExited(*)`	`BuildingPrompt`	`None` (no error increment)
`Interrupting`	`GraceExceeded`	`BuildingPrompt`	`ForceStopSession`
`SessionComplete`	`WorktreeReady`	`BuildingPrompt`	`IncrementSession` (bumps session_seq)
`CoolingDown`	`BackoffElapsed`	`BuildingPrompt`	`None`

Global Transitions

These events are valid from any state:

Event	To State	Side Effect
`OperatorStop`	`Stopped`	`CancelSession` if currently `Running` or `Interrupting`; `None` otherwise
`FatalError(msg)`	`Stopped`	`LogFatal(msg)`

Error Counters

The state machine maintains two counters that affect transitions:

Counter	Incremented On	Reset On	Fatal Threshold
`consecutive_errors`	Any `SessionExited(Error\|Timeout)` from `Running` or `Spawning`	`SessionStarted` or `SessionExited(Success)`	Default: 5 (`max_consecutive_errors`)
`total_errors`	Any `SessionExited(Error\|Timeout)` from `Running` or `Spawning`	Never	Default: 20 (`max_total_errors`)

When a counter reaches its threshold, the state transitions to Stopped with LogFatal instead of CoolingDown.

Note: SessionExited(*) from Interrupting does not increment error counters — interrupts are intentional, not errors.

Backoff Formula

The CoolingDown duration uses exponential backoff:

duration_ms = min(2000 * 2^(n-1), 60000)

Where n = consecutive_errors after increment.

n	Duration
1	2,000 ms (2s)
2	4,000 ms (4s)
3	8,000 ms (8s)
4	16,000 ms (16s)
5	32,000 ms (32s)
6+	60,000 ms (60s, cap)

State Diagram

Initializing ──WorktreeReady──► BuildingPrompt ◄──BackoffElapsed── CoolingDown
                                     │                                  ▲
                                     │ PromptReady                      │
                                     ▼                                  │
                                  Spawning ──Error/Timeout──────────────┤
                                     │                                  │
                                     │ SessionStarted                   │
                                     ▼                                  │
                                  Running ──Error/Timeout───────────────┘
                                   │   │
                      Success      │   │ UrgentMessage
                         │         │   ▼
                         ▼         │ Interrupting ──SessionExited──► BuildingPrompt
                  SessionComplete  │              ──GraceExceeded──► BuildingPrompt
                         │         │
                         │ WorktreeReady
                         └──────────────────────────────────────────► BuildingPrompt

                  OperatorStop or FatalError from ANY state ──────► Stopped

ExitOutcome

The ExitOutcome enum describes how a backend session ended:

Variant	Description
`Success`	Session completed normally
`Error(String)`	Session failed with an error message
`Timeout`	Session exceeded its timeout

Agent Lifecycle — Conceptual overview
Error Types — Error enums used in transitions

Message Schema

Complete SQLite database schema for the swarm messaging system.

Database Location

.swarm/messages.db

PRAGMAs

Set on every connection open:

PRAGMA journal_mode = WAL;
PRAGMA busy_timeout = 5000;

PRAGMA	Value	Purpose
`journal_mode`	`WAL`	Write-Ahead Logging for concurrent readers
`busy_timeout`	`5000`	Wait up to 5 seconds on lock contention before returning SQLITE_BUSY

Table: `messages`

CREATE TABLE IF NOT EXISTS messages (
    id           INTEGER PRIMARY KEY AUTOINCREMENT,
    thread_id    INTEGER REFERENCES messages(id),
    reply_to     INTEGER REFERENCES messages(id),
    sender       TEXT    NOT NULL,
    recipient    TEXT    NOT NULL,
    msg_type     TEXT    NOT NULL DEFAULT 'message',
    urgency      TEXT    NOT NULL DEFAULT 'normal',
    body         TEXT    NOT NULL,
    created_at   INTEGER NOT NULL,
    delivered_at INTEGER
);

Column Details

Column	Type	Nullable	Default	Description
`id`	`INTEGER`	No	Auto-increment	Primary key
`thread_id`	`INTEGER`	Yes	`NULL`	References root message in thread
`reply_to`	`INTEGER`	Yes	`NULL`	References the message being replied to
`sender`	`TEXT`	No	—	Sending agent name or `"operator"`
`recipient`	`TEXT`	No	—	Receiving agent name
`msg_type`	`TEXT`	No	`'message'`	One of: `message`, `task`, `status`, `nudge`
`urgency`	`TEXT`	No	`'normal'`	One of: `normal`, `urgent`
`body`	`TEXT`	No	—	Message content
`created_at`	`INTEGER`	No	—	Epoch nanoseconds (creation time)
`delivered_at`	`INTEGER`	Yes	`NULL`	Epoch nanoseconds (delivery time); `NULL` = pending

Enum Values

msg_type:

message — General inter-agent communication
task — Task assignment or delegation
status — Status updates
nudge — Liveness nudge

urgency:

normal — Standard delivery on next prompt build
urgent — Triggers router interrupt

Indexes

CREATE INDEX IF NOT EXISTS idx_messages_recipient_pending
    ON messages (recipient, delivered_at)
    WHERE delivered_at IS NULL;

CREATE INDEX IF NOT EXISTS idx_messages_urgency_pending
    ON messages (urgency, delivered_at)
    WHERE delivered_at IS NULL AND urgency = 'urgent';

CREATE INDEX IF NOT EXISTS idx_messages_thread
    ON messages (thread_id) WHERE thread_id IS NOT NULL;

Index	Columns	Filter	Used By
`idx_messages_recipient_pending`	`(recipient, delivered_at)`	`delivered_at IS NULL`	`consume()` — fetch pending messages for an agent
`idx_messages_urgency_pending`	`(urgency, delivered_at)`	`delivered_at IS NULL AND urgency = 'urgent'`	`poll_urgent()` — router polling for urgent messages
`idx_messages_thread`	`(thread_id)`	`thread_id IS NOT NULL`	`thread()` — retrieve all messages in a thread

Key Queries

Send Message

INSERT INTO messages (sender, recipient, msg_type, urgency, body, created_at)
VALUES (?1, ?2, ?3, ?4, ?5, ?6)

Reply to Message

INSERT INTO messages (thread_id, reply_to, sender, recipient, msg_type, urgency, body, created_at)
VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7, ?8)

The thread_id is inherited from the original message's thread (or uses the original's ID as the root).

Consume Pending Messages

-- Read pending messages
SELECT * FROM messages WHERE recipient = ?1 AND delivered_at IS NULL ORDER BY created_at;

-- Mark as delivered
UPDATE messages SET delivered_at = ?1 WHERE recipient = ?2 AND delivered_at IS NULL;

Both operations happen in a single transaction for atomicity.

Poll Urgent Messages

SELECT id, recipient FROM messages
WHERE urgency = 'urgent' AND delivered_at IS NULL;

Used by the router every 100ms to find messages that need interrupt delivery.

Prune Old Messages

DELETE FROM messages
WHERE delivered_at IS NOT NULL
  AND id NOT IN (
    SELECT id FROM messages
    WHERE delivered_at IS NOT NULL
    ORDER BY delivered_at DESC
    LIMIT 1000
  );

Keeps the 1000 most recently delivered messages; runs every 300 seconds.

WAL Checkpoint

PRAGMA wal_checkpoint(TRUNCATE);

Runs every 60 seconds to reclaim WAL file space.

Messaging — Conceptual overview
ADR-002: SQLite Messaging — Design rationale

ADR-001: Tokio Async Runtime

Status

Accepted

Context

Swarm orchestrates multiple concurrent agent processes, polls SQLite for messages, manages timers (backoff, checkpoints), and handles OS signals. We need an async runtime to avoid blocking the main thread.

Decision

Use Tokio with full features (rt-multi-thread, macros, process, signal, time, sync, io-util, fs).

Alternatives Considered

Alternative	Why rejected
`async-std`	Smaller ecosystem, less mature process/signal support
Threads only (no async)	Would require manual thread pools, select-like logic is painful
`smol`	Lighter, but we need Tokio's `process::Command` and `signal` support

Consequences

All I/O-heavy functions are async fn.
rusqlite is synchronous — wrap calls in tokio::task::spawn_blocking or accept brief blocking on the runtime (acceptable since DB ops are <1ms with WAL).
Process spawning uses tokio::process::Command.
The TUI event loop must integrate with Tokio (crossterm polling + tokio::select).

Tradeoffs

Tokio is a large dependency but well-maintained.
Learning curve for contributors unfamiliar with async Rust.

ADR-002: SQLite WAL for Inter-Agent Messaging

Status

Accepted

Context

Agents need to send messages to each other. The messaging system must support:

Direct messages (agent-to-agent)
Broadcasts (one-to-all)
Urgency levels (normal vs urgent)
Delivery tracking (consumed vs pending)
Concurrent readers/writers (orchestrator + N agents + swarm-msg)

Decision

Use SQLite in WAL mode as the messaging transport, stored at .swarm/messages.db.

Alternatives Considered

Alternative	Why rejected
File-based (one file per message)	Race conditions, no atomicity, glob storms
Unix domain sockets	Requires a broker process, more complex
Redis/NATS	External dependency, overkill for local orchestration
Named pipes / FIFOs	No persistence, no multi-reader, fragile
Shared memory	Complex, no persistence

Key Pragmas

PRAGMA journal_mode = WAL;       -- concurrent reads during writes
PRAGMA busy_timeout = 5000;      -- wait up to 5s if locked
PRAGMA synchronous = NORMAL;     -- safe with WAL, faster than FULL

Consequences

Single file for all messaging state — easy to inspect, backup, clean.
swarm-msg binary can write directly to DB without going through orchestrator.
WAL allows concurrent reads (agents polling) while writer inserts.
Must checkpoint periodically to prevent WAL file growth.
Must prune old delivered messages to bound DB size.

Invariants

A message is inserted once and never modified except to mark delivered_at.
consume_messages() reads + marks delivered in a single transaction.
Self-send is rejected at insert time (sender == recipient).
Broadcast creates N-1 rows (one per recipient, excluding sender).

ADR-003: AgentBackend Trait Abstraction

Status

Accepted

Context

Currently we target Claude Code CLI (claude -p) as the agent backend. However, we want the architecture to support future backends (e.g., other LLM CLIs, HTTP APIs, local models) without rewriting the agent engine.

Decision

Define an AgentBackend trait and an AgentProcess trait. The orchestrator interacts only with these traits.

Trait Design

#![allow(unused)]
fn main() {
trait AgentBackend: Send + Sync {
    fn id(&self) -> &str;
    fn spawn_session(&self, config: &SessionConfig) -> Result<Box<dyn AgentProcess>>;
}

trait AgentProcess: Send {
    fn pid(&self) -> u32;
    async fn wait(&mut self) -> ExitOutcome;
    fn terminate(&self) -> Result<()>;  // SIGTERM
    fn kill(&self) -> Result<()>;       // SIGKILL
    fn supports_injection(&self) -> bool;
}
}

Current Implementation: ClaudeBackend

Spawns claude -p <prompt> --dangerously-skip-permissions --model <model> --output-format json
Sets working directory to agent's worktree
Injects SWARM_* environment variables
Isolates process in own process group via setsid (pre_exec)
Redirects stdout+stderr to agent's current.log

Alternatives Considered

Alternative	Why rejected
Direct Claude coupling	No testing, no future extensibility
Plugin system (dylib)	Over-engineered for v1
Config-driven command template	Less type-safe, harder to test

Consequences

Easy to write a MockBackend for integration tests.
Adding a new backend requires only implementing two traits.
supports_injection() returns false for Claude (no stdin injection); future backends may return true.

ExitOutcome

#![allow(unused)]
fn main() {
enum ExitOutcome {
    Success,
    Error { code: Option<i32>, signal: Option<i32> },
    Timeout,
}
}

ADR-004: Fresh Agent Sessions (No --resume)

Status

Accepted

Context

Claude Code CLI supports --resume to continue a previous conversation. We must decide whether agents resume or start fresh each iteration.

Decision

Use fresh sessions every time. Each agent invocation is claude -p <prompt> with no --resume flag.

Rationale

Prompt injection is our primary control mechanism — we assemble a new prompt each iteration with current messages, beads state, and context.
--resume would carry stale context and make it harder to steer agents.
Fresh sessions give us a clean slate with precise control over what the agent sees.
Session sequence number (session_seq) tracks iterations for logging.

Tradeoffs

Aspect	Fresh sessions	--resume
Context control	Full	Partial (can't unsay things)
Token cost	Higher (re-inject system prompt)	Lower
Simplicity	Simpler orchestrator logic	More complex state tracking
Failure recovery	Clean restart	Must handle corrupt resume state

Consequences

Every prompt must be self-contained (all context re-injected).
The 5-stage prompt pipeline (system + user prompt + messages + beads + session context) runs every iteration.
session_seq increments per spawn, used for log archival naming.
No need to manage conversation IDs or resume tokens.

ADR-005: Foreground Process (Not Daemon)

Status

Accepted

Context

The orchestrator could run as a background daemon or a foreground process attached to a terminal.

Decision

Run as a foreground process. The user starts swarm start in a terminal and the TUI takes over. The process exits when the user quits or sends SIGINT/SIGTERM.

Rationale

Simpler lifecycle management — no PID files, no orphan daemons.
TUI requires a terminal anyway.
Users can use tmux/screen for persistence.
--no-tui mode still runs in foreground, just without the UI.

Alternatives Considered

Alternative	Why rejected
Daemon with `swarm attach`	Complex (double-fork, PID management, socket for TUI attach)
systemd service	Over-engineered, not portable

Consequences

Session lockfile contains the orchestrator PID for staleness detection.
swarm stop from another terminal sends SIGTERM to the orchestrator PID.
Signal handler catches SIGINT/SIGTERM and triggers graceful shutdown.
swarm status checks if the PID is alive to determine session liveness.

ADR-006: Git Worktree Isolation Per Agent

Status

Accepted

Context

Multiple agents work on the same repository concurrently. They must not interfere with each other's file changes.

Decision

Each agent gets its own git worktree branching from the base commit. The supervisor also gets its own worktree for integration.

Branch Naming

Agent branches: swarm/<session-id>/<agent-name>
Supervisor branch: swarm/<session-id>/supervisor
Beads branch: swarm/<session-id>/beads (shared, optimistic concurrency)

Worktree Location

Worktrees are created under .swarm/worktrees/<agent-name>/.

Lifecycle

swarm start: Create worktrees for all agents + supervisor from HEAD.
During session: Agents commit to their own branches freely.
swarm stop:
- Auto-commit any dirty worktrees.
- Based on stop mode (merge/squash/discard), combine branches.
- Remove worktrees and delete branches.

Locking

Worktrees are locked (git worktree lock) during active sessions to prevent accidental removal.

Alternatives Considered

Alternative	Why rejected
Separate clones	Wastes disk space, slow
Shared working directory with stash	Race conditions, impossible with concurrent agents
Docker containers per agent	Heavy, slow startup

Consequences

Requires git >= 2.20 for reliable worktree support.
.swarm/ must be in .git/info/exclude to avoid tracking.
Recovery must handle orphaned worktrees from crashed sessions.
Worktree creation/removal is async (tokio::process::Command).

Stop Modes

Mode	Behavior
`--merge`	Merge each agent branch into the original branch (default)
`--squash`	Squash-merge each agent's work into a single commit
`--discard`	Delete branches without merging

ADR-007: Full TUI From Day One

Status

Accepted

Context

We could build a headless-only MVP first and add a TUI later, or build the TUI as a core feature from the start.

Decision

Build the full Ratatui TUI as a core feature. swarm start launches the TUI by default; --no-tui provides headless mode.

Rationale

Observability is critical for a multi-agent system — users need to see what agents are doing in real time.
Building the TUI later would require retrofitting state observation APIs.
Ratatui is lightweight and well-suited for this use case.

Stack

Ratatui for layout/rendering
Crossterm as the terminal backend

Consequences

TUI event loop must integrate with Tokio (poll crossterm events alongside async tasks).
Agent states are exposed via watch::Receiver<AgentState> channels for the TUI to observe.
Log viewing is file-based (tail agent's current.log).
The TUI is the primary shutdown trigger (user presses q).

ADR-008: Beads CLI as External Dependency

Status

Accepted

Context

Swarm agents need a task/work-item system to claim, work on, and close units of work. The beads CLI (bd) provides this functionality.

Decision

Require beads CLI (bd) to be pre-installed. Swarm does not bundle or install it. If bd is not found at startup, error with installation instructions.

Integration Points

Prompt builder (Stage 4): Runs bd ready --json to get available tasks, includes in agent prompt.
Agent prompt instructions: Agents are told to use bd claim, bd close as part of their workflow.
Status command: May query beads summary for swarm status output.
Shared beads branch: swarm/<session-id>/beads with optimistic concurrency.

Alternatives Considered

Alternative	Why rejected
Built-in task system	Reinventing the wheel, beads already works well
Bundle beads as a library	Tight coupling, harder to update independently
Make beads optional	Core workflow depends on task assignment

Consequences

Pre-flight check at swarm start: verify bd is in PATH and functional.
Beads state is captured as a subprocess call (stdout parsing).
Beads branch conflicts are handled by optimistic retry.

ADR-009: Error Handling Strategy

Status

Accepted

Context

The swarm orchestrator has multiple error domains (config, git, messaging, agents, sessions) that surface at different layers. We need a consistent strategy for error propagation, user-facing messages, and recovery.

Decision

Use anyhow for application-level error propagation and thiserror for domain-specific error enums in errors.rs.

Domain Error Types

#![allow(unused)]
fn main() {
// All defined in swarm/src/errors.rs
enum ConfigError    { MissingFile, ParseError, ValidationError, VersionMismatch }
enum GitError       { NotARepo, DirtyTree, WorktreeOp, VersionTooOld }
enum MessagingError { DbOpen, DbLocked, UnknownAgent, SelfSend }
enum AgentError     { SpawnFailed, BinaryNotFound, Timeout }
enum SessionError   { StaleLockfile, RecoveryNeeded }
}

Each variant carries a human-readable message via #[error("...")].

Propagation Rules

Library-style modules (config.rs, messaging.rs, session.rs, worktree.rs) return Result<T, SpecificError> using domain error types.
Orchestrator and runner use anyhow::Result and convert via ? (automatic From impls from thiserror).
CLI layer (main.rs) catches anyhow::Error, prints user-friendly messages, and sets exit codes.

User-Facing Error Messages

All errors surfaced to the user include: what failed, why, and what to do.
Example: "Git version 2.17 is too old. Swarm requires git >= 2.20. Please upgrade git."
Internal errors (panics, unexpected states) are logged with full context to orchestrator.log and shown to the user as "internal error, see .swarm/orchestrator.log".

Recovery vs Fatal

Category	Behavior
Config errors	Fatal at startup. Fix config and retry.
Git prereq errors	Fatal at startup. Fix environment and retry.
Agent spawn errors	Per-agent retry with backoff (see state machine).
Messaging DB errors	Fatal if DB can't open. Transient locks retried via `busy_timeout`.
Session stale	Prompt user: recover or clean and restart.

Alternatives Considered

anyhow only (no domain types): Simpler but loses structured matching for tests and recovery logic.
Custom error type with enum: More boilerplate than thiserror for the same result.
eyre + color-eyre: Better panic reports but adds dependency; anyhow is sufficient with tracing.

Consequences

Domain errors are testable via pattern matching.
anyhow provides clean ?-chaining in orchestrator code.
Error messages are consistent and actionable.
thiserror derives keep boilerplate minimal.

ADR-010: Shared Beads Branch with Optimistic Concurrency

Status

Accepted

Context

Swarm agents need to coordinate task assignment through beads. Each agent runs in its own git worktree on its own branch. The beads database (.beads/) must be accessible to all agents for claiming, closing, and querying tasks.

The question: how do agents share beads state when they're on different branches?

Decision

Use a shared beads branch (swarm/<session-id>/beads) that all agent worktrees can access. Beads operations use optimistic concurrency: agents read from the branch, make changes, and push. If a push fails (another agent pushed first), the agent rebases and retries.

Mechanism

At session start, create_beads_branch() creates swarm/<session>/beads from HEAD.
Each agent's prompt includes beads state read from this shared branch.
When an agent runs bd claim or bd close, beads writes to the local .beads/ directory and commits to the shared beads branch.
If two agents try to update beads simultaneously, one push will fail. The beads CLI handles the retry (pull-rebase-push).

Why not per-agent beads?

Per-agent beads would require a merge step and could lead to conflicting claims (two agents claiming the same bead). A shared branch with optimistic concurrency prevents double-claims at the git level.

Alternatives Considered

1. Beads in SQLite (alongside messages)

Pro: No git contention, fast.
Con: Beads is an external tool with its own storage. Duplicating its state in SQLite would require constant syncing. Beads' git-native design is a feature, not a limitation.

2. Per-agent beads with supervisor merge

Pro: No contention between agents.
Con: Double-claims are possible (two agents claim same bead before supervisor merges). Adds complexity to supervisor and delays visibility.

3. File-locking beads

Pro: Simple mutual exclusion.
Con: Worktrees are on different filesystem paths. File locks don't work across worktrees easily. Also fragile under crash scenarios.

Consequences

Agents may occasionally see a brief retry delay when beads pushes conflict.
The shared branch is created/deleted per session (no permanent branch pollution).
The beads CLI must be pre-installed and support the shared branch workflow.
The prompt builder reads beads state from the shared branch (Stage 4 of prompt pipeline).

Tradeoffs

Aspect	Impact
Simplicity	Medium — leverages git's existing conflict resolution
Correctness	High — git push prevents double-claims
Performance	Slightly slower than SQLite for high-contention scenarios
Crash safety	Good — git reflog provides recovery

Development

Development setup, build process, and contribution workflow for swarm.

Prerequisites

Tool	Version	Purpose
Rust	Latest stable	Build the project
Git	2.20+	Required for worktree operations
`bd` (beads)	Latest	Issue tracking

Clone and Build

git clone <repo-url> swarm
cd swarm
cargo build

The workspace has a single member crate at swarm/.

Build with WASM Support

To enable the WASM sandbox feature:

cargo build --features wasm-sandbox

This requires wasmtime v41 and adds significant compile time. Only enable it if you're working on WASM tool functionality.

Release Build

cargo build --release

Project Structure

swarm/
├── swarm/                    # Main crate
│   ├── src/
│   │   ├── lib.rs            # Module declarations (22+ pub modules)
│   │   ├── main.rs           # Entry point
│   │   ├── cli.rs            # CLI argument parsing (clap)
│   │   ├── config.rs         # Configuration loading and validation
│   │   ├── orchestrator.rs   # Session lifecycle management
│   │   ├── session.rs        # Session info and lockfile
│   │   ├── router.rs         # Message routing and urgent polling
│   │   ├── mailbox.rs        # SQLite messaging
│   │   ├── worktree.rs       # Git worktree operations
│   │   ├── prompt.rs         # 14-section prompt pipeline
│   │   ├── errors.rs         # 6 error enums (thiserror)
│   │   ├── agent/
│   │   │   ├── state.rs      # Agent state machine (8 states)
│   │   │   ├── runner.rs     # Agent execution loop
│   │   │   └── registry.rs   # Agent registry
│   │   ├── tools/
│   │   │   ├── mod.rs        # Tool trait, ToolResult, ExecutionMode
│   │   │   ├── registry.rs   # ToolRegistry
│   │   │   ├── context.rs    # ToolContext
│   │   │   ├── bash.rs       # BashTool
│   │   │   ├── read.rs       # ReadTool
│   │   │   ├── write.rs      # WriteTool
│   │   │   ├── edit.rs       # EditTool
│   │   │   ├── glob.rs       # GlobTool
│   │   │   ├── grep.rs       # GrepTool
│   │   │   ├── web_fetch.rs  # WebFetchTool
│   │   │   ├── web_search.rs # WebSearchTool
│   │   │   ├── mailbox.rs    # MailboxTool
│   │   │   ├── task.rs       # TaskTool
│   │   │   ├── ask_user.rs   # AskUserTool
│   │   │   ├── skill.rs      # SkillTool
│   │   │   ├── notebook_edit.rs # NotebookEditTool
│   │   │   └── wasm/         # WASM sandbox (feature-gated)
│   │   ├── skills/           # Skill discovery and resolution
│   │   ├── permissions/      # Permission evaluation
│   │   ├── mcp/              # MCP server integration
│   │   ├── hooks/            # Lifecycle hooks
│   │   └── tui/              # Terminal UI (ratatui)
│   └── Cargo.toml
├── specs/                    # Specification documents
├── docs/                     # mdBook documentation
├── wit/                      # WASM Interface Types
├── Cargo.toml                # Workspace root
└── CLAUDE.md                 # Agent instructions

Key Dependencies

Crate	Purpose
`tokio`	Async runtime (full features)
`clap`	CLI argument parsing
`ratatui` + `crossterm`	Terminal UI
`rusqlite` (bundled)	SQLite for messaging
`anthropic`	Anthropic API SDK
`reqwest`	HTTP client (rustls-tls)
`serde` + `serde_json`	Serialization
`anyhow` + `thiserror`	Error handling
`tracing`	Structured logging
`wasmtime` (optional)	WASM runtime

Running Tests

cargo test

Run tests for a specific module:

cargo test --lib tools::bash

Run with logging:

RUST_LOG=debug cargo test -- --nocapture

Logging

Swarm uses the tracing crate with tracing-subscriber. Control verbosity via RUST_LOG:

# Default
RUST_LOG=info cargo run -- start

# Debug output
RUST_LOG=debug cargo run -- start

# Per-module filtering
RUST_LOG=swarm=debug,rusqlite=warn cargo run -- start

# Quiet
RUST_LOG=warn cargo run -- start

Error Handling Conventions

Swarm follows ADR-009:

Library code: Use thiserror enums for precise, typed errors. There are 6 error enums in errors.rs covering config, git, mailbox, agent, workflow, and session errors.
Binary/orchestrator: Use anyhow for ergonomic error propagation with the ? operator.
All thiserror types automatically convert to anyhow::Error.

Contribution Workflow

Swarm uses a spec-first workflow. See Spec Workflow for full details.

Quick Summary

Pick a task: bd ready
Claim it: bd update <id> --status in_progress
If the task involves new design, write specs first in specs/
Implement the code
Write tests
Run quality gates: cargo test && cargo clippy
Close the task: bd close <id>
Commit and push:

git add <files>
git commit -m "feat: description of change"
git pull --rebase
bd sync
git push

Commit Messages

Use conventional commits:

Prefix	When to Use
`feat:`	New feature
`fix:`	Bug fix
`refactor:`	Code restructuring without behavior change
`docs:`	Documentation only
`test:`	Adding or updating tests
`chore:`	Build, CI, tooling changes

Session Completion

When ending a work session, you must complete the landing sequence:

File issues for remaining work (bd create)
Run quality gates (cargo test, cargo clippy)
Update issue status (bd close, bd update)

Push to remote:

git pull --rebase
bd sync
git push
git status  # Must show "up to date with origin"

Work is not complete until git push succeeds.

CI

The project uses GitHub Actions for CI. The docs pipeline is configured in .github/workflows/docs.yml for building and deploying the mdBook site.

Quality gates that should pass before pushing:

cargo build — Successful compilation
cargo test — All tests pass
cargo clippy — No lint warnings

Spec Workflow — How to write specs and file tasks
Adding Tools — How to add new tools
Architecture — System architecture overview

Spec Workflow

How to create and manage specification documents in the swarm project.

Overview

Swarm uses a spec-first development process. Before implementing a feature or making significant changes, you write specification documents that capture the design, contracts, and invariants. These specs serve as the authoritative source of truth for implementation and are stored in specs/.

Why Specs First?

Clarity — Forces you to think through the design before writing code
Reviewability — Easier to review a design document than a large code PR
Agent-friendly — AI agents can read specs to understand what to implement
Traceability — Each implementation task links back to its spec

Spec Categories

The specs/ directory contains three categories of documents:

Architecture Decision Records (ADRs)

ADRs document significant architectural choices with rationale. They live in docs/src/adr/ and are numbered sequentially:

docs/src/adr/
  001-tokio-async.md
  002-sqlite-messaging.md
  003-agent-backend-trait.md
  ...

Write an ADR when you need to:

Choose between competing approaches (e.g., SQLite vs Redis for messaging)
Make a decision that affects the whole system (e.g., async runtime choice)
Document why something was done a certain way for future contributors

Contract Documents

Contracts define the precise interface and behavior of a module or subsystem. They're the most common spec type:

specs/
  contract-config-schema.md
  contract-agent-state-machine.md
  contract-mailbox-schema.md
  ...

A contract includes:

File location (which source files implement it)
Type definitions (structs, enums, traits)
Method signatures and behavior
Invariants and validation rules
Error handling

Plan Documents

Plans describe a multi-step implementation strategy. They're used for larger efforts that span multiple sessions:

specs/
  plan-sdk-integration.md
  plan-mailbox-migration.md

Writing a Spec

Step 1: Choose the Right Type

Situation	Spec Type
Choosing between approaches	ADR
Defining a module's interface	Contract
Planning a multi-session effort	Plan

Step 2: Create the File

Use a descriptive name with the appropriate prefix:

# Contract
touch specs/contract-my-feature.md

# ADR (in docs)
touch docs/src/adr/011-my-decision.md

Step 3: Write the Content

Contract Template

# Contract: [Feature Name]

## File Location

`swarm/src/my_module.rs`

## Overview

Brief description of what this module does and why it exists.

## Type Definitions

```rust
pub struct MyStruct {
    pub field: Type,
}

pub enum MyEnum {
    Variant1,
    Variant2 { data: String },
}

Methods

Method	Signature	Description
`new()`	`fn new() -> Self`	Create instance
`process()`	`async fn process(&self, input: Value) -> Result<Output>`	Process input

Invariants

[Rule that must always hold]
[Another rule]

Error Handling

Error	When	Recovery
`MyError::NotFound`	Item doesn't exist	Return error to caller

[Link to related spec or ADR]


#### ADR Template

```markdown
# ADR-NNN: [Title]

## Status

Accepted | Proposed | Superseded by ADR-XXX

## Context

What problem are we solving? What constraints exist?

## Decision

What did we decide?

## Consequences

### Positive
- Benefit 1
- Benefit 2

### Negative
- Tradeoff 1

### Neutral
- Observation

Step 4: File Tasks from the Spec

After writing the spec, create beads tasks that reference it:

bd create "Implement MyStruct per specs/contract-my-feature.md"
bd create "Add tests for MyStruct per specs/contract-my-feature.md"
bd create "Add MyStruct to registry per specs/contract-my-feature.md"

Break large specs into small, independently implementable tasks. Each task should:

Reference the spec file
Describe a specific deliverable
Be completable in a single session

The Full Workflow

1. Identify need       →  "We need feature X"
2. Write spec          →  specs/contract-feature-x.md
3. Review spec         →  Get feedback, iterate
4. File tasks          →  bd create "Implement X per specs/..."
5. Implement           →  Pick up tasks with bd ready
6. Verify              →  Check implementation against spec
7. Update spec         →  If the implementation diverged, update the spec

Spec Conventions

Do

Be precise about types — Use real Rust type signatures, not pseudocode
Include invariants — Document rules that must always hold
List error cases — Every error variant with when it occurs
Link related specs — Cross-reference ADRs and other contracts
Keep specs updated — When implementation changes, update the spec

Don't

Don't write implementation code — Specs describe what, not how
Don't duplicate — Reference other specs instead of copying
Don't over-specify — Leave implementation details to the implementer
Don't write stale specs — Archive or update specs that no longer apply

Existing Specs

The specs/ directory currently contains 59 documents:

10 ADRs — Major architectural decisions
~46 contracts — Module interfaces and behavior
3 plans — Multi-session implementation strategies

Use ls specs/ to see all spec files, or check specs/README.md for an index.

Development — Development setup and workflow
Beads Workflow — Task tracking with beads
Architecture — System architecture overview

Adding Tools

How to add new built-in tools to swarm.

Overview

Tools are the primary way agents interact with the outside world. Each tool implements the Tool trait and is registered in the ToolRegistry. This guide walks through adding a new native tool from scratch.

The Tool Trait

Every tool implements this trait from swarm/src/tools/mod.rs:

#![allow(unused)]
fn main() {
pub trait Tool: Send + Sync {
    /// Unique tool name (e.g., "bash", "read", "my_tool")
    fn name(&self) -> &str;

    /// Human-readable description for the model
    fn description(&self) -> &str;

    /// JSON Schema describing the tool's input parameters
    fn input_schema(&self) -> Value;

    /// Whether the tool runs natively or in a sandbox (default: Native)
    fn execution_mode(&self) -> ExecutionMode {
        ExecutionMode::Native
    }

    /// Execute the tool with the given input and context
    fn execute<'a>(
        &'a self,
        input: Value,
        ctx: &'a ToolContext,
    ) -> Pin<Box<dyn Future<Output = Result<ToolResult>> + Send + 'a>>;
}
}

The five methods:

Method	Required	Purpose
`name()`	Yes	Unique identifier used in tool calls
`description()`	Yes	Shown to the model to explain what the tool does
`input_schema()`	Yes	JSON Schema the model uses to construct input
`execution_mode()`	No (default: `Native`)	`Native` or `Sandboxed`
`execute()`	Yes	Performs the actual work

Step 1: Create the Tool File

Create a new file in swarm/src/tools/:

touch swarm/src/tools/my_tool.rs

Step 2: Implement the Tool Trait

#![allow(unused)]
fn main() {
use anyhow::Result;
use serde_json::Value;
use std::future::Future;
use std::pin::Pin;

use super::{Tool, ToolResult};
use super::context::ToolContext;

pub struct MyTool;

impl MyTool {
    pub fn new() -> Self {
        Self
    }
}

impl Tool for MyTool {
    fn name(&self) -> &str {
        "my_tool"
    }

    fn description(&self) -> &str {
        "A brief description of what this tool does. \
         The model reads this to decide when to use the tool."
    }

    fn input_schema(&self) -> Value {
        serde_json::json!({
            "type": "object",
            "properties": {
                "query": {
                    "type": "string",
                    "description": "The search query"
                },
                "limit": {
                    "type": "integer",
                    "description": "Maximum number of results",
                    "default": 10
                }
            },
            "required": ["query"]
        })
    }

    fn execute<'a>(
        &'a self,
        input: Value,
        ctx: &'a ToolContext,
    ) -> Pin<Box<dyn Future<Output = Result<ToolResult>> + Send + 'a>> {
        Box::pin(async move {
            // Parse input
            let query = input["query"]
                .as_str()
                .ok_or_else(|| anyhow::anyhow!("missing required field: query"))?;

            let limit = input["limit"].as_u64().unwrap_or(10);

            // Use context for working directory, agent info, etc.
            let _working_dir = &ctx.working_dir;

            // Check permissions if needed
            let decision = ctx.check_permission("my_tool", &input);
            // Handle decision...

            // Do the actual work
            let result = format!("Found results for '{}' (limit: {})", query, limit);

            Ok(ToolResult::text(result))
        })
    }
}
}

Key Points

ToolResult::text(s) — Creates a successful result with text content
ToolResult::error(s) — Creates an error result (displayed to the model as an error)
ctx.working_dir — The agent's working directory (its worktree)
ctx.agent_name — The name of the calling agent
ctx.session_id — The current session ID
ctx.is_cancelled() — Check if the agent's session has been cancelled
ctx.check_permission(tool, input) — Check permission for this operation

Step 3: Register the Module

Add the module to swarm/src/tools/mod.rs:

#![allow(unused)]
fn main() {
mod my_tool;
}

Then register the tool in the default_registry() function in swarm/src/tools/registry.rs:

#![allow(unused)]
fn main() {
pub fn default_registry() -> ToolRegistry {
    let mut registry = ToolRegistry::new();
    registry.register(Box::new(bash::BashTool::new()));
    registry.register(Box::new(read::ReadTool::new()));
    // ... existing tools ...
    registry.register(Box::new(my_tool::MyTool::new()));
    registry
}
}

Step 4: Write Tests

Add tests in the same file or in a separate test module:

#![allow(unused)]
fn main() {
#[cfg(test)]
mod tests {
    use super::*;
    use crate::tools::context::ToolContext;
    use serde_json::json;
    use std::path::PathBuf;

    #[tokio::test]
    async fn test_my_tool_basic() {
        let tool = MyTool::new();
        let ctx = ToolContext::new(
            PathBuf::from("/tmp/test"),
            "test-agent".to_string(),
            "test-session".to_string(),
        );

        let input = json!({
            "query": "hello"
        });

        let result = tool.execute(input, &ctx).await.unwrap();
        assert!(!result.is_error);
    }

    #[tokio::test]
    async fn test_my_tool_missing_query() {
        let tool = MyTool::new();
        let ctx = ToolContext::new(
            PathBuf::from("/tmp/test"),
            "test-agent".to_string(),
            "test-session".to_string(),
        );

        let input = json!({});
        let result = tool.execute(input, &ctx).await;
        assert!(result.is_err());
    }

    #[test]
    fn test_schema_is_valid() {
        let tool = MyTool::new();
        let schema = tool.input_schema();
        assert_eq!(schema["type"], "object");
        assert!(schema["properties"]["query"].is_object());
    }
}
}

Step 5: Run Tests

cargo test --lib tools::my_tool

ToolResult Details

The ToolResult struct supports text and image content:

#![allow(unused)]
fn main() {
pub struct ToolResult {
    pub content: Vec<ToolResultContent>,
    pub is_error: bool,
}

pub enum ToolResultContent {
    Text(String),
    Image { media_type: String, data: String },
}
}

Helper constructors:

Method	Creates
`ToolResult::text("output")`	Successful text result
`ToolResult::error("message")`	Error result displayed to model

For richer results, construct ToolResult directly:

#![allow(unused)]
fn main() {
Ok(ToolResult {
    content: vec![
        ToolResultContent::Text("Found 3 files:".to_string()),
        ToolResultContent::Text("- src/main.rs\n- src/lib.rs\n- Cargo.toml".to_string()),
    ],
    is_error: false,
})
}

ToolContext Details

The ToolContext provides execution context:

#![allow(unused)]
fn main() {
pub struct ToolContext {
    pub working_dir: PathBuf,           // Agent's worktree directory
    pub agent_name: String,             // Name of the calling agent
    pub session_id: String,             // Current session ID
    pub env_vars: HashMap<String, String>, // Agent environment variables
    pub cancellation_token: CancellationToken, // Cancellation signal
    pub permissions: Option<Arc<PermissionEvaluator>>, // Permission checker
}
}

Use ctx.is_cancelled() to check for cancellation in long-running tools. This is important for tools that perform loops or wait for external resources.

Walkthrough: Existing Tool

Here's how the built-in GlobTool is structured (simplified):

#![allow(unused)]
fn main() {
pub struct GlobTool;

impl Tool for GlobTool {
    fn name(&self) -> &str { "glob" }

    fn description(&self) -> &str {
        "Fast file pattern matching tool that works with any codebase size"
    }

    fn input_schema(&self) -> Value {
        json!({
            "type": "object",
            "properties": {
                "pattern": {
                    "type": "string",
                    "description": "Glob pattern (e.g., **/*.rs)"
                },
                "path": {
                    "type": "string",
                    "description": "Directory to search in"
                }
            },
            "required": ["pattern"]
        })
    }

    fn execute<'a>(
        &'a self,
        input: Value,
        ctx: &'a ToolContext,
    ) -> Pin<Box<dyn Future<Output = Result<ToolResult>> + Send + 'a>> {
        Box::pin(async move {
            let pattern = input["pattern"].as_str()
                .ok_or_else(|| anyhow::anyhow!("missing pattern"))?;

            let base_dir = input["path"].as_str()
                .map(PathBuf::from)
                .unwrap_or_else(|| ctx.working_dir.clone());

            let full_pattern = base_dir.join(pattern);
            let matches = glob::glob(full_pattern.to_str().unwrap())?;

            let files: Vec<String> = matches
                .filter_map(|entry| entry.ok())
                .map(|p| p.display().to_string())
                .collect();

            Ok(ToolResult::text(files.join("\n")))
        })
    }
}
}

Pattern to follow:

Parse input from JSON, validating required fields
Use ctx.working_dir as default base path
Perform the operation
Return ToolResult::text() for success or ToolResult::error() for failures

Checklist

Before submitting your tool:

Implements all required Tool trait methods
Input schema has "type": "object" with documented properties
Required fields are listed in "required" array
Description is clear enough for the model to know when to use it
Tool handles missing/invalid input gracefully
Module is declared in tools/mod.rs
Tool is registered in default_registry() in tools/registry.rs
Tests cover basic usage and error cases
cargo test passes
cargo clippy has no warnings

Tools Concept — How the tool system works
WASM Tools — Building sandboxed tools with WASM
Development — Development setup and workflow