Architecture

This page describes swarm's high-level architecture: the crate structure, module map, data flow between components, and key design decisions.

Crate Structure

Swarm is a single-crate Rust project organized as a Cargo workspace with one member:

swarm/                  # Workspace root
├── Cargo.toml          # Workspace manifest (members = ["swarm"])
└── swarm/              # Main crate
    ├── Cargo.toml      # Crate manifest
    └── src/
        ├── lib.rs      # Module declarations and crate-level docs
        ├── main.rs     # Binary entry point (CLI parsing → orchestrator)
        └── ...         # All modules below

The crate exposes a library (swarm_lib) and a binary (swarm). The binary is a thin wrapper that parses CLI arguments and delegates to the orchestrator.

Module Map

Module	Purpose
`cli`	CLI argument parsing via `clap` (commands, flags, subcommands)
`config`	Settings file loading, validation, and resolution (raw → resolved types)
`orchestrator`	Top-level session lifecycle: start (13-step flow), stop, status
`session`	Session ID generation, `session.json` management, PID-based liveness
`agent::state`	Agent state machine (`AgentState`, `AgentEvent`, `SideEffect`)
`agent::runner`	Agent lifecycle loop driver (prompt → spawn → run → repeat)
`agent::registry`	Central registry of all running agents and their handles
`backend`	`AgentBackend` trait abstraction for LLM providers (Anthropic, mock)
`prompt`	14-section prompt assembly pipeline (`build_prompt()`)
`mailbox`	SQLite-backed per-agent message broker with threading and urgency
`router`	Async message router that polls for urgent messages and sends interrupts
`tools`	`Tool` trait, `ToolRegistry`, and all built-in tools
`tools::wasm`	WASM sandboxed tool execution (feature-gated: `wasm-sandbox`)
`permissions`	Permission rules, sets, modes, and evaluation logic
`skills`	Skill discovery, frontmatter parsing, argument substitution, resolution
`mcp`	Model Context Protocol client, transport (HTTP/SSE/Stdio), and manager
`hooks`	Hook configuration, event types, and script execution
`worktree`	Git worktree creation, cleanup, merging, and recovery
`tui`	Terminal UI application (agent panels, log viewer, event viewer, input)
`liveness`	Agent liveness monitoring (idle nudges, stall detection, warnings)
`iteration`	Iteration engine for repeated task-solving loops
`workflow`	Workflow pipeline definitions and execution
`conversation`	Conversation history management
`context_window`	Context window size tracking and management
`supervisor`	Supervisor agent logic and merge-focused prompt
`tasks`	Task system integration
`modes`	Agent execution modes (code, delegate, etc.)
`logging`	Structured logging setup
`errors`	Error types for all subsystems
`history`	Session history and archiving

Data Flow

Session Start (13-Step Flow)

CLI (swarm start)
  │
  ├── 1. Load config (~/.swarm/settings.json)
  ├── 2. Validate git prerequisites (version, repo, not detached)
  ├── 3. Handle --init flag
  ├── 4. Handle --stash or require clean working tree
  ├── 5. Check for stale session + recovery
  ├── 6. Create session (session.json + lockfile)
  ├── 7. Create worktrees (one per agent + supervisor)
  ├── 8. Initialize SQLite mailbox database
  ├── 9. Create agent runners + registry
  ├── 10. Start message router (100ms poll loop)
  ├── 11. Start periodic tasks (WAL checkpoint, message prune)
  ├── 12. Launch TUI or headless mode
  └── 13. Await shutdown signal → graceful shutdown

Agent Lifecycle Loop

Each agent runs independently through its state machine:

Initializing → BuildingPrompt → Spawning → Running → SessionComplete
                    ↑                          │            │
                    │                          │            │
                    │    ┌─────── CoolingDown ←┘ (on error) │
                    │    │  (exponential backoff)            │
                    │    ↓                                   │
                    └────┴───────────────────────────────────┘
                                                    (next session)

The runner loop for each agent:

Build prompt — Assembles a 14-section system prompt with environment info, role, tools, pending messages, beads tasks, etc.
Spawn backend session — Sends the prompt to the configured LLM provider (Anthropic API)
Run — The backend session executes, making tool calls that the runner handles
Handle exit — On success, transition to SessionComplete; on error, enter CoolingDown with exponential backoff
Repeat — After cooldown or session complete, rebuild prompt and spawn again

Message Flow

Agent A                    SQLite DB                    Agent B
   │                          │                            │
   ├── send(to=B, body) ─────►│                            │
   │                          ├── INSERT INTO messages ────►│
   │                          │                            │
   │                     Router (100ms poll)                │
   │                          ├── poll_urgent() ───────────►│
   │                          │   (if urgent)     InterruptSignal
   │                          │                            │
   │                          │◄── consume() ──────────────┤
   │                          │   (next prompt build)      │

Shutdown Flow

SIGTERM received (or operator stop)
  │
  ├── Signal all agents: OperatorStop event
  ├── Wait for all agents to reach Stopped state
  ├── Auto-commit any dirty worktrees
  ├── Merge/squash/discard agent branches (based on StopMode)
  ├── Remove worktrees and prune
  ├── Delete session branches
  ├── Remove session.json + lockfile
  └── Exit

Key Dependencies

Dependency	Used For
`tokio`	Async runtime (ADR-001)
`clap`	CLI argument parsing
`serde` / `serde_json`	Configuration and message serialization
`rusqlite`	SQLite mailbox (ADR-002)
`ratatui`	Terminal UI rendering (ADR-007)
`tracing`	Structured logging
`reqwest`	HTTP client for Anthropic API and MCP transports
`chrono`	Timestamp handling
`anyhow` / `thiserror`	Error handling (ADR-009)
`wasmtime`	WASM sandbox runtime (optional, feature-gated)
`libc`	Process liveness checks (kill signal 0)

Design Decisions

The architecture is shaped by several key decisions documented in ADRs:

ADR-001: Tokio Async Runtime — All I/O is async via Tokio, enabling concurrent agent runners, router polling, and periodic tasks in a single process.
ADR-002: SQLite WAL for Messaging — Inter-agent messaging uses a single SQLite database in WAL mode, providing durability without a separate message broker.
ADR-003: Agent Backend Abstraction — The AgentBackend trait decouples agent logic from LLM providers, enabling mock backends for testing.
ADR-004: Fresh Sessions — Each session starts fresh; there is no --resume flag. Agents rebuild context from prompt + messages.
ADR-005: Foreground Process — Swarm runs as a foreground process (not a daemon), simplifying lifecycle management and TUI integration.
ADR-006: Git Worktree Isolation — Each agent gets its own git worktree and branch, preventing file conflicts between parallel agents.
ADR-007: TUI First — The TUI is the primary interface, with headless mode as an alternative (--no-tui).
ADR-008: Beads Integration — The bd CLI is used for issue tracking, injected into agent prompts as available tasks.
ADR-009: Error Handling — thiserror for library error types, anyhow at the binary/orchestrator level.
ADR-010: Shared Beads Branch — All agents share a single beads branch with optimistic concurrency for issue tracking data.

Component Interactions

                    ┌──────────────┐
                    │     CLI      │
                    └──────┬───────┘
                           │
                    ┌──────▼───────┐
                    │ Orchestrator │──────────────────┐
                    └──────┬───────┘                  │
                           │                          │
              ┌────────────┼────────────┐      ┌──────▼──────┐
              │            │            │      │   Session    │
        ┌─────▼────┐ ┌────▼─────┐ ┌────▼────┐ │  Management │
        │ Agent 1  │ │ Agent 2  │ │ Agent N │ └─────────────┘
        │ Runner   │ │ Runner   │ │ Runner  │
        └────┬─────┘ └────┬─────┘ └────┬────┘
             │            │            │
        ┌────▼────────────▼────────────▼────┐
        │           Agent Registry          │
        └────┬─────────────────────────┬────┘
             │                         │
      ┌──────▼───────┐         ┌──────▼───────┐
      │   Backend    │         │   Mailbox    │
      │  (Anthropic) │         │   (SQLite)   │
      └──────────────┘         └──────┬───────┘
                                      │
                               ┌──────▼───────┐
                               │    Router    │
                               │ (100ms poll) │
                               └──────────────┘

Agent Lifecycle — Detailed state machine walkthrough
Messaging — SQLite mailbox design and message flow
Orchestration — The 13-step start flow in detail
Configuration — Settings file structure and resolution