Orchestration

The orchestrator is the top-level component that manages the entire swarm lifecycle. It implements the 13-step start flow, handles shutdown, and coordinates all subsystems.

Session Management

Each swarm run creates a session represented by a SessionInfo struct:

Field	Type	Description
`id`	`String`	Format `YYYYMMDD-XXXX` (date + 4 random hex chars, e.g. `20250115-a3f2`)
`base_commit`	`String`	The HEAD commit hash at session start
`agents`	`Vec<String>`	List of agent names from the config
`started_at`	`DateTime<Utc>`	UTC timestamp of session creation
`pid`	`u32`	Process ID of the orchestrator (used for liveness checks)

Session state is persisted in .swarm/session.json alongside a lockfile containing the PID. Both files are written atomically (temp file then rename).

Stale Session Detection

A session is considered stale if its owning process no longer exists. This is checked using libc::kill(pid, 0):

Returns 0 — process alive, session is active
Returns -1 with ESRCH — process gone, session is stale

Stale sessions are automatically recovered before creating a new one.

The 13-Step Start Flow

When you run swarm start, the orchestrator executes these steps in order:

Step 1: Load Configuration

Read ~/.swarm/settings.json, validate the version, look up the project by its canonicalized path, and resolve all defaults into a ResolvedConfig.

Step 2: Validate Git Prerequisites

Check git version >= 2.20
Verify the project is a git repository
Confirm HEAD is not detached

Step 3: Handle `--init` Flag

If --init is set and the repo needs initialization, run init_git_repo(). If the repo already exists, this is a no-op.

Step 4: Handle Working Tree State

If --stash is set: auto-stash uncommitted changes (git stash push --include-untracked -m "swarm auto-stash")
Otherwise: require a clean working tree (git status --porcelain must be empty)

Step 5: Check for Stale Session

If .swarm/session.json exists:

If the process is alive: bail with "session already active"
If the process is dead: recover the stale session (auto-commit, remove worktrees, delete branches)

Step 6: Create Session

Generate a session ID, write session.json and lockfile atomically.

Step 7: Create Worktrees

For each agent and the supervisor, create a git worktree:

git worktree add .swarm/worktrees/<name> -b swarm/<session_id>/<name> <base_commit>

Lock each worktree to prevent accidental pruning.

Step 8: Initialize SQLite

Open (or create) the mailbox database at .swarm/messages.db with WAL mode enabled.

Step 9: Create Agent Runners and Registry

For each resolved agent config:

Create an AgentHandle with state channels and interrupt sender
Spawn the run_agent() task on Tokio
Register in the AgentRegistry

Step 10: Start Message Router

Launch the async router loop that polls for urgent messages every 100ms and delivers InterruptSignals to the appropriate agent channels.

Step 11: Start Periodic Tasks

WAL checkpoint: Every 60 seconds, run PRAGMA wal_checkpoint(TRUNCATE)
Message prune: Every 300 seconds, delete old delivered messages (keep recent 1000)

Step 12: Launch TUI or Headless Mode

Default: Launch the TUI with agent panels, log viewer, and command input
--no-tui: Run in headless mode, logging to stdout

Step 13: Await Shutdown

Block until a shutdown signal is received (SIGTERM, TUI quit, or all agents stopped), then execute graceful shutdown.

Stop Modes

When a session is stopped (swarm stop), agent branches are handled according to the stop mode:

Mode	Flag	Behavior
Merge	`--merge` (default)	`git merge --no-ff` each agent branch into the base branch, in config order
Squash	`--squash`	`git merge --squash` each agent branch, creating a single commit per agent
Discard	`--discard`	Delete agent branches without merging any changes

The merge order is: agent branches first (in the order defined in settings.json), then the supervisor branch.

Shutdown Sequence

The graceful shutdown sequence runs inside the orchestrator process:

Signal all agents — Send OperatorStop to each agent via the registry
Wait for agents — Wait for all agents to reach the Stopped state
Stop router — Signal the router's shutdown channel
Auto-commit — For each worktree (agents + supervisor), commit any dirty changes
Merge branches — Apply the selected stop mode (merge/squash/discard)
Remove worktrees — Unlock and remove each worktree
Prune worktrees — Run git worktree prune to clean stale references
Delete branches — Remove all swarm/<session_id>/* branches
Remove session — Delete session.json and lockfile
Exit

When swarm stop is run from a separate terminal:

Load the session from .swarm/session.json
Send SIGTERM to the orchestrator PID
Wait up to 60 seconds for the process to exit
If session files remain after exit, perform cleanup from the stop side

Status Command

swarm status provides a snapshot of the current session:

Session: 20250115-a3f2 (active)
Started: 2025-01-15T10:30:00Z (2h 15m ago)
Base commit: abc123def456
PID: 12345

Agents:
  ● backend          Running (1h 23m)
  ● frontend         Running (45m 12s)
  ● reviewer         SessionComplete (idle 5m)

Beads: 3 ready, 2 claimed, 8 closed

With --json, the output is a structured JSON object including agent states, liveness data, and beads summary.

Architecture — How the orchestrator fits in the system
Agent Lifecycle — Agent state machine details
Worktrees — Git worktree operations
ADR-005: Foreground Process — Why swarm runs in the foreground