Orchestration

The orchestrator is the top-level component that manages the entire swarm lifecycle. It implements the 13-step start flow, handles shutdown, and coordinates all subsystems.

Session Management

Each swarm run creates a session represented by a SessionInfo struct:

FieldTypeDescription
idStringFormat YYYYMMDD-XXXX (date + 4 random hex chars, e.g. 20250115-a3f2)
base_commitStringThe HEAD commit hash at session start
agentsVec<String>List of agent names from the config
started_atDateTime<Utc>UTC timestamp of session creation
pidu32Process ID of the orchestrator (used for liveness checks)

Session state is persisted in .swarm/session.json alongside a lockfile containing the PID. Both files are written atomically (temp file then rename).

Stale Session Detection

A session is considered stale if its owning process no longer exists. This is checked using libc::kill(pid, 0):

  • Returns 0 — process alive, session is active
  • Returns -1 with ESRCH — process gone, session is stale

Stale sessions are automatically recovered before creating a new one.

The 13-Step Start Flow

When you run swarm start, the orchestrator executes these steps in order:

Step 1: Load Configuration

Read ~/.swarm/settings.json, validate the version, look up the project by its canonicalized path, and resolve all defaults into a ResolvedConfig.

Step 2: Validate Git Prerequisites

  • Check git version >= 2.20
  • Verify the project is a git repository
  • Confirm HEAD is not detached

Step 3: Handle --init Flag

If --init is set and the repo needs initialization, run init_git_repo(). If the repo already exists, this is a no-op.

Step 4: Handle Working Tree State

  • If --stash is set: auto-stash uncommitted changes (git stash push --include-untracked -m "swarm auto-stash")
  • Otherwise: require a clean working tree (git status --porcelain must be empty)

Step 5: Check for Stale Session

If .swarm/session.json exists:

  • If the process is alive: bail with "session already active"
  • If the process is dead: recover the stale session (auto-commit, remove worktrees, delete branches)

Step 6: Create Session

Generate a session ID, write session.json and lockfile atomically.

Step 7: Create Worktrees

For each agent and the supervisor, create a git worktree:

git worktree add .swarm/worktrees/<name> -b swarm/<session_id>/<name> <base_commit>

Lock each worktree to prevent accidental pruning.

Step 8: Initialize SQLite

Open (or create) the mailbox database at .swarm/messages.db with WAL mode enabled.

Step 9: Create Agent Runners and Registry

For each resolved agent config:

  1. Create an AgentHandle with state channels and interrupt sender
  2. Spawn the run_agent() task on Tokio
  3. Register in the AgentRegistry

Step 10: Start Message Router

Launch the async router loop that polls for urgent messages every 100ms and delivers InterruptSignals to the appropriate agent channels.

Step 11: Start Periodic Tasks

  • WAL checkpoint: Every 60 seconds, run PRAGMA wal_checkpoint(TRUNCATE)
  • Message prune: Every 300 seconds, delete old delivered messages (keep recent 1000)

Step 12: Launch TUI or Headless Mode

  • Default: Launch the TUI with agent panels, log viewer, and command input
  • --no-tui: Run in headless mode, logging to stdout

Step 13: Await Shutdown

Block until a shutdown signal is received (SIGTERM, TUI quit, or all agents stopped), then execute graceful shutdown.

Stop Modes

When a session is stopped (swarm stop), agent branches are handled according to the stop mode:

ModeFlagBehavior
Merge--merge (default)git merge --no-ff each agent branch into the base branch, in config order
Squash--squashgit merge --squash each agent branch, creating a single commit per agent
Discard--discardDelete agent branches without merging any changes

The merge order is: agent branches first (in the order defined in settings.json), then the supervisor branch.

Shutdown Sequence

The graceful shutdown sequence runs inside the orchestrator process:

  1. Signal all agents — Send OperatorStop to each agent via the registry
  2. Wait for agents — Wait for all agents to reach the Stopped state
  3. Stop router — Signal the router's shutdown channel
  4. Auto-commit — For each worktree (agents + supervisor), commit any dirty changes
  5. Merge branches — Apply the selected stop mode (merge/squash/discard)
  6. Remove worktrees — Unlock and remove each worktree
  7. Prune worktrees — Run git worktree prune to clean stale references
  8. Delete branches — Remove all swarm/<session_id>/* branches
  9. Remove session — Delete session.json and lockfile
  10. Exit

When swarm stop is run from a separate terminal:

  1. Load the session from .swarm/session.json
  2. Send SIGTERM to the orchestrator PID
  3. Wait up to 60 seconds for the process to exit
  4. If session files remain after exit, perform cleanup from the stop side

Status Command

swarm status provides a snapshot of the current session:

Session: 20250115-a3f2 (active)
Started: 2025-01-15T10:30:00Z (2h 15m ago)
Base commit: abc123def456
PID: 12345

Agents:
  ● backend          Running (1h 23m)
  ● frontend         Running (45m 12s)
  ● reviewer         SessionComplete (idle 5m)

Beads: 3 ready, 2 claimed, 8 closed

With --json, the output is a structured JSON object including agent states, liveness data, and beads summary.