Orchestration
The orchestrator is the top-level component that manages the entire swarm lifecycle. It implements the 13-step start flow, handles shutdown, and coordinates all subsystems.
Session Management
Each swarm run creates a session represented by a SessionInfo struct:
| Field | Type | Description |
|---|---|---|
id | String | Format YYYYMMDD-XXXX (date + 4 random hex chars, e.g. 20250115-a3f2) |
base_commit | String | The HEAD commit hash at session start |
agents | Vec<String> | List of agent names from the config |
started_at | DateTime<Utc> | UTC timestamp of session creation |
pid | u32 | Process ID of the orchestrator (used for liveness checks) |
Session state is persisted in .swarm/session.json alongside a lockfile containing the PID. Both files are written atomically (temp file then rename).
Stale Session Detection
A session is considered stale if its owning process no longer exists. This is checked using libc::kill(pid, 0):
- Returns
0— process alive, session is active - Returns
-1withESRCH— process gone, session is stale
Stale sessions are automatically recovered before creating a new one.
The 13-Step Start Flow
When you run swarm start, the orchestrator executes these steps in order:
Step 1: Load Configuration
Read ~/.swarm/settings.json, validate the version, look up the project by its canonicalized path, and resolve all defaults into a ResolvedConfig.
Step 2: Validate Git Prerequisites
- Check git version >= 2.20
- Verify the project is a git repository
- Confirm HEAD is not detached
Step 3: Handle --init Flag
If --init is set and the repo needs initialization, run init_git_repo(). If the repo already exists, this is a no-op.
Step 4: Handle Working Tree State
- If
--stashis set: auto-stash uncommitted changes (git stash push --include-untracked -m "swarm auto-stash") - Otherwise: require a clean working tree (
git status --porcelainmust be empty)
Step 5: Check for Stale Session
If .swarm/session.json exists:
- If the process is alive: bail with "session already active"
- If the process is dead: recover the stale session (auto-commit, remove worktrees, delete branches)
Step 6: Create Session
Generate a session ID, write session.json and lockfile atomically.
Step 7: Create Worktrees
For each agent and the supervisor, create a git worktree:
git worktree add .swarm/worktrees/<name> -b swarm/<session_id>/<name> <base_commit>
Lock each worktree to prevent accidental pruning.
Step 8: Initialize SQLite
Open (or create) the mailbox database at .swarm/messages.db with WAL mode enabled.
Step 9: Create Agent Runners and Registry
For each resolved agent config:
- Create an
AgentHandlewith state channels and interrupt sender - Spawn the
run_agent()task on Tokio - Register in the
AgentRegistry
Step 10: Start Message Router
Launch the async router loop that polls for urgent messages every 100ms and delivers InterruptSignals to the appropriate agent channels.
Step 11: Start Periodic Tasks
- WAL checkpoint: Every 60 seconds, run
PRAGMA wal_checkpoint(TRUNCATE) - Message prune: Every 300 seconds, delete old delivered messages (keep recent 1000)
Step 12: Launch TUI or Headless Mode
- Default: Launch the TUI with agent panels, log viewer, and command input
--no-tui: Run in headless mode, logging to stdout
Step 13: Await Shutdown
Block until a shutdown signal is received (SIGTERM, TUI quit, or all agents stopped), then execute graceful shutdown.
Stop Modes
When a session is stopped (swarm stop), agent branches are handled according to the stop mode:
| Mode | Flag | Behavior |
|---|---|---|
| Merge | --merge (default) | git merge --no-ff each agent branch into the base branch, in config order |
| Squash | --squash | git merge --squash each agent branch, creating a single commit per agent |
| Discard | --discard | Delete agent branches without merging any changes |
The merge order is: agent branches first (in the order defined in settings.json), then the supervisor branch.
Shutdown Sequence
The graceful shutdown sequence runs inside the orchestrator process:
- Signal all agents — Send
OperatorStopto each agent via the registry - Wait for agents — Wait for all agents to reach the
Stoppedstate - Stop router — Signal the router's shutdown channel
- Auto-commit — For each worktree (agents + supervisor), commit any dirty changes
- Merge branches — Apply the selected stop mode (merge/squash/discard)
- Remove worktrees — Unlock and remove each worktree
- Prune worktrees — Run
git worktree pruneto clean stale references - Delete branches — Remove all
swarm/<session_id>/*branches - Remove session — Delete
session.jsonandlockfile - Exit
When swarm stop is run from a separate terminal:
- Load the session from
.swarm/session.json - Send
SIGTERMto the orchestrator PID - Wait up to 60 seconds for the process to exit
- If session files remain after exit, perform cleanup from the stop side
Status Command
swarm status provides a snapshot of the current session:
Session: 20250115-a3f2 (active)
Started: 2025-01-15T10:30:00Z (2h 15m ago)
Base commit: abc123def456
PID: 12345
Agents:
● backend Running (1h 23m)
● frontend Running (45m 12s)
● reviewer SessionComplete (idle 5m)
Beads: 3 ready, 2 claimed, 8 closed
With --json, the output is a structured JSON object including agent states, liveness data, and beads summary.
Related
- Architecture — How the orchestrator fits in the system
- Agent Lifecycle — Agent state machine details
- Worktrees — Git worktree operations
- ADR-005: Foreground Process — Why swarm runs in the foreground