Agent Lifecycle

Each swarm agent follows a deterministic state machine that drives its lifecycle from initialization through multiple backend sessions to eventual shutdown. The state machine is defined in agent::state and executed by the runner loop in agent::runner.

Agent States

The AgentState enum defines 8 observable states:

State	Description
`Initializing`	Agent registered; waiting for its git worktree to be ready
`BuildingPrompt`	Assembling the system prompt (environment, role, tools, messages, tasks)
`Spawning`	Prompt stored; launching a backend session with the LLM provider
`Running { session_seq }`	Backend session is active; `session_seq` tracks which session iteration
`Interrupting { session_seq }`	Graceful cancellation requested (urgent message received); waiting for session exit
`SessionComplete`	Backend session exited successfully; ready for next iteration
`CoolingDown { until }`	Session failed; waiting for exponential backoff to elapse
`Stopped`	Terminal state — agent will not run again

Check if an agent has reached its terminal state with AgentState::is_terminal(), which returns true only for Stopped.

Agent Events

The AgentEvent enum defines the events that drive state transitions:

Event	Trigger
`WorktreeReady`	Git worktree created and ready for use
`PromptReady(String)`	System prompt assembled successfully
`SessionStarted(u32)`	Backend session launched (carries the session sequence number)
`SessionExited(ExitOutcome)`	Backend session ended — `Success`, `Error(String)`, or `Timeout`
`UrgentMessage`	Router detected an urgent message for this agent
`GraceExceeded`	Interruption grace period expired without session exit
`BackoffElapsed`	CoolingDown timer expired
`OperatorStop`	Operator requested shutdown (global — valid from any state)
`FatalError(String)`	Unrecoverable error (global — valid from any state)

Side Effects

Each transition returns a SideEffect telling the runner what action to take:

SideEffect	Runner Action
`None`	No action needed
`StorePrompt(String)`	Save the assembled prompt for the next spawn
`CancelSession`	Request graceful cancellation of the current backend session
`ForceStopSession`	Force-stop the session immediately (grace period exceeded)
`IncrementSession`	Bump session sequence counter and loop back to BuildingPrompt
`LogFatal(String)`	Log the fatal error message; agent is now Stopped

State Diagram

                    ┌──────────────┐
                    │ Initializing │
                    └──────┬───────┘
                           │ WorktreeReady
                    ┌──────▼────────┐
              ┌────►│ BuildingPrompt │◄─────────────────────┐
              │     └──────┬────────┘                       │
              │            │ PromptReady                    │
              │     ┌──────▼───────┐                        │
              │     │   Spawning   │──── SessionExited ─────┤
              │     └──────┬───────┘    (Error/Timeout)     │
              │            │ SessionStarted                 │
              │     ┌──────▼───────┐                 ┌──────┴──────┐
              │     │   Running    │── Error/Timeout─►│ CoolingDown │
              │     └──┬───┬───────┘                 └──────┬──────┘
              │        │   │ UrgentMessage                  │ BackoffElapsed
              │        │   │                                │
              │        │ ┌─▼────────────┐                   │
              │        │ │ Interrupting  │──────────────────►│
              │        │ └──────────────┘                   │
              │        │ SessionExited(Success)              │
              │ ┌──────▼────────┐                           │
              │ │SessionComplete│                           │
              │ └──────┬────────┘                           │
              │        │ WorktreeReady                      │
              └────────┘◄───────────────────────────────────┘

        ── OperatorStop or FatalError from ANY state ──► Stopped

Error Thresholds and Backoff

The state machine tracks two error counters:

Counter	Default Limit	Behavior
`consecutive_errors`	5 (`max_consecutive_errors`)	Reset to 0 on `SessionStarted` or `SessionExited(Success)`
`total_errors`	20 (`max_total_errors`)	Never reset; accumulates across all sessions

When either counter reaches its limit, the agent transitions to Stopped with a LogFatal side effect.

Backoff Formula

When an error occurs, the agent enters CoolingDown with exponential backoff:

duration_ms = min(2000 * 2^(n-1), 60000)

Where n is consecutive_errors (after increment). Examples:

Consecutive Errors	Backoff Duration
1	2,000 ms
2	4,000 ms
3	8,000 ms
4	16,000 ms
5	32,000 ms
6+	60,000 ms (cap)

Agent Registry

The AgentRegistry (agent::registry) is the central data structure that tracks all running agents:

AgentHandle — Bundles an agent's resolved config, state watch channel, interrupt sender, and task join handle
Registration — register() adds a new agent handle; each agent gets a unique name
State queries — states() returns a snapshot of all agent states; state_of(name) queries a single agent
Interrupt delivery — interrupt_senders() returns a map of interrupt channels for the router
Shutdown — shutdown() sends OperatorStop to all agents and awaits their task handles

Runner Loop

The run_agent() function in agent::runner is the top-level entry point for each agent's lifecycle:

Setup — Create worktree, initialize environment variables, fire SessionStart hook
State machine loop — Process events, execute side effects, manage the backend session
Session iteration — On SessionComplete + WorktreeReady, increment sequence and rebuild prompt
Interrupt handling — On UrgentMessage, cancel the session with a grace period; force-stop on GraceExceeded
Cleanup — On Stopped, archive session logs, fire SessionEnd hook, prune old logs

The runner manages environment variables injected into each backend session:

SWARM_AGENT_ID — The agent's name
SWARM_SESSION_ID — The current session ID
SWARM_DB_PATH — Path to the SQLite mailbox database
SWARM_AGENTS — Comma-separated list of all agent names

State Transitions — Full transition table
Architecture — How agents fit in the overall system
Orchestration — How the orchestrator manages agent runners