Agent Lifecycle

Each swarm agent follows a deterministic state machine that drives its lifecycle from initialization through multiple backend sessions to eventual shutdown. The state machine is defined in agent::state and executed by the runner loop in agent::runner.

Agent States

The AgentState enum defines 8 observable states:

StateDescription
InitializingAgent registered; waiting for its git worktree to be ready
BuildingPromptAssembling the system prompt (environment, role, tools, messages, tasks)
SpawningPrompt stored; launching a backend session with the LLM provider
Running { session_seq }Backend session is active; session_seq tracks which session iteration
Interrupting { session_seq }Graceful cancellation requested (urgent message received); waiting for session exit
SessionCompleteBackend session exited successfully; ready for next iteration
CoolingDown { until }Session failed; waiting for exponential backoff to elapse
StoppedTerminal state — agent will not run again

Check if an agent has reached its terminal state with AgentState::is_terminal(), which returns true only for Stopped.

Agent Events

The AgentEvent enum defines the events that drive state transitions:

EventTrigger
WorktreeReadyGit worktree created and ready for use
PromptReady(String)System prompt assembled successfully
SessionStarted(u32)Backend session launched (carries the session sequence number)
SessionExited(ExitOutcome)Backend session ended — Success, Error(String), or Timeout
UrgentMessageRouter detected an urgent message for this agent
GraceExceededInterruption grace period expired without session exit
BackoffElapsedCoolingDown timer expired
OperatorStopOperator requested shutdown (global — valid from any state)
FatalError(String)Unrecoverable error (global — valid from any state)

Side Effects

Each transition returns a SideEffect telling the runner what action to take:

SideEffectRunner Action
NoneNo action needed
StorePrompt(String)Save the assembled prompt for the next spawn
CancelSessionRequest graceful cancellation of the current backend session
ForceStopSessionForce-stop the session immediately (grace period exceeded)
IncrementSessionBump session sequence counter and loop back to BuildingPrompt
LogFatal(String)Log the fatal error message; agent is now Stopped

State Diagram

                    ┌──────────────┐
                    │ Initializing │
                    └──────┬───────┘
                           │ WorktreeReady
                    ┌──────▼────────┐
              ┌────►│ BuildingPrompt │◄─────────────────────┐
              │     └──────┬────────┘                       │
              │            │ PromptReady                    │
              │     ┌──────▼───────┐                        │
              │     │   Spawning   │──── SessionExited ─────┤
              │     └──────┬───────┘    (Error/Timeout)     │
              │            │ SessionStarted                 │
              │     ┌──────▼───────┐                 ┌──────┴──────┐
              │     │   Running    │── Error/Timeout─►│ CoolingDown │
              │     └──┬───┬───────┘                 └──────┬──────┘
              │        │   │ UrgentMessage                  │ BackoffElapsed
              │        │   │                                │
              │        │ ┌─▼────────────┐                   │
              │        │ │ Interrupting  │──────────────────►│
              │        │ └──────────────┘                   │
              │        │ SessionExited(Success)              │
              │ ┌──────▼────────┐                           │
              │ │SessionComplete│                           │
              │ └──────┬────────┘                           │
              │        │ WorktreeReady                      │
              └────────┘◄───────────────────────────────────┘

        ── OperatorStop or FatalError from ANY state ──► Stopped

Error Thresholds and Backoff

The state machine tracks two error counters:

CounterDefault LimitBehavior
consecutive_errors5 (max_consecutive_errors)Reset to 0 on SessionStarted or SessionExited(Success)
total_errors20 (max_total_errors)Never reset; accumulates across all sessions

When either counter reaches its limit, the agent transitions to Stopped with a LogFatal side effect.

Backoff Formula

When an error occurs, the agent enters CoolingDown with exponential backoff:

duration_ms = min(2000 * 2^(n-1), 60000)

Where n is consecutive_errors (after increment). Examples:

Consecutive ErrorsBackoff Duration
12,000 ms
24,000 ms
38,000 ms
416,000 ms
532,000 ms
6+60,000 ms (cap)

Agent Registry

The AgentRegistry (agent::registry) is the central data structure that tracks all running agents:

  • AgentHandle — Bundles an agent's resolved config, state watch channel, interrupt sender, and task join handle
  • Registrationregister() adds a new agent handle; each agent gets a unique name
  • State queriesstates() returns a snapshot of all agent states; state_of(name) queries a single agent
  • Interrupt deliveryinterrupt_senders() returns a map of interrupt channels for the router
  • Shutdownshutdown() sends OperatorStop to all agents and awaits their task handles

Runner Loop

The run_agent() function in agent::runner is the top-level entry point for each agent's lifecycle:

  1. Setup — Create worktree, initialize environment variables, fire SessionStart hook
  2. State machine loop — Process events, execute side effects, manage the backend session
  3. Session iteration — On SessionComplete + WorktreeReady, increment sequence and rebuild prompt
  4. Interrupt handling — On UrgentMessage, cancel the session with a grace period; force-stop on GraceExceeded
  5. Cleanup — On Stopped, archive session logs, fire SessionEnd hook, prune old logs

The runner manages environment variables injected into each backend session:

  • SWARM_AGENT_ID — The agent's name
  • SWARM_SESSION_ID — The current session ID
  • SWARM_DB_PATH — Path to the SQLite mailbox database
  • SWARM_AGENTS — Comma-separated list of all agent names