Agent Lifecycle
Each swarm agent follows a deterministic state machine that drives its lifecycle from initialization through multiple backend sessions to eventual shutdown. The state machine is defined in agent::state and executed by the runner loop in agent::runner.
Agent States
The AgentState enum defines 8 observable states:
| State | Description |
|---|---|
Initializing | Agent registered; waiting for its git worktree to be ready |
BuildingPrompt | Assembling the system prompt (environment, role, tools, messages, tasks) |
Spawning | Prompt stored; launching a backend session with the LLM provider |
Running { session_seq } | Backend session is active; session_seq tracks which session iteration |
Interrupting { session_seq } | Graceful cancellation requested (urgent message received); waiting for session exit |
SessionComplete | Backend session exited successfully; ready for next iteration |
CoolingDown { until } | Session failed; waiting for exponential backoff to elapse |
Stopped | Terminal state — agent will not run again |
Check if an agent has reached its terminal state with AgentState::is_terminal(), which returns true only for Stopped.
Agent Events
The AgentEvent enum defines the events that drive state transitions:
| Event | Trigger |
|---|---|
WorktreeReady | Git worktree created and ready for use |
PromptReady(String) | System prompt assembled successfully |
SessionStarted(u32) | Backend session launched (carries the session sequence number) |
SessionExited(ExitOutcome) | Backend session ended — Success, Error(String), or Timeout |
UrgentMessage | Router detected an urgent message for this agent |
GraceExceeded | Interruption grace period expired without session exit |
BackoffElapsed | CoolingDown timer expired |
OperatorStop | Operator requested shutdown (global — valid from any state) |
FatalError(String) | Unrecoverable error (global — valid from any state) |
Side Effects
Each transition returns a SideEffect telling the runner what action to take:
| SideEffect | Runner Action |
|---|---|
None | No action needed |
StorePrompt(String) | Save the assembled prompt for the next spawn |
CancelSession | Request graceful cancellation of the current backend session |
ForceStopSession | Force-stop the session immediately (grace period exceeded) |
IncrementSession | Bump session sequence counter and loop back to BuildingPrompt |
LogFatal(String) | Log the fatal error message; agent is now Stopped |
State Diagram
┌──────────────┐
│ Initializing │
└──────┬───────┘
│ WorktreeReady
┌──────▼────────┐
┌────►│ BuildingPrompt │◄─────────────────────┐
│ └──────┬────────┘ │
│ │ PromptReady │
│ ┌──────▼───────┐ │
│ │ Spawning │──── SessionExited ─────┤
│ └──────┬───────┘ (Error/Timeout) │
│ │ SessionStarted │
│ ┌──────▼───────┐ ┌──────┴──────┐
│ │ Running │── Error/Timeout─►│ CoolingDown │
│ └──┬───┬───────┘ └──────┬──────┘
│ │ │ UrgentMessage │ BackoffElapsed
│ │ │ │
│ │ ┌─▼────────────┐ │
│ │ │ Interrupting │──────────────────►│
│ │ └──────────────┘ │
│ │ SessionExited(Success) │
│ ┌──────▼────────┐ │
│ │SessionComplete│ │
│ └──────┬────────┘ │
│ │ WorktreeReady │
└────────┘◄───────────────────────────────────┘
── OperatorStop or FatalError from ANY state ──► Stopped
Error Thresholds and Backoff
The state machine tracks two error counters:
| Counter | Default Limit | Behavior |
|---|---|---|
consecutive_errors | 5 (max_consecutive_errors) | Reset to 0 on SessionStarted or SessionExited(Success) |
total_errors | 20 (max_total_errors) | Never reset; accumulates across all sessions |
When either counter reaches its limit, the agent transitions to Stopped with a LogFatal side effect.
Backoff Formula
When an error occurs, the agent enters CoolingDown with exponential backoff:
duration_ms = min(2000 * 2^(n-1), 60000)
Where n is consecutive_errors (after increment). Examples:
| Consecutive Errors | Backoff Duration |
|---|---|
| 1 | 2,000 ms |
| 2 | 4,000 ms |
| 3 | 8,000 ms |
| 4 | 16,000 ms |
| 5 | 32,000 ms |
| 6+ | 60,000 ms (cap) |
Agent Registry
The AgentRegistry (agent::registry) is the central data structure that tracks all running agents:
AgentHandle— Bundles an agent's resolved config, state watch channel, interrupt sender, and task join handle- Registration —
register()adds a new agent handle; each agent gets a unique name - State queries —
states()returns a snapshot of all agent states;state_of(name)queries a single agent - Interrupt delivery —
interrupt_senders()returns a map of interrupt channels for the router - Shutdown —
shutdown()sendsOperatorStopto all agents and awaits their task handles
Runner Loop
The run_agent() function in agent::runner is the top-level entry point for each agent's lifecycle:
- Setup — Create worktree, initialize environment variables, fire
SessionStarthook - State machine loop — Process events, execute side effects, manage the backend session
- Session iteration — On
SessionComplete+WorktreeReady, increment sequence and rebuild prompt - Interrupt handling — On
UrgentMessage, cancel the session with a grace period; force-stop onGraceExceeded - Cleanup — On
Stopped, archive session logs, fireSessionEndhook, prune old logs
The runner manages environment variables injected into each backend session:
SWARM_AGENT_ID— The agent's nameSWARM_SESSION_ID— The current session IDSWARM_DB_PATH— Path to the SQLite mailbox databaseSWARM_AGENTS— Comma-separated list of all agent names
Related
- State Transitions — Full transition table
- Architecture — How agents fit in the overall system
- Orchestration — How the orchestrator manages agent runners