Messaging

Swarm agents communicate through a SQLite-backed mailbox system. Messages are stored durably in a shared database, delivered to recipients on their next prompt build, and can trigger real-time interrupts for urgent communications.

Design

The messaging system uses SQLite in WAL (Write-Ahead Logging) mode as the message store. This choice (ADR-002) provides:

  • Durability — Messages survive process crashes
  • Concurrent access — WAL mode allows multiple readers with a single writer
  • No external dependencies — No message broker or network service required
  • Simplicity — A single file at .swarm/messages.db

Message Structure

The Message struct represents a single message:

FieldTypeDescription
idi64Auto-incrementing primary key
thread_idOption<i64>ID of the root message in the thread (for grouping)
reply_toOption<i64>ID of the message this is a direct reply to
senderStringName of the sending agent (or "operator" for CLI messages)
recipientStringName of the receiving agent
msg_typeMessageTypeDiscriminator: Message, Task, Status, or Nudge
urgencyUrgencyNormal or Urgent
bodyStringThe message content
created_ati64Epoch nanoseconds when the message was created
delivered_atOption<i64>Epoch nanoseconds when consumed; NULL while pending

MessageType

VariantUsage
MessageGeneral inter-agent communication
TaskTask assignment or delegation
StatusStatus updates between agents
NudgeLiveness nudge from the monitoring system

Urgency

VariantBehavior
NormalDelivered on the recipient's next prompt build
UrgentTriggers an interrupt via the router, causing the recipient to restart its session

Mailbox Operations

The Mailbox struct provides per-agent messaging operations:

OperationDescription
send(recipient, body, msg_type, urgency)Send a message to another agent (self-send rejected)
reply(original_id, body, msg_type, urgency)Reply to an existing message, inheriting thread context
broadcast(recipients, body, msg_type, urgency)Send to multiple agents in a single transaction
consume()Atomically read and mark all pending messages as delivered
thread(thread_id)Retrieve all messages in a conversation thread
outbox(limit)Get recently sent messages

Free functions are also available for use outside the Mailbox context:

  • send_message() — Send a message using a raw connection
  • broadcast_message() — Broadcast to multiple recipients
  • consume_messages() — Consume pending messages for an agent

Message Router

The router module runs an async polling loop that watches for urgent messages:

Router Loop (every 100ms):
  1. poll_urgent(conn) → Vec<UrgentMessage>
  2. For each urgent message:
     a. Skip if already signalled (deduplication via HashSet)
     b. Send InterruptSignal to recipient's mpsc channel
     c. Add to signalled set
  3. Sleep 100ms
  4. Exit on shutdown signal

When the router sends an InterruptSignal, the agent's runner receives an UrgentMessage event, which triggers the interrupt flow:

  1. RunningInterrupting (with CancelSession side effect)
  2. The backend session is gracefully cancelled
  3. On session exit → BuildingPrompt (the new prompt will include the urgent message)

Message Threading

Messages can be organized into threads using thread_id and reply_to:

  • When you send a new message, thread_id and reply_to are NULL
  • When you reply to a message, the reply inherits the original's thread_id (or uses the original's id as the thread root)
  • The thread() method retrieves all messages sharing the same thread_id

Database Configuration

The SQLite database is configured with these PRAGMAs:

PRAGMAValuePurpose
journal_modeWALConcurrent reads, single writer
busy_timeout5000 msWait up to 5 seconds on lock contention

Periodic maintenance tasks run in the background:

TaskIntervalAction
WAL checkpoint60 secondsPRAGMA wal_checkpoint(TRUNCATE) — reclaims WAL file space
Message prune300 secondsDelete old delivered messages, keeping the most recent 1000

Message Flow in Practice

  1. Agent A calls the mailbox tool to send a message to Agent B
  2. The message is INSERTed into the messages table with delivered_at = NULL
  3. If the message is urgent, the router detects it within 100ms and sends an InterruptSignal to Agent B
  4. Agent B's runner cancels its current session and rebuilds the prompt
  5. On the next prompt build, consume() marks all pending messages as delivered and includes them in the system prompt
  6. Agent B reads the messages in its prompt context and responds accordingly