ADR-002: SQLite WAL for Inter-Agent Messaging

Status

Accepted

Context

Agents need to send messages to each other. The messaging system must support:

  • Direct messages (agent-to-agent)
  • Broadcasts (one-to-all)
  • Urgency levels (normal vs urgent)
  • Delivery tracking (consumed vs pending)
  • Concurrent readers/writers (orchestrator + N agents + swarm-msg)

Decision

Use SQLite in WAL mode as the messaging transport, stored at .swarm/messages.db.

Alternatives Considered

AlternativeWhy rejected
File-based (one file per message)Race conditions, no atomicity, glob storms
Unix domain socketsRequires a broker process, more complex
Redis/NATSExternal dependency, overkill for local orchestration
Named pipes / FIFOsNo persistence, no multi-reader, fragile
Shared memoryComplex, no persistence

Key Pragmas

PRAGMA journal_mode = WAL;       -- concurrent reads during writes
PRAGMA busy_timeout = 5000;      -- wait up to 5s if locked
PRAGMA synchronous = NORMAL;     -- safe with WAL, faster than FULL

Consequences

  • Single file for all messaging state — easy to inspect, backup, clean.
  • swarm-msg binary can write directly to DB without going through orchestrator.
  • WAL allows concurrent reads (agents polling) while writer inserts.
  • Must checkpoint periodically to prevent WAL file growth.
  • Must prune old delivered messages to bound DB size.

Invariants

  • A message is inserted once and never modified except to mark delivered_at.
  • consume_messages() reads + marks delivered in a single transaction.
  • Self-send is rejected at insert time (sender == recipient).
  • Broadcast creates N-1 rows (one per recipient, excluding sender).