Skip to content

Retention#

How long each artefact type lives, and how the retention daemon enforces it.

Artefact classes + default TTLs#

Artefact Location Default TTL Env var
Conversation JSONL pipeline_runs/<id>/conversations/*.jsonl 90 days SWARM_RETENTION_CONVERSATION_JSONL_DAYS
Run events run_events table 365 days SWARM_RETENTION_RUN_EVENTS_DAYS
Permission denials permission_denials table never pruned (invariant)
Shadow predictions shadow_predictions table 30 days SWARM_RETENTION_SHADOW_PREDICTIONS_DAYS
Audit PDFs pipeline_runs/<id>/audit/*.pdf 2555 days (7 years) SWARM_RETENTION_AUDIT_PDF_DAYS
Model artefacts pipeline_runs/<id>/models/ 365 days SWARM_RETENTION_MODEL_DAYS
Batch results batch_runs/<id>/results.jsonl 90 days SWARM_RETENTION_BATCH_RESULTS_DAYS
Cron output logs ~/.swarm/cron/output/*.log 30 days SWARM_RETENTION_CRON_OUTPUT_DAYS
Metrics (Prometheus) external 30-90 days (Prometheus config)
Traces (OTel) external tracing backend's retention
Logs external log store's retention

Invariants#

Two artefact classes are never pruned regardless of TTL config:

  1. permission_denials table — audit integrity requirement. BFSI / HIPAA / EU AI Act all require this.
  2. run_events entries of kind permission_decision or approval_granted — even if row-level retention prunes other events

Attempting to disable retention on these (e.g. via SWARM_RETENTION_ENABLED=false) will still preserve them. The audit_trail_security_events feature flag is invariant tier (cannot be disabled at runtime).

SWARM_RETENTION_AUDIT_PDF_DAYS=2555          # 7 years (RBI norm)
SWARM_RETENTION_RUN_EVENTS_DAYS=2555
SWARM_RETENTION_MODEL_DAYS=2555
SWARM_RETENTION_CONVERSATION_JSONL_DAYS=365
SWARM_RETENTION_AUDIT_PDF_DAYS=2190          # 6 years (§ 164.316)
SWARM_RETENTION_RUN_EVENTS_DAYS=2190
SWARM_RETENTION_MODEL_DAYS=2190
SWARM_RETENTION_CONVERSATION_JSONL_DAYS=2190
SWARM_RETENTION_AUDIT_PDF_DAYS=3650          # 10 years (Art. 12)
SWARM_RETENTION_RUN_EVENTS_DAYS=3650
SWARM_RETENTION_MODEL_DAYS=3650
SWARM_RETENTION_CONVERSATION_JSONL_DAYS=3650
SWARM_RETENTION_AUDIT_PDF_DAYS=30
SWARM_RETENTION_RUN_EVENTS_DAYS=30
SWARM_RETENTION_CONVERSATION_JSONL_DAYS=7

The retention daemon#

A background process (part of the API lifespan) sweeps every 24 hours.

  • First sweep: 30 seconds after API startup (to make sure DB is up)
  • Subsequent sweeps: every SWARM_RETENTION_DAEMON_INTERVAL_HOURS (default 24)
  • Per-artefact: TTL compared to created_at (conversation JSONL) or file mtime (PDFs); pruned if older

What gets pruned#

For files: file + any accompanying .sig manifest + parent directory if empty.

For DB rows: soft-delete first (deleted_at set), hard-delete after 7-day grace period. Soft-deleted rows still queryable with ?include_deleted=1.

What gets emitted#

{
  "event": "retention.sweep.complete",
  "duration_ms": 2318,
  "pruned": {
    "conversation_jsonl": 142,
    "run_events": 8491,
    "shadow_predictions": 12032,
    "audit_pdfs": 0,
    "models": 3,
    "batch_results": 17,
    "cron_output": 245
  },
  "errors": []
}

Prometheus metric: swarm_retention_pruned_total{artefact_kind}.

Opting out (individual artefact)#

Sometimes you want to freeze a specific pipeline run — e.g., during a regulator investigation:

swarm audit freeze --run-id 7f8e9a2b --reason "Pending RBI inspection"

Creates a frozen flag on the run. Retention daemon skips frozen runs. Unfreeze:

swarm audit unfreeze --run-id 7f8e9a2b

Record of freeze + unfreeze lands in run_events for audit.

Disabling retention entirely#

SWARM_RETENTION_ENABLED=false

Retention daemon doesn't start. Files + rows grow unbounded. Not recommended in production.

Even with this set to false, permission_denials + audit_trail_security_events-kind run_events remain retained (invariant).

Archive to long-term storage#

For artefacts past their operational TTL but retained for compliance, swarm supports archival to cold storage.

Object storage tier transitions#

With storage.backend=s3:

# values.yaml
storage:
  lifecycle:
    - prefix: "audit/"
      transitions:
        - days: 90
          class: GLACIER_IR         # instant-retrieval Glacier
        - days: 365
          class: DEEP_ARCHIVE        # Glacier Deep Archive
      expiration_days: 2555          # 7 years
    - prefix: "conversations/"
      expiration_days: 365           # simple delete; no archival tier

Equivalent for GCS / Azure Blob.

Retrieval from cold storage#

# Request restore (may take hours for Deep Archive)
swarm audit restore --run-id 7f8e9a2b --priority standard
# → restore job queued. Check status:
swarm audit restore-status 7f8e9a2b

Backup-driven retention#

If you rely on DB backups for long-term retention (instead of in-DB retention), configure:

postgres:
  backup:
    enabled: true
    schedule: "0 2 * * *"              # 02:00 nightly
    retention:
      daily: 7
      weekly: 4
      monthly: 12
      yearly: 10                         # 10 years of yearly backups
    destination: s3://acme-swarm-backups/postgres/

And set shorter in-DB retention:

SWARM_RETENTION_RUN_EVENTS_DAYS=90        # in-DB; rest is in backups

Trade-off: faster DB, slower regulator-inspection workflow (must restore from backup).

GDPR right-to-be-forgotten#

Data subjects can request deletion. swarm's retention cannot override this — compliance wins.

swarm audit forget --principal-id <customer_id> --reason "DSR under GDPR Art. 17"

This: - Searches all run_events + conversation journals + artefact metadata for the principal - Pseudonymizes records that must be preserved (e.g., audit trail) - Fully deletes records that can be safely removed (conversation JSONL) - Emits a compliance certificate to give the requester

Logs the deletion in run_events (which itself is preserved — paradox handled by pseudonymization).

Monitoring retention#

Alerts to consider:

  • swarm_retention_pruned_total{artefact_kind="audit_pdfs"} > 0 — something unexpectedly pruned audit PDFs
  • Retention sweep failed — triggers if error in the retention daemon
  • Storage approaching full — cloud-provider-level alert

Next#