Skip to content

Changelog#

Full release history. Follows Keep a Changelog / Semantic Versioning where possible — swarm is pre-1.0, so minors may include breaking changes but each is flagged below.

All notable changes to swarm (the ML team agent platform).

Format follows Keep a Changelog. Version bumps are retro-fitted to commit history — no git tags exist yet (see "Next steps" at the bottom).

[Unreleased]#

Roadmap (ordered by expected sequence, subject to customer signal): - Pilot polish for BFSI: screencast demo, single "compliance architecture" doc for CISO review, scripted auditor-reply flow - Auto-retry / HITL escalation on low evaluator grades (W6 follow-up — to be framed as "quality monitor + circuit breaker", not self-learning) - Plugin compat Phase 2 (if marketplace adoption grows): prompt/agent/http hook command types; 10+ additional CC hook events; team_factory integration of plugin-contributed agents - Async orchestration core (only with a ≥4-concurrent-pipelines customer signal) - Postgres migration path for multi-node deployments


[0.11.0] — 2026-04-20 — Claude Code marketplace plugin compatibility#

Commits: dd44c5365438bc (5 commits — one per phase)

Added#

  • Phase A — Drop telemetry (dd44c53): PluginInstallDrops dataclass + install_drops_json SQLite column record every plugin surface seen on disk but not registered. Silent skips in hooks.load_from_plugin upgraded from DEBUG to WARNING. Drops payload surfaces in GET /api/v1/plugins/{name}.
  • Phase B — Shell-command hooks (34b2fde): ml_team/core/shell_hook_runner.py executes CC's {"type": "command", ...} hooks behind the new plugin_shell_hooks_enabled feature flag (EXPERIMENT tier, default OFF). Security model: invoke-time validation reusing run_bash allowlist, ${CLAUDE_PLUGIN_ROOT} substitution, scrubbed env, rlimits on Linux, hard timeout (10s default, 60s max), per-execution audit row in new plugin_shell_executions SQLite table. Exit 2 blocks; JSON stdout {"mutation": {...}} lifted into HookResult.
  • Phase C — commands/ directory (f405eab): ml_team/core/commands_registry.py scans commands/*.md, registers each with optional $ARGUMENTS substitution. REST at GET /plugins/commands + POST /plugins/commands/{qname}/invoke. Feature flag plugin_commands_enabled (FLAG tier, default ON).
  • Phase D — agents/ directory (65438bc): ml_team/core/agents_registry.py scans agents/*.md, forces plugin-{name}::{agent} namespacing so no plugin can shadow a built-in AGENT_DEFS. REST at GET /plugins/agents[?plugin=] + GET /plugins/agents/{qname}. Feature flag plugin_agents_enabled (FLAG tier, default ON).
  • Phase E — Smoke + docs: test_plugin_compat_smoke.py installs real superpowers v5.0.7 from the CC cache and asserts 100% surface registration (14 skills + 1 shell hook + 3 commands + 1 agent, zero silent drops). Automatically skipped in CI when the cache isn't present.

Changed#

  • hooks.load_from_plugin now parses CC's nested {matcher, hooks: [{type: command|python, ...}]} shape (Phase B).
  • scan_install_drops updated phase by phase: command no longer counts as an unsupported type (Phase B); commands/ + agents/ dirs no longer count as drops (Phases C + D). unsupported_hook_types is now strictly prompt / agent / http.
  • PluginInstallation dataclass + _row_to_installation + _save_installation all carry the drops payload.

Security#

  • All shell-hook execution is feature-flag gated, default OFF. Install-time whitelist still applies + runtime command validation + BFSI-grade audit trail.

Tests#

  • +59 new: test_plugin_install_drops.py (13), test_plugin_shell_hooks.py (14), test_plugin_commands.py (16), test_plugin_agents.py (14), test_plugin_compat_smoke.py (2).
  • Regression: 603 passing (was 545), 1 skipped, 0 failing.

Empirical result#

Installing superpowers v5.0.7 before this cycle: 14/14 skills + 0/1 hooks + 0/3 commands + 0/1 agents = ~25% surface retention. After: 14/14 + 1/1 + 3/3 + 1/1 = 100% surface retention, 0 silent drops.


[0.10.2] — 2026-04-20 — Documentation Phase 2#

Commit: 4ad7a9b

Added#

  • ml_team/tools/{IMPLEMENTATION,LEARNING}_README.md
  • ml_team/backends/{IMPLEMENTATION,LEARNING}_README.md
  • ml_team/config/{IMPLEMENTATION,LEARNING}_README.md
  • ml_team/dashboard/{IMPLEMENTATION,LEARNING}_README.md
  • ml_team/tests/IMPLEMENTATION_README.md (IMPL-only by design — tests are their own doc)

Changed#

  • .github/workflows/doc-drift.yml — advisory CI guard now covers 7 subsystems (Phase 1 + Phase 2)

Rationale#

Closes the two-layer doc rollout. hello-swarm deliberately stays on the Phase-1 plugin-README shape (plugins aren't subsystems).


[0.10.1] — 2026-04-20 — Documentation Phase 1#

Commit: f65c27c

Added#

  • MASTER_README.md at repo root — product + system source of truth
  • ml_team/core/{IMPLEMENTATION,LEARNING}_README.md
  • ml_team/api/{IMPLEMENTATION,LEARNING}_README.md
  • ml_team/api/routers/{IMPLEMENTATION,LEARNING}_README.md
  • .github/workflows/doc-drift.yml — advisory CI guard for documented subsystems

Rationale#

New-engineer ramp-up was measured in days. Two-layer model — IMPL (engineering contract) + LEARNING (conceptual) — cuts it to hours while giving regulators a stable doc surface to quote from.


[0.10.0] — 2026-04-20 — Week 7: Compliance + Ops pack#

Commits: 89562631a1a0c1 (7 commits, +113 tests; 432 → 545 green)

Added#

  • W7-1 Unified permission engine (8956263, 72b6c3c)
  • ml_team/core/permissions.py — ALLOW > DENY > ASK > default pipeline with glob tool matching, optional arg regex, priority tiebreak, lazy init
  • ml_team/core/permission_sources.py — 5 default sources: RBAC, agent allowlist, feature flag, HITL, YAML policy
  • ml_team/core/permission_audit.py — SQLite permission_denials persistence + ml_team_permission_decisions_total metric
  • ml_team/api/routers/permissions.pyGET /api/v1/permissions/denials?since=&tool=&agent=
  • ml_team/config/permission_policies.yaml — operator-authored rules (empty default)
  • W7-2 Hook lifecycle (e6e6268)
  • ml_team/core/hooks.py — 5 events: SESSION_START, PRE/POST_TOOL, PRE/POST_COMPACTION
  • AgentRunner integration; plugin-loader ingestion of hooks/hooks.json
  • Reference PII-mask handler in examples/plugins/hello-swarm/
  • W7-3 Cron scheduler (6d1f17e) — vendored from Hermes
  • ml_team/core/cron.py + cron_tasks.py — 4 task kinds (retrain / drift_check / audit_pdf / custom)
  • File-backed store at ~/.swarm/cron/jobs.json, 60s daemon tick
  • REST at /api/v1/cron/*, /cron dashboard page, swarm cron CLI subcommand
  • W7-4 Batch runner (aa4e87a) — vendored from Hermes
  • ml_team/core/batch.py + batch_processors.py — JSONL → inference / echo / custom processors
  • Checkpoints every 10 records, streams results.jsonl, resume-on-restart
  • REST at /api/v1/pipelines/{run_id}/batch

Changed#

  • ToolExecutor.execute + CompositeToolExecutor.execute + require_role + require_approval all route through the permission engine
  • feature_flags.py — added hooks_enabled, cron_scheduler, batch_runner
  • api/database.py::init_db() — adds permission_denials table

Fixed#

  • Cron first-run sentinel flake: interval schedules now fire immediately on boot (previously drifted past the 60s tick)
  • Cron output filename collisions under sub-second job runs (microsecond precision in filename)

Docs#

  • ADRs for all four W7 items in .project/decisions.md (3b0b18c)
  • /transparency dashboard refresh with denial panel

[0.9.0] — 2026-04-20 — Context compaction + evaluator separation#

Commits: fc6c188, d65e305

Added#

  • ml_team/core/context_compaction.py — summarise oldest middle-messages at 80% context window; mechanical fallback on summariser failure
  • ml_team/core/evaluator.py — clean-context grade on agent terminal response with 0–5 score + verdict-override

[0.8.0] — 2026-04-20 — Plugin ecosystem + MCP streamable-HTTP + CLI#

Commits: 54007f3, 1c1a42f, 3fa5aa5, ac7c931, 00db188, f3e018c, a6d1bd7

Added#

  • ml_team/cli.pyswarm CLI: auth, features, pipelines, deployments
  • ml_team/core/plugin_loader.py (Phase A) — Claude Code plugin install/uninstall/reload with SHA-256 manifest pinning; .mcp.json ingestion
  • Plugin skill ingestion (Phase B) — SKILL.md parsing, keyword match, system-prompt injection
  • Streamable-HTTP (SSE) MCP transport — spec 2025-11-25
  • /plugins dashboard page + expanded /transparency
  • Intra-agent parallel tool dispatch (2.9× speedup on multi-tool turns)

Fixed#

  • Critical: ApprovalRequired now propagates through ToolExecutor (was being swallowed by a broad except Exception). HITL gates now fire reliably.

[0.7.0] — 2026-04-20 — Perf + feature flags + retention + transparency#

Commits: b7c82a68f972bb, plus bd330bc, d5fb344, 335660a, 3da2a4a, 96a0fd1, 54007f3

Added#

  • ml_team/core/feature_flags.py — central registry with 3 tiers: INVARIANT / FLAG / EXPERIMENT; resolution order runtime → env → alias → default
  • ml_team/core/retention.py — daemon that prunes conversation JSONL, run_events rows, audit PDFs, shadow predictions past TTL
  • /settings admin controls + /transparency read-only flags + metrics catalogue
  • Micro-benchmark harness + frozen baseline + nightly bench workflow

Changed#

  • Week-1 perf: shared HTTP pool + shared schema cache + Anthropic prompt caching
  • Batched SQLite event writes (6.5× speedup) + per-agent JSONL buffers (3.9× speedup)

[0.6.0] — 2026-04-20 — Week 2: Dashboard UI + CI + docs browser#

Commits: 7142f248152239, fb263eb, 55eeec4, 84cac56, 9e5799f, de3709b, 32dc01a

Added#

  • Login page + deployments view + auth context + OIDC helpers in the dashboard
  • In-app docs browser + sidebar login/logout footer
  • PR workflow + nightly real-LLM golden path
  • System-design brief + extending guide wired into in-app browser

Fixed#

  • CI blockers — broken build-backend, ruff import-sort cleanup
  • Dashboard: missing highlight.js dep for docs page

[0.5.0] — 2026-04-20 — Week 1: BFSI compliance MVP#

Commits: 1e699ef0e92e03 (security, RBAC, tiers, SSO, audit)

Added#

  • Security: per-agent tool allowlists (1e699ef), tool_denied_total metric surfaced at /metrics
  • RBAC MVP: 3 roles (admin/operator/viewer) + JWT + backward-compat API key (817b1b8)
  • Deploy pipeline: real model packaging + Kubernetes manifest generation (0037d9c)
  • Tier-1: train_classifier tool + observability wiring (2e5925a)
  • Tier-2 — RBI FREE-AI compliance bundle: drift detector + fairness audit + SHAP explainability + model cards (712d243)
  • Tier-3 — Champion-challenger MVP: model registry + shadow-traffic log + promotion gate (307ea1c)
  • Audit: single-document PDF export for regulatory sign-off (2a7956e)
  • SSO: OIDC authorization-code flow for Okta / Azure AD / Google (9c6f23f)

Housekeeping#

  • CrewAI / plain-LangGraph prototypes archived under deprecated/ (cb05248)
  • Project scaffolding, architecture docs, AI tool configs (a9a44df)

[0.4.0] — 2026-04-13 — Phase 8: Production readiness#

Commits: 94624bd, 2b090cf, 16e0e94, 410f1c9, 1ab2524, 106c7bb, c44d4b8, 487dc3e, 389d561, 18c1d41, eb2f642, 735d2c7

Added#

  • P0 security: all P0 vulnerabilities fixed
  • P1 infrastructure: bounded thread pool, WebSocket bridge, rate limiting, structured logging
  • P2 reliability: OpenTelemetry tracing, Prometheus metrics, cost budgeting
  • P3 observability: error boundaries, guardrails, Makefile, pre-commit, loading states
  • Dashboard features: chat, HITL UI, controls, persistence, training logs, 3-dot menu, artifact downloads
  • Circuit breaker + 3 specialist agents (LLM, vision, repo_researcher) + dataset explorer + model playground + 8 algorithm repos
  • Comprehensive README (architecture, setup, API, dashboard, tools) + service start/stop/status/logs commands
  • Production readiness test suite: 115/115 pass

Fixed#

  • 6 dashboard bugs: feedback crash, graph status, quality page, polling, error recovery

[0.3.0] — 2026-04-13 — Phases 5–7.5: Hardening + HITL + MCP + StateGraph#

Commits: 532a7bb, ae3781c, 1ca811a, 97a702a

Added#

  • Phase 5 — Agent hardening: 37/37 operational rules, span-based observability, evaluation framework
  • Phase 6 — HITL + persistence: approval gates, project memory (SQLite), org memory (PostgreSQL)
  • Phase 7 — Integration: MCP client, RAG knowledge store, parallel team execution
  • Phase 7.5 — Polish: StateGraph execution, conversation transparency, demo presets

[0.2.0] — 2026-04-12–13 — Phases 0–4: Framework-agnostic core + REST + dashboard#

Commits: 5cba253, 04066e7, 75d8672, 1591d5a, 5a95c60

Added#

  • Phase 1: framework-agnostic core with native / LangGraph / CrewAI backends
  • Phase 2: agent memory, per-agent rules, post-run feedback loop
  • Phase 3: REST API (FastAPI), pipeline execution, agent inspection, WebSocket streaming
  • Phase 4: algorithm repos + customer dashboard

Fixed#

  • Phase 0: INVALID_CHAT_HISTORY crash + path-resolution bugs

[0.1.0] — 2026-04-12 — Initial swarm#

Commits: 8207424, 2c48f38

Added#

  • Multi-agent swarm with 3-model vLLM Docker deployment
  • ML Team Agent baseline: 32 agents, 7 teams, 23 tools

Cross-cutting state (current HEAD 4ad7a9b)#

Dimension Count
Agents 40 across 7 teams (algorithm 9, data 6, deployment 5, evaluation 5, management 4, quality 5, training 6)
Tools 38 callable primitives, 33 tool sets
Algorithm repos 18 (tabular, vision, NLP, fine-tuning)
REST routers 18+ (auth, pipelines, deployments, permissions, cron, batch, plugins, features, …)
Feature flags 20+ registered, 3 tiers
Tests 545 passing, 1 skipped (Docker), 0 failing
Commits 73 total (Apr 12 → Apr 20)
Documented subsystems 7 (two-layer READMEs) + MASTER_README + advisory CI

Next steps for versioning hygiene#

  1. Bump ml_team/pyproject.toml + ml_team/dashboard/package.json to 0.10.2 — both are still 0.1.0.
  2. Annotate git tags retroactivelygit tag -a v0.5.0 0e92e03 -m "Week 1: BFSI compliance MVP" through v0.10.2 4ad7a9b. Signed tags if you maintain a signing key.
  3. Adopt semver going forward. Customer-facing API changes bump minor; bugfixes patch. 1.0.0 when the first BFSI pilot signs off.
  4. Add a PR checklist item — "Did you update CHANGELOG.md under [Unreleased]?" — so this file stops being my job to reconstruct.