Changelog#
Full release history. Follows Keep a Changelog / Semantic Versioning where possible — swarm is pre-1.0, so minors may include breaking changes but each is flagged below.
All notable changes to swarm (the ML team agent platform).
Format follows Keep a Changelog. Version bumps are retro-fitted to commit history — no git tags exist yet (see "Next steps" at the bottom).
[Unreleased]#
Roadmap (ordered by expected sequence, subject to customer signal):
- Pilot polish for BFSI: screencast demo, single "compliance architecture" doc for CISO review, scripted auditor-reply flow
- Auto-retry / HITL escalation on low evaluator grades (W6 follow-up — to be framed as "quality monitor + circuit breaker", not self-learning)
- Plugin compat Phase 2 (if marketplace adoption grows): prompt/agent/http hook command types; 10+ additional CC hook events; team_factory integration of plugin-contributed agents
- Async orchestration core (only with a ≥4-concurrent-pipelines customer signal)
- Postgres migration path for multi-node deployments
[0.11.0] — 2026-04-20 — Claude Code marketplace plugin compatibility#
Commits: dd44c53 → 65438bc (5 commits — one per phase)
Added#
- Phase A — Drop telemetry (
dd44c53):PluginInstallDropsdataclass +install_drops_jsonSQLite column record every plugin surface seen on disk but not registered. Silent skips inhooks.load_from_pluginupgraded from DEBUG to WARNING. Drops payload surfaces inGET /api/v1/plugins/{name}. - Phase B — Shell-command hooks (
34b2fde):ml_team/core/shell_hook_runner.pyexecutes CC's{"type": "command", ...}hooks behind the newplugin_shell_hooks_enabledfeature flag (EXPERIMENT tier, default OFF). Security model: invoke-time validation reusingrun_bashallowlist,${CLAUDE_PLUGIN_ROOT}substitution, scrubbed env, rlimits on Linux, hard timeout (10s default, 60s max), per-execution audit row in newplugin_shell_executionsSQLite table. Exit 2 blocks; JSON stdout{"mutation": {...}}lifted into HookResult. - Phase C —
commands/directory (f405eab):ml_team/core/commands_registry.pyscanscommands/*.md, registers each with optional$ARGUMENTSsubstitution. REST atGET /plugins/commands+POST /plugins/commands/{qname}/invoke. Feature flagplugin_commands_enabled(FLAG tier, default ON). - Phase D —
agents/directory (65438bc):ml_team/core/agents_registry.pyscansagents/*.md, forcesplugin-{name}::{agent}namespacing so no plugin can shadow a built-in AGENT_DEFS. REST atGET /plugins/agents[?plugin=]+GET /plugins/agents/{qname}. Feature flagplugin_agents_enabled(FLAG tier, default ON). - Phase E — Smoke + docs:
test_plugin_compat_smoke.pyinstalls realsuperpowersv5.0.7 from the CC cache and asserts 100% surface registration (14 skills + 1 shell hook + 3 commands + 1 agent, zero silent drops). Automatically skipped in CI when the cache isn't present.
Changed#
hooks.load_from_pluginnow parses CC's nested{matcher, hooks: [{type: command|python, ...}]}shape (Phase B).scan_install_dropsupdated phase by phase:commandno longer counts as an unsupported type (Phase B);commands/+agents/dirs no longer count as drops (Phases C + D).unsupported_hook_typesis now strictlyprompt/agent/http.PluginInstallationdataclass +_row_to_installation+_save_installationall carry the drops payload.
Security#
- All shell-hook execution is feature-flag gated, default OFF. Install-time whitelist still applies + runtime command validation + BFSI-grade audit trail.
Tests#
- +59 new:
test_plugin_install_drops.py(13),test_plugin_shell_hooks.py(14),test_plugin_commands.py(16),test_plugin_agents.py(14),test_plugin_compat_smoke.py(2). - Regression: 603 passing (was 545), 1 skipped, 0 failing.
Empirical result#
Installing superpowers v5.0.7 before this cycle: 14/14 skills + 0/1 hooks + 0/3 commands + 0/1 agents = ~25% surface retention.
After: 14/14 + 1/1 + 3/3 + 1/1 = 100% surface retention, 0 silent drops.
[0.10.2] — 2026-04-20 — Documentation Phase 2#
Commit: 4ad7a9b
Added#
ml_team/tools/{IMPLEMENTATION,LEARNING}_README.mdml_team/backends/{IMPLEMENTATION,LEARNING}_README.mdml_team/config/{IMPLEMENTATION,LEARNING}_README.mdml_team/dashboard/{IMPLEMENTATION,LEARNING}_README.mdml_team/tests/IMPLEMENTATION_README.md(IMPL-only by design — tests are their own doc)
Changed#
.github/workflows/doc-drift.yml— advisory CI guard now covers 7 subsystems (Phase 1 + Phase 2)
Rationale#
Closes the two-layer doc rollout. hello-swarm deliberately stays on the Phase-1 plugin-README shape (plugins aren't subsystems).
[0.10.1] — 2026-04-20 — Documentation Phase 1#
Commit: f65c27c
Added#
MASTER_README.mdat repo root — product + system source of truthml_team/core/{IMPLEMENTATION,LEARNING}_README.mdml_team/api/{IMPLEMENTATION,LEARNING}_README.mdml_team/api/routers/{IMPLEMENTATION,LEARNING}_README.md.github/workflows/doc-drift.yml— advisory CI guard for documented subsystems
Rationale#
New-engineer ramp-up was measured in days. Two-layer model — IMPL (engineering contract) + LEARNING (conceptual) — cuts it to hours while giving regulators a stable doc surface to quote from.
[0.10.0] — 2026-04-20 — Week 7: Compliance + Ops pack#
Commits: 8956263 → 1a1a0c1 (7 commits, +113 tests; 432 → 545 green)
Added#
- W7-1 Unified permission engine (
8956263,72b6c3c) ml_team/core/permissions.py— ALLOW > DENY > ASK > default pipeline with glob tool matching, optional arg regex, priority tiebreak, lazy initml_team/core/permission_sources.py— 5 default sources: RBAC, agent allowlist, feature flag, HITL, YAML policyml_team/core/permission_audit.py— SQLitepermission_denialspersistence +ml_team_permission_decisions_totalmetricml_team/api/routers/permissions.py—GET /api/v1/permissions/denials?since=&tool=&agent=ml_team/config/permission_policies.yaml— operator-authored rules (empty default)- W7-2 Hook lifecycle (
e6e6268) ml_team/core/hooks.py— 5 events: SESSION_START, PRE/POST_TOOL, PRE/POST_COMPACTION- AgentRunner integration; plugin-loader ingestion of
hooks/hooks.json - Reference PII-mask handler in
examples/plugins/hello-swarm/ - W7-3 Cron scheduler (
6d1f17e) — vendored from Hermes ml_team/core/cron.py+cron_tasks.py— 4 task kinds (retrain / drift_check / audit_pdf / custom)- File-backed store at
~/.swarm/cron/jobs.json, 60s daemon tick - REST at
/api/v1/cron/*,/crondashboard page,swarm cronCLI subcommand - W7-4 Batch runner (
aa4e87a) — vendored from Hermes ml_team/core/batch.py+batch_processors.py— JSONL → inference / echo / custom processors- Checkpoints every 10 records, streams
results.jsonl, resume-on-restart - REST at
/api/v1/pipelines/{run_id}/batch
Changed#
ToolExecutor.execute+CompositeToolExecutor.execute+require_role+require_approvalall route through the permission enginefeature_flags.py— addedhooks_enabled,cron_scheduler,batch_runnerapi/database.py::init_db()— addspermission_denialstable
Fixed#
- Cron first-run sentinel flake: interval schedules now fire immediately on boot (previously drifted past the 60s tick)
- Cron output filename collisions under sub-second job runs (microsecond precision in filename)
Docs#
- ADRs for all four W7 items in
.project/decisions.md(3b0b18c) /transparencydashboard refresh with denial panel
[0.9.0] — 2026-04-20 — Context compaction + evaluator separation#
Commits: fc6c188, d65e305
Added#
ml_team/core/context_compaction.py— summarise oldest middle-messages at 80% context window; mechanical fallback on summariser failureml_team/core/evaluator.py— clean-context grade on agent terminal response with 0–5 score + verdict-override
[0.8.0] — 2026-04-20 — Plugin ecosystem + MCP streamable-HTTP + CLI#
Commits: 54007f3, 1c1a42f, 3fa5aa5, ac7c931, 00db188, f3e018c, a6d1bd7
Added#
ml_team/cli.py—swarmCLI: auth, features, pipelines, deploymentsml_team/core/plugin_loader.py(Phase A) — Claude Code plugin install/uninstall/reload with SHA-256 manifest pinning;.mcp.jsoningestion- Plugin skill ingestion (Phase B) —
SKILL.mdparsing, keyword match, system-prompt injection - Streamable-HTTP (SSE) MCP transport — spec 2025-11-25
/pluginsdashboard page + expanded/transparency- Intra-agent parallel tool dispatch (2.9× speedup on multi-tool turns)
Fixed#
- Critical:
ApprovalRequirednow propagates throughToolExecutor(was being swallowed by a broadexcept Exception). HITL gates now fire reliably.
[0.7.0] — 2026-04-20 — Perf + feature flags + retention + transparency#
Commits: b7c82a6 → 8f972bb, plus bd330bc, d5fb344, 335660a, 3da2a4a, 96a0fd1, 54007f3
Added#
ml_team/core/feature_flags.py— central registry with 3 tiers: INVARIANT / FLAG / EXPERIMENT; resolution order runtime → env → alias → defaultml_team/core/retention.py— daemon that prunes conversation JSONL, run_events rows, audit PDFs, shadow predictions past TTL/settingsadmin controls +/transparencyread-only flags + metrics catalogue- Micro-benchmark harness + frozen baseline + nightly bench workflow
Changed#
- Week-1 perf: shared HTTP pool + shared schema cache + Anthropic prompt caching
- Batched SQLite event writes (6.5× speedup) + per-agent JSONL buffers (3.9× speedup)
[0.6.0] — 2026-04-20 — Week 2: Dashboard UI + CI + docs browser#
Commits: 7142f24 → 8152239, fb263eb, 55eeec4, 84cac56, 9e5799f, de3709b, 32dc01a
Added#
- Login page + deployments view + auth context + OIDC helpers in the dashboard
- In-app docs browser + sidebar login/logout footer
- PR workflow + nightly real-LLM golden path
- System-design brief + extending guide wired into in-app browser
Fixed#
- CI blockers — broken build-backend, ruff import-sort cleanup
- Dashboard: missing
highlight.jsdep for docs page
[0.5.0] — 2026-04-20 — Week 1: BFSI compliance MVP#
Commits: 1e699ef → 0e92e03 (security, RBAC, tiers, SSO, audit)
Added#
- Security: per-agent tool allowlists (
1e699ef),tool_denied_totalmetric surfaced at/metrics - RBAC MVP: 3 roles (admin/operator/viewer) + JWT + backward-compat API key (
817b1b8) - Deploy pipeline: real model packaging + Kubernetes manifest generation (
0037d9c) - Tier-1:
train_classifiertool + observability wiring (2e5925a) - Tier-2 — RBI FREE-AI compliance bundle: drift detector + fairness audit + SHAP explainability + model cards (
712d243) - Tier-3 — Champion-challenger MVP: model registry + shadow-traffic log + promotion gate (
307ea1c) - Audit: single-document PDF export for regulatory sign-off (
2a7956e) - SSO: OIDC authorization-code flow for Okta / Azure AD / Google (
9c6f23f)
Housekeeping#
- CrewAI / plain-LangGraph prototypes archived under
deprecated/(cb05248) - Project scaffolding, architecture docs, AI tool configs (
a9a44df)
[0.4.0] — 2026-04-13 — Phase 8: Production readiness#
Commits: 94624bd, 2b090cf, 16e0e94, 410f1c9, 1ab2524, 106c7bb, c44d4b8, 487dc3e, 389d561, 18c1d41, eb2f642, 735d2c7
Added#
- P0 security: all P0 vulnerabilities fixed
- P1 infrastructure: bounded thread pool, WebSocket bridge, rate limiting, structured logging
- P2 reliability: OpenTelemetry tracing, Prometheus metrics, cost budgeting
- P3 observability: error boundaries, guardrails, Makefile, pre-commit, loading states
- Dashboard features: chat, HITL UI, controls, persistence, training logs, 3-dot menu, artifact downloads
- Circuit breaker + 3 specialist agents (LLM, vision, repo_researcher) + dataset explorer + model playground + 8 algorithm repos
- Comprehensive README (architecture, setup, API, dashboard, tools) + service start/stop/status/logs commands
- Production readiness test suite: 115/115 pass
Fixed#
- 6 dashboard bugs: feedback crash, graph status, quality page, polling, error recovery
[0.3.0] — 2026-04-13 — Phases 5–7.5: Hardening + HITL + MCP + StateGraph#
Commits: 532a7bb, ae3781c, 1ca811a, 97a702a
Added#
- Phase 5 — Agent hardening: 37/37 operational rules, span-based observability, evaluation framework
- Phase 6 — HITL + persistence: approval gates, project memory (SQLite), org memory (PostgreSQL)
- Phase 7 — Integration: MCP client, RAG knowledge store, parallel team execution
- Phase 7.5 — Polish: StateGraph execution, conversation transparency, demo presets
[0.2.0] — 2026-04-12–13 — Phases 0–4: Framework-agnostic core + REST + dashboard#
Commits: 5cba253, 04066e7, 75d8672, 1591d5a, 5a95c60
Added#
- Phase 1: framework-agnostic core with native / LangGraph / CrewAI backends
- Phase 2: agent memory, per-agent rules, post-run feedback loop
- Phase 3: REST API (FastAPI), pipeline execution, agent inspection, WebSocket streaming
- Phase 4: algorithm repos + customer dashboard
Fixed#
- Phase 0:
INVALID_CHAT_HISTORYcrash + path-resolution bugs
[0.1.0] — 2026-04-12 — Initial swarm#
Commits: 8207424, 2c48f38
Added#
- Multi-agent swarm with 3-model vLLM Docker deployment
- ML Team Agent baseline: 32 agents, 7 teams, 23 tools
Cross-cutting state (current HEAD 4ad7a9b)#
| Dimension | Count |
|---|---|
| Agents | 40 across 7 teams (algorithm 9, data 6, deployment 5, evaluation 5, management 4, quality 5, training 6) |
| Tools | 38 callable primitives, 33 tool sets |
| Algorithm repos | 18 (tabular, vision, NLP, fine-tuning) |
| REST routers | 18+ (auth, pipelines, deployments, permissions, cron, batch, plugins, features, …) |
| Feature flags | 20+ registered, 3 tiers |
| Tests | 545 passing, 1 skipped (Docker), 0 failing |
| Commits | 73 total (Apr 12 → Apr 20) |
| Documented subsystems | 7 (two-layer READMEs) + MASTER_README + advisory CI |
Next steps for versioning hygiene#
- Bump
ml_team/pyproject.toml+ml_team/dashboard/package.jsonto0.10.2— both are still0.1.0. - Annotate git tags retroactively —
git tag -a v0.5.0 0e92e03 -m "Week 1: BFSI compliance MVP"throughv0.10.2 4ad7a9b. Signed tags if you maintain a signing key. - Adopt semver going forward. Customer-facing API changes bump minor; bugfixes patch. 1.0.0 when the first BFSI pilot signs off.
- Add a PR checklist item — "Did you update CHANGELOG.md under [Unreleased]?" — so this file stops being my job to reconstruct.