Skip to content

Permissions & audit#

Every agent action, every tool call, every HTTP endpoint, every HITL approval — all resolve through one unified permission engine. Every denial is attributed to its source rule and lands in a SQL-queryable audit table.

This is the single most important concept for regulated-industry buyers. Read this page carefully.

The resolution pipeline#

                 incoming request / tool call
             ┌─────────────────────────────┐
             │   collect rules from 5      │
             │   default sources:          │
             │     · RBAC (user role)      │
             │     · agent allowlist       │
             │     · feature flags         │
             │     · HITL approval gates   │
             │     · YAML policy           │
             └──────────────┬──────────────┘
             ┌─────────────────────────────┐
             │   ALLOW wins over DENY      │
             │   DENY wins over ASK        │
             │   ASK wins over default     │
             │   (higher priority wins     │
             │    within a tier)           │
             └──────────────┬──────────────┘
                 ┌──────────┼──────────┐
                 ▼          ▼          ▼
              ALLOW       DENY        ASK
              (runs)   (blocked,  (HITL gate,
                        audited)   operator
                                   prompt)

Implemented at ml_team/core/permissions.py. reset_sources() clears; mark_uninitialized() resets lazy-init — both needed in test fixtures.

The five rule sources#

Source Example rule
RBAC require_role(Role.ADMIN) translates into deny for role < admin on admin endpoints
Agent allowlist The per-agent tools=[...] list translates into allow for listed tools, deny for all others
Feature flags Flag auto_fairness_audit=off translates into deny on the fairness_audit tool path
HITL gates Tool decorated with require_approval("deploy") translates into ask until operator approves
YAML policy config/permission_policies.yaml — operator-authored rules. Empty default; BFSI customers ship their own

Each source is a pure (context) -> list[PermissionRule] function — swappable, testable, plugin-addable.

Every denial persists#

A new SQLite table, permission_denials, records every ASK-turned-deny and every outright DENY:

CREATE TABLE permission_denials (
    id INTEGER PRIMARY KEY,
    tool_call_id TEXT,
    tool_name TEXT NOT NULL,
    agent_name TEXT,
    arguments_json TEXT,
    rule_source TEXT NOT NULL,   -- 'rbac:admin', 'agent_allowlist', 'hitl:deploy', ...
    reason TEXT,
    user_role TEXT,
    http_method TEXT,
    http_path TEXT,
    timestamp REAL NOT NULL
);

Queryable via:

# All denials in the last 24h
curl http://localhost:8000/api/v1/permissions/denials?since=86400

# All denials attributed to a specific agent
curl http://localhost:8000/api/v1/permissions/denials?agent=data_cleaner

# Denials that resolved because of a YAML policy rule
curl http://localhost:8000/api/v1/permissions/denials?rule_source=policy

The BFSI-auditor question#

The one that sold this design:

"What did agent X try to do that was blocked, and why?"

Before the unified engine (W7-1), answering required cross-referencing four systems: RBAC role guards, per-agent tool allowlists, ApprovalRequired exceptions, and ~22 feature flags. Fragile. Not defensible.

After: one SQL query, one table, one rule-source attribution per row. The shape regulators already understand.

The invariant#

The audit_trail_security_events flag is an INVARIANT (cannot be disabled at runtime). Denials are always persisted, regardless of any other observability configuration. This is the contract to BFSI / healthcare / EU AI Act auditors.

HITL approval gates#

Some tools are sensitive enough to warrant an explicit human approval — deploy_serving, promote_challenger, export_raw_data, etc. Mark them with require_approval(gate_type):

@tool(schema=...)
@require_approval("deploy")
def deploy_serving(model_id: str, env: str) -> str:
    ...

The first call raises ApprovalRequired. The agent runtime serializes the pending call to the ApprovalStore. An operator approves via REST or dashboard. On the next agent turn, the call re-runs and succeeds.

Feature flag registry#

The system has three tiers:

  • Invariant — cannot be disabled (e.g. audit_trail_security_events)
  • Flag — production-facing, stable (e.g. cron_scheduler, batch_runner)
  • Experiment — opt-in, can change or disappear (e.g. plugin_shell_hooks_enabled, hooks_enabled, evaluator_grading)

Resolution order: runtime override → env var → alias → declared default. Check at runtime:

from ml_team.core.feature_flags import is_enabled
if is_enabled("hooks_enabled"):
    ...

Full list + descriptions: swarm features list or the /transparency page.

Next#