Train your first classifier#
Goal: train an iris-species classifier using the fast_prototype pipeline, inspect the model card, and understand each agent's contribution. Time: ~5 minutes after quickstart.
Prerequisites#
- Quickstart complete (stack running on localhost)
- You've seen
iris.csvin the dataset list ANTHROPIC_API_KEYorOPENAI_API_KEYset in.env
1. Kick off the run#
curl -X POST http://localhost:8000/api/v1/pipelines \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"problem_statement": "Classify iris flowers into species using sepal and petal measurements",
"dataset_path": "iris.csv",
"template": "fast_prototype",
"name": "my-first-run"
}'
Pipelines → New → fill in the form → Run.
You get back a run_id like 7f8e9a2b.
2. Watch it run#
You'll see:
ml_directorreads the problem, plans the pipelinedata_profilerloadsiris.csv, reports column types, cardinalityalgorithm_selectorreads the profile, picks logistic regression (simple, appropriate for 3-class 4-feature)trainerfits the model on an 80/20 splitmodel_evaluatorreports accuracy, per-class precision/recall, confusion matrix
Typical runtime: 90-180 seconds.
3. Inspect the outputs#
conversations/
├── ml_director.jsonl
├── data_profiler.jsonl
├── algorithm_selector.jsonl
├── trainer.jsonl
└── model_evaluator.jsonl
models/
└── model.joblib
reports/
├── model_card.md
├── evaluation.json
└── confusion_matrix.png
run_events.jsonl
The model card#
Reads like:
# Iris species classifier (v0)
## Model
- Algorithm: Logistic Regression (scikit-learn 1.8)
- Parameters: multi_class=multinomial, solver=lbfgs, C=1.0
- Trained: 2026-04-15 14:22 IST
- Training size: 120 samples; test size: 30 samples
## Performance
- Accuracy: 0.9667 ± 0.0152
- Macro F1: 0.9674
- Confusion matrix: reports/confusion_matrix.png
## Data
- Source: iris.csv (seeded)
- Features: sepal_length, sepal_width, petal_length, petal_width
- Target: species (3 classes)
- Missing values: 0
- Class balance: 50/50/50
## Limitations
- 150 samples — very small. Production use requires more data.
- No fairness audit attached (protected attribute not relevant for botany).
## Version
- Pipeline: fast_prototype
- swarm: 0.11.0
- Run ID: 7f8e9a2b
Per-agent conversations#
Each agent's JSONL journal has a full record of every LLM call, tool call, and intermediate reasoning step. Useful for:
- Debugging ("why did algorithm_selector pick logistic regression?")
- Auditing ("show me every tool call made during this run")
- Feedback / fine-tuning data later
4. Try a different template#
swarm pipelines run \
--problem "Classify iris flowers into species" \
--dataset iris.csv \
--template default_ml_pipeline # 22 agents instead of 5
You'll see:
- hyperparam_tuner runs a grid search over multiple algorithms
- model_comparator picks the best
- fairness_auditor attaches a (trivially-passing) fairness report
- documentation_agent expands the model card
- reproducibility_agent pins seeds + versions
Takes longer (~15-30 min) but produces a production-quality run.
5. Next#
- Deploy a model — take this trained model to shadow traffic and promotion
- Generate an audit PDF — the regulator-format artefact
- Write a custom agent — add your own specialist
- End-to-end fraud classifier tutorial — a realistic BFSI flow
Troubleshooting#
Pipeline stalls at algorithm_selector
Check logs: docker compose logs api | grep algorithm_selector. Usually means the LLM call timed out (API key missing / rate limited). Verify .env and retry.
Training completes but model accuracy is 0.33
Something's wrong — that's random-guess for 3 classes. Inspect data_profiler.jsonl — likely the target column wasn't identified. Rerun with an explicit target: --target species.