Ravi loses his best model
Experiment tracking, MLflow, and DVC — keeping track of what actually worked
Ravi had been training models for three weeks. He'd tried 47 different combinations — different algorithms, different parameters, different data preprocessing steps. One of them had achieved 91% accuracy. It was perfect.
He couldn't remember which one.
He looked at his folder: model_v1.pkl, model_v2.pkl, model_GOOD.pkl, model_FINAL.pkl, model_FINAL2.pkl, model_actually_final.pkl. No notes. No logs. Just files with meaningless names.
He had to start over. Three weeks of work, gone because of no tracking.
This is the most common ML disaster. MLflow was built to prevent it.
What is experiment tracking?
Every time you train a model, you make choices:
- Which algorithm? (Random Forest, XGBoost, Neural Network)
- Which parameters? (100 trees or 500? Learning rate 0.01 or 0.001?)
- Which data? (All features or a subset? Scaled or not?)
Experiment tracking records all of these choices alongside the results. So you can always look back and say: "Experiment #31, XGBoost with 200 trees, learning rate 0.05, trained on scaled data — 91% accuracy."
MLflow — the industry standard
```python
import mlflow
import mlflow.sklearn
from sklearn.ensemble import RandomForestClassifier
Start an experiment run
with mlflow.start_run(run_name="random-forest-v3"):
# Log parameters (your choices)
n_estimators = 200
max_depth = 10
mlflow.log_param("n_estimators", n_estimators)
mlflow.log_param("max_depth", max_depth)# Train the model
model = RandomForestClassifier(
n_estimators=n_estimators,
max_depth=max_depth
)
model.fit(X_train, y_train)# Log metrics (your results)
accuracy = model.score(X_test, y_test)
mlflow.log_metric("accuracy", accuracy)
mlflow.log_metric("f1_score", f1)# Save the model itself
mlflow.sklearn.log_model(model, "model")```
```bash
Start the MLflow UI
mlflow ui
Open http://localhost:5000 — visual dashboard of all experiments
```
The MLflow dashboard shows every experiment — parameters, metrics, the model file — in a searchable table. Finding the best experiment takes 10 seconds.
DVC — versioning your data like code
Code changes over time. Data changes too — you add more rows, fix errors, add new features. DVC (Data Version Control) versions your datasets the same way Git versions your code.
```bash
Track a dataset
dvc add data/students.csv
Push data to remote storage (S3, GCS)
dvc push
Pull a specific version of data
git checkout v1.0
dvc pull
```
Now you can recreate any experiment exactly — same code (via Git) + same data (via DVC) = reproducible results.
What Ravi does now
Every experiment goes into MLflow. Every dataset version goes into DVC. He names his runs descriptively. He writes a one-line note in the run description.
His folder now has one file: `best_model.pkl` — downloaded from the MLflow run with the highest F1 score, found in 30 seconds.
He hasn't lost a model since.
Experiment tracking logs parameters, metrics, and model files for every training run
MLflow provides a UI to compare experiments and find your best model
DVC versions datasets like Git versions code — enabling reproducibility
Always log run name, parameters, metrics, and a description note