Learn › AI Ops › Ravi loses his best model

AI Ops Ch 2 / 9 Beginner

🔬

Ravi loses his best model

Experiment tracking, MLflow, and DVC — keeping track of what actually worked

⏱ 10 min 4 commands 4 takeaways

🔬

In this chapter

Ravi

ML engineer at a Delhi fintech

The story

Ravi had been training models for three weeks. He'd tried 47 different combinations — different algorithms, different parameters, different data preprocessing steps. One of them had achieved 91% accuracy. It was perfect.

He couldn't remember which one.

He looked at his folder: model_v1.pkl, model_v2.pkl, model_GOOD.pkl, model_FINAL.pkl, model_FINAL2.pkl, model_actually_final.pkl. No notes. No logs. Just files with meaningless names.

He had to start over. Three weeks of work, gone because of no tracking.

This is the most common ML disaster. MLflow was built to prevent it.

What is experiment tracking?

Every time you train a model, you make choices:

- Which algorithm? (Random Forest, XGBoost, Neural Network)

- Which parameters? (100 trees or 500? Learning rate 0.01 or 0.001?)

- Which data? (All features or a subset? Scaled or not?)

Experiment tracking records all of these choices alongside the results. So you can always look back and say: "Experiment #31, XGBoost with 200 trees, learning rate 0.05, trained on scaled data — 91% accuracy."

MLflow — the industry standard

```python

import mlflow

import mlflow.sklearn

from sklearn.ensemble import RandomForestClassifier

Start an experiment run

with mlflow.start_run(run_name="random-forest-v3"):

# Log parameters (your choices)
n_estimators = 200
max_depth = 10
mlflow.log_param("n_estimators", n_estimators)
mlflow.log_param("max_depth", max_depth)

# Train the model
model = RandomForestClassifier(
    n_estimators=n_estimators, 
    max_depth=max_depth
)
model.fit(X_train, y_train)

# Log metrics (your results)
accuracy = model.score(X_test, y_test)
mlflow.log_metric("accuracy", accuracy)
mlflow.log_metric("f1_score", f1)

# Save the model itself
mlflow.sklearn.log_model(model, "model")

```

```bash

Start the MLflow UI

mlflow ui

Open http://localhost:5000 — visual dashboard of all experiments

```

The MLflow dashboard shows every experiment — parameters, metrics, the model file — in a searchable table. Finding the best experiment takes 10 seconds.

DVC — versioning your data like code

Code changes over time. Data changes too — you add more rows, fix errors, add new features. DVC (Data Version Control) versions your datasets the same way Git versions your code.

```bash

Track a dataset

dvc add data/students.csv

Push data to remote storage (S3, GCS)

dvc push

Pull a specific version of data

git checkout v1.0

dvc pull

```

Now you can recreate any experiment exactly — same code (via Git) + same data (via DVC) = reproducible results.

What Ravi does now

Every experiment goes into MLflow. Every dataset version goes into DVC. He names his runs descriptively. He writes a one-line note in the run description.

His folder now has one file: `best_model.pkl` — downloaded from the MLflow run with the highest F1 score, found in 30 seconds.

He hasn't lost a model since.

Key takeaways

Experiment tracking logs parameters, metrics, and model files for every training run

MLflow provides a UI to compare experiments and find your best model

DVC versions datasets like Git versions code — enabling reproducibility

Always log run name, parameters, metrics, and a description note

Commands from this chapter

$ pip install mlflow dvc

Install MLflow and DVC

$ mlflow ui

Start MLflow tracking UI at localhost:5000

$ dvc init

Initialize DVC in your project

$ dvc add data/dataset.csv

Start tracking a data file