Learn › AI Ops › Building the full ML pipeline

AI Ops Ch 5 / 9 Advanced

🔄

Building the full ML pipeline

Kubeflow, Airflow, and the end-to-end MLOps system that runs itself

⏱ 15 min 4 commands 4 takeaways

🔄

In this chapter

Ravi & Meera

Building the full stack together

The story

Ravi was doing experiment tracking. Meera was doing model serving. Kavya was doing monitoring. Each person had their own scripts, their own tools, their own way of doing things.

When the company decided to launch five new ML models in Q3, the chaos became visible. Nobody could reproduce each other's work. Deploying a new model took two weeks of coordination.

"We need a proper MLOps platform," the head of engineering said.

What is an ML pipeline?

A pipeline connects all the steps of the ML lifecycle into an automated, reproducible flow:

```

Data ingestion → Data validation → Feature engineering →

Model training → Model evaluation → Model registration →

Model deployment → Monitoring → [Retrain if drift] → loop

```

Each step is a node. Data flows between them. The whole thing can be triggered automatically — on a schedule, on new data arrival, or on drift detection.

Apache Airflow — the scheduler

Airflow is a workflow orchestrator. You write pipelines as Python code called DAGs (Directed Acyclic Graphs). Airflow schedules and monitors them.

```python

from airflow import DAG

from airflow.operators.python import PythonOperator

from datetime import datetime, timedelta

def ingest_data():

# Pull new data from database
pass

def train_model():

# Run training with MLflow tracking
pass

def evaluate_and_promote():

# Compare new model vs production
# Promote if better
pass

with DAG(

'student_dropout_weekly_retrain',
schedule_interval='@weekly',
start_date=datetime(2025, 1, 1),

) as dag:

ingest = PythonOperator(task_id='ingest_data', python_callable=ingest_data)
train = PythonOperator(task_id='train_model', python_callable=train_model)
evaluate = PythonOperator(task_id='evaluate', python_callable=evaluate_and_promote)

# Define the order
ingest >> train >> evaluate

```

Every Sunday at midnight, this DAG runs automatically: ingests new data, retrains, evaluates, promotes if better.

Kubeflow — ML pipelines on Kubernetes

For larger scale, Kubeflow runs ML pipelines as Kubernetes jobs. Each step runs in its own container, with its own resources, tracked automatically.

```python

import kfp

from kfp import dsl

@dsl.component(base_image='python:3.11')

def train_step(data_path: str) -> str:

# Training logic here
return "model saved at gs://bucket/model.pkl"

@dsl.component(base_image='python:3.11')

def deploy_step(model_path: str):

# Deployment logic here
pass

@dsl.pipeline(name='loan-approval-pipeline')

def loan_pipeline(data_path: str):

train_task = train_step(data_path=data_path)
deploy_step(model_path=train_task.output)

Compile and submit

kfp.compiler.Compiler().compile(loan_pipeline, 'pipeline.yaml')

```

The full stack Ravi and Meera built

```

Weekly trigger (Airflow)

↓

Data ingestion from PostgreSQL

↓

Data validation (Great Expectations — checks data quality)

↓

Feature engineering (same code used in training + serving)

↓

Model training (logged to MLflow)

↓

Evaluation: new model vs current production model

↓

If new model wins → push to model registry (MLflow)

↓

Trigger deployment pipeline (GitHub Actions)

↓

Build new Docker image with model baked in

↓

Deploy to Kubernetes (blue-green deployment)

↓

Run smoke tests against new deployment

↓

If smoke tests pass → switch traffic to new model

↓

Monitor with Evidently → alert if drift

↓

Loop back to weekly trigger

```

This whole system runs itself. The team reviews MLflow dashboards once a week. Deployments happen automatically. Models stay fresh.

Five new models in Q3 were deployed in two days each instead of two weeks.

Key takeaways

ML pipelines automate the full lifecycle: ingest → train → evaluate → deploy → monitor

Airflow schedules and orchestrates pipeline steps with dependency management

Kubeflow runs ML pipelines as Kubernetes jobs for larger scale

The goal is automation: models should retrain and redeploy with minimal human intervention

Commands from this chapter

$ pip install apache-airflow

Install Airflow

$ airflow standalone

Start Airflow dev server

$ airflow dags list

List all DAGs

$ airflow dags trigger dag_id

Manually trigger a DAG