Learn 🧠 All Concepts (20) 🤖 What is an LLM? 📚 RAG Explained ⚡ AI Agents 💻 Run AI Locally 🇮🇳 AI in India 📖 Learn Tracks 🔧 DevOps Track ⚙️ AI Ops Track 🗺️ AI Engineer Roadmap
Tools 🔧 AI Tools Directory 🔓 Open Source AI ⭐ Top GitHub Repos ✦ Claude Skill Repos 🚀 Ready-to-Deploy Projects
Build 🏗️ Build Hub 🎯 Master Prompts 🧩 RAG Agents 🚀 App Megaprompts
Workflows ⚡ All Workflows (22) 🎥 Text to Video 🎞️ Image to Video 🔊 Text to Speech ♻️ Automation
Resources 🧪 Colab Notebooks ⚙️ n8n Workflows 📈 Algo Trading 💰 Passive Income
🗂️ Browse All Topics About AItheGuru
Learn AI Ops Building the full ML pipeline
AI Ops Ch 5 / 9 Advanced
🔄

Building the full ML pipeline

Kubeflow, Airflow, and the end-to-end MLOps system that runs itself

⏱ 15 min 4 commands 4 takeaways
🔄
In this chapter
Ravi & Meera
Building the full stack together
The story

Ravi was doing experiment tracking. Meera was doing model serving. Kavya was doing monitoring. Each person had their own scripts, their own tools, their own way of doing things.

When the company decided to launch five new ML models in Q3, the chaos became visible. Nobody could reproduce each other's work. Deploying a new model took two weeks of coordination.

"We need a proper MLOps platform," the head of engineering said.

What is an ML pipeline?

A pipeline connects all the steps of the ML lifecycle into an automated, reproducible flow:

```

Data ingestion → Data validation → Feature engineering →

Model training → Model evaluation → Model registration →

Model deployment → Monitoring → [Retrain if drift] → loop

```

Each step is a node. Data flows between them. The whole thing can be triggered automatically — on a schedule, on new data arrival, or on drift detection.

Apache Airflow — the scheduler

Airflow is a workflow orchestrator. You write pipelines as Python code called DAGs (Directed Acyclic Graphs). Airflow schedules and monitors them.

```python

from airflow import DAG

from airflow.operators.python import PythonOperator

from datetime import datetime, timedelta

def ingest_data():

# Pull new data from database
pass

def train_model():

# Run training with MLflow tracking
pass

def evaluate_and_promote():

# Compare new model vs production
# Promote if better
pass

with DAG(

'student_dropout_weekly_retrain',
schedule_interval='@weekly',
start_date=datetime(2025, 1, 1),

) as dag:

ingest = PythonOperator(task_id='ingest_data', python_callable=ingest_data)
train = PythonOperator(task_id='train_model', python_callable=train_model)
evaluate = PythonOperator(task_id='evaluate', python_callable=evaluate_and_promote)
# Define the order
ingest >> train >> evaluate

```

Every Sunday at midnight, this DAG runs automatically: ingests new data, retrains, evaluates, promotes if better.

Kubeflow — ML pipelines on Kubernetes

For larger scale, Kubeflow runs ML pipelines as Kubernetes jobs. Each step runs in its own container, with its own resources, tracked automatically.

```python

import kfp

from kfp import dsl

@dsl.component(base_image='python:3.11')

def train_step(data_path: str) -> str:

# Training logic here
return "model saved at gs://bucket/model.pkl"

@dsl.component(base_image='python:3.11')

def deploy_step(model_path: str):

# Deployment logic here
pass

@dsl.pipeline(name='loan-approval-pipeline')

def loan_pipeline(data_path: str):

train_task = train_step(data_path=data_path)
deploy_step(model_path=train_task.output)

Compile and submit

kfp.compiler.Compiler().compile(loan_pipeline, 'pipeline.yaml')

```

The full stack Ravi and Meera built

```

Weekly trigger (Airflow)

Data ingestion from PostgreSQL

Data validation (Great Expectations — checks data quality)

Feature engineering (same code used in training + serving)

Model training (logged to MLflow)

Evaluation: new model vs current production model

If new model wins → push to model registry (MLflow)

Trigger deployment pipeline (GitHub Actions)

Build new Docker image with model baked in

Deploy to Kubernetes (blue-green deployment)

Run smoke tests against new deployment

If smoke tests pass → switch traffic to new model

Monitor with Evidently → alert if drift

Loop back to weekly trigger

```

This whole system runs itself. The team reviews MLflow dashboards once a week. Deployments happen automatically. Models stay fresh.

Five new models in Q3 were deployed in two days each instead of two weeks.

Key takeaways

ML pipelines automate the full lifecycle: ingest → train → evaluate → deploy → monitor

Airflow schedules and orchestrates pipeline steps with dependency management

Kubeflow runs ML pipelines as Kubernetes jobs for larger scale

The goal is automation: models should retrain and redeploy with minimal human intervention

Commands from this chapter
$ pip install apache-airflow
Install Airflow
$ airflow standalone
Start Airflow dev server
$ airflow dags list
List all DAGs
$ airflow dags trigger dag_id
Manually trigger a DAG