Learn 🧠 All Concepts (20) 🤖 What is an LLM? 📚 RAG Explained ⚡ AI Agents 💻 Run AI Locally 🇮🇳 AI in India 📖 Learn Tracks 🔧 DevOps Track ⚙️ AI Ops Track 🗺️ AI Engineer Roadmap
Tools 🔧 AI Tools Directory 🔓 Open Source AI ⭐ Top GitHub Repos ✦ Claude Skill Repos 🚀 Ready-to-Deploy Projects
Build 🏗️ Build Hub 🎯 Master Prompts 🧩 RAG Agents 🚀 App Megaprompts
Workflows ⚡ All Workflows (22) 🎥 Text to Video 🎞️ Image to Video 🔊 Text to Speech ♻️ Automation
Resources 🧪 Colab Notebooks ⚙️ n8n Workflows 📈 Algo Trading 💰 Passive Income
🗂️ Browse All Topics About AItheGuru
Learn AI Ops Deploying Meera's chatbot
AI Ops Ch 3 / 9 Intermediate
🚀

Deploying Meera's chatbot

Model serving with FastAPI and Docker — from notebook to real API

⏱ 12 min 4 commands 4 takeaways
🚀
In this chapter
Meera
Now full ML engineer at the Chennai edtech
The story

Meera's dropout prediction model had been running in a Jupyter notebook for three months. Every Monday morning, she'd manually run the notebook, download a CSV, email it to the teachers team. 47 clicks, 20 minutes, every week.

"Can you make this automatic?" Kiran asked. "Like, teachers should see predictions in their dashboard in real-time."

This required turning a notebook into an API. This is where data science ends and ML engineering begins.

The notebook-to-production gap

A Jupyter notebook is great for exploration. It's terrible for production:

- You can't call a notebook from another service

- It runs cells in whatever order you last ran them

- It doesn't handle concurrent requests

- It crashes with no recovery

You need to wrap your model in a web API — a service that accepts requests and returns predictions.

FastAPI — the modern way to serve ML models

```python

app.py

from fastapi import FastAPI

from pydantic import BaseModel

import joblib

import numpy as np

Load the trained model at startup

model = joblib.load("dropout_model.pkl")

app = FastAPI(title="Student Dropout Prediction API")

Define the input shape

class StudentData(BaseModel):

grades: float
attendance: float
engagement_score: float
assignments_completed: int

Define the prediction endpoint

@app.post("/predict")

def predict_dropout(student: StudentData):

features = np.array([[
    student.grades,
    student.attendance,
    student.engagement_score,
    student.assignments_completed
]])
probability = model.predict_proba(features)[0][1]
prediction = "at_risk" if probability > 0.6 else "on_track"
return {
    "student_status": prediction,
    "dropout_probability": round(float(probability), 3),
    "confidence": "high" if probability > 0.8 or probability < 0.2 else "medium"
}

@app.get("/health")

def health():

return {"status": "healthy"}

```

```bash

Run locally

uvicorn app:app --reload

Test it

curl -X POST http://localhost:8000/predict \

-H "Content-Type: application/json" \

-d '{"grades": 62, "attendance": 0.6, "engagement_score": 3.2, "assignments_completed": 8}'

```

Response: `{"student_status": "at_risk", "dropout_probability": 0.74, "confidence": "high"}`

The teachers dashboard calls this API for every student. Real-time, automatic, no Monday morning ritual.

Packaging it with Docker

```dockerfile

FROM python:3.11-slim

WORKDIR /app

COPY requirements.txt .

RUN pip install fastapi uvicorn scikit-learn joblib numpy

COPY dropout_model.pkl .

COPY app.py .

EXPOSE 8000

CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]

```

```bash

docker build -t dropout-predictor .

docker run -p 8000:8000 dropout-predictor

```

Now the model runs in a container. It can be deployed anywhere — a cloud server, Kubernetes, a serverless platform. The 47-click Monday ritual became a 200ms API call that happens automatically.

What about heavier models like LLMs?

For smaller models (scikit-learn, XGBoost), FastAPI + Docker is perfect. For larger models (PyTorch, TensorFlow, LLMs), you'd use dedicated serving frameworks: TorchServe, TF Serving, or Triton Inference Server. But the concept is the same — wrap the model in an API, put it in a container.

Key takeaways

Notebooks are for exploration; production needs a proper API

FastAPI is the fastest way to turn a model into an HTTP endpoint

Always add a /health endpoint — used by load balancers and monitoring

Docker the whole thing so it runs identically in dev and production

Commands from this chapter
$ pip install fastapi uvicorn joblib
Install serving dependencies
$ uvicorn app:app --reload
Run FastAPI with auto-reload
$ joblib.dump(model, "model.pkl")
Save a trained model to disk
$ joblib.load("model.pkl")
Load a saved model