Learn › AI Ops › Deploying Meera's chatbot

AI Ops Ch 3 / 9 Intermediate

🚀

Deploying Meera's chatbot

Model serving with FastAPI and Docker — from notebook to real API

⏱ 12 min 4 commands 4 takeaways

🚀

In this chapter

Meera

Now full ML engineer at the Chennai edtech

The story

Meera's dropout prediction model had been running in a Jupyter notebook for three months. Every Monday morning, she'd manually run the notebook, download a CSV, email it to the teachers team. 47 clicks, 20 minutes, every week.

"Can you make this automatic?" Kiran asked. "Like, teachers should see predictions in their dashboard in real-time."

This required turning a notebook into an API. This is where data science ends and ML engineering begins.

The notebook-to-production gap

A Jupyter notebook is great for exploration. It's terrible for production:

- You can't call a notebook from another service

- It runs cells in whatever order you last ran them

- It doesn't handle concurrent requests

- It crashes with no recovery

You need to wrap your model in a web API — a service that accepts requests and returns predictions.

FastAPI — the modern way to serve ML models

```python

app.py

from fastapi import FastAPI

from pydantic import BaseModel

import joblib

import numpy as np

Load the trained model at startup

model = joblib.load("dropout_model.pkl")

app = FastAPI(title="Student Dropout Prediction API")

Define the input shape

class StudentData(BaseModel):

grades: float
attendance: float
engagement_score: float
assignments_completed: int

Define the prediction endpoint

@app.post("/predict")

def predict_dropout(student: StudentData):

features = np.array([[
    student.grades,
    student.attendance,
    student.engagement_score,
    student.assignments_completed
]])

probability = model.predict_proba(features)[0][1]
prediction = "at_risk" if probability > 0.6 else "on_track"

return {
    "student_status": prediction,
    "dropout_probability": round(float(probability), 3),
    "confidence": "high" if probability > 0.8 or probability < 0.2 else "medium"
}

@app.get("/health")

def health():

return {"status": "healthy"}

```

```bash

Run locally

uvicorn app:app --reload

Test it

curl -X POST http://localhost:8000/predict \

-H "Content-Type: application/json" \

-d '{"grades": 62, "attendance": 0.6, "engagement_score": 3.2, "assignments_completed": 8}'

```

Response: `{"student_status": "at_risk", "dropout_probability": 0.74, "confidence": "high"}`

The teachers dashboard calls this API for every student. Real-time, automatic, no Monday morning ritual.

Packaging it with Docker

```dockerfile

FROM python:3.11-slim

WORKDIR /app

COPY requirements.txt .

RUN pip install fastapi uvicorn scikit-learn joblib numpy

COPY dropout_model.pkl .

COPY app.py .

EXPOSE 8000

CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]

```

```bash

docker build -t dropout-predictor .

docker run -p 8000:8000 dropout-predictor

```

Now the model runs in a container. It can be deployed anywhere — a cloud server, Kubernetes, a serverless platform. The 47-click Monday ritual became a 200ms API call that happens automatically.

What about heavier models like LLMs?

For smaller models (scikit-learn, XGBoost), FastAPI + Docker is perfect. For larger models (PyTorch, TensorFlow, LLMs), you'd use dedicated serving frameworks: TorchServe, TF Serving, or Triton Inference Server. But the concept is the same — wrap the model in an API, put it in a container.

Key takeaways

Notebooks are for exploration; production needs a proper API

FastAPI is the fastest way to turn a model into an HTTP endpoint

Always add a /health endpoint — used by load balancers and monitoring

Docker the whole thing so it runs identically in dev and production

Commands from this chapter

$ pip install fastapi uvicorn joblib

Install serving dependencies

$ uvicorn app:app --reload

Run FastAPI with auto-reload

$ joblib.dump(model, "model.pkl")

Save a trained model to disk

$ joblib.load("model.pkl")

Load a saved model