Meera trains her first model
What is machine learning really — and what happens after training?
Meera joined an edtech startup in Chennai after her MSc in statistics. On her first day, her manager Kiran handed her a CSV file with 10,000 student records — grades, attendance, engagement scores — and said: "Build a model that predicts which students will drop out."
Meera opened the laptop. She'd studied ML theory for two years. She'd never actually shipped anything.
This is where most ML education ends and real work begins.
What is machine learning, really?
A normal program follows explicit rules you write. "If score < 40, flag as at-risk." You define the logic.
Machine learning flips this. You give the program examples — students who dropped out, students who didn't — and it figures out the patterns itself. The rules emerge from the data.
The output is called a model — a mathematical function that takes inputs (grades, attendance) and produces a prediction (drop out: yes/no, probability: 73%).
The three phases Meera learned
Phase 1: Training — Show the model thousands of examples. It adjusts its internal parameters to get better at predicting. This is computationally expensive. Happens once (or periodically).
Phase 2: Evaluation — Test the model on examples it hasn't seen. Does it actually predict correctly? This tells you if it learned real patterns or just memorised the training data.
Phase 3: Inference — Use the trained model to make predictions on new data. A student logs in → model predicts dropout probability → teacher gets an alert. This happens millions of times, must be fast and cheap.
The gap between Phase 1 and Phase 3 is where most ML projects die.
Meera's first model
```python
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
Load data
df = pd.read_csv('students.csv')
Features (inputs) and target (what we're predicting)
X = df[['grades', 'attendance', 'engagement_score', 'assignments_completed']]
y = df['dropped_out'] # 0 = stayed, 1 = dropped out
Split: 80% for training, 20% for testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
Train the model
model = RandomForestClassifier(n_estimators=100)
model.fit(X_train, y_train)
Evaluate
predictions = model.predict(X_test)
print(classification_report(y_test, predictions))
```
Output: 84% accuracy. Meera was excited. Kiran was cautious.
"Accuracy is not enough," Kiran said. "What's the false negative rate? If we miss a student who drops out, that's worse than a false alarm. Show me precision and recall."
Meera learned that metrics matter as much as the model.
Training vs Inference — the forgotten distinction
Training: runs on a powerful GPU, takes hours or days, happens infrequently.
Inference: runs on a CPU or small GPU, must return in milliseconds, happens constantly.
A model that trains in 4 hours on a ₹10,000/month GPU server needs to run inference in 50ms on a ₹500/month server. These are completely different engineering problems.
This is what AI Ops is about — the engineering that bridges them.
ML learns patterns from examples instead of following explicit rules
Three phases: Training (learn), Evaluation (test), Inference (use)
Accuracy alone is not enough — understand precision, recall, F1 for your use case
Training and inference have completely different infrastructure requirements