Learn 🧠 All Concepts (20) 🤖 What is an LLM? 📚 RAG Explained ⚡ AI Agents 💻 Run AI Locally 🇮🇳 AI in India 📖 Learn Tracks 🔧 DevOps Track ⚙️ AI Ops Track 🗺️ AI Engineer Roadmap
Tools 🔧 AI Tools Directory 🔓 Open Source AI ⭐ Top GitHub Repos ✦ Claude Skill Repos 🚀 Ready-to-Deploy Projects
Build 🏗️ Build Hub 🎯 Master Prompts 🧩 RAG Agents 🚀 App Megaprompts
Workflows ⚡ All Workflows (22) 🎥 Text to Video 🎞️ Image to Video 🔊 Text to Speech ♻️ Automation
Resources 🧪 Colab Notebooks ⚙️ n8n Workflows 📈 Algo Trading 💰 Passive Income
🗂️ Browse All Topics About AItheGuru
Learn MySQL Priya masters advanced query optimisation
MySQL Ch 13 / 18 Advanced
🏎️

Priya masters advanced query optimisation

Covering indexes, query rewriting, partitioning — the craft of making queries fast

⏱ 15 min 6 commands 5 takeaways
🏎️
In this chapter
Priya
DBA, optimising a 50M-row database
The story

Priya got a new database with 50 million rows. Queries that worked fine at 5 million were slow again. Simple indexes were not enough. She needed advanced query optimisation.

This chapter is the craft of making queries fast.

QUERY OPTIMISATION THINKING

Before optimising, understand what the query is doing:

EXPLAIN SELECT o.id, u.name, o.amount
FROM orders o
JOIN users u ON o.user_id = u.id
WHERE o.status = 'paid'
AND u.city = 'Mumbai'
ORDER BY o.ordered_at DESC
LIMIT 10;

Read EXPLAIN output from right to left (inner to outer):

1. Which table is accessed first (usually the smaller or more selective one)?

2. Is an index being used for each table access?

3. Are there filesort or temporary table warnings?

Key things to look for:

Extra: Using filesort         -- ORDER BY could not use an index (slow)
Extra: Using temporary        -- GROUP BY or DISTINCT caused a temp table (slow)
Extra: Using index            -- covers query from index alone (fast!)
rows: 4500000                 -- if rows is huge, the index is not selective enough

COVERING INDEXES -- THE FASTEST QUERIES

A covering index includes ALL columns the query needs. MySQL never touches the actual table rows.

-- This query needs: status, ordered_at, id, user_id, amount
SELECT o.id, o.user_id, o.amount
FROM orders o
WHERE o.status = 'paid'
ORDER BY o.ordered_at DESC
LIMIT 10;
-- A covering index for this query:
CREATE INDEX idx_orders_cover ON orders(status, ordered_at, id, user_id, amount);
EXPLAIN SELECT o.id, o.user_id, o.amount ...;
-- Extra: Using index      <-- MySQL never reads the table rows, only the index

QUERY REWRITING PATTERNS

Pattern 1: Avoid functions on indexed columns

-- SLOW: function on column prevents index use
SELECT * FROM users WHERE YEAR(created_at) = 2026;
-- FAST: range query uses index
SELECT * FROM users WHERE created_at >= '2026-01-01' AND created_at < '2027-01-01';

Pattern 2: Avoid SELECT * -- only fetch what you need

-- Fetches all 20 columns, transfers unnecessary data
SELECT * FROM orders WHERE status = 'paid';
-- Fetches only what the screen shows
SELECT id, user_id, amount, ordered_at FROM orders WHERE status = 'paid';

Pattern 3: Use EXISTS instead of IN for subqueries

-- Can be slow with large subqueries:
SELECT * FROM users WHERE id IN (SELECT user_id FROM orders WHERE status = 'paid');
-- Faster: stops at first match
SELECT * FROM users u WHERE EXISTS (
    SELECT 1 FROM orders o WHERE o.user_id = u.id AND o.status = 'paid'
);

Pattern 4: LIMIT early with subqueries

-- SLOW: joins all rows then limits
SELECT u.name, o.amount FROM users u JOIN orders o ON u.id = o.user_id ORDER BY o.amount DESC LIMIT 5;
-- FASTER: limit in a subquery first then join
SELECT u.name, top_orders.amount
FROM (
    SELECT user_id, amount FROM orders ORDER BY amount DESC LIMIT 5
) top_orders
JOIN users u ON top_orders.user_id = u.id;

PARTITIONING -- SPLITTING HUGE TABLES

When a table grows to hundreds of millions of rows, even indexes slow down. Partitioning splits the table into physical sections.

-- Partition orders by year:
CREATE TABLE orders_partitioned (
    id         INT AUTO_INCREMENT,
    user_id    INT,
    amount     DECIMAL(10,2),
    status     VARCHAR(20),
    ordered_at TIMESTAMP,
    PRIMARY KEY (id, ordered_at)   -- partition key must be in primary key
)
PARTITION BY RANGE (YEAR(ordered_at)) (
    PARTITION p2024 VALUES LESS THAN (2025),
    PARTITION p2025 VALUES LESS THAN (2026),
    PARTITION p2026 VALUES LESS THAN (2027),
    PARTITION p_future VALUES LESS THAN MAXVALUE
);
-- Query automatically scans only the relevant partition:
SELECT * FROM orders_partitioned WHERE ordered_at >= '2026-01-01';
-- MySQL scans only p2026, ignoring the other 3 partitions
-- See partition stats:
SELECT partition_name, table_rows
FROM information_schema.PARTITIONS
WHERE table_name = 'orders_partitioned';

Priya reduced query time from 4 seconds to 180ms using covering indexes. The principle: give MySQL exactly what it needs to answer your query without touching the table rows themselves.

Key takeaways

A covering index includes all columns the query needs — MySQL answers from the index alone without reading table rows at all

Functions on indexed columns (YEAR(created_at) = 2026) prevent index use — rewrite as a range query instead

Extra: Using filesort in EXPLAIN means ORDER BY cannot use an index — design the index to match the ORDER BY columns

EXISTS stops at the first match, IN evaluates the full subquery — EXISTS is faster for large subqueries

Partitioning splits huge tables physically — queries that filter by partition key scan only the relevant partition

Commands from this chapter
$ EXPLAIN SELECT id, user_id, amount FROM orders WHERE status = 'paid' ORDER BY ordered_at DESC LIMIT 10;
Check if ORDER BY uses an index or causes filesort
$ CREATE INDEX idx_cover ON orders(status, ordered_at, id, user_id, amount);
Covering index for the query above
$ SELECT * FROM users WHERE created_at >= '2026-01-01' AND created_at < '2027-01-01';
Range query instead of YEAR() function to enable index use
$ SELECT * FROM information_schema.PARTITIONS WHERE table_name = 'orders_partitioned';
See partition row counts
$ OPTIMIZE TABLE orders;
Reclaim fragmented space after many deletes
$ SELECT DIGEST_TEXT, COUNT_STAR, ROUND(AVG_TIMER_WAIT/1e12,3) AS avg_sec FROM performance_schema.events_statements_summary_by_digest ORDER BY SUM_TIMER_WAIT DESC LIMIT 10;
Find slowest queries historically