Priya masters advanced query optimisation
Covering indexes, query rewriting, partitioning — the craft of making queries fast
Priya got a new database with 50 million rows. Queries that worked fine at 5 million were slow again. Simple indexes were not enough. She needed advanced query optimisation.
This chapter is the craft of making queries fast.
QUERY OPTIMISATION THINKING
Before optimising, understand what the query is doing:
EXPLAIN SELECT o.id, u.name, o.amount
FROM orders o
JOIN users u ON o.user_id = u.id
WHERE o.status = 'paid'
AND u.city = 'Mumbai'
ORDER BY o.ordered_at DESC
LIMIT 10;Read EXPLAIN output from right to left (inner to outer):
1. Which table is accessed first (usually the smaller or more selective one)?
2. Is an index being used for each table access?
3. Are there filesort or temporary table warnings?
Key things to look for:
Extra: Using filesort -- ORDER BY could not use an index (slow)
Extra: Using temporary -- GROUP BY or DISTINCT caused a temp table (slow)
Extra: Using index -- covers query from index alone (fast!)
rows: 4500000 -- if rows is huge, the index is not selective enoughCOVERING INDEXES -- THE FASTEST QUERIES
A covering index includes ALL columns the query needs. MySQL never touches the actual table rows.
-- This query needs: status, ordered_at, id, user_id, amount
SELECT o.id, o.user_id, o.amount
FROM orders o
WHERE o.status = 'paid'
ORDER BY o.ordered_at DESC
LIMIT 10;-- A covering index for this query:
CREATE INDEX idx_orders_cover ON orders(status, ordered_at, id, user_id, amount);EXPLAIN SELECT o.id, o.user_id, o.amount ...;
-- Extra: Using index <-- MySQL never reads the table rows, only the indexQUERY REWRITING PATTERNS
Pattern 1: Avoid functions on indexed columns
-- SLOW: function on column prevents index use
SELECT * FROM users WHERE YEAR(created_at) = 2026;-- FAST: range query uses index
SELECT * FROM users WHERE created_at >= '2026-01-01' AND created_at < '2027-01-01';Pattern 2: Avoid SELECT * -- only fetch what you need
-- Fetches all 20 columns, transfers unnecessary data
SELECT * FROM orders WHERE status = 'paid';-- Fetches only what the screen shows
SELECT id, user_id, amount, ordered_at FROM orders WHERE status = 'paid';Pattern 3: Use EXISTS instead of IN for subqueries
-- Can be slow with large subqueries:
SELECT * FROM users WHERE id IN (SELECT user_id FROM orders WHERE status = 'paid');-- Faster: stops at first match
SELECT * FROM users u WHERE EXISTS (
SELECT 1 FROM orders o WHERE o.user_id = u.id AND o.status = 'paid'
);Pattern 4: LIMIT early with subqueries
-- SLOW: joins all rows then limits
SELECT u.name, o.amount FROM users u JOIN orders o ON u.id = o.user_id ORDER BY o.amount DESC LIMIT 5;-- FASTER: limit in a subquery first then join
SELECT u.name, top_orders.amount
FROM (
SELECT user_id, amount FROM orders ORDER BY amount DESC LIMIT 5
) top_orders
JOIN users u ON top_orders.user_id = u.id;PARTITIONING -- SPLITTING HUGE TABLES
When a table grows to hundreds of millions of rows, even indexes slow down. Partitioning splits the table into physical sections.
-- Partition orders by year:
CREATE TABLE orders_partitioned (
id INT AUTO_INCREMENT,
user_id INT,
amount DECIMAL(10,2),
status VARCHAR(20),
ordered_at TIMESTAMP,
PRIMARY KEY (id, ordered_at) -- partition key must be in primary key
)
PARTITION BY RANGE (YEAR(ordered_at)) (
PARTITION p2024 VALUES LESS THAN (2025),
PARTITION p2025 VALUES LESS THAN (2026),
PARTITION p2026 VALUES LESS THAN (2027),
PARTITION p_future VALUES LESS THAN MAXVALUE
);-- Query automatically scans only the relevant partition:
SELECT * FROM orders_partitioned WHERE ordered_at >= '2026-01-01';
-- MySQL scans only p2026, ignoring the other 3 partitions-- See partition stats:
SELECT partition_name, table_rows
FROM information_schema.PARTITIONS
WHERE table_name = 'orders_partitioned';Priya reduced query time from 4 seconds to 180ms using covering indexes. The principle: give MySQL exactly what it needs to answer your query without touching the table rows themselves.
A covering index includes all columns the query needs — MySQL answers from the index alone without reading table rows at all
Functions on indexed columns (YEAR(created_at) = 2026) prevent index use — rewrite as a range query instead
Extra: Using filesort in EXPLAIN means ORDER BY cannot use an index — design the index to match the ORDER BY columns
EXISTS stops at the first match, IN evaluates the full subquery — EXISTS is faster for large subqueries
Partitioning splits huge tables physically — queries that filter by partition key scan only the relevant partition