Learn › MySQL › Priya survives five production MySQL crises

MySQL Ch 16 / 18 Expert

🚨

Priya survives five production MySQL crises

MySQL won't start, too many connections, accidental delete, broken replication, table corruption

⏱ 15 min 6 commands 5 takeaways

🚨

In this chapter

Priya

DBA, handling 5 production scenarios

The story

Priya's company runs a payment platform. One Friday at 6pm, three alarms fired at once. She had to handle all three simultaneously. This is the chapter she wishes she had read before that Friday.

WAR GAME 1: DATABASE WILL NOT START

sudo systemctl status mysql
# Shows: failed

Step 1: Read the error log:
sudo tail -50 /var/log/mysql/error.log

Common reasons it won't start:
- Disk full: check df -h
- Port already in use: ss -tlnp | grep 3306
- Corrupted InnoDB files: look for [ERROR] in the log
- Wrong permissions: ls -la /var/lib/mysql/

-- Disk full fix:
df -h
# If /var/lib/mysql is 100%:
du -sh /var/lib/mysql/* | sort -rh | head  -- find what is large
-- Often: binary logs or undo logs
PURGE BINARY LOGS BEFORE DATE_SUB(NOW(), INTERVAL 3 DAY);

-- Port in use fix:
sudo kill $(sudo lsof -t -i:3306)
sudo systemctl start mysql

WAR GAME 2: APP CANNOT CONNECT TO DATABASE

-- Error in app logs: "Too many connections" or "Can't connect to MySQL server"

-- Check current connections:
SHOW STATUS LIKE 'Threads_connected';
SHOW STATUS LIKE 'Max_used_connections';

-- Check the max_connections setting:
SHOW VARIABLES LIKE 'max_connections';

-- See who has connections open:
SELECT user, host, COUNT(*) AS connections
FROM information_schema.PROCESSLIST
GROUP BY user, host
ORDER BY connections DESC;

-- Kill idle connections (sleeping for over 300 seconds):
SELECT CONCAT('KILL ', id, ';')
FROM information_schema.PROCESSLIST
WHERE command = 'Sleep' AND time > 300;

-- Temporary emergency fix -- increase max_connections:
SET GLOBAL max_connections = 500;
-- Permanent: edit my.cnf and restart

WAR GAME 3: DATA ACCIDENTALLY DELETED

-- Someone ran: DELETE FROM orders WHERE user_id = 1;
-- But they forgot the WHERE should have been: DELETE FROM orders WHERE id = 1;

-- If you have binary logging enabled:
SHOW VARIABLES LIKE 'log_bin';        -- is it ON?
SHOW BINARY LOGS;                     -- list log files

-- Find the DELETE statement in the binary log:
mysqlbinlog --start-datetime="2026-03-16 17:00:00"                 --stop-datetime="2026-03-16 18:00:00"                 /var/log/mysql/mysql-bin.000123 | grep -A5 -B5 "DELETE"

-- To recover, restore from last backup and replay binary logs up to just before the DELETE:
mysql -u root -p myapp_db < backup_20260316.sql
mysqlbinlog --stop-datetime="2026-03-16 17:45:00"                 /var/log/mysql/mysql-bin.000123 | mysql -u root -p myapp_db

-- Prevention: enable binary logging, take daily backups, test restore monthly

WAR GAME 4: REPLICATION BROKEN

SHOW SLAVE STATUS\G
-- Slave_IO_Running: No   or   Slave_SQL_Running: No

-- Read the error:
-- Last_Error: Could not execute Write_rows event...

-- Quick fix: skip the problem event and continue:
STOP SLAVE;
SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1;
START SLAVE;
SHOW SLAVE STATUS\G

-- If replication lag is growing:
-- Seconds_Behind_Master: 3600   (1 hour behind!)
-- Check replica server load: top, iostat
-- Replica disk/CPU may be struggling with the write load

WAR GAME 5: TABLE CORRUPTION

-- App error: "Table './myapp_db/users' is marked as crashed"

-- For MyISAM tables (older):
REPAIR TABLE users;

-- For InnoDB tables (modern, default):
-- InnoDB rarely corrupts. If it does:
sudo systemctl stop mysql
-- Add to my.cnf: innodb_force_recovery = 1
sudo systemctl start mysql
mysqldump -u root -p myapp_db > emergency_backup.sql
-- Remove innodb_force_recovery
sudo systemctl restart mysql
mysql -u root -p myapp_db < emergency_backup.sql

-- Check for table issues:
CHECK TABLE users;
-- Shows status: OK (good) or errors (problem)

THE CHECKLIST PRIYA FOLLOWS FOR EVERY INCIDENT

1. Is MySQL running? sudo systemctl status mysql
2. Can I connect? sudo mysql -e "SELECT 1;"
3. What does SHOW PROCESSLIST show? Any blocking queries?
4. Is disk full? df -h /var/lib/mysql
5. What does the error log say? sudo tail -100 /var/log/mysql/error.log
6. Is replication healthy? SHOW SLAVE STATUS
7. When did the problem start? Compare with recent deployments and cron jobs.

Priya's rule: if you can answer those 7 questions in under 3 minutes, you can handle any MySQL incident.

Key takeaways

MySQL won't start: always check the error log first (sudo tail -50 /var/log/mysql/error.log) — the reason is always there

Too many connections: SET GLOBAL max_connections temporarily, then find and kill sleeping connections, then fix the connection pool

Accidental delete: binary logging is your undo button — restore from backup then replay binary logs up to just before the mistake

Broken replication: SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1 skips a single problem event — check SHOW SLAVE STATUS after

innodb_force_recovery = 1 lets MySQL start with a corrupted InnoDB — immediately dump and restore, then remove it

Commands from this chapter

$ sudo tail -100 /var/log/mysql/error.log

First step in every MySQL crisis — read the error log

$ PURGE BINARY LOGS BEFORE DATE_SUB(NOW(), INTERVAL 3 DAY);

Free disk space by removing old binary logs

$ SELECT CONCAT('KILL ', id, ';') FROM information_schema.PROCESSLIST WHERE command = 'Sleep' AND time > 300;

Generate KILL statements for idle connections

$ SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1; START SLAVE;

Skip a broken replication event and resume

$ CHECK TABLE users;

Check a table for corruption

$ mysqlbinlog --start-datetime='2026-03-16 17:00' --stop-datetime='2026-03-16 17:45' /var/log/mysql/mysql-bin.000123 | mysql -u root -p myapp_db

Replay binary log for point-in-time recovery