Priya survives five production MySQL crises
MySQL won't start, too many connections, accidental delete, broken replication, table corruption
Priya's company runs a payment platform. One Friday at 6pm, three alarms fired at once. She had to handle all three simultaneously. This is the chapter she wishes she had read before that Friday.
WAR GAME 1: DATABASE WILL NOT START
sudo systemctl status mysql
# Shows: failedStep 1: Read the error log:
sudo tail -50 /var/log/mysql/error.logCommon reasons it won't start:
- Disk full: check df -h
- Port already in use: ss -tlnp | grep 3306
- Corrupted InnoDB files: look for [ERROR] in the log
- Wrong permissions: ls -la /var/lib/mysql/-- Disk full fix:
df -h
# If /var/lib/mysql is 100%:
du -sh /var/lib/mysql/* | sort -rh | head -- find what is large
-- Often: binary logs or undo logs
PURGE BINARY LOGS BEFORE DATE_SUB(NOW(), INTERVAL 3 DAY);-- Port in use fix:
sudo kill $(sudo lsof -t -i:3306)
sudo systemctl start mysqlWAR GAME 2: APP CANNOT CONNECT TO DATABASE
-- Error in app logs: "Too many connections" or "Can't connect to MySQL server"-- Check current connections:
SHOW STATUS LIKE 'Threads_connected';
SHOW STATUS LIKE 'Max_used_connections';-- Check the max_connections setting:
SHOW VARIABLES LIKE 'max_connections';-- See who has connections open:
SELECT user, host, COUNT(*) AS connections
FROM information_schema.PROCESSLIST
GROUP BY user, host
ORDER BY connections DESC;-- Kill idle connections (sleeping for over 300 seconds):
SELECT CONCAT('KILL ', id, ';')
FROM information_schema.PROCESSLIST
WHERE command = 'Sleep' AND time > 300;-- Temporary emergency fix -- increase max_connections:
SET GLOBAL max_connections = 500;
-- Permanent: edit my.cnf and restartWAR GAME 3: DATA ACCIDENTALLY DELETED
-- Someone ran: DELETE FROM orders WHERE user_id = 1;
-- But they forgot the WHERE should have been: DELETE FROM orders WHERE id = 1;-- If you have binary logging enabled:
SHOW VARIABLES LIKE 'log_bin'; -- is it ON?
SHOW BINARY LOGS; -- list log files-- Find the DELETE statement in the binary log:
mysqlbinlog --start-datetime="2026-03-16 17:00:00" --stop-datetime="2026-03-16 18:00:00" /var/log/mysql/mysql-bin.000123 | grep -A5 -B5 "DELETE"-- To recover, restore from last backup and replay binary logs up to just before the DELETE:
mysql -u root -p myapp_db < backup_20260316.sql
mysqlbinlog --stop-datetime="2026-03-16 17:45:00" /var/log/mysql/mysql-bin.000123 | mysql -u root -p myapp_db-- Prevention: enable binary logging, take daily backups, test restore monthlyWAR GAME 4: REPLICATION BROKEN
SHOW SLAVE STATUS\G
-- Slave_IO_Running: No or Slave_SQL_Running: No-- Read the error:
-- Last_Error: Could not execute Write_rows event...-- Quick fix: skip the problem event and continue:
STOP SLAVE;
SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1;
START SLAVE;
SHOW SLAVE STATUS\G-- If replication lag is growing:
-- Seconds_Behind_Master: 3600 (1 hour behind!)
-- Check replica server load: top, iostat
-- Replica disk/CPU may be struggling with the write loadWAR GAME 5: TABLE CORRUPTION
-- App error: "Table './myapp_db/users' is marked as crashed"-- For MyISAM tables (older):
REPAIR TABLE users;-- For InnoDB tables (modern, default):
-- InnoDB rarely corrupts. If it does:
sudo systemctl stop mysql
-- Add to my.cnf: innodb_force_recovery = 1
sudo systemctl start mysql
mysqldump -u root -p myapp_db > emergency_backup.sql
-- Remove innodb_force_recovery
sudo systemctl restart mysql
mysql -u root -p myapp_db < emergency_backup.sql-- Check for table issues:
CHECK TABLE users;
-- Shows status: OK (good) or errors (problem)THE CHECKLIST PRIYA FOLLOWS FOR EVERY INCIDENT
1. Is MySQL running? sudo systemctl status mysql
2. Can I connect? sudo mysql -e "SELECT 1;"
3. What does SHOW PROCESSLIST show? Any blocking queries?
4. Is disk full? df -h /var/lib/mysql
5. What does the error log say? sudo tail -100 /var/log/mysql/error.log
6. Is replication healthy? SHOW SLAVE STATUS
7. When did the problem start? Compare with recent deployments and cron jobs.Priya's rule: if you can answer those 7 questions in under 3 minutes, you can handle any MySQL incident.
MySQL won't start: always check the error log first (sudo tail -50 /var/log/mysql/error.log) — the reason is always there
Too many connections: SET GLOBAL max_connections temporarily, then find and kill sleeping connections, then fix the connection pool
Accidental delete: binary logging is your undo button — restore from backup then replay binary logs up to just before the mistake
Broken replication: SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1 skips a single problem event — check SHOW SLAVE STATUS after
innodb_force_recovery = 1 lets MySQL start with a corrupted InnoDB — immediately dump and restore, then remove it