Riya's first production incident
Navigate Linux, read logs, find the problem — the foundation of everything
Riya joined a Bengaluru fintech startup fresh out of college. Her first day in production support, at 3:47pm, her senior Anand ran to her desk. "Payment service is throwing errors. Go investigate."
Riya stared at the black terminal screen. She had used Windows her whole life. Anand typed ssh prod-pay-01 and handed her the keyboard.
Before you touch anything, always do these three things first:
hostname # which server am I on?
whoami # who am I logged in as?
uptime # how long has it been running?She was on prod-pay-01, logged in as riya, server running 12 days.
The Linux filesystem is like a building. / is the lobby. /var/log is the security office where all logs live. /opt is the office floors where apps live. /etc is the filing cabinet for configs. /tmp is the bin, cleared on reboot.
pwd # where am I right now?
ls -lrt # list files, newest at bottom
cd /var/log # go to the logs folder
cd /opt/payment # go to the payment app
cd .. # go one level up
cd - # go back to where you just wereRiya navigated to /opt/payment/logs. Anand said: Always use ls -lrt. The newest file is at the bottom.
Reading logs - the three commands you need:
tail -f payment.log # watch live as new lines appear
tail -100 payment.log # last 100 lines
grep "ERROR" payment.log # find all error lines
grep -C 5 "ERROR" payment.log # error plus 5 lines of contextShe ran tail -f and watched. Then she saw it:
2026-03-16 15:47:23 ERROR PaymentService - Database connection timeout after 30sDatabase connection timeout. Not a code bug. The database was unreachable.
She ran: grep -C 5 "timeout" payment.log | tail -30. Every error had timeout in it. Answer found in 3 minutes.
The first 60-second checklist:
hostname && whoami # orient yourself
df -h # is disk full? causes 30 percent of incidents
free -h # is memory low?
uptime # what is the load average?Riya's first incident resolved in 8 minutes. She saved the 4-command checklist on a sticky note. She still uses it today.
Always run hostname && whoami first — confirm you are on the right server
ls -lrt puts the newest file at the bottom — always use this in log directories
tail -f watches a log live — your best tool during an active incident
grep -C 5 shows context around errors — use this not plain grep
Check df -h, free -h, uptime in the first 60 seconds of every incident