Learn 🧠 All Concepts (20) 🤖 What is an LLM? 📚 RAG Explained ⚡ AI Agents 💻 Run AI Locally 🇮🇳 AI in India 📖 Learn Tracks 🔧 DevOps Track ⚙️ AI Ops Track 🗺️ AI Engineer Roadmap
Tools 🔧 AI Tools Directory 🔓 Open Source AI ⭐ Top GitHub Repos ✦ Claude Skill Repos 🚀 Ready-to-Deploy Projects
Build 🏗️ Build Hub 🎯 Master Prompts 🧩 RAG Agents 🚀 App Megaprompts
Workflows ⚡ All Workflows (22) 🎥 Text to Video 🎞️ Image to Video 🔊 Text to Speech ♻️ Automation
Resources 🧪 Colab Notebooks ⚙️ n8n Workflows 📈 Algo Trading 💰 Passive Income
🗂️ Browse All Topics About AItheGuru
Learn Linux for Production Support Vijay handles his first Linux production incident
Linux for Production Support Ch 8 / 32 Advanced 🪟 Windows → Linux
🚨

Vijay handles his first Linux production incident

Full incident response — applying Windows instincts in a Linux world

⏱ 14 min 6 commands 5 takeaways
🚨
In this chapter
Vijay
Windows support engineer, month 3 on Linux
The story

Month 3. Vijay was on-call alone for the first time. PagerDuty fired: Payment service down. No Windows machine. Just a Linux terminal.

Same incident. Same logic. Different spellings.

STEP 1: ORIENT YOURSELF

On Windows: RDP in, check Computer Name, check who is logged in.

On Linux:

hostname                    # confirm which server
whoami                      # confirm who you are logged in as
uptime                      # how long running, recent reboot?

STEP 2: IS IT A RESOURCE PROBLEM?

On Windows: open Task Manager, check CPU and Memory tabs, check Performance tab.

On Linux:

uptime                      # load average tells you CPU pressure immediately
free -h                     # memory available
df -h                       # disk space - CRITICAL, often missed on Windows

Vijay checked df -h first. Habit from training.

/dev/sda1   50G  50G  0  100%  /

Disk 100% full. This was the problem. On Windows he would have checked Task Manager first and missed it.

STEP 3: FIND WHAT FILLED THE DISK

On Windows: open WinDirStat, wait 3 minutes, find visually.

On Linux (30 seconds total):

du -sh /var/log/* 2>/dev/null | sort -rh | head -10
# 47G    /var/log/myapp
# 1.2G   /var/log/nginx
du -sh /var/log/myapp/* | sort -rh | head -5
# 47G    /var/log/myapp/debug.log

47GB debug log. Developer turned on verbose logging and forgot. Same human error as Windows. Different OS.

STEP 4: FIX IT SAFELY

On Windows: File Explorer, find the file, check if anything is using it (handle.exe), delete it.

On Linux:

# Check if a process has the file open:
lsof +L1 /var/log/myapp/debug.log
# If output shows a process: truncate instead of delete
# If no output: safe to delete
# Truncate (empties without deleting, process keeps its file handle):
> /var/log/myapp/debug.log
# Verify:
df -h /

STEP 5: RESTART THE SERVICE

On Windows: Services.msc right-click Restart.

On Linux:

sudo systemctl restart myapp
sudo systemctl status myapp     # verify it restarted
journalctl -u myapp -f          # watch logs as it starts

STEP 6: VERIFY THE FIX

On Linux:

df -h                               # confirm disk is now OK
systemctl status myapp              # confirm service is running
journalctl -u myapp -n 50          # any new errors?
curl -s -o /dev/null -w "%{http_code}" http://localhost:8080/health

THE FULL INCIDENT COMPARISON

Phase              Windows                          Linux
Orient             RDP, Computer Name               hostname, whoami, uptime
Check resources    Task Manager                     top, free -h, df -h
Find big files     WinDirStat (3 min wait)          du -sh | sort -rh (30 sec)
Check file usage   handle.exe from Sysinternals     lsof +L1 filename
Free space         Delete in Explorer               > filename (truncate)
Restart service    Services.msc right-click         systemctl restart service
Watch logs         Event Viewer refresh             journalctl -u service -f
Test app           Browser or Postman               curl http://localhost/path

Vijay resolved his first solo Linux incident in 14 minutes. After the incident he wrote in the ticket:

The biggest difference is that Linux tells you the truth faster. df -h is instant. du -sh is 30 seconds. grep finds exactly what you need. There are no loading screens.

Three months in, Vijay had developed genuinely new instincts. Not replacing his Windows knowledge. Adding to it.

Key takeaways

The incident logic is identical on Windows and Linux — only the tool names change

df -h takes 1 second and catches disk full issues — check it immediately, before checking Task Manager

lsof +L1 filename replaces handle.exe from Sysinternals — check if a process has a file open

The > filename trick empties a log file safely while a process still has it open

journalctl -u service -f replaces watching Event Viewer refresh — faster and more accurate

Commands from this chapter
$ hostname && whoami && uptime && df -h
The Linux 10-second orient — RDP then Task Manager check equivalent
$ du -sh /var/log/* 2>/dev/null | sort -rh | head -10
WinDirStat equivalent — find what is eating disk in 30 seconds
$ lsof +L1 /path/to/file
handle.exe Sysinternals equivalent — check if process has file open
$ > /var/log/bigfile.log
Safe empty file while process runs — no easy Windows equivalent
$ journalctl -u myapp -f
Event Viewer real-time refresh equivalent — live service log watching
$ curl -s -o /dev/null -w "%{http_code}" http://localhost:8080/health
Browser test equivalent — check HTTP endpoint from command line