Learn 🧠 All Concepts (20) 🤖 What is an LLM? 📚 RAG Explained ⚡ AI Agents 💻 Run AI Locally 🇮🇳 AI in India 📖 Learn Tracks 🔧 DevOps Track ⚙️ AI Ops Track 🗺️ AI Engineer Roadmap
Tools 🔧 AI Tools Directory 🔓 Open Source AI ⭐ Top GitHub Repos ✦ Claude Skill Repos 🚀 Ready-to-Deploy Projects
Build 🏗️ Build Hub 🎯 Master Prompts 🧩 RAG Agents 🚀 App Megaprompts
Workflows ⚡ All Workflows (22) 🎥 Text to Video 🎞️ Image to Video 🔊 Text to Speech ♻️ Automation
Resources 🧪 Colab Notebooks ⚙️ n8n Workflows 📈 Algo Trading 💰 Passive Income
🗂️ Browse All Topics About AItheGuru
Learn Linux for Production Support Dev hunts the CPU killer
Linux for Production Support Ch 10 / 32 Beginner

Dev hunts the CPU killer

Processes, top, ps, kill — finding and fixing runaway processes

⏱ 10 min 6 commands 5 takeaways
In this chapter
Dev
L2 support engineer, 6 months experience
The story

Dev's monitoring fired at 11am Tuesday. CPU 98%. App response time 45 seconds. Users complaining on Twitter. His manager said: You have 10 minutes.

Step 1: Confirm the problem.

uptime
# load average: 15.2, 12.1, 8.3

Load average 15.2 on a 4-core server means 11 processes are queued waiting for CPU. The server is drowning.

Step 2: Find the culprit.

ps aux --sort=-%cpu | head -10

Output:

USER    PID   %CPU %MEM  COMMAND
app    4521   97.3  2.1  java -jar analytics.jar
tomcat 1234    0.8 12.3  java -jar payment.jar

analytics.jar was eating 97.3% CPU. Someone triggered a large report that went into an infinite loop.

Every program in Linux is a process with a PID (unique ID), %CPU, %MEM, and COMMAND.

ps aux              # all processes, all users
ps aux | grep java  # find all java processes
top                 # live dashboard (press P for CPU, M for memory, q to quit)

Dev noted the PID 4521. Before killing anything he confirmed it with: cat /proc/4521/cmdline

Then he killed it correctly:

kill 4521         # polite - gives process a chance to clean up
sleep 10
ps aux | grep 4521   # is it gone?
# if still running:
kill -9 4521      # force kill - last resort only

CPU dropped from 97% to 8% in 3 seconds. Response time went from 45s to 200ms.

Never jump to kill -9 immediately. Forced kills on databases can corrupt data.

Inside top keyboard shortcuts: P sorts by CPU, M sorts by memory, 1 shows each core individually, k kills a process by PID, q quits.

The OOM Killer: when memory is completely full, Linux kills the process using the most memory to protect itself. Your app disappears with nothing in its own log.

grep -i "killed process" /var/log/kern.log | tail -5
# kernel: Kill process 1234 (java) score 890
Key takeaways

ps aux --sort=-%cpu | head -10 finds the CPU hog in 1 second

Load average above number of CPU cores means the server is overloaded

Always try kill PID before kill -9 — give the process a chance to clean up

top: P = sort by CPU, M = sort by memory, q = quit

grep killed process in kern.log reveals OOM killer events

Commands from this chapter
$ ps aux --sort=-%cpu | head -10
Top 10 CPU-hungry processes instantly
$ ps aux --sort=-%mem | head -10
Top 10 memory-hungry processes
$ top -bn1 | head -20
One snapshot of top without staying in it
$ kill PID
Politely stop a process (try this first)
$ kill -9 PID
Force kill — only when kill fails after 15s
$ grep "killed process" /var/log/kern.log | tail -5
Find OOM killer events