Learn 🧠 All Concepts (20) 🤖 What is an LLM? 📚 RAG Explained ⚡ AI Agents 💻 Run AI Locally 🇮🇳 AI in India 📖 Learn Tracks 🔧 DevOps Track ⚙️ AI Ops Track 🗺️ AI Engineer Roadmap
Tools 🔧 AI Tools Directory 🔓 Open Source AI ⭐ Top GitHub Repos ✦ Claude Skill Repos 🚀 Ready-to-Deploy Projects
Build 🏗️ Build Hub 🎯 Master Prompts 🧩 RAG Agents 🚀 App Megaprompts
Workflows ⚡ All Workflows (22) 🎥 Text to Video 🎞️ Image to Video 🔊 Text to Speech ♻️ Automation
Resources 🧪 Colab Notebooks ⚙️ n8n Workflows 📈 Algo Trading 💰 Passive Income
🗂️ Browse All Topics About AItheGuru
Learn Linux for Production Support Arjun becomes a log wizard
Linux for Production Support Ch 15 / 32 Intermediate
📝

Arjun becomes a log wizard

awk, sed, cut, sort, uniq — turn any log file into a data report in seconds

⏱ 12 min 5 commands 5 takeaways
📝
In this chapter
Arjun
Support engineer, data-heavy fintech team
The story

Arjun's manager sent him a 2GB log file and asked: How many unique users had errors today? Which error type happened most? What was the peak error hour?

Two months ago he would have opened it in a text editor, searched manually, and spent 3 hours. Today he opened a terminal and had all three answers in 45 seconds.

The log file format:

2026-03-17 14:23:45 ERROR user_id=10234 PaymentService NullPointerException
2026-03-17 14:23:46 INFO  user_id=10234 PaymentService request completed
2026-03-17 14:24:01 ERROR user_id=98712 AuthService TokenExpiredException

Question 1: How many unique users had errors?

grep "ERROR" app.log | grep -oP "user_id=\K[0-9]+" | sort -u | wc -l

Breaking this down:

grep "ERROR" keeps only error lines
grep -oP extracts just the user ID number using Perl regex
sort -u sorts and removes duplicates
wc -l counts remaining lines

Result: 847 unique users. 45 seconds.

Question 2: Which error type happened most?

grep "ERROR" app.log | awk '{print $NF}' | sort | uniq -c | sort -rn | head -10

Breaking this down:

awk '{print $NF}' prints the last field (the exception class name)
sort groups identical values together
uniq -c counts consecutive duplicates
sort -rn sorts by count, highest first
head -10 shows top 10

Result:

1847 NullPointerException
 923 TokenExpiredException
 412 DatabaseTimeoutException

Question 3: What was the peak error hour?

grep "ERROR" app.log | cut -d' ' -f2 | cut -d: -f1 | sort | uniq -c | sort -rn
cut -d' ' -f2 gets the time field (14:23:45)
cut -d: -f1 gets just the hour (14)

The awk command in depth. Every line is split into fields: $1, $2, and so on. $NF is the last field.

# Print columns 1 and 3
awk '{print $1, $3}' app.log
# Filter and print
awk '$3 == "ERROR" {print $0}' app.log
# Sum response times from column 8
awk '{sum += $8; count++} END {print "Average:", sum/count}' response.log
# Count errors per service
awk '$3 == "ERROR" {services[$4]++} END {for (s in services) print services[s], s}' app.log | sort -rn

The sed command makes substitutions in text:

# Replace ERROR with CRITICAL
sed 's/ERROR/CRITICAL/g' app.log
# Delete DEBUG lines
sed '/DEBUG/d' app.log
# In-place edit (modifies the file directly):
sed -i 's/old_hostname/new_hostname/g' config.xml
# Remove blank lines
sed '/^$/d' app.log

Building a daily report script:

#!/bin/bash
LOG="/opt/app/logs/app.log"
DATE=$(date +%Y-%m-%d)
echo "=== Daily Report: $DATE ==="
echo "Total requests:"
grep "$DATE" "$LOG" | wc -l
echo "Error breakdown:"
grep "$DATE" "$LOG" | grep "ERROR" | awk '{print $NF}' | sort | uniq -c | sort -rn | head -10
echo "Unique users with errors:"
grep "$DATE" "$LOG" | grep "ERROR" | grep -oP "user_id=\K[0-9]+" | sort -u | wc -l

Arjun runs this script every morning. The whole report generates in 3 seconds.

Key takeaways

awk processes columns: $1 is field 1, $NF is last field, the END block runs after all lines

sort | uniq -c | sort -rn is the most useful pipeline for counting anything in logs

grep -oP with Perl regex extracts specific patterns like IDs or values from log lines

sed 's/old/new/g' does text replacement — add -i flag to edit files directly in place

cut -d'delimiter' -f1 splits on a character and picks a column — simpler than awk for simple cases

Commands from this chapter
$ grep 'ERROR' app.log | awk '{print $NF}' | sort | uniq -c | sort -rn | head -10
Count errors by type, most frequent first
$ grep -oP 'user_id=\K[0-9]+' app.log | sort -u | wc -l
Count unique users who had errors
$ awk '$3=="ERROR"{s[$4]++} END{for(k in s) print s[k],k}' app.log | sort -rn
Error count per service
$ sed -i 's/old_hostname/new_hostname/g' config.xml
In-place find and replace in config file
$ cut -d' ' -f2 app.log | cut -d: -f1 | sort | uniq -c | sort -rn
Error count per hour of day