Arjun becomes a log wizard
awk, sed, cut, sort, uniq — turn any log file into a data report in seconds
Arjun's manager sent him a 2GB log file and asked: How many unique users had errors today? Which error type happened most? What was the peak error hour?
Two months ago he would have opened it in a text editor, searched manually, and spent 3 hours. Today he opened a terminal and had all three answers in 45 seconds.
The log file format:
2026-03-17 14:23:45 ERROR user_id=10234 PaymentService NullPointerException
2026-03-17 14:23:46 INFO user_id=10234 PaymentService request completed
2026-03-17 14:24:01 ERROR user_id=98712 AuthService TokenExpiredExceptionQuestion 1: How many unique users had errors?
grep "ERROR" app.log | grep -oP "user_id=\K[0-9]+" | sort -u | wc -lBreaking this down:
grep "ERROR" keeps only error lines
grep -oP extracts just the user ID number using Perl regex
sort -u sorts and removes duplicates
wc -l counts remaining linesResult: 847 unique users. 45 seconds.
Question 2: Which error type happened most?
grep "ERROR" app.log | awk '{print $NF}' | sort | uniq -c | sort -rn | head -10Breaking this down:
awk '{print $NF}' prints the last field (the exception class name)
sort groups identical values together
uniq -c counts consecutive duplicates
sort -rn sorts by count, highest first
head -10 shows top 10Result:
1847 NullPointerException
923 TokenExpiredException
412 DatabaseTimeoutExceptionQuestion 3: What was the peak error hour?
grep "ERROR" app.log | cut -d' ' -f2 | cut -d: -f1 | sort | uniq -c | sort -rncut -d' ' -f2 gets the time field (14:23:45)
cut -d: -f1 gets just the hour (14)The awk command in depth. Every line is split into fields: $1, $2, and so on. $NF is the last field.
# Print columns 1 and 3
awk '{print $1, $3}' app.log# Filter and print
awk '$3 == "ERROR" {print $0}' app.log# Sum response times from column 8
awk '{sum += $8; count++} END {print "Average:", sum/count}' response.log# Count errors per service
awk '$3 == "ERROR" {services[$4]++} END {for (s in services) print services[s], s}' app.log | sort -rnThe sed command makes substitutions in text:
# Replace ERROR with CRITICAL
sed 's/ERROR/CRITICAL/g' app.log# Delete DEBUG lines
sed '/DEBUG/d' app.log# In-place edit (modifies the file directly):
sed -i 's/old_hostname/new_hostname/g' config.xml# Remove blank lines
sed '/^$/d' app.logBuilding a daily report script:
#!/bin/bash
LOG="/opt/app/logs/app.log"
DATE=$(date +%Y-%m-%d)echo "=== Daily Report: $DATE ==="
echo "Total requests:"
grep "$DATE" "$LOG" | wc -lecho "Error breakdown:"
grep "$DATE" "$LOG" | grep "ERROR" | awk '{print $NF}' | sort | uniq -c | sort -rn | head -10echo "Unique users with errors:"
grep "$DATE" "$LOG" | grep "ERROR" | grep -oP "user_id=\K[0-9]+" | sort -u | wc -lArjun runs this script every morning. The whole report generates in 3 seconds.
awk processes columns: $1 is field 1, $NF is last field, the END block runs after all lines
sort | uniq -c | sort -rn is the most useful pipeline for counting anything in logs
grep -oP with Perl regex extracts specific patterns like IDs or values from log lines
sed 's/old/new/g' does text replacement — add -i flag to edit files directly in place
cut -d'delimiter' -f1 splits on a character and picks a column — simpler than awk for simple cases