Rajan unlocks the kernel's hidden performance
sysctl, ulimits, file descriptors — tuning the OS for high-traffic production
Rajan was a Java developer turned infrastructure engineer. His apps worked fine on small servers. When the company scaled to high-traffic, strange things happened. Connections were refused at 1000 concurrent users even though CPU was 20%. Database connections were timing out even though the database was healthy. The OS was the bottleneck, not the app.
His senior SRE spent 30 minutes with him, changed 8 kernel parameters, and the server handled 10x the traffic without hardware changes.
WHAT IS THE KERNEL AND WHY DOES IT MATTER
The Linux kernel is the core of the OS. It manages memory, CPU scheduling, network connections, and file handles. By default its settings are tuned for general purpose workloads. Production servers serving high traffic need tuning.
sysctl is the tool to read and change kernel parameters:
sysctl -a # show all kernel parameters (thousands of them)
sysctl net.core.somaxconn # read a specific parameter
sysctl -w net.core.somaxconn=1024 # change a parameter (temporary, resets on reboot)# Permanent changes in /etc/sysctl.conf or /etc/sysctl.d/:
echo "net.core.somaxconn = 65535" | sudo tee -a /etc/sysctl.d/99-production.conf
sudo sysctl -p /etc/sysctl.d/99-production.conf # apply without rebootNETWORK TUNING — HIGH TRAFFIC SERVERS
# /etc/sysctl.d/99-network.conf# Maximum pending connections in the kernel queue (default 128, too low for high traffic):
net.core.somaxconn = 65535
net.ipv4.tcp_max_syn_backlog = 65535# TIME_WAIT connection recycling (default waits 60 seconds, can exhaust ports):
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_fin_timeout = 15# Increase local port range (default 32768-60999 = only 28000 ports):
net.ipv4.ip_local_port_range = 10000 65535# TCP buffer sizes for high-bandwidth connections:
net.core.rmem_max = 134217728
net.core.wmem_max = 134217728
net.ipv4.tcp_rmem = 4096 87380 67108864
net.ipv4.tcp_wmem = 4096 65536 67108864# Enable TCP Fast Open (reduces latency for repeated connections):
net.ipv4.tcp_fastopen = 3MEMORY TUNING
# /etc/sysctl.d/99-memory.conf# Reduce swap usage (default 60 is too high for servers with enough RAM):
vm.swappiness = 10# How aggressively kernel reclaims memory from inode/dentry cache:
vm.vfs_cache_pressure = 50# Overcommit settings (for JVM applications):
vm.overcommit_memory = 1FILE DESCRIPTORS — ULIMITS
Every open file and network connection uses a file descriptor. The default limit is 1024. A busy server handling 10,000 connections needs far more.
ulimit -n # current max open files for your session
ulimit -n 65535 # increase for current session only# Check what a running process has open:
cat /proc/$(pgrep java)/limits | grep "Max open files"
lsof -p $(pgrep java) | wc -l # how many are actually open?# Permanent system-wide limits in /etc/security/limits.conf:
sudo nano /etc/security/limits.conf# Add these lines:
* soft nofile 65535
* hard nofile 65535
tomcat soft nofile 65535
tomcat hard nofile 65535# For systemd services, set in the service file:
[Service]
LimitNOFILE=65535CHECKING CURRENT KERNEL BOTTLENECKS
# Are we hitting max connections?
ss -s | grep "TCP:"
cat /proc/sys/net/core/somaxconn# Are we running out of file descriptors system-wide?
cat /proc/sys/fs/file-nr
# Output: used unused max
# If used is close to max, you need to increase fs.file-max# Are TIME_WAIT connections piling up?
ss -tan state time-wait | wc -l
# Over 10000 is a problem# Are we dropping incoming connections?
netstat -s | grep "SYNs to LISTEN"
# Non-zero means somaxconn is too lowCPU SCHEDULER TUNING
# Check CPU scheduling policy:
chrt -p $(pgrep java) # what scheduling policy is this process using?# For latency-sensitive processes (real-time scheduling):
sudo chrt -f -p 50 $(pgrep java) # FIFO scheduling, priority 50# NUMA topology (for multi-socket servers):
numactl --hardware # see NUMA nodes
numactl --cpunodebind=0 --membind=0 java -jar app.jar # pin to NUMA node 0Rajan's server after tuning handled 12,000 concurrent connections on the same hardware that choked at 1,000. The CPU barely moved. The kernel was the bottleneck. Kernel settings are free performance.
net.core.somaxconn limits pending connections in the kernel queue — default 128 is far too low for any production server
ulimit -n shows the max open files limit — a Java app handling 10k connections needs this at 65535 not 1024
vm.swappiness = 10 reduces swap usage on servers with enough RAM — the default 60 causes unnecessary swapping
net.ipv4.tcp_tw_reuse = 1 recycles TIME_WAIT connections — prevents port exhaustion under high traffic
sysctl -p applies changes from a config file without rebooting — always verify with sysctl -a | grep setting