Priya traces the broken connection
Ports, DNS, firewall — diagnosing cannot connect incidents
Monday 9am. Priya got a ticket: Payment service cannot connect to database since Sunday night deployment. Both services running. Cannot talk to each other.
Mental model: the network as a highway.
1. Does the GPS know the address? That is DNS.
2. Is the road open? That is routing and ping.
3. Is the specific gate open? That is port and firewall.
4. Is someone home to answer? That is service running.
Step 1: Can we resolve the hostname?
nslookup db-server-01
# Returns 10.0.0.5 - good
# NXDOMAIN means DNS cannot find this hostname
cat /etc/hosts # check local overridesDNS resolved. db-server-01 points to 10.0.0.5.
Step 2: Can we reach the server?
ping -c 3 db-server-01
# 3 packets sent, 3 received - server is reachableStep 3: Can we reach the specific port?
nc -zv db-server-01 5432
# Connection succeeded - port is open
# Connection refused - service not running
# hangs with no output - firewall is blocking itPriya's command hung for 10 seconds. Firewall.
The two error types that tell you everything:
"Connection refused" = Service not running OR port not open. Fix: start the service, check ss -tlnp.
"Connection timed out" = Firewall silently dropping packets. Fix: open the port in the firewall.Step 4: Fix the firewall.
sudo ufw status # Ubuntu
sudo firewall-cmd --list-all # CentOS/RHEL
sudo iptables -L -n # raw rules# Fix: allow payment server to reach postgres
sudo ufw allow from 10.0.0.10 to any port 5432
sudo ufw reloadConnection established immediately. Root cause: Sunday deployment added a firewall rule that blocked the payment server IP from reaching port 5432. Incident resolved in 12 minutes.
The networking toolkit:
ss -tlnp # all listening ports with process names
lsof -i :8080 # what process owns port 8080?
curl -v localhost:8080/health # test HTTP endpoint
traceroute db-server # show every hop to find where path breaksCommon ports: 22 SSH, 80 HTTP, 443 HTTPS, 3306 MySQL, 5432 PostgreSQL, 6379 Redis, 8080 Tomcat, 9200 Elasticsearch
Connection refused means service not running. Connection timed out means firewall blocking
nc -zv host port is the fastest way to test if a port is reachable
Always check DNS, routing, port, service in that exact order
ss -tlnp shows all listening ports with the process name
Firewall changes during deployments are the most common cause of connectivity breaks