Linux & Networking Fundamentals
Linux & Networking Troubleshooting Scenarios
Real interview troubleshooting questions test your systematic approach. Let's practice with scenarios you'll actually face.
Scenario 1: High Load, Low CPU
Interviewer: "Load average is 100 but CPU utilization is only 10%. What's going on?"
Your approach:
# Step 1: Confirm the symptoms
uptime # Check load average
top -b -n1 | head -20 # CPU utilization
# Step 2: Check for I/O wait
vmstat 1 5 # Look at 'wa' column (I/O wait)
iostat -x 1 5 # Disk I/O details
# Step 3: Identify the culprit
iotop # Which processes are doing I/O
lsof +D /mount/point # Files being accessed on slow storage
# Step 4: Check disk health
dmesg | grep -i error # Disk errors in kernel log
smartctl -a /dev/sda # SMART disk health
Root causes: Disk failing, NFS issues, database locks, swap thrashing
Scenario 2: SSH Connection Timeout
Interviewer: "You can't SSH to a server. Walk me through debugging."
Your approach:
# Step 1: Basic connectivity
ping server.example.com # ICMP reachable?
traceroute server.example.com # Where does it fail?
# Step 2: DNS resolution
dig server.example.com # Does DNS resolve?
dig @8.8.8.8 server.example.com # Try alternate DNS
# Step 3: Port connectivity
nc -zv server.example.com 22 # Is port 22 open?
telnet server.example.com 22 # Can we connect?
# Step 4: Check from another host
# If above fails, try from different network
# Rules out client-side issues
# Step 5: If you have console access
systemctl status sshd # Is SSH running?
ss -tlnp | grep 22 # Listening on port 22?
journalctl -u sshd # SSH logs
iptables -L -n # Firewall blocking?
cat /etc/hosts.deny # TCP wrappers?
Common causes: Firewall rules, SSH service down, network ACLs, security groups (cloud)
Scenario 3: Application Running Slow
Interviewer: "Users report the web app is slow. Where do you start?"
Your structured approach (USE Method):
| Resource | Utilization | Saturation | Errors |
|---|---|---|---|
| CPU | top, mpstat |
Load average | dmesg |
| Memory | free -h |
Swap usage | OOM in logs |
| Disk | iostat |
I/O wait | smartctl |
| Network | sar -n DEV |
Drops/errors | Interface errors |
# Quick health check
top -b -n1 | head -20
free -h
iostat -x 1 3
ss -s
# Application-specific
curl -w "@curl-format.txt" -o /dev/null -s http://localhost/
# Measures DNS, connect, TTFB, total time
# Check application logs
tail -f /var/log/app/error.log
journalctl -u myapp -f
# Database connection issues?
mysql -e "SHOW PROCESSLIST"
# or
psql -c "SELECT * FROM pg_stat_activity"
Scenario 4: Disk Space Emergency
Interviewer: "Disk is at 99%. Production is down. What do you do?"
Your approach (fast!):
# Step 1: Find largest directories (quick)
du -sh /* 2>/dev/null | sort -rh | head -10
# Step 2: Find large files
find /var -type f -size +100M -exec ls -lh {} \; 2>/dev/null
# Step 3: Check for deleted files still open
lsof +L1 | head -20
# Step 4: Safe quick wins
# Truncate (not delete) large logs
> /var/log/large-log-file.log
# Clear package cache
# Debian/Ubuntu
apt-get clean
# RHEL/CentOS
yum clean all
# Remove old kernels (careful!)
# Check current kernel first
uname -r
# Step 5: Long-term
# Set up log rotation
# Add monitoring alerts at 80%
# Consider LVM for flexibility
Scenario 5: Network Packet Loss
Interviewer: "Users complain of intermittent connectivity. How do you investigate?"
# Step 1: Measure packet loss
ping -c 100 target.com
# Look at packet loss percentage
# Step 2: Continuous monitoring
mtr target.com
# Shows loss at each hop
# Step 3: Check interface errors
ip -s link show eth0
# Look at RX/TX errors, drops
# Step 4: Check for duplex mismatch
ethtool eth0
# Full duplex should match switch config
# Step 5: Capture packets for analysis
tcpdump -i eth0 -w capture.pcap host target.com
# Analyze with Wireshark
# Step 6: Check for network saturation
sar -n DEV 1 10
# Look at rxkB/s, txkB/s vs interface capacity
The Troubleshooting Framework
Always use a systematic approach:
1. GATHER INFORMATION
- What changed recently?
- When did it start?
- Who is affected?
2. FORM HYPOTHESIS
- Based on symptoms, what's most likely?
- Prioritize by probability
3. TEST HYPOTHESIS
- Run diagnostic commands
- Check logs
- Verify assumptions
4. IMPLEMENT FIX
- Start with reversible changes
- Document what you changed
5. VERIFY AND MONITOR
- Confirm issue resolved
- Set up monitoring to catch recurrence
Pro tip: Always verbalize your thought process in interviews. Interviewers want to see HOW you think, not just the final answer.
You've mastered Linux and networking fundamentals. Next module: CI/CD and Infrastructure as Code—the tools that define modern DevOps. :::