■ Issue/Symptom : High load on server, not accessible over ssh
■ OS Environment : RHEL 5.5
■ Background Information :
■ OS Environment : RHEL 5.5
■ Background Information :
- Infra was running test
- Server was intermittently highly loaded
- ssh was failing :
- [usera@user01lxv ~]$ ssh 10.57XXX
- Password:
- Connection closed by 10.57.XXX
- console shows "lockd: rejected NSM callback from 7f000001:30001" and sometimes NFS is not ok
- iowait was very high and fluctuating.
- All the cpu were busy to serve i/o bound operations
$ mpstat -P ALL 1
Linux 2.6.18-128.el5 (xxxxxxx) 11/19/2014 10:17:29 PM CPU %user %nice %sys %iowait %irq %soft %steal %idle intr/s 10:17:30 PM all 0.00 0.00 0.00 75.00 0.00 0.00 0.00 25.00 182.18 10:17:30 PM 0 0.00 0.00 0.00 100.00 0.00 0.00 0.00 0.00 182.18 10:17:30 PM 1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 0.00
- top had shown high load or no process took too much cpu
top - 22:18:03 up 50 days, 22:20, 4 users, load average: 25.19, 26.68, 30.74 Tasks: 235 total, 2 running, 231 sleeping, 0 stopped, 2 zombie Cpu(s): 2.0%us, 0.8%sy, 0.0%ni, 0.0%id, 96.8%wa, 0.0%hi, 0.4%si, 0.0%st Mem: 3866480k total, 2916884k used, 949596k free, 12440k buffers Swap: 8385920k total, 498424k used, 7887496k free, 350200k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 20841 cw 21 0 5606m 2.1g 5040 S 4.0 56.8 291:05.56 /opt/cw/jre/bin/java -Duser.timezone=America/Mexico_City -Xms2560m -Xmx2560m -XX:MaxPermSize=128m
- Found that there were lot of "D" stated processes which didn't appear on nso-102, 101
$ ps aux |awk '{print $1 " " $8 " " $NF }'|grep D
USER STAT COMMAND root D< [kjournald] root Ds 0 root Ds /var/run/vmware-guestd.pid nobody DN /usr/bin/log2mysql-nso-tomcat-writer nobody DN /usr/bin/log2mysql-nso-tomcat-spooler root D
- In above output, system thread kjournald is also in D state which looked bad from kernel perspective. Journalling would have stopped.
■ Workaround Solution :Shutdown VM and power on again.[D stated processes can't be killed unless system is rebooted]
■ Permanent Solution :Shutdown VM and power on again. .[D stated processes can't be killed unless system is rebooted]■ Root Cause Analysis :
- IOwait was mainly taking place as there were high number of D stated processes.
No comments:
Post a Comment