Linux - EnterpriseThis forum is for all items relating to using Linux in the Enterprise.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Problem definition:
We are facing server hang problem from past 3 months. We have analyzed all our services that we are executing, and the server logs in /var/log/ but couldn’t find the solution. We are manually rebooting the server to recover it from hung state.
Action taken:
We have analyzed all the system logs and application logs in all our servers but we haven’t found any fixed pattern of messages in system logs. We are taking memory dump by top command for every 15 minutes and we found sufficient memory left before server going into hang state.
System configuration:
Red Hat Enterprise Linux ES release 3 (Taroon)
Kernel: 2.4.21-40.EL
Postgres: 7.3.8-2
Redhat Cluster Manager: 1.2.28
RAM: 2GB
Server: HP ML 370 G3, DL 760 G2
Please let me know the scenario’s in which server gets into hung state and what we need to check for rectifying the server hang problem.
I think another good idea to do is setup a crash script. Make it run every 10 seconds or whatever you think is appropriate. Report all system status' i.e. df, top, netstat, connections, ps... etc. have the system send out the alerts via mail. This should help a bit more than just looking at the logs.
Install the sysstat package so that you'll collect data on performance.
Default on the installation collects memory usage, cpu usage, disk io, swap usage, and a number of other statistics every 10 minutes. You can change this down to a 1 minute interval if needed in /etc/cron.d/sysstat.
After the server crashes, you can run:
sar -r # gets memory information
sar # gets CPU information (like in top)
sar -q # load average and run que sizes
sar -n DEV # network interface statistics
sar -b # io rates
Those should give you a very good picture of what your server was doing when it hung, as well as any trend leading up to it.
Other than that, we've experienced a lot of the same problems with some of our machines. It turned out that the running kernel wasn't certified for the processors that we were running on, and updating the kernel fixed our issues. Take a look at the release notes for the newer kernels to see if they have added support for your server, or processors.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.