Tools for investigating server crash (% used CPU (by user) suddenly peaks to 100%)
Linux - SoftwareThis forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Tools for investigating server crash (% used CPU (by user) suddenly peaks to 100%)
Hi all,
Once in a while I have a Linux server which suddenly doesn't respond anymore. Server will give response on ICMP request, but I can't log in via ssh or even the console. In vCenter I can see a CPU alert for this server, but no further info. The only thing that is left to do is to reboot the server.
In /var/log/messages and dmesg I can't find any clues for what process(es) did cause the situation, the sar statistics shows only that the % used CPU (by user)suddenly peaks.
Question: With sar i can see that cpu utilization rises, but is there a tools that i can use to see WHAT processes are causing the rise or other logs to look at? I can create a CRON job to periodically dump the process info, but i was wondering if a complete tool does exists for this purpose?
Background:
Redhat-release: Red Hat Enterprise Linux Server release 5.6 (Tikanga)
Mem: 8 GB
(v)CPU: 2
2.6.18-238.1.1.el5
x86_64
Don't VMware logs show anything? Shouldn't you make the client save logs using a remote syslog server? Atop allows you to save process and memory state at intervals (choose wisely as the default of 5 minutes may be too long) and replay and step through samples. Dstat and collectl (and mentioning collectl infallibly summons its developer for further comments ;-p) can save state too but AFAIK only with Atop you will be able able to see which process, for how long and any command line args.
Thanks unSpawn, just now I installed the atop-1.23-1 rpm on a testserver. Think this will provide me useful info in the future.
We also use a central syslog server, but the issue didn't leave any clues there. I tried to download the vmware.log of the specific server from the datastore, but this seems to fail because the file is still open. Anyway thanks for the help!
Regards, Robbert
unspawn - now that you summoned me I guess I have to respond
to use collectl to find which processes were running at the time of the crash is trivial, assuming collectl has been running at the time. all you need to do is:
collectl -p /var/log/logfilename --top
and you'll see the top 10 processes sorted by cpu load. you can select a timeframe using --from and even change the sort criteria or the number of top processes to display.
just keep in mind that by collectl only looks at process data every minute to keep the overhead down. if this is not granular enough you can always change it.
Hi Mark(Seger), thanks for your reaction. I took a look at collectl and i think this will also help me. For now i'm staying with atop, waiting for a next crash :-)
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.