LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 05-02-2012, 10:21 AM   #1
rovrider
LQ Newbie
 
Registered: May 2012
Posts: 3

Rep: Reputation: Disabled
Tools for investigating server crash (% used CPU (by user) suddenly peaks to 100%)


Hi all,

Once in a while I have a Linux server which suddenly doesn't respond anymore. Server will give response on ICMP request, but I can't log in via ssh or even the console. In vCenter I can see a CPU alert for this server, but no further info. The only thing that is left to do is to reboot the server.

In /var/log/messages and dmesg I can't find any clues for what process(es) did cause the situation, the sar statistics shows only that the % used CPU (by user)suddenly peaks.

Question: With sar i can see that cpu utilization rises, but is there a tools that i can use to see WHAT processes are causing the rise or other logs to look at? I can create a CRON job to periodically dump the process info, but i was wondering if a complete tool does exists for this purpose?

Background:
Redhat-release: Red Hat Enterprise Linux Server release 5.6 (Tikanga)
Mem: 8 GB
(v)CPU: 2
2.6.18-238.1.1.el5
x86_64

Regards,
Robbert
 
Old 05-02-2012, 12:02 PM   #2
unSpawn
Moderator
 
Registered: May 2001
Posts: 29,415
Blog Entries: 55

Rep: Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600
Don't VMware logs show anything? Shouldn't you make the client save logs using a remote syslog server? Atop allows you to save process and memory state at intervals (choose wisely as the default of 5 minutes may be too long) and replay and step through samples. Dstat and collectl (and mentioning collectl infallibly summons its developer for further comments ;-p) can save state too but AFAIK only with Atop you will be able able to see which process, for how long and any command line args.
 
1 members found this post helpful.
Old 05-03-2012, 03:03 AM   #3
rovrider
LQ Newbie
 
Registered: May 2012
Posts: 3

Original Poster
Rep: Reputation: Disabled
Thanks unSpawn, just now I installed the atop-1.23-1 rpm on a testserver. Think this will provide me useful info in the future.
We also use a central syslog server, but the issue didn't leave any clues there. I tried to download the vmware.log of the specific server from the datastore, but this seems to fail because the file is still open. Anyway thanks for the help!
Regards, Robbert
 
Old 05-03-2012, 04:14 AM   #4
unSpawn
Moderator
 
Registered: May 2001
Posts: 29,415
Blog Entries: 55

Rep: Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600
Quote:
Originally Posted by rovrider View Post
I tried to download the vmware.log of the specific server from the datastore, but this seems to fail because the file is still open.
Maybe cp or cat the file to a separate one first and then download it?
 
Old 05-04-2012, 09:51 AM   #5
markseger
Member
 
Registered: Jul 2003
Posts: 244

Rep: Reputation: 26
unspawn - now that you summoned me I guess I have to respond

to use collectl to find which processes were running at the time of the crash is trivial, assuming collectl has been running at the time. all you need to do is:

collectl -p /var/log/logfilename --top

and you'll see the top 10 processes sorted by cpu load. you can select a timeframe using --from and even change the sort criteria or the number of top processes to display.

just keep in mind that by collectl only looks at process data every minute to keep the overhead down. if this is not granular enough you can always change it.

-mark
 
1 members found this post helpful.
Old 05-07-2012, 02:32 PM   #6
rovrider
LQ Newbie
 
Registered: May 2012
Posts: 3

Original Poster
Rep: Reputation: Disabled
Hi Mark(Seger), thanks for your reaction. I took a look at collectl and i think this will also help me. For now i'm staying with atop, waiting for a next crash :-)

Regards, Robbert
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
How come the server using over 100% CPU ust Linux - Newbie 4 12-08-2011 06:11 AM
[SOLVED] MySQL server 100% CPU usage in a specific time okcomputer44 Linux - Server 3 01-18-2011 05:41 PM
100% cpu usage by apache user yilmaz Linux - Newbie 5 11-12-2009 04:33 PM
X server consuming 100% CPU on 12.1 gmelendez Slackware 2 08-19-2008 04:22 PM
XMMS crash, 100% cpu, even root cant kill it? gundelgauk Linux - Software 12 11-01-2003 03:59 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 02:49 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration