LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Enterprise Linux Forums > Linux - Enterprise
User Name
Password
Linux - Enterprise This forum is for all items relating to using Linux in the Enterprise.

Notices


Reply
  Search this Thread
Old 04-27-2007, 08:11 PM   #1
pxsnet
LQ Newbie
 
Registered: Mar 2004
Posts: 16

Rep: Reputation: 0
Unhappy RHEL 3 system hang when invoke "ps"


hi,
my RH AS 3 (Update 8 with all patches) server hung when i was trying to invoke "ps".

the server runs more than 25000 processes under medium load.

kernel: 2.4.21-47.0.1.ELhugemem SMP
mem: 16GB
cpu: AMD Opteron(tm) Processor 844 x 4

any suggestions? thank you very much!

Last edited by pxsnet; 04-27-2007 at 08:46 PM.
 
Old 04-27-2007, 08:35 PM   #2
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,125

Rep: Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120
Quote:
Originally Posted by pxsnet
any suggestions?
Ummm - don't do it ???.
What does "hung up" mean ??? - CPU maxed out, disks hammered, swap thrashing, all/none of the above ...

What option(s) were you using on the command ?.
How about "top", does it do likewise ?.
What about "ps -r"
 
Old 04-27-2007, 08:50 PM   #3
pxsnet
LQ Newbie
 
Registered: Mar 2004
Posts: 16

Original Poster
Rep: Reputation: 0
Unhappy

Quote:
Originally Posted by syg00
Ummm - don't do it ???.
What does "hung up" mean ??? - CPU maxed out, disks hammered, swap thrashing, all/none of the above ...

What option(s) were you using on the command ?.
How about "top", does it do likewise ?.
What about "ps -r"
Hi, thanks for your reply.

I use "ps aux" and then the system did not respond. I can't find what happened because i cannot log in from neither ssh nor tty.

I've no idea if top would do likewise.

Thanks!
 
Old 04-27-2007, 09:52 PM   #4
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,125

Rep: Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120
Mmmmm - not good.
With RH, you should have sysstat I would imagine. With the sar (and sadc) componentyou can get a history of what happens on your system.
Perhaps have a look at that.
 
Old 04-29-2007, 03:53 AM   #5
pxsnet
LQ Newbie
 
Registered: Mar 2004
Posts: 16

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by syg00
Mmmmm - not good.
With RH, you should have sysstat I would imagine. With the sar (and sadc) componentyou can get a history of what happens on your system.
Perhaps have a look at that.
Hi, i just check sar's report but found nothing abnormal.
Is it something related to some kernel parameters or its limit/defect when inquiring too many processes under /proc?

btw: a quick question. what's the difference between /proc/pid and /proc/.pid ?

thanks!

Last edited by pxsnet; 04-29-2007 at 03:55 AM.
 
Old 04-30-2007, 09:31 PM   #6
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,125

Rep: Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120
Quote:
Originally Posted by pxsnet
Is it something related to some kernel parameters or its limit/defect when inquiring too many processes under /proc?
The latter I would imagine.
As for the other query, I have no idea - I just see /proc/<pid>
 
Old 05-02-2007, 01:34 PM   #7
rahulk
Member
 
Registered: Mar 2006
Posts: 110

Rep: Reputation: 16
Can you paste me the output of the following command on your server?

/usr/bin/strace ps -ef

This will give you an idea on the point where the ps command hangs up. The problem is due to some defunct threads running on the server. defunct threads are the one which reside in the system even after the parent processes are killed. I believe it could be due to some kernel module not working correctly.

What does this server is used for?
Your ps command hangs up since it tries to read some processes from /proc directory, which are zombies.

Rahul Khare.
 
Old 05-02-2007, 06:09 PM   #8
pxsnet
LQ Newbie
 
Registered: Mar 2004
Posts: 16

Original Poster
Rep: Reputation: 0
Wink

Quote:
Originally Posted by rahulk
Can you paste me the output of the following command on your server?

/usr/bin/strace ps -ef

This will give you an idea on the point where the ps command hangs up. The problem is due to some defunct threads running on the server. defunct threads are the one which reside in the system even after the parent processes are killed. I believe it could be due to some kernel module not working correctly.

What does this server is used for?
Your ps command hangs up since it tries to read some processes from /proc directory, which are zombies.

Rahul Khare.
Hi, Thanks for your reply. This is really a good idea.
Anyway, it's unpredictable when the box hangs. I'm so unlucky that it never hang when i'm there...
In fact, i've tried to trace syscall ps uses.

the server is used for a critical business use, which runs lots of processes (also lots of threads)

and... I'd like to ask again about /proc/.pid what's the difference with common /proc/pid?

thank you all!
 
Old 05-03-2007, 04:43 AM   #9
jharris
Senior Member
 
Registered: May 2001
Location: Bristol, UK
Distribution: Slackware, Fedora, RHES
Posts: 2,243

Rep: Reputation: 47
When you say the box hangs, does the OS actually crash and kernel panic or does it just lock/freeze? If you have another RH ox on the same LAN it would be worth setting up netdump to see if that will provide you with a useful memory and syslog dump of the failure. Have you setup netdump in the past? - it's *really* easy but fantastic for this kind of problem. Also, if you need to raise a call with RedHat then you can provide them all the details they need to sort the problem out for you, assuming you have support that is.

Other thoughts - has anything changed recently no matter how small or 'unrealted'? Updated applications, new network switches, etc etc?

As for /proc/pid vs /proc/.pid - I've only got /proc/pid on my 2.4 and 2.6 boxes which is odd.

HTH

Jamie
 
Old 05-03-2007, 06:04 AM   #10
pxsnet
LQ Newbie
 
Registered: Mar 2004
Posts: 16

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by jharris
When you say the box hangs, does the OS actually crash and kernel panic or does it just lock/freeze? If you have another RH ox on the same LAN it would be worth setting up netdump to see if that will provide you with a useful memory and syslog dump of the failure. Have you setup netdump in the past? - it's *really* easy but fantastic for this kind of problem. Also, if you need to raise a call with RedHat then you can provide them all the details they need to sort the problem out for you, assuming you have support that is.

Other thoughts - has anything changed recently no matter how small or 'unrealted'? Updated applications, new network switches, etc etc?

As for /proc/pid vs /proc/.pid - I've only got /proc/pid on my 2.4 and 2.6 boxes which is odd.

HTH

Jamie
Hi, i've set up netdump. Hopefully it could help!

Thanks anyway.
 
Old 05-10-2007, 01:23 AM   #11
pxsnet
LQ Newbie
 
Registered: Mar 2004
Posts: 16

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by pxsnet
Hi, i just check sar's report but found nothing abnormal.
Is it something related to some kernel parameters or its limit/defect when inquiring too many processes under /proc?

btw: a quick question. what's the difference between /proc/pid and /proc/.pid ?

thanks!
i think i've got some ideas about /proc/.pid

they are "threads", which belong to a specific process id
 
Old 06-02-2007, 08:33 PM   #12
pxsnet
LQ Newbie
 
Registered: Mar 2004
Posts: 16

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by pxsnet
hi,
my RH AS 3 (Update 8 with all patches) server hung when i was trying to invoke "ps".

the server runs more than 25000 processes under medium load.

kernel: 2.4.21-47.0.1.ELhugemem SMP
mem: 16GB
cpu: AMD Opteron(tm) Processor 844 x 4

any suggestions? thank you very much!

It seems that "ps" stopped when reading a particular process' process status file (/proc/pid/stat)

...
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
how can I invoke "awk" from shell to do floating point math? Joseph Schiller Programming 8 01-12-2006 05:00 AM
System hang during installation/using the "shift" key how do you enable 64bit ZoZa SUSE / openSUSE 3 12-17-2004 05:17 AM
cgi-bin: "attempt to invoke directory as script" hamish Linux - Software 0 12-09-2004 12:45 PM
Optical drives "disappear", browsing /mnt graphically causes system hang xSauronx Linux - Hardware 0 06-20-2004 10:55 AM
"X-MS" cant open because "x-Multimedia System" cant access files at "smb&qu ponchy5 Linux - Networking 0 03-29-2004 11:18 PM

LinuxQuestions.org > Forums > Enterprise Linux Forums > Linux - Enterprise

All times are GMT -5. The time now is 11:45 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration