Linux - Security This forum is for all security related questions.
Questions, tips, system compromises, firewalls, etc. are all included here. |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
|
07-04-2006, 05:07 AM
|
#1
|
Member
Registered: Jan 2006
Posts: 70
Rep:
|
Server hang up and need to fin root cause
Hi Guys,
One of our RHEL 4 server hanged last week. My boss is asking me a possible root cause of the hang up. I tried checking /var/log/messages but could not find anything there. There is a gap between the time where the server was restarted and the time that it was still ok. Where could I get more information to help me know the reason? A snippet of the logs is below for reference of what I am trying to explain. TIA.
Jul 1 09:08:01 somemachine crond(pam_unix)[28797]: session closed for user someuser
Jul 1 09:08:01 testmachine crond(pam_unix)[28800]: session closed for user someuser
Jul 1 09:45:44 somemachine syslogd 1.4.1: restart.
|
|
|
07-04-2006, 06:34 AM
|
#2
|
Moderator
Registered: May 2001
Posts: 29,415
|
Where is this box located on the network?
Are there any devices in front of it that log info?
Was /etc/syslog.conf changed or does it contain default values?
Are all logins "last" reports accounted for?
Does this box run any SAR?
What services does this box provide and who has access to them?
Have any daemons logged data in the period?
Have these hangs or log blackouts been happening before or not?
Is SW regularly updated?
Does "rpm -Va --noscripts" look OK?
Do you people keep a log of (admin) change reports for the box?
Any other things out of the ordinary you should mention?
|
|
|
07-04-2006, 06:58 AM
|
#3
|
Member
Registered: Jan 2006
Posts: 70
Original Poster
Rep:
|
Where is this box located on the network?
yes, this box is located on a network. it has a public ip and a private ip.
Are there any devices in front of it that log info?
none.
Was /etc/syslog.conf changed or does it contain default values?
it contains the default value.
Are all logins "last" reports accounted for?
yes, all logins are accounted for. there is also an entry where it shows that a user was connected from a certain time to a point where it crashed. e.g.
someuser pts/0 someip Sat Jul 1 09:46 - 10:11 (00:24)
reboot system boot 2.6.9-22.0.1.ELs Sat Jul 1 09:45 (1+21:32)
someuser pts/2 someip Sat Jul 1 06:17 - crash (03:28)
Does this box run any SAR?
no
What services does this box provide and who has access to them?
it only houses db processes,the dba has the access to this process.
Have any daemons logged data in the period?
none also.
Have these hangs or log blackouts been happening before or not?
it happened 3x already. all have the same scenario where logs are not present.
Is SW regularly updated?
what does SW stand for?
Does "rpm -Va --noscripts" look OK?
how do I know it the result is OK?
Do you people keep a log of (admin) change reports for the box?
no. we do not keep them.
Any other things out of the ordinary you should mention?
none so far.
|
|
|
07-04-2006, 07:35 AM
|
#4
|
Moderator
Registered: May 2001
Posts: 29,415
|
Where is this box located on the network?
yes, this box is located on a network. it has a public ip and a private ip.
Are any services accessable on the public interface? Is the box firewalled on all interfaces?
Was /etc/syslog.conf changed or does it contain default values?
it contains the default value.
* If nothing else will then it may prove valuable to have processes log (more) verbose and catch that with a "*.*" entry in syslog.conf. Downside is that you will need a lot of free diskspace and maybe schedule extra logrotates for those logs to combat extreme loggrowth.
Are all logins "last" reports accounted for?
yes, all logins are accounted for. there is also an entry where it shows that a user was connected from a certain time to a point where it crashed. e.g.
someuser pts/0 someip Sat Jul 1 09:46 - 10:11 (00:24)
reboot system boot 2.6.9-22.0.1.ELs Sat Jul 1 09:45 (1+21:32)
someuser pts/2 someip Sat Jul 1 06:17 - crash (03:28)
Is "someuser" human? It's fine for this user to be there on a saturday at 06 AM? And "someip" is allowed to access the box? And is "someuser" on pts/0 the same as the earlier entry? * You can have every command issued by a user logged if you wrap their default shell with "rootsh".
Does this box run any SAR?
no
* Then maybe you should. Ideally you should first save default values right after boot to diff against. Look for "Atsar" (maybe DAG has an EL4 .rpm) or "Dstat" (needs Python, its CSV output is easy to chart in OOo) or maybe remote through any SNMP monitoring SW like say Nagios (OK, you need snmpd for that).
What services does this box provide and who has access to them?
it only houses db processes,the dba has the access to this process.
I imagine the db is being used by applications (on adjacent boxen)? Maybe give more details what db you're running, what it's used by, if it's a recent SW (software) version, any problems encountered in the past with any of it etc, etc.
Have any daemons logged data in the period?
none also.
Can they log verbose?
Have these hangs or log blackouts been happening before or not?
it happened 3x already. all have the same scenario where logs are not present.
Could be anything from deliberate resets to memory leaks to overheating.
At this point there's not enough info to even try to speculate.
The more you log the more chance you have narrowing it down.
Is SW regularly updated?
what does SW stand for?
Software. HW is hardware and "wetware" are "lusers" or human users. Some speak of "meatware" because them admins tend to grind them for lunch but I think that's pushing it too far. Dinner is OK I think ;-p
Does "rpm -Va --noscripts" look OK?
how do I know it the result is OK?
You know because you've opened up "man rpm" and looked for what it reports (S, M, 5, U, G, etc, etc) under "VERIFY OPTIONS"?
Do you people keep a log of (admin) change reports for the box?
no. we do not keep them.
Running servers in a professional environment is all about stability. Anything that "threatens" stability should be investigated, mended and logged. This provides a history of stuff encountered and fixed and is also efficient for sharing information.
|
|
|
07-04-2006, 08:57 PM
|
#5
|
Member
Registered: Jan 2006
Posts: 70
Original Poster
Rep:
|
@unSpawn,
it is actually an oracle account. the night shift guys are the ones logged on during that time. they have some scripts that they need to run to check availability of server.
|
|
|
07-04-2006, 10:32 PM
|
#6
|
Senior Member
Registered: Mar 2003
Distribution: Fedora
Posts: 3,658
Rep:
|
Moved: This thread is more suitable in the Linux Security forum and has been moved accordingly to help your thread/question get the exposure it deserves.
|
|
|
07-07-2006, 01:39 AM
|
#7
|
Member
Registered: Jan 2006
Posts: 70
Original Poster
Rep:
|
I got logs from our nagios server that it detected cpu and memory resouce shortage within the time the syslogd also stopped logging. I could assume that a process has used all the resources. How could I trap what process is doing this one? TIA.
|
|
|
07-09-2006, 05:13 PM
|
#8
|
Moderator
Registered: May 2001
Posts: 29,415
|
Check out Atop.
Last edited by unSpawn; 07-09-2006 at 05:17 PM.
|
|
|
07-09-2006, 11:03 PM
|
#9
|
Member
Registered: Jan 2006
Posts: 70
Original Poster
Rep:
|
hi unSpawn,
I have already installed atop on our system, I would just like to ask how to configure the /etc/atop/atop.24hours script? Should I put in on the crontab or should I just modify /etc/logrotate.d/pacct "postrotate" parameter. I read the man of atop ang got confused with the script files part. TIA
|
|
|
07-10-2006, 03:15 AM
|
#10
|
Moderator
Registered: May 2001
Posts: 29,415
|
If installed as RPM you get a crontab file /etc/cron.d/atop.
Change it or remove and use /etc/crontab.
|
|
|
07-10-2006, 05:15 AM
|
#11
|
Member
Registered: Jan 2006
Posts: 70
Original Poster
Rep:
|
ok..i already found the entry..thanks for your help...
|
|
|
All times are GMT -5. The time now is 02:11 AM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|