Linux - Hardware This forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux? |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
|
05-14-2006, 02:59 PM
|
#1
|
Member
Registered: Feb 2002
Posts: 322
Rep:
|
Server hangs, nothing adds up
I have a SUSE Linux server, usually rock-solid, that suddenly started locking up every day, and I'm having a hard time determining what the problem is.
The first thing that happens is that services are no longer available... network communication "times out". So, I go to the server console to find that the display is black and won't respond to input. I can't even switch to another Bash console. However, the machine still responds to ping packets (with good ping response times).
The Caps-Lock and Scroll Lock appear to work (LEDs on the keyboard toggle state when the keys are pressed).
So, I have to hard-power the machine off, and then power it back on. Everything starts appropriately. But within 24 hours or so, this happens again.
So I got checking my logs, particularly /var/log/messages. I noticed that syslog is logging throughout the entire "down" period, where the server is unresponsive.
I suppose it could be from something I've done or changed, but I can't think of anything that would cause this. I don't "tinker" with my servers... I typically set them up (install) and then leave them alone.
Here is a typical exerpt from the logs (/var/log/messages):
May 14 12:26:55 fs2 kernel: lowmem_reserve[]: 0 0 0
May 14 12:26:55 fs2 kernel: Node 0 DMA: 0*4kB 1*8kB 1*16kB 1*32kB 0*64kB 1*128kB 1*256kB 1*512kB
May 14 12:26:55 fs2 kernel: Node 0 Normal: 9*4kB 24*8kB 12*16kB 2*32kB 0*64kB 1*128kB 1*256kB 0*5
May 14 12:26:55 fs2 kernel: Node 0 HighMem: empty
May 14 12:26:55 fs2 kernel: Swap cache: add 1109627, delete 1109624, find 243647/321774, race 1+2
May 14 12:26:55 fs2 kernel: Free swap: 0kB
May 14 12:26:55 fs2 kernel: 259724 pages of RAM
May 14 12:26:55 fs2 kernel: 6632 reserved pages
May 14 12:26:55 fs2 kernel: 14525 pages shared
May 14 12:26:55 fs2 kernel: 3 pages swap cached
May 14 12:26:55 fs2 kernel: Out of Memory: Killed process 12545 (httpd2-prefork).
May 14 12:26:55 fs2 kernel: iptables in DROP IN=eth1 OUT= MAC=00:0a:5e:3d:13:f9:00:12:80:32:a1:8
May 14 12:26:55 fs2 kernel: Mem-info:
May 14 12:26:55 fs2 kernel: Node 0 DMA per-cpu:
May 14 12:26:55 fs2 kernel: cpu 0 hot: low 2, high 6, batch 1
May 14 12:26:55 fs2 kernel: cpu 0 cold: low 0, high 2, batch 1
May 14 12:26:55 fs2 kernel: cpu 1 hot: low 2, high 6, batch 1
May 14 12:26:55 fs2 kernel: cpu 1 cold: low 0, high 2, batch 1
May 14 12:26:55 fs2 kernel: Node 0 Normal per-cpu:
May 14 12:26:55 fs2 kernel: cpu 0 hot: low 62, high 186, batch 31
May 14 12:26:55 fs2 kernel: cpu 0 cold: low 0, high 62, batch 31
May 14 12:26:55 fs2 kernel: cpu 1 hot: low 62, high 186, batch 31
May 14 12:26:55 fs2 kernel: cpu 1 cold: low 0, high 62, batch 31
May 14 12:26:55 fs2 kernel: Node 0 HighMem per-cpu: empty
May 14 12:26:55 fs2 kernel:
May 14 12:26:55 fs2 kernel: Free pages: 6016kB (0kB HighMem)
This is a DELL PowerEdge server, with Intel EMT64 3.4 GHz processor and 1 GB of RAM. It has a SATA 150 RAID 1 configuration (2x160GB drives via 3Ware/AMCC Escalade RAID card). Linux kernel version is 2.6.5-7.252-smp. It serves web (apache2), ftp (proftpd), email (cyrus/postfix), file (samba), etc. No, I no longer have a service contract with Dell (we couldn't afford to renew it, besides the fact they don't support SUSE Linux on this PowerEdge server anyway).
I ran clamscan, and it reports the machine is clean. I have plenty of spare disk space, and the machine (while it is responsive) averages 30-60MB free RAM.
What could be doing this? Are there other logs I should be checking?
Any help would be most appreciated, as this is a production server!
|
|
|
05-14-2006, 05:53 PM
|
#2
|
LQ Guru
Registered: Jan 2002
Posts: 6,042
Rep:
|
Probably the power supply is going bad. Also use knoppix or any Linux LIVE distribution to scan your installation for rootkits. The 2.6.5 kernel version is very old and vulernable to network and other attacks. I suggest upgrading to at least 2.6.12 or higher. Search through NSA web site to find out any vulnerabilities of the services that you are running.
|
|
|
05-14-2006, 06:02 PM
|
#3
|
Member
Registered: Feb 2002
Posts: 322
Original Poster
Rep:
|
Why would you say the power supply is going bad?
|
|
|
05-15-2006, 12:47 AM
|
#4
|
LQ Guru
Registered: Jan 2002
Posts: 6,042
Rep:
|
If you have done a rootkit scan, update the kernel, ran memtest86, and you still have problems, the power supply is number one of all computer related problems. Intel 3.4 GHz processor uses a lot more electricity than AMD's top of the line processors. Power supplies gets worst over time. Unforturnately, Dell systems uses non-standard devices, so you may want to re-think to pay their service to replace the hardware.
|
|
|
05-15-2006, 01:48 AM
|
#5
|
Member
Registered: Feb 2002
Posts: 322
Original Poster
Rep:
|
After a great deal of digging, I have come to the conclusion that the machine is locking up when the swap fills to 100%. There is no available ram, no available swap... and the computer freezes.
Well, like I said earlier, some processes continue (ie. syslog), and the keyboard responds, but the computer won't allow you to login (prompt hangs after you've typed your username and hit enter), and none of the network services are available.
As long as swap doesn't run out, nothing goes bad. I have a 2GB swap partition.
Based on normal load (my baseline), the server rarely ever touches swap. I usually have 30 to 50 MB free space. The fact something is sucking up so much swap tells me (1) I have one or more processes that are bringing the server to its knees, or (2) I have a virus.
What would you suggest for rootkit scanning? All I have is clam antivirus, which takes forever.
Again, help is appreciated...
|
|
|
05-15-2006, 02:19 AM
|
#6
|
LQ Guru
Registered: Jan 2002
Posts: 6,042
Rep:
|
Reconfigure Apache so it does not open a lot of threads and minimize its memory usage. Also use sysctl to optimize memory usage. Setup a cron script it creates additional swap when needed.
I should have notice the error message "May 14 12:26:55 fs2 kernel: Out of Memory: Killed process 12545 (httpd2-prefork)."
|
|
|
05-15-2006, 07:47 AM
|
#7
|
LQ Newbie
Registered: Jul 2003
Posts: 28
Rep:
|
Seems like a memory leak somewhere....
I had a problem with a server a while ago similar to yours.
It turned out that it was saslauthd that had a major leak and it would eat up all the memory from the machine until it was OOM killed.
I would let the machine run for a while, then use top and sort by memory usage. That should pinpoint the culprit straight away.
|
|
|
05-15-2006, 08:32 AM
|
#8
|
Member
Registered: Sep 2004
Location: NJ
Distribution: Gentoo
Posts: 104
Rep:
|
RKHUNTER http://www.rootkit.nl/ is a good rootkit scanner if you want to look into the possibilty of an exploit.
|
|
|
05-15-2006, 10:47 AM
|
#9
|
Member
Registered: Feb 2002
Posts: 322
Original Poster
Rep:
|
First of all, thanks for the tips!
Second, I have done further research, and it is quite evident that clamav (aka clamscan) is the one hogging all the memory/swap. It is scanning archives, and I think that is what is killing it.
I think apache, because of its thread/memory usage, is the one getting killed as a result.
I've never used sysctl, so I don't know what it is all about.
I also don't know why clamav suddenly started doing this. I've had it scanning archives on a weekly basis for some time now (months). I'm unsure as to why it is giving me grief now.
|
|
|
05-15-2006, 11:11 AM
|
#10
|
Member
Registered: Sep 2004
Location: NJ
Distribution: Gentoo
Posts: 104
Rep:
|
maybe there is an update that fixes a bug
|
|
|
05-15-2006, 01:05 PM
|
#11
|
Member
Registered: Feb 2002
Posts: 322
Original Poster
Rep:
|
Stupid me... after doing even further research, I found that I had zip, rar and arj archive scanning enabled. I also had a 3 GB tar (backup of a home directory on a linux workstation) on the server, and I think it was trying to process that, chewed up swap and killed the machine.
I have turned off archive scanning for the time being, to see if the server locks up again. We will see if it hangs within the next 24 hours. If it hangs again, then I guess I have another problem...
Thanks thusfar for everyone's help... it is most appreciated!
|
|
|
05-15-2006, 05:32 PM
|
#12
|
LQ Newbie
Registered: Apr 2005
Location: Barbados
Distribution: Mandrake/Mandriva
Posts: 15
Rep:
|
Is there any way that you can exempt the compressed file or directory from the scan?
|
|
|
All times are GMT -5. The time now is 05:34 PM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|