LinuxQuestions.org
Latest LQ Deal: Complete CCNA, CCNP & Red Hat Certification Training Bundle
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 03-29-2011, 07:16 AM   #1
ethanole
LQ Newbie
 
Registered: Mar 2011
Distribution: Gentoo, Fedora, Ubuntu, Debian, Redhat, Mandrake
Posts: 8

Rep: Reputation: Disabled
Something eats up my whole RAM causing kernel panic!


Hello!

I'm a newbie here, so please apologize for my misunderstandings.

This issue is NOT related to "something eats my RAM, how can I disable system caching" or "ooh man, how can I lower my RAM usage in Linux" threads discussed often here...

I've posted this question to the Newbie forum because it is a general linux question, and it is for linux-newbies...


So the current status:
We have an (almost) newly built Gentoo file server, with kernel version 2.6.34-gentoo-r2, some RAID10 mdadm arrays, using samba 3.5.4, postfix 2.6.7, postgresql-server 8.4.4-r1, apache(2) 2.2.15 with php 5.2.13 and mysql 5.1, and finally openldap 2.4.21 with phpldapadmin 1.2.0.4-r1.

It was working for about half a year without any severe issues (only filesystem checks were needed sometimes - as usual). Now it began throwing kernel panics twice in a month. We have checked the logs and found that something eats the RAM, and then the server begins to swap. When it starts swapping we can not control the behaviour because it won't respond to any ssh or console requests (so we can't log in). In the logs it cries always for memory, but it has 8G RAM, with 2G swap space (ok, ok, I know what he fist rule is, but if a fileserver begins to swap 8G of active processes to the disks it IS a severe issue - this is not the question)...
During the swapping the box was still able to do an emergency sync (that was one of the first things what we tried). Of course after some time the swap is filled as well and there goes a kernel panic... This swap process happens pretty fast, it takes about 10-20 minutes before the kernel panic.

We have tested the whole hardware many times (tested every RAM modules, every CPU, every disk, the PSU, each in separate computers), all tests showed good result, but these kernel panic actions remain twice a month, so I suspect it is software related.
(we have tested the hardware in its place, too)

Both Postgres and Mysql have only small tables, what are consuming about 500M total space. Samba has about 100 users, main LDAP is on other server, there is only a cache here, apache is for our Intranet, what has about 100 users and about 100M big site...


Now my question is:

Which tools are capable for finding the malicious process? We were using top, htop, sar, iostat, dstat, and lsof to find any "bad" processes, but we haven't found any (or missing something)

How can we determine the "wrong" process? I have a suspect, but I don't want to tell it now



eth

ps: it should be a 24/7 server.
 
Old 03-29-2011, 08:50 AM   #2
bigrigdriver
LQ Addict
 
Registered: Jul 2002
Location: East Centra Illinois, USA
Distribution: Debian stable
Posts: 5,889

Rep: Reputation: 351Reputation: 351Reputation: 351Reputation: 351
When you say the kernel panics occur about twice a month, does that mean the panics are cyclic in nature? Do they occur at approximately two week intervals?

If so, I suggest the possibility that temp files are not being deleted and slowly consuming RAM, or, you are using a tmpfs without size limits, which will grow to consume RAM.
 
Old 03-29-2011, 09:14 AM   #3
ethanole
LQ Newbie
 
Registered: Mar 2011
Distribution: Gentoo, Fedora, Ubuntu, Debian, Redhat, Mandrake
Posts: 8

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by bigrigdriver View Post
When you say the kernel panics occur about twice a month, does that mean the panics are cyclic in nature? Do they occur at approximately two week intervals?
No, it's inconsequent It happens like a surprise.

eth
 
Old 03-29-2011, 10:00 AM   #4
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 15,725

Rep: Reputation: 2121Reputation: 2121Reputation: 2121Reputation: 2121Reputation: 2121Reputation: 2121Reputation: 2121Reputation: 2121Reputation: 2121Reputation: 2121Reputation: 2121
Did you do as suggested, and check for tmpfs ?. By default a tmpfs will have a maximum allocation of 50% of your RAM. If memory gets short, tmpfs is designed to swap. Might well be a suspect - and will be gone when you reboot, so no evidence.

Otherwise you should be able to track process level numbers with pidstat depending on the version of sysstat you have installed.
It's also possible that's it's not a userspace problem at all - keep an eye on slabinfo
 
Old 04-01-2011, 03:57 AM   #5
ethanole
LQ Newbie
 
Registered: Mar 2011
Distribution: Gentoo, Fedora, Ubuntu, Debian, Redhat, Mandrake
Posts: 8

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by syg00 View Post
Did you do as suggested, and check for tmpfs ?. By default a tmpfs will have a maximum allocation of 50% of your RAM. If memory gets short, tmpfs is designed to swap. Might well be a suspect - and will be gone when you reboot, so no evidence.
Partly. You were right, it has a 50%of RAM capacity but it's using 0bytes (here is the corresponding df -h output):
Code:
shm                   4,0G     0  4,0G   0% /dev/shm
If I would know how to get the system do what it does before the kernel panic... (of course then we will know what is causing this misbehaviour)

Thankyou for the tools suggestion, now I have to find out how to use them just before the server begins swapping... (I think I'll write a small script what can be started in case of emergency...)


hm.. I see, heheh, shm is 4G, the swap space is only 2G... so shm has to be limited to max 2G (even better if it is limited to 1,5G). Anyway another general question: Which processes are using shm? (lsof | grep shm shows nothing right at the moment - I think they can be any process and I can only determine them, if they are really using shm...)
eth

Last edited by ethanole; 04-01-2011 at 04:11 AM.
 
Old 04-01-2011, 04:10 AM   #6
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 15,725

Rep: Reputation: 2121Reputation: 2121Reputation: 2121Reputation: 2121Reputation: 2121Reputation: 2121Reputation: 2121Reputation: 2121Reputation: 2121Reputation: 2121Reputation: 2121
Trying to pick "just before" is like trying to pick the peak of the stock market - a mugs game. Go get collectl and run it in daemon mode - it'll save all the process level data you'll ever need.
Else write a small script such as the following
Code:
while true ; do date >> saveit.txt ; ps aux --sort -rss  | head >> saveit.txt ; sleep 300 ; done
A quick bit of awk/perl should on that file should get you your suspect(s) - if userspace.
 
Old 04-01-2011, 04:36 AM   #7
ethanole
LQ Newbie
 
Registered: Mar 2011
Distribution: Gentoo, Fedora, Ubuntu, Debian, Redhat, Mandrake
Posts: 8

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by syg00 View Post
Trying to pick "just before" is like trying to pick the peak of the stock market - a mugs game. Go get collectl and run it in daemon mode - it'll save all the process level data you'll ever need.
Else write a small script such as the following
Code:
while true ; do date >> saveit.txt ; ps aux --sort -rss  | head >> saveit.txt ; sleep 300 ; done
A quick bit of awk/perl should on that file should get you your suspect(s) - if userspace.
thx, I'll try it, dstat can do the same, but it is not good enough while it has minimum 1 sec update interval

eth
 
Old 04-26-2011, 04:00 AM   #8
ethanole
LQ Newbie
 
Registered: Mar 2011
Distribution: Gentoo, Fedora, Ubuntu, Debian, Redhat, Mandrake
Posts: 8

Original Poster
Rep: Reputation: Disabled
Weird... it did not happen since I've started this thread. We're still watching every moment of that server...

eth
 
Old 09-19-2011, 09:40 AM   #9
ethanole
LQ Newbie
 
Registered: Mar 2011
Distribution: Gentoo, Fedora, Ubuntu, Debian, Redhat, Mandrake
Posts: 8

Original Poster
Rep: Reputation: Disabled
it's gone. The lockups were caused by a misconfigured filesystem.

Thanks for all.

---------- Post added 2011-09-19 at 16:41 ----------

just for rating
 
Old 09-19-2011, 07:48 PM   #10
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.9, Centos 7.3
Posts: 17,374

Rep: Reputation: 2383Reputation: 2383Reputation: 2383Reputation: 2383Reputation: 2383Reputation: 2383Reputation: 2383Reputation: 2383Reputation: 2383Reputation: 2383Reputation: 2383
Please provide more detail on soln for others to learn from. We may have the same problem one day ....
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
RamDisk causing kernel panic chuckp123 Linux - General 1 02-16-2010 05:45 AM
Hard drives causing kernel panic in slack 11 tehkaao Linux - Hardware 1 09-19-2007 07:35 AM
Kernel 2868 causing a kernel panic Jongi Fedora 6 12-27-2006 01:09 PM
Kernel Upgrade causing Kernel Panic DragonM15 Linux - Software 28 12-20-2006 10:41 AM
switching to mkinitcpio causing kernel panic due to raid' Fredde87 Linux - Kernel 2 12-05-2006 03:29 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 11:08 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration