LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Slackware (http://www.linuxquestions.org/questions/slackware-14/)
-   -   Memory leak? (http://www.linuxquestions.org/questions/slackware-14/memory-leak-524734/)

habl 02-01-2007 01:29 PM

Memory leak?
 
Hiya,

I have a problem with my Linux server (Slackware 10.2.0, kernel 2.4.31). I think a memory leak. When I ran the program free I get this result:

Code:

root@home:/var/log# free
            total      used      free    shared    buffers    cached
Mem:        904496    836748      67748          0    235344    333820
-/+ buffers/cache:    267584    636912
Swap:      2000084      10404    1989680

So it's using around 90% of my memory (most days it's 98% to 99%). Yesterday a lot of processes were killed by the kernel, i guess beceause of a full memory. I checked some logfiles and found this:

syslog:
Code:

Jan 31 18:03:33 hablserv sshd[3040]: error: Could not get shadow information for NOUSER
Jan 31 18:03:35 hablserv sshd[3042]: error: Could not get shadow information for NOUSER
Jan 31 18:03:47 hablserv kernel: VM: killing process sshd
Jan 31 18:03:50 hablserv kernel: VM: killing process proftpd
Jan 31 18:04:00 hablserv kernel: VM: killing process eggdrop
Jan 31 18:04:12 hablserv kernel: VM: killing process mysqld
Jan 31 18:04:48 hablserv kernel: VM: killing process httpd
Jan 31 18:05:39 hablserv kernel: VM: killing process smbd
Jan 31 18:06:10 hablserv kernel: VM: killing process local
Jan 31 18:06:20 hablserv kernel: VM: killing process irssi
Jan 31 18:07:01 hablserv kernel: VM: killing process eggdrop
Jan 31 18:15:09 hablserv kernel: VM: killing process sshd
Jan 31 18:15:39 hablserv kernel: VM: killing process local
Jan 31 18:16:43 hablserv kernel: VM: killing process nmbd
Jan 31 18:17:27 hablserv kernel: VM: killing process eggdrop
Jan 31 18:17:27 hablserv kernel: VM: killing process httpd
Jan 31 18:17:27 hablserv kernel: VM: killing process psybnc
Jan 31 18:17:49 hablserv kernel: VM: killing process httpd
Jan 31 18:18:11 hablserv kernel: VM: killing process mysqld
Jan 31 18:18:16 hablserv kernel: VM: killing process ircd
Jan 31 18:18:31 hablserv kernel: VM: killing process local
Jan 31 18:18:48 hablserv kernel: VM: killing process pipe
Jan 31 18:18:48 hablserv kernel: VM: killing process stats
Jan 31 18:19:05 hablserv kernel: VM: killing process flush
Jan 31 18:19:07 hablserv kernel: VM: killing process eggdrop
Jan 31 18:19:10 hablserv kernel: VM: killing process trivial-rewrite
Jan 31 18:19:13 hablserv kernel: VM: killing process bot.pl
Jan 31 18:19:22 hablserv kernel: VM: killing process master
Jan 31 18:19:26 hablserv kernel: VM: killing process eggdrop
Jan 31 18:19:44 hablserv kernel: VM: killing process eggdrop

And here messages:
Code:

Jan 31 18:03:35 hablserv sshd[3042]: Invalid user windows from 65.160.227.136
Jan 31 18:03:36 hablserv sshd[3042]: Failed password for invalid user windows from 65.160.227.136 port 42661 ssh2
Jan 31 18:03:46 hablserv kernel: __alloc_pages: 0-order allocation failed (gfp=0x1d2/0)
Jan 31 18:04:35 hablserv last message repeated 8 times
Jan 31 18:04:35 hablserv kernel: __alloc_pages: 0-order allocation failed (gfp=0xf0/0)
Jan 31 18:04:37 hablserv kernel: __alloc_pages: 0-order allocation failed (gfp=0xf0/0)
Jan 31 18:04:47 hablserv kernel: __alloc_pages: 0-order allocation failed (gfp=0x1d2/0)
Jan 31 18:05:35 hablserv kernel: __alloc_pages: 0-order allocation failed (gfp=0xf0/0)
Jan 31 18:05:39 hablserv kernel: __alloc_pages: 0-order allocation failed (gfp=0x1d2/0)
Jan 31 18:06:13 hablserv last message repeated 2 times
Jan 31 18:15:08 hablserv last message repeated 8 times
Jan 31 18:16:28 hablserv last message repeated 2 times
Jan 31 18:17:09 hablserv last message repeated 5 times
Jan 31 18:17:03 hablserv sshd[3064]: Did not receive identification string from 217.120.219.224
Jan 31 18:17:11 hablserv kernel: __alloc_pages: 0-order allocation failed (gfp=0x1d2/0)
Jan 31 18:17:37 hablserv last message repeated 7 times
Jan 31 18:17:39 hablserv kernel: __alloc_pages: 0-order allocation failed (gfp=0xf0/0)
Jan 31 18:17:39 hablserv kernel: __alloc_pages: 0-order allocation failed (gfp=0x1d2/0)
Jan 31 18:18:11 hablserv last message repeated 4 times
Jan 31 18:19:13 hablserv last message repeated 18 times
Jan 31 18:19:19 hablserv kernel: __alloc_pages: 0-order allocation failed (gfp=0xf0/0)
Jan 31 18:19:20 hablserv kernel: __alloc_pages: 0-order allocation failed (gfp=0x1d2/0)
Jan 31 18:19:22 hablserv last message repeated 2 times
Jan 31 18:19:25 hablserv kernel: __alloc_pages: 0-order allocation failed (gfp=0x1f0/0)
Jan 31 18:19:26 hablserv kernel: __alloc_pages: 0-order allocation failed (gfp=0x1d2/0)
Jan 31 18:19:44 hablserv last message repeated 2 times
Jan 31 18:19:44 hablserv kernel: __alloc_pages: 0-order allocation failed (gfp=0xf0/0)
Jan 31 18:19:44 hablserv kernel: __alloc_pages: 0-order allocation failed (gfp=0x1d2/0)
Jan 31 18:19:44 hablserv kernel: __alloc_pages: 0-order allocation failed (gfp=0x1d2/0)

The last thing that happend was some kind of a kiddo who tried to login to my server, without any succes. I didn't pasted all lines, it are a lot. I'm not sure if this has something to do with it, because it happends a lot that people are hammering my sshd. It never causes a problem and they never got in.

I think it's just a process with a memory leak or something and offcourse I would like to know which one it is. Unfortunately I can't find it. After the processes were killed, I started them one by one and after every process checked my memory, but everything went normal. After about approximately 1 hour, my memory was filled again.

I try tools like ps, top, slabtop (output below) but I still haven't found the process wich causes this problem.

Does somebody has suggestions what else I could try?


Here are the outputs of slabtop and ps. I shrinked the outputs btw, if nessecary I can post the complete result.

slabtop output:
Code:

Active / Total Objects (% used)    : 909093 / 913799 (99.5%)
 Active / Total Slabs (% used)      : 54653 / 54748 (99.8%)
 Active / Total Caches (% used)    : 46 / 65 (70.8%)
 Active / Total Size (% used)      : 200557.89K / 201070.15K (99.7%)
 Minimum / Average / Maximum Object : 0.01K / 0.22K / 128.00K

  OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
319504 319448  99%    0.45K  39938        8    159752K inode_cache
325010 324962  99%    0.11K  9286      36    37144K dentry_cache
142320 142185  99%    0.09K  3558      42    14232K buffer_head
 88931  88832  99%    0.03K    787      128      3148K size-32
 19411  19391  99%    0.06K    329      64      1316K size-64

ps output:
Code:

root    20324  6928 /usr/sbin/smbd -D            0.3  2736
root    20321  6928 /usr/sbin/smbd -D            0.3  2760
dennis  20698  4160 ./eggdrop                    0.3  2836
dennis  20708  6172 ./stats                      0.3  3016
hans    20459  6892 irssi                        0.3  3528
dennis  20702  5220 ./eggdrop                    0.4  3908
dennis  20689  5316 ./eggdrop                    0.4  3948
root    20106  79624 /usr/sbin/httpd              0.5  5376
nobody  20357  79784 /usr/sbin/httpd              0.6  5528
nobody  20107  79784 /usr/sbin/httpd              0.6  5540
nobody  20109  79784 /usr/sbin/httpd              0.6  5540
nobody  20110  79784 /usr/sbin/httpd              0.6  5568
hans    20131  8932 /usr/bin/perl -w? ./bot.pl  0.6  5812
nobody  20108  82344 /usr/sbin/httpd              0.9  8712
nobody  20111  82344 /usr/sbin/httpd              0.9  8716
nobody  20358  82344 /usr/sbin/httpd              0.9  8716
nobody  20359  82344 /usr/sbin/httpd              0.9  8720
mysql    20094  51656 /usr/libexec/mysqld --based  1.7 15948
mysql    20095  51656 /usr/libexec/mysqld --based  1.7 15948
mysql    20096  51656 /usr/libexec/mysqld --based  1.7 15948
mysql    20097  51656 /usr/libexec/mysqld --based  1.7 15948
mysql    20098  51656 /usr/libexec/mysqld --based  1.7 15948
mysql    20099  51656 /usr/libexec/mysqld --based  1.7 15948
mysql    20100  51656 /usr/libexec/mysqld --based  1.7 15948
mysql    20101  51656 /usr/libexec/mysqld --based  1.7 15948
mysql    20102  51656 /usr/libexec/mysqld --based  1.7 15948
mysql    20103  51656 /usr/libexec/mysqld --based  1.7 15948
mysql    20709  51656 /usr/libexec/mysqld --based  1.7 15948
mysql    20726  51656 /usr/libexec/mysqld --based  1.7 15948
mysql    20731  51656 /usr/libexec/mysqld --based  1.7 15948
mysql    20733  51656 /usr/libexec/mysqld --based  1.7 15948
mysql    20734  51656 /usr/libexec/mysqld --based  1.7 15948
mysql    20736  51656 /usr/libexec/mysqld --based  1.7 15948
mysql    20739  51656 /usr/libexec/mysqld --based  1.7 15948
mysql    20741  51656 /usr/libexec/mysqld --based  1.7 15948


H_TeXMeX_H 02-01-2007 03:47 PM

Well, it sounds like a memory leak ... now how would you see what was leaking ? That's somewhat more difficult.

If it fills up in 1 hour, I would call that memory torrent not a memory leak :)

Anyway, because, if it is a memory leak, it is huge, you could try only running a minimal amount of things at startup and wait 1 hr, if it's not filled, try enabling another process, and so on. Anyone got a better idea ?

habl 02-01-2007 04:10 PM

Tnx for your reply. It would be an option to start with minimal processes, but I hope there is an option that takes less time :p But I keep it in mind ;)

dive 02-01-2007 05:56 PM

I normally use top and press <shift>M to arrange by memory use.

H_TeXMeX_H 02-01-2007 06:02 PM

Quote:

Originally Posted by dive
I normally use top and press <shift>M to arrange by memory use.

cool, I didn't know that, probably cuz I don't use top a lot (if ever).

habl 02-02-2007 05:02 AM

Quote:

Originally Posted by H_TeXMeX_H
cool, I didn't know that, probably cuz I don't use top a lot (if ever).

But I did know.. Only I didn't use MEM, but RES, because I want to see how much it is in KB, not in percentages :p

habl 02-02-2007 05:46 AM

Here you can see the current status of the memory:

http://home.habl.nl/sysinfo/

Gethyn 02-02-2007 05:48 AM

Incidentally, if you're getting a lot of brute force ssh attempts, you could install DenyHosts.

syg00 02-02-2007 06:08 AM

I'd be wanting a *lot* more evidence to suggest a memory problem.
Try doing a fsck on that reiser partition.

tronayne 02-02-2007 06:11 AM

Looking at your processes (and taking into account previous comments) I'm kind of wondering how and why you have so many instances of mysqld? Combined with the ssh attacks you're getting I wonder if you have the anonymous MySQL user disabled. Have you taken a hard look at your /etc/my.cnf file and made sure that your configuration is reasonable?

Also, have you taken a look at your shared memory, semaphores and message queues (as root with ipcs)? If you've got a process that keeps asking for more resources...

I heartily second installing DenyHosts as suggested by gethyn -- that thing really keeps the kiddies out of your pants.

And, just because it's just too, too obvious, what is bot.pl that was killed by the kernel?

habl 02-02-2007 09:23 AM

Quote:

Originally Posted by tronayne
Looking at your processes (and taking into account previous comments) I'm kind of wondering how and why you have so many instances of mysqld? Combined with the ssh attacks you're getting I wonder if you have the anonymous MySQL user disabled. Have you taken a hard look at your /etc/my.cnf file and made sure that your configuration is reasonable?

I don't know why it are that many, it always has been that many, i tought it was normal :p Anyway, I actually didn't know it was possible to login anonymous, but indeed, it was possible. I disabled it now. Nothing has changed for so far.

Quote:

Also, have you taken a look at your shared memory, semaphores and message queues (as root with ipcs)? If you've got a process that keeps asking for more resources...
I did now, but I don't understand the meaning of it :S

Code:

------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch    status
0x00000000 1048576    root      600        33554432  11        dest
0x00000000 1081345    root      600        33554432  11        dest
0x00000000 1114114    root      600        46084      11        dest

------ Semaphore Arrays --------
key        semid      owner      perms      nsems

------ Message Queues --------
key        msqid      owner      perms      used-bytes  messages

Quote:

I heartily second installing DenyHosts as suggested by gethyn -- that thing really keeps the kiddies out of your pants.
Yeah indeed, I need something to do about it. I have set up a very strict sshd config, but it ain't enough. So I will install it, thank you and gethyn for the advice :)

Quote:

And, just because it's just too, too obvious, what is bot.pl that was killed by the kernel?
bot.pl is an IRC bot I wrote myself. This is one of the processes that uses mysql a lot. I have closed it for awhile to check if it causes the problem, but I don't think it is.

tronayne 02-02-2007 03:44 PM

Well, your IPCs don't look too bad. Have a look at http://en.wikipedia.org/wiki/Interprocess_communication for an overview of what they are and what they're for.

Might be worth it to take a look at this Wikipedia article http://en.wikipedia.org/wiki/Memory_leak for some hints about what to look for. It's actually pretty unusual that Slackware craps out as you describe, I think, and you've more than likely got a runaway process that is not a "distribution" problem but something added on. If you haven't, I'd go get all the patches and make sure to "upgradepkg" all of them -- I have a 10.2 box that's been running for about a year without reboot and have not seen any of the sort of problem you're experiencing (it's a Bugzilla server for a couple of hundred users and that means lots of MySQL use).

Wish I could be of more help.


All times are GMT -5. The time now is 07:36 PM.