[SOLVED] my server run out of memory

vannathlab · 09-04-2013, 06:58 AM

Dear every body,

My server is almost run out of memory. here is the out put of free command:

# free -h
total used free shared buffers cached
Mem: 7.8G 7.6G 214M 0B 798M 6.4G
-/+ buffers/cache: 467M 7.4G
Swap: 11G 384K 11G

my server is running backuppc

and here the out put of top command sort my memory (M)

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
21117 backuppc 20 0 65184 13m 1868 S 0 0.2 0:01.15 BackupPC
2478 root 20 0 52244 9664 1224 S 0 0.1 0:11.81 munin-node
29301 root 0 -20 19472 7196 2712 S 0 0.1 0:04.12 atop
21124 backuppc 20 0 45596 7016 2276 S 0 0.1 0:00.06 BackupPC_trashC
8214 www-data 20 0 364m 4872 1896 S 0 0.1 0:00.03 apache2
8193 root 20 0 83528 4728 2432 S 0 0.1 0:00.21 apache2
8213 www-data 20 0 428m 4444 1704 S 0 0.1 0:00.03 apache2
9167 root 20 0 79716 3824 2984 S 0 0.0 0:00.00 sshd
20947 root 20 0 79724 3812 2968 S 0 0.0 0:00.02 sshd
8198 www-data 20 0 82756 2924 680 S 0 0.0 0:00.00 apache2

I can not find out any process that take up my server memory. can any one give me some idea about this.

Thanks,
Vannath

pan64 · 09-04-2013, 07:02 AM

have you checked this: http://www.linuxatemyram.com/ ?

acid_kewpie · 09-04-2013, 07:04 AM

coo, not seen that link for years. a decade ago we were posting it daily.

vannathlab · 09-04-2013, 07:12 AM

Quote:

Originally Posted by pan64

have you checked this: http://www.linuxatemyram.com/ ?

Thank you so much for this useful link. but why my other server is fine and i always get alarm memory critical so is there any way to prevent this?

Thanks,
Vannath

pan64 · 09-04-2013, 07:30 AM

actually it looks like your swap is not in use, therefore I think it is ok. Probably the config of the alarm is not ok, or not similar to the others...

vannathlab · 09-04-2013, 07:38 AM

Quote:

Originally Posted by pan64

actually it looks like your swap is not in use, therefore I think it is ok. Probably the config of the alarm is not ok, or not similar to the others...

Here my configuration to monitor the server memory. I check if the memory reach 90% of its total memory it will be critical.

command[check_memory]=/usr/local/nagios/libexec/check_memory.sh -w 80 -c 90

and the server memory is always 90% used.

Memory: Critical Total: 7997 MB - Used: 6986 MB - 92% used!

Thanks
Vannath

akiuni · 09-04-2013, 10:40 AM

Maybe you are facing a memory leak in kernel space. You won't see anything with "free" in that case. Can you check these files :
# cat /proc/meminfo
# cat /proc/slabinfo

also here a few usefull commands which can help you (for debian) :

user space usage :
# ps -e o rsz,vsz,pid,command --sort -rsz

processes memory space :
# pmap -x `ps ax | cut -d" " -f1`

displays peak memory usage :
# grep ^VmSize /proc/*/status | sort -n -k+2 | tail

vannathlab · 09-04-2013, 09:04 PM

Here the out put of commane "cat /proc/meminfo"

Quote:

MemTotal: 8189396 kB
MemFree: 221636 kB
Buffers: 579372 kB
Cached: 6917192 kB
SwapCached: 244 kB
Active: 2389616 kB
Inactive: 5158432 kB
Active(anon): 10036 kB
Inactive(anon): 47268 kB
Active(file): 2379580 kB
Inactive(file): 5111164 kB
Unevictable: 5440 kB
Mlocked: 5440 kB
SwapTotal: 11718652 kB
SwapFree: 11718268 kB
Dirty: 0 kB
Writeback: 0 kB
AnonPages: 56792 kB
Mapped: 10776 kB
Shmem: 3104 kB
Slab: 348004 kB
SReclaimable: 315860 kB
SUnreclaim: 32144 kB
KernelStack: 2392 kB
PageTables: 3072 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 15813348 kB
Committed_AS: 568288 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 299896 kB
VmallocChunk: 34359433540 kB
HardwareCorrupted: 0 kB
AnonHugePages: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 136704 kB
DirectMap2M: 4048896 kB
DirectMap1G: 4194304 kB

Here is the out put of command "pmap -x `ps ax | cut -d" " -f1` "

Quote:

root@backup2:~# pmap -x `ps ax | cut -d" " -f1`
11606: [kworker/0:2]
Address Kbytes RSS Dirty Mode Mapping
---------------- ------ ------ ------
total kB 0 0 0
11841: /usr/bin/atop -a -w /var/log/atop/atop_20130905 600
Address Kbytes RSS Dirty Mode Mapping
0000000000400000 152 152 0 r-x-- atop
0000000000625000 4 4 4 r---- atop
0000000000626000 16 16 16 rw--- atop
000000000062a000 124 124 124 rw--- [ anon ]
00000000023b6000 2192 2192 2192 rw--- [ anon ]
00007eff048fa000 8 8 0 r-x-- libdl-2.13.so
00007eff048fc000 2048 0 0 ----- libdl-2.13.so
00007eff04afc000 4 4 4 r---- libdl-2.13.so
00007eff04afd000 4 4 4 rw--- libdl-2.13.so
00007eff04afe000 1536 1536 0 r-x-- libc-2.13.so
00007eff04c7e000 2048 0 0 ----- libc-2.13.so
00007eff04e7e000 16 16 16 r---- libc-2.13.so
00007eff04e82000 4 4 4 rw--- libc-2.13.so
00007eff04e83000 20 20 20 rw--- [ anon ]
00007eff04e88000 88 88 0 r-x-- libz.so.1.2.7
00007eff04e9e000 2044 0 0 ----- libz.so.1.2.7
00007eff0509d000 4 4 4 r---- libz.so.1.2.7
00007eff0509e000 4 4 4 rw--- libz.so.1.2.7
00007eff0509f000 516 516 0 r-x-- libm-2.13.so
00007eff05120000 2044 0 0 ----- libm-2.13.so
00007eff0531f000 4 4 4 r---- libm-2.13.so
00007eff05320000 4 4 4 rw--- libm-2.13.so
00007eff05321000 148 148 0 r-x-- libtinfo.so.5.9
00007eff05346000 2044 0 0 ----- libtinfo.so.5.9
00007eff05545000 16 16 16 r---- libtinfo.so.5.9
00007eff05549000 4 4 4 rw--- libtinfo.so.5.9
00007eff0554a000 132 132 0 r-x-- libncurses.so.5.9
00007eff0556b000 2044 0 0 ----- libncurses.so.5.9
00007eff0576a000 4 4 4 r---- libncurses.so.5.9
00007eff0576b000 4 4 4 rw--- libncurses.so.5.9
00007eff0576c000 128 128 0 r-x-- ld-2.13.so
00007eff0595d000 152 152 152 rw--- [ anon ]
00007eff05989000 8 8 8 rw--- [ anon ]
00007eff0598b000 4 4 4 r---- ld-2.13.so
00007eff0598c000 4 4 4 rw--- ld-2.13.so
00007eff0598d000 4 4 4 rw--- [ anon ]
00007fffc525a000 132 132 132 rw--- [ stack ]
00007fffc53ff000 4 4 0 r-x-- [ anon ]
ffffffffff600000 4 0 0 r-x-- [ anon ]
---------------- ------ ------ ------
total kB 17720 5444 2732

[........]

Here is the out put of command ""

Quote:

grep ^VmSize /proc/*/status | sort -n -k+2 | tail
/proc/2344/status:VmSize: 53168 kB
/proc/21124/status:VmSize: 60424 kB
/proc/21117/status:VmSize: 65568 kB
/proc/17646/status:VmSize: 79724 kB
/proc/17651/status:VmSize: 79724 kB
/proc/9407/status:VmSize: 83032 kB
/proc/8193/status:VmSize: 83528 kB
/proc/30233/status:VmSize: 127564 kB
/proc/9449/status:VmSize: 307280 kB
/proc/9450/status:VmSize: 307280 kB

I understand nothing about those above out put command.

Here the out put of top command sort by memory usaged (shift + m)

Quote:

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
21124 backuppc 20 0 60424 21m 2220 S 0 0.3 1:26.59 BackupPC_trashC
21117 backuppc 20 0 65568 13m 1956 S 0 0.2 0:03.45 BackupPC
2478 root 20 0 52244 9664 1224 S 0 0.1 0:13.52 munin-node
8193 root 20 0 83528 5768 3328 S 0 0.1 0:01.13 apache2
11841 root 0 -20 17720 5444 2712 S 0 0.1 0:01.74 atop
9450 www-data 20 0 300m 4948 2148 S 0 0.1 0:00.00 apache2
9449 www-data 20 0 300m 4940 2140 S 0 0.1 0:00.01 apache2
17646 root 20 0 79724 3812 2968 S 0 0.0 0:00.02 sshd
9407 www-data 20 0 83032 3052 660 S 0 0.0 0:00.00 apache2
30233 root 20 0 124m 2644 1572 S 0 0.0 0:00.14 console-kit-dae
17660 root 20 0 19476 2232 1656 S 0 0.0 0:00.02 bash
17657 root 20 0 44320 2048 1544 S 0 0.0 0:00.02 sudo
17652 vannath 20 0 19400 1992 1552 S 0 0.0 0:00.00 bash
19777 root 20 0 23308 1700 1176 R 0 0.0 0:00.13 top
17651 vannath 20 0 79724 1640 796 S 0 0.0 0:00.04 sshd
2344 root 20 0 53168 1548 596 S 0 0.0 0:01.75 rsyslogd
31812 root 20 0 49848 1176 568 S 0 0.0 0:00.12 sshd
2880 messageb 20 0 29800 1028 704 S 0 0.0 0:00.03 dbus-daemon
732 root 20 0 21584 1000 360 S 0 0.0 0:00.00 udevd

and here is the the out put of free command

Quote:

# free -m
total used free shared buffers cached
Mem: 7997 7781 215 0 565 6755
-/+ buffers/cache: 460 7536
Swap: 11443 0 11443
root@backup2:~#

I really can not find out what process is taking up my memory.

and how can fix the memory leak. and with disk caching how can i make my memory work as usaul.

Thanks with best regards,
Vannath

vannathlab · 09-04-2013, 10:27 PM

Now I fix it. thank you so much for all you help.

root@backup2:~# free -m
total used free shared buffers cached
Mem: 7997 7765 231 0 567 6738
-/+ buffers/cache: 460 7537
Swap: 11443 0 11443
root@backup2:~# echo 3 | sudo tee /proc/sys/vm/drop_caches
3
root@backup2:~#
root@backup2:~#
root@backup2:~# free -m
total used free shared buffers cached
Mem: 7997 174 7823 0 1 14
-/+ buffers/cache: 158 7838
Swap: 11443 0 11443
root@backup2:~#

astrogeek · 09-04-2013, 10:38 PM

Good! Could you share with the rest of us what you did to fix it?

vannathlab · 09-04-2013, 10:41 PM

I use this command

#echo 3 | sudo tee /proc/sys/vm/drop_caches
3

astrogeek · 09-04-2013, 10:45 PM

Quote:

Originally Posted by vannathlab

I use this command

#echo 3 | sudo tee /proc/sys/vm/drop_caches
3

Oops! My bad! I missed that in your post!

Thanks!

akiuni · 09-06-2013, 05:09 AM

that sounds good, however you only dropped the caches... I fear that your problem may occur again in the future.
/proc/meminfo shows a lot amount of caches but it's not an issue. I suggest that you monitor the /proc/slabinfo file if the problem occures again.

vannathlab · 09-06-2013, 05:26 AM

Yes, it actually happen again after 9 to 12 hour later. I really tried command "cat /proc/slabinfo" but i really can not understand the out put of this command. what information that i can get from this command. and could you tell me any solution that i can fix this problem permanently?

Thanks,
Vannath

akiuni · 09-06-2013, 07:07 AM

Well I think that you should monitor the evolution of /proc/meminfo (and drop the caches when the system becomes unsafe). This can be done with a simple loop like that for example :

# mkdir /root/tmp
# while true; do cp /proc/meminfo /root/tmp/meminfo.`date +%Y%m%d%H%M`; sleep 3600; done

After a while, you can try to find out which part of the memory is concerned. For exemple for Slab cache :
# grep Slab root/tmp/meminfo.*

you can "export" the result to excel or something like that to check the evolution, that can give you a first clue.
If the part of memory is the cache, then you should have a look at the slab cache because it can help you to find out which process is leaking.
You can monitor the amount of memory allocated inside the cache like that :
# cat /proc/slabinfo | awk -F" " '{printf "$1 : %10.0f\n", ($3*$4) }'

use this to drop the process which are not using any memory :
# cat /proc/slabinfo | awk -F" " '{printf $1" : %10.0f\n", ($3*$4) }' | grep -ve " 0$"

after that, you can create the same kind of loop to monitor the slab cache :
# while true; do cat /proc/slabinfo | awk -F" " '{printf $1" : %10.0f\n", ($3*$4) }' | grep -ve " 0$" > /root/tmp/slab.`date +%Y%m%d%H%M` ; sleep 3600; done

and chech the evolution of the process. cred_jar for example :
# grep cred_jar root/tmp/slab.*

Based on what you will find, you may be able to google about a memory leak in that process...

good luck !