Memory reported as used, but can't account for what's using it.
My memory usage, as reported by free is creeping up with time, before eventually plateauing. Free reported 2,495Gb used, and 1,337Gb free. The summary in top seemed to concur with these values.
However if I add up the memory used by various process I cannot account for anywhere near that. Using ps_mem.py http://www.pixelbeat.org/scripts/ps_mem.py , I can only account for 763Mb of RAM usage. I am taking the free memory value from the -/+ buffers/cache line, so this doesn't seem the widely encountered issue where people sometimes mistakenly read it from the first liine, which includes the buffering+caching: http://www.linuxatemyram.com/ To see if the RAM reported as used by free, was actually available for use I used the "C munch program" at: http://www.linuxatemyram.com/play.html to attempt to allocate 3000Mb of memory, far more than free suggested was available, but what I thought should be available based upon the memory accounted for by ps_mem.py. This displaced some stuff into swap. But after the memory allocation program exited, free now reports much lower memory usage, which is far closer to what I would expect based upon ps_mem output Mem + swap before munch 2589 Mem + swap after munch 1184 This is on a headless server, with a minimal install of Redhat Enterprise Linux 6.2, the only software of any significance that we have running is a source build of Apache httpd 2.4, and three instances of tomcat. If anyone can shed any light as to what's going on with the memory usage I would be very grateful. I have included the output of free, top (sorted by memory usage) and ps_mem.py below. Thanks, Paul free -m total used free shared buffers cached Mem: 3833 2896 937 0 282 118 -/+ buffers/cache: 2495 1337 Swap: 6142 94 6048 top - 09:02:54 up 7 days, 17:22, 1 user, load average: 0.07, 0.02, 0.00 Tasks: 125 total, 1 running, 124 sleeping, 0 stopped, 0 zombie Cpu(s): 1.0%us, 0.3%sy, 0.0%ni, 98.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 3925316k total, 2966540k used, 958776k free, 289028k buffers Swap: 6290424k total, 97100k used, 6193324k free, 121432k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 26523 sits_tom 20 0 1372m 209m 6244 S 0.0 5.5 2:00.14 java 26485 sits_tom 20 0 1372m 209m 6352 S 0.0 5.5 2:02.43 java 26568 sits_tom 20 0 1372m 201m 6248 S 0.0 5.2 1:58.63 java 1465 daemon 20 0 85092 41m 3460 S 0.0 1.1 19:15.98 .vasd 1600 daemon 20 0 73428 30m 5612 S 0.0 0.8 0:02.31 .vasd 1478 daemon 20 0 71312 28m 4164 S 0.0 0.7 41:37.49 .vasd 1479 daemon 20 0 88908 18m 5684 S 0.0 0.5 8:07.72 .vasd 5889 web_apac 20 0 1658m 17m 3864 S 1.0 0.5 12:50.66 httpd 1455 root 20 0 53060 9640 4892 S 0.0 0.2 1:02.75 .vasd 15010 root 20 0 806m 9484 3008 S 0.0 0.2 1:40.46 scxcimprovagt 14977 root 20 0 706m 5788 2948 S 0.0 0.1 1:28.11 scxcimserver 14845 root 20 0 110m 5784 4728 S 0.0 0.1 0:00.01 sshd 14878 root 20 0 184m 4492 3716 S 0.0 0.1 0:00.00 sudo 19308 postfix 20 0 87980 3736 2844 S 0.0 0.1 0:00.02 qmgr 14652 postfix 20 0 87800 3596 2748 S 0.0 0.1 0:00.00 pickup 15044 scom_msa 20 0 341m 3528 2956 S 0.0 0.1 0:06.22 scxcimprovagt 19293 root 20 0 78684 3300 2448 S 0.0 0.1 0:00.10 master 1319 root 20 0 249m 2668 800 S 0.0 0.1 0:01.44 rsyslogd 5879 root 20 0 122m 2536 2144 S 0.0 0.1 0:03.13 httpd 14850 s167 20 0 110m 2396 1320 S 0.0 0.1 0:00.01 sshd 14851 s167 20 0 121m 2276 1856 S 0.0 0.1 0:00.00 bash 14882 root 20 0 105m 2020 1548 S 0.0 0.1 0:00.01 bash ps_mem.py Private + Shared = RAM used Program 4.0 KiB + 39.5 KiB = 43.5 KiB rpc.idmapd 4.0 KiB + 40.5 KiB = 44.5 KiB acpid 4.0 KiB + 54.0 KiB = 58.0 KiB rpc.statd 0.0 KiB + 71.5 KiB = 71.5 KiB udevd (3) 60.0 KiB + 46.5 KiB = 106.5 KiB rpcbind 80.0 KiB + 60.0 KiB = 140.0 KiB rhsmcertd (2) 24.0 KiB + 123.0 KiB = 147.0 KiB mingetty (6) 184.0 KiB + 37.5 KiB = 221.5 KiB auditd 256.0 KiB + 81.5 KiB = 337.5 KiB init 296.0 KiB + 91.0 KiB = 387.0 KiB ntpd 368.0 KiB + 39.5 KiB = 407.5 KiB crond 296.0 KiB + 114.5 KiB = 410.5 KiB cronolog (5) 532.0 KiB + 211.5 KiB = 743.5 KiB catalina.sh (3) 876.0 KiB + 87.5 KiB = 963.5 KiB vmtoolsd 976.0 KiB + 421.5 KiB = 1.4 MiB master 1.0 MiB + 443.5 KiB = 1.4 MiB pickup 1.1 MiB + 446.5 KiB = 1.5 MiB qmgr 924.0 KiB + 685.5 KiB = 1.6 MiB bash (2) 1.3 MiB + 633.0 KiB = 1.9 MiB sudo 2.0 MiB + 59.5 KiB = 2.1 MiB rsyslogd 1.3 MiB + 2.1 MiB = 3.4 MiB sshd (3) 3.7 MiB + 494.0 KiB = 4.2 MiB scxcimserver 7.6 MiB + 1.4 MiB = 9.1 MiB scxcimprovagt (2) 14.2 MiB + 2.1 MiB = 16.4 MiB httpd (2) 105.2 MiB + 4.7 MiB = 109.9 MiB .vasd (5) 601.8 MiB + 5.0 MiB = 606.8 MiB java (3) --------------------------------- 763.6 MiB ================================= ### DURING MUNCH - 3000Mb allocated free -m total used free shared buffers cached Mem: 3833 3714 119 0 4 16 -/+ buffers/cache: 3694 139 Swap: 6142 508 5634 ### AFTER MUNCH free -m total used free shared buffers cached Mem: 3833 724 3108 0 5 16 -/+ buffers/cache: 703 3130 Swap: 6142 481 5661 top - 09:05:49 up 7 days, 17:25, 2 users, load average: 0.19, 0.08, 0.03 Tasks: 129 total, 1 running, 128 sleeping, 0 stopped, 0 zombie Cpu(s): 4.8%us, 0.0%sy, 0.0%ni, 95.2%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 3925316k total, 769064k used, 3156252k free, 9128k buffers Swap: 6290424k total, 490536k used, 5799888k free, 36352k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 26485 sits_tom 20 0 1372m 144m 4380 S 0.0 3.8 2:02.48 java 26568 sits_tom 20 0 1372m 61m 4328 S 0.0 1.6 1:58.69 java 26523 sits_tom 20 0 1372m 59m 4328 S 0.0 1.6 2:00.20 java 1465 daemon 20 0 85092 40m 2908 S 0.0 1.1 19:16.01 .vasd 1478 daemon 20 0 71312 25m 3104 S 0.0 0.7 41:37.55 .vasd 1600 daemon 20 0 73428 20m 3840 S 0.0 0.5 0:02.31 .vasd 1479 daemon 20 0 88908 12m 3832 S 0.0 0.3 8:07.79 .vasd 5889 web_apac 20 0 1658m 11m 3528 S 3.2 0.3 12:52.25 httpd 1455 root 20 0 53060 7696 3952 S 0.0 0.2 1:02.76 .vasd 15010 root 20 0 806m 7188 1696 S 0.0 0.2 1:40.49 scxcimprovagt 15108 root 20 0 110m 4616 4588 S 0.0 0.1 0:00.01 sshd 14845 root 20 0 110m 4592 4588 S 0.0 0.1 0:00.01 sshd 14878 root 20 0 184m 3664 3660 S 0.0 0.1 0:00.00 sudo 15141 root 20 0 184m 3660 3656 S 0.0 0.1 0:00.01 sudo 19308 postfix 20 0 87980 2796 2792 S 0.0 0.1 0:00.02 qmgr 14652 postfix 20 0 87800 2744 2684 S 0.0 0.1 0:00.00 pickup 19293 root 20 0 78684 2512 2432 S 0.0 0.1 0:00.10 master 5879 root 20 0 122m 2368 2084 S 0.0 0.1 0:03.14 httpd 1319 root 20 0 249m 2252 792 S 0.0 0.1 0:01.44 rsyslogd ps_mem.py Private + Shared = RAM used Program 4.0 KiB + 37.5 KiB = 41.5 KiB rpc.idmapd 4.0 KiB + 39.5 KiB = 43.5 KiB acpid 4.0 KiB + 40.0 KiB = 44.0 KiB rhsmcertd (2) 4.0 KiB + 51.0 KiB = 55.0 KiB rpc.statd 4.0 KiB + 68.5 KiB = 72.5 KiB udevd (3) 52.0 KiB + 41.5 KiB = 93.5 KiB rpcbind 24.0 KiB + 117.0 KiB = 141.0 KiB mingetty (6) 112.0 KiB + 38.5 KiB = 150.5 KiB crond 92.0 KiB + 81.0 KiB = 173.0 KiB ntpd 12.0 KiB + 163.5 KiB = 175.5 KiB catalina.sh (3) 176.0 KiB + 35.5 KiB = 211.5 KiB auditd 128.0 KiB + 94.5 KiB = 222.5 KiB cronolog (5) 160.0 KiB + 71.5 KiB = 231.5 KiB init 160.0 KiB + 389.5 KiB = 549.5 KiB pickup 180.0 KiB + 392.5 KiB = 572.5 KiB qmgr 204.0 KiB + 369.5 KiB = 573.5 KiB master 400.0 KiB + 233.0 KiB = 633.0 KiB scxcimserver 896.0 KiB + 74.5 KiB = 970.5 KiB vmtoolsd 508.0 KiB + 813.0 KiB = 1.3 MiB bash (4) 8.0 KiB + 1.4 MiB = 1.4 MiB sudo (2) 1.6 MiB + 56.5 KiB = 1.7 MiB rsyslogd 304.0 KiB + 2.5 MiB = 2.8 MiB sshd (5) 5.8 MiB + 579.0 KiB = 6.4 MiB scxcimprovagt (2) 8.3 MiB + 1.8 MiB = 10.1 MiB httpd (2) 88.6 MiB + 2.9 MiB = 91.5 MiB .vasd (5) 253.7 MiB + 3.2 MiB = 256.9 MiB java (3) --------------------------------- 376.8 MiB ================================= |
[removed]
|
Quote:
I am taking the free memory value from the -/+ buffers/cache line, so this doesn't seem the widely encountered issue where people sometimes mistakenly read it from the first liine, which includes the buffering+caching: linuxatemyram |
That was a totally pointless post - you are doing fine trying to chase this.
Memory allocation is a can of worms. Reading that script is a good start, but there is no good way to completely account for memory. Even the kernel devs have argued about this for years. "pss" was the best they could come up with for shared pages, and even that is (extremely) "rubbery". Then there is the caches (did you try drop_caches as suggested on the linuxatemyram site ?). Not to mention the buddy slab allocator and its requirements (see /proc/meminfo and /proc/slabinfo). This could drive you nuts .... :p |
Thanks syg00, your post has helped me a great deal, and got me moving in the right direction (I think).
Quote:
I have to confess I know virtually nothing about how memory allocation works. Googling my issue has been problematic as all I manged to find was the typical LinuxAteMyRAM issue of mis-reading free's output. I've included below the results of drop_caches, plus /proc/meminfo before and after the drop_caches. I have no experience of interpreting these, but comparing the before and after values, the memory discrepancy was probably accounted for by the SReclaimable line in /proc/meminfo . Some googling tells me this is "a cache of in-kernel data structures" (http://stackoverflow.com/questions/5...ng-discrepancy). The slabtop command (sorted by cache size), shows: OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME 366540 366540 100% 0.19K 18327 20 73308K dentry 171 171 100% 32.12K 171 1 10944K kmem_cache 4128 4128 100% 0.58K 688 6 2752K inode_cache 3143 1958 62% 0.55K 449 7 1796K radix_tree_node 9612 9596 99% 0.14K 356 27 1424K sysfs_dir_cache Having wiped it out pretty recently, the discrepancy is only about 5% of my memory so far, but I can already see that dentry appears to account for most of the slabs, and be growing quickly. My hunch is that it's apache HTTPD (as this seems to be the common factor where I've seen this major reporting discrepancy), but I'm not sure how I can tie these slab entires to particular applications. I'm also still unclear about whether this is normal and I should be unconcerned, as the OS seems to be able to free it if it's needed, or whether it suggests something is wrong with my build of apache (assuming that is the culperate). Any other thoughts / advice on the subject would be greatly appreciated. Thanks, Paul free -m total used free shared buffers cached Mem: 2006 1934 72 0 167 488 -/+ buffers/cache: 1278 728 Swap: 6142 0 6142 /usr/local/bin/psmem.py Private + Shared = RAM used Program 96.0 KiB + 82.0 KiB = 178.0 KiB rhsmcertd (2) 172.0 KiB + 24.5 KiB = 196.5 KiB acpid 176.0 KiB + 65.0 KiB = 241.0 KiB cronolog (2) 284.0 KiB + 44.5 KiB = 328.5 KiB rpc.idmapd 260.0 KiB + 79.5 KiB = 339.5 KiB rpcbind 364.0 KiB + 42.5 KiB = 406.5 KiB auditd 328.0 KiB + 102.0 KiB = 430.0 KiB rpc.statd 492.0 KiB + 147.0 KiB = 639.0 KiB mingetty (6) 644.0 KiB + 53.5 KiB = 697.5 KiB crond 260.0 KiB + 497.5 KiB = 757.5 KiB udevd (3) 712.0 KiB + 124.0 KiB = 836.0 KiB ntpd 752.0 KiB + 110.5 KiB = 862.5 KiB init 664.0 KiB + 238.5 KiB = 902.5 KiB sudo 872.0 KiB + 77.5 KiB = 949.5 KiB rsyslogd 964.0 KiB + 415.5 KiB = 1.3 MiB master 1.0 MiB + 439.5 KiB = 1.4 MiB pickup 1.1 MiB + 443.5 KiB = 1.5 MiB qmgr 924.0 KiB + 881.5 KiB = 1.8 MiB bash (2) 1.8 MiB + 108.5 KiB = 1.9 MiB vmtoolsd 2.1 MiB + 2.0 MiB = 4.1 MiB sshd (3) 15.4 MiB + 960.0 KiB = 16.3 MiB shibd 11.8 MiB + 6.1 MiB = 17.9 MiB httpd (2) 113.5 MiB + 5.3 MiB = 118.8 MiB .vasd (5) 170.4 MiB + 65.0 KiB = 170.5 MiB clamd --------------------------------- 343.1 MiB ================================= cat /proc/meminfo MemTotal: 2054980 kB MemFree: 73984 kB Buffers: 171724 kB Cached: 500424 kB SwapCached: 88 kB Active: 556756 kB Inactive: 441328 kB Active(anon): 218956 kB Inactive(anon): 107796 kB Active(file): 337800 kB Inactive(file): 333532 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 6290424 kB SwapFree: 6290040 kB Dirty: 112 kB Writeback: 0 kB AnonPages: 325868 kB Mapped: 24392 kB Shmem: 816 kB Slab: 939860 kB SReclaimable: 917436 kB SUnreclaim: 22424 kB KernelStack: 1752 kB PageTables: 5172 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 7317912 kB Committed_AS: 1510524 kB VmallocTotal: 34359738367 kB VmallocUsed: 274872 kB VmallocChunk: 34359459448 kB HardwareCorrupted: 0 kB AnonHugePages: 253952 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 10240 kB DirectMap2M: 2086912 kB free -m total used free shared buffers cached Mem: 2006 1932 74 0 170 482 -/+ buffers/cache: 1279 726 Swap: 6142 0 6142 echo 3 | tee /proc/sys/vm/drop_caches 3 free -m total used free shared buffers cached Mem: 2006 414 1592 0 0 24 -/+ buffers/cache: 389 1617 Swap: 6142 0 6142 /usr/local/bin/psmem.py Private + Shared = RAM used Program 96.0 KiB + 82.0 KiB = 178.0 KiB rhsmcertd (2) 172.0 KiB + 24.5 KiB = 196.5 KiB acpid 176.0 KiB + 65.0 KiB = 241.0 KiB cronolog (2) 284.0 KiB + 44.5 KiB = 328.5 KiB rpc.idmapd 260.0 KiB + 79.5 KiB = 339.5 KiB rpcbind 364.0 KiB + 42.5 KiB = 406.5 KiB auditd 328.0 KiB + 102.0 KiB = 430.0 KiB rpc.statd 492.0 KiB + 147.0 KiB = 639.0 KiB mingetty (6) 644.0 KiB + 53.5 KiB = 697.5 KiB crond 260.0 KiB + 497.5 KiB = 757.5 KiB udevd (3) 712.0 KiB + 124.0 KiB = 836.0 KiB ntpd 752.0 KiB + 110.5 KiB = 862.5 KiB init 664.0 KiB + 238.5 KiB = 902.5 KiB sudo 872.0 KiB + 77.5 KiB = 949.5 KiB rsyslogd 964.0 KiB + 415.5 KiB = 1.3 MiB master 1.0 MiB + 439.5 KiB = 1.4 MiB pickup 1.1 MiB + 443.5 KiB = 1.5 MiB qmgr 924.0 KiB + 881.5 KiB = 1.8 MiB bash (2) 1.8 MiB + 108.5 KiB = 1.9 MiB vmtoolsd 2.1 MiB + 2.0 MiB = 4.1 MiB sshd (3) 15.4 MiB + 960.0 KiB = 16.3 MiB shibd 11.8 MiB + 6.1 MiB = 17.9 MiB httpd (2) 113.5 MiB + 5.3 MiB = 118.8 MiB .vasd (5) 170.4 MiB + 65.0 KiB = 170.5 MiB clamd --------------------------------- 343.1 MiB ================================= cat /proc/meminfo MemTotal: 2054980 kB MemFree: 1610268 kB Buffers: 5704 kB Cached: 38348 kB SwapCached: 88 kB Active: 239512 kB Inactive: 130452 kB Active(anon): 218952 kB Inactive(anon): 107796 kB Active(file): 20560 kB Inactive(file): 22656 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 6290424 kB SwapFree: 6290040 kB Dirty: 72 kB Writeback: 0 kB AnonPages: 325868 kB Mapped: 24392 kB Shmem: 816 kB Slab: 31752 kB SReclaimable: 9360 kB SUnreclaim: 22392 kB KernelStack: 1752 kB PageTables: 5168 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 7317912 kB Committed_AS: 1510524 kB VmallocTotal: 34359738367 kB VmallocUsed: 274872 kB VmallocChunk: 34359459448 kB HardwareCorrupted: 0 kB AnonHugePages: 253952 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 10240 kB DirectMap2M: 2086912 kB |
I think these are your 2 key quotes
Quote:
It has a very efficient scheduler that swaps out or otherwise recycles RAM depending on 'most urgent' ie current requirement. As syg00 points out, you're welcome to investigate this, but it could send you insane ... ;) Unless you've got a definite performance issue, I'd regard it as an academic exercise for when you're bored :) |
Hmmmm - where to start. First off, use [code] tags when posting output - it keeps the alignment. If it's easier to read, people are more likely to try and help.
Quote:
Quote:
You don't want to go there ... There have been some issues (memory leak essentially) with the slab allocator, but that was a few years back, and certainly won't be applicable to RHEL 6.2. So I would consider this as "working as designed" and just accept the situation if it isn't actually causing a problem. Poorly written/designed apps can drive memory fragmentation, which causes more slab cache entries to be allocated (which eats RAM). I've no idea whether apache would be guilty like this, but it certainly has a history of disregarding the overall system health - much like every database system out there. But I digress ... |
Thanks to everyone for there comments, sorry for the delay in replying (have been on holiday).
Unfortunately this isn't purely an academic exercise for me.... I'm "losing" up to 2.5Gb RAM on some servers. Whilst it does seem to be possible to free the memory (as I described in previous posts), it results in all the memory being unavailable for disk caching, which results in some quite significant performance degradation on our web servers. As the memory usage plateaus, I could just throw more memory at the problem, and have done this in the case of one server already. But doing this across 30-50 servers does have cost implications - which I don't mind if that much memory is genuinely needed, but would like to better understand what's going on, and try to rule out a software bug that might be fixable rather than upgrading a lot of servers. Thanks, Paul |
All times are GMT -5. The time now is 07:29 PM. |