Linux - Server This forum is for the discussion of Linux Software used in a server related context. |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
|
|
09-05-2011, 10:15 PM
|
#1
|
LQ Newbie
Registered: Sep 2011
Location: New Caledonia
Posts: 8
Rep:
|
Very high SLAB usage, hard to understand
Hi there,
I'm not used to ask but this time I have followed every piece of information I could find and I never found solution, so I'm trying here as there seems to be a lot of good people
On a virtual server hosted under VMware, I have some LXC containers hosting a dozen of PHP websites. It's not a very high load for a VM showing 2 CPU at 2.5GHz and 3GB of memory. But it is swaping... My analysis is the following :
Code:
www2 ~ # uname -a
Linux www2 2.6.38-gentoo-r6 #9 SMP Fri Jun 24 14:28:08 NCT 2011 x86_64 Intel(R) Xeon(R) CPU E5420 @ 2.50GHz GenuineIntel GNU/Linux
www2 ~ # free -m
total used free shared buffers cached
Mem: 3018 2915 103 0 11 38
-/+ buffers/cache: 2864 153
Swap: 4095 916 3179
As you can see, there a lot a swap space used while there's no real reason. I fact, if I sum up all the cgroup's used memory it comes to this :
Code:
www2 ~ # sum=0; for f in /cgroup/*/memory.usage_in_bytes; do sum=$[$sum+$(<$f)];
done; echo $[sum / 1024 / 1024]
259
Meaning that I have 259MB of memory used by the containers. After investigating, I found that the missing memory is going in the SLAB allocator, which should be fine too. atop gives the following stats :
Code:
MEM | tot 2.9G | free 121.2M | cache 33.5M | buff 9.2M | slab 2.4G |
So, the SLAB is eating a lot of memory, which wouldn't be a problem if its space was reclaimed as the kernel advertise ~2GB of reclaimable SLAB space :
Code:
www2 / # grep SReclaimable /proc/meminfo
SReclaimable: 2177344 kB
Furthermore, slabtop gives me the following information :
Code:
Active / Total Objects (% used) : 6541983 / 6578453 (99.4%)
Active / Total Slabs (% used) : 615834 / 615841 (100.0%)
Active / Total Caches (% used) : 131 / 227 (57.7%)
Active / Total Size (% used) : 2324507.76K / 2329858.41K (99.8%)
Minimum / Average / Maximum Object : 0.02K / 0.35K / 4096.00K
OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
2507580 2504636 99% 0.19K 125379 20 501516K dentry
2491500 2491494 99% 0.61K 415250 6 1661000K inode_cache
And this is where I'm stuck, I can't find a way to identify what is holding 99% of the inode_cache and dentry SLABs. I'm sorry if I missed a link somewhere before asking, but I really think I exhausted any information Google could get me about this.
If anyone can help me with some tool to get more information about SLAB objects and how to make the kernel reclaim the space it pretends to be reclaimable.
Thanks a lot for (at least) reading !
Last edited by nwrk; 09-05-2011 at 10:39 PM.
|
|
|
09-06-2011, 04:37 PM
|
#2
|
Member
Registered: Sep 2011
Posts: 96
Rep:
|
Hi nwrk
Reading your post, very especially your slabtop output ( I assume you use the default sorting criteria which is based on number of objects), I could say that your Linux server is doing quite a lot ( maybe massive! ) file read/write.
In short, for every file reading, its inode is cached and it is called inode_cache (icache). Furthermore, since one can point to a file after finding in which directory it resides, directory entry is also read and cached. This is named dentry (directory entry).
But adding the fact that your "cached" amount (shown by free) is low, I might deduct that the I/O might be so frequent that your slab allocator (that is what kernel folks name it, it's like cache manager for data structures) decide not to release it. It could be fine, but it could be a leak.
I suggest, if possible, to do kernel upgrade to latest longterm stable. Also, it might be useful to switch to other slab allocator method (usually it's SLOB afaik) and see if it helps. For this slab change, you need to do kernel recompilation by your own.
Hope it helps....
|
|
|
09-06-2011, 09:19 PM
|
#3
|
LQ Veteran
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,249
|
If you're running gentoo, you have a kernel source tree. Have a look at ../Documentation/vm/slub.txt - I haven't tried this debug info, but certainly looks promising. Might add it to my "to-do" list sometime ...
Unless you've done something very specific/silly, you *will* be using the slub allocator.
There have been examples of problems with the allocator (I haven't noticed one reported for a while) - if you feel you have such, that's a reportable kernel bug.
|
|
|
09-07-2011, 12:54 AM
|
#4
|
LQ Newbie
Registered: Sep 2011
Location: New Caledonia
Posts: 8
Original Poster
Rep:
|
Hi syg00 and mulyadi.santosa, thanks for your useful answers.
I didn't knew the SLUB allocator was the new default, I didn't change that and it was on "SLAB". I must admit that I was wondering about the SLUB, as the description states "SLUB can use memory efficiently and has enhanced diagnostics", which looked really interesting in my case.
Upgrading to the lastest mainline kernel was also my next move, sadly requiring a reboot but necessary given your answers. Anyway when the memory gets full of caches, the load goes to 60 or more with no other solution than a reboot.
About what looks like a bug, is that slabtop report à 99% usage of the cache, around that 2.5M objects in both inode_cache and dentry caches while lsof -n |wc -l tells me there are at most 19k open files. If you think I've hit a bug because of that, then I will report this on the LKML; but these are people I disturb in last resort, because they are dangerous
For know I will follow your good advices (that also confirm my intuitions), build a 3.0.4 vanilla (instead of the 2.6.38-gentoo) with the SLUB and reboot. I'll tell you about the results, but expect a delay of around 14 days, before I can confirm that the caches should be reclaimed and they are not.
Thanks again and see you !
Last edited by nwrk; 09-07-2011 at 04:56 AM.
|
|
|
09-07-2011, 02:51 AM
|
#5
|
LQ Veteran
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,249
|
You really need to be on the SLUB allocator - a lot of effort (over years now) has been put into improving its efficiency and slab consolidation.
I don't think I'd want to be disturbing the residents on LKML with SLAB issues unless you can prove the problem(s) also affect SLUB. As you say, they can be a bit touchy ...
Will wait with interest.
|
|
1 members found this post helpful.
|
09-07-2011, 02:58 AM
|
#6
|
Member
Registered: Sep 2011
Posts: 96
Rep:
|
Hi nwrk
Well, sometimes reporting that to LKML is a good move But since you slightly mention about -gentoo kernel, I think you should report that to Gentoo kernel dev team first. I am not saying you have to, it's just if you do that and it is indeed a corner case bug, then you indirectly helps probably hundreds maybe thousands people which might have same workload like you.
Regarding slab allocator choice, actually under no strange condition, both Slab, Slub or Slob or whatever should be fine. What I saw here was a leak and that could happen in any method you choose. Since you mention about LXC, I suspect it could come from LXC, not from the bare Linux kernel.
ehm, about kernel version, I still suggest you pick latest longterm stable. The reason is that the stability is usually better than just "stable". After all, all the fixes are backported from latest stable to them. As we speak, you could choose between 2.6.35.14, 2.6.34.10, 2.6.33.19, or 2.6.32.46. Also pay attention on kernel hacking section during kernel config. Usually, per item there chew more memory, so pick carefully.
|
|
1 members found this post helpful.
|
09-07-2011, 04:48 AM
|
#7
|
LQ Newbie
Registered: Sep 2011
Location: New Caledonia
Posts: 8
Original Poster
Rep:
|
Okay then I'll try to just change the SLAB implementation to SLUB, and see. I'll stick with 2.6.38-gentoo-r6 which is the kernel+gentoo patchset that gentoo people considered stable when I built the server (it is 2.6.39-r3 now). FYI, these kernels include bugfix patches too.
If the problem persists, I'll try to contact the Gentoo kernel maintainers as you suggested. It will be a very good step before LKML
(off subject: there seems to be a problem resolving kernel.org right now; tried from New Caledonia and France... I'm still lucky it seems...)
Last edited by nwrk; 09-08-2011 at 10:53 PM.
|
|
|
09-07-2011, 05:14 AM
|
#8
|
LQ Veteran
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,249
|
Same here - maybe they took it down for a rebuild after they were compromised.
|
|
|
09-07-2011, 09:38 PM
|
#9
|
Member
Registered: Sep 2011
Posts: 96
Rep:
|
Hi
Quote:
Originally Posted by nwrk
Okay then I'll try to just change the SLAB implementation to SLUB, and see. I'll stick with 2.6.38-gentoo-r6 with is the kernel+gentoo patchset that gentoo people considered stable when I built the server (it is 2.6.39-r3 now). FYI, these kernels include bugfix patches too.
|
Good choice I think Picking one which is supported by your distro of choice will make life easier After all, 3.x is still new..
Quote:
Originally Posted by nwrk
(off subject: there seems to be a problem resolving kernel.org right now; tried from New Caledonia and France... I'm still lucky it seems...)
|
I got slowdown too and I agree it might have something to do with latest compromise.
|
|
|
09-18-2011, 03:02 AM
|
#10
|
LQ Newbie
Registered: Sep 2011
Location: New Caledonia
Posts: 8
Original Poster
Rep:
|
Hi again,
It doesn't look better :
Code:
www2 ~ # free -m
total used free shared buffers cached
Mem: 3018 2822 195 0 44 101
-/+ buffers/cache: 2677 341
Swap: 4095 351 3744
www2 ~ # atop |grep MEM
MEM | tot 2.9G | free 177.9M | cache 104.9M | buff 47.3M | slab 1.8G |
I'm upgrading to the latest stable gentoo kernel (2.6.39-gentoo-r3) to see.
|
|
|
09-18-2011, 09:34 AM
|
#11
|
Member
Registered: Sep 2011
Posts: 96
Rep:
|
Hi....
Quote:
Originally Posted by nwrk
Code:
www2 ~ # atop |grep MEM
MEM | tot 2.9G | free 177.9M | cache 104.9M | buff 47.3M | slab 1.8G |
I'm upgrading to the latest stable gentoo kernel (2.6.39-gentoo-r3) to see.
|
Sheesh, 1.8 GiB for slab...that'a lot.... could you run:
slabtop -s c
and let us know the five biggest cache name in yours?
PS: So far I still sense a leak somewhere but this might need deeper Linux kernel memory tracing,which might be unpleasant and quite complicated task to do.
NB: AFAIK, I could only recall "kmemleak" but that would slow down your machine entirely by several magnitude, so I am not sure this will be feasible
|
|
1 members found this post helpful.
|
09-18-2011, 02:59 PM
|
#12
|
LQ Newbie
Registered: Sep 2011
Location: New Caledonia
Posts: 8
Original Poster
Rep:
|
Hi,
thanks again for the answer. I forgot to take a "shot" of slaptop, but it's still the same, ratios like 65% inode_cache, 33% dentry and maybe 1% for everything else; with 99% to 100% usage. I suppose that this usage percent is calculated through something like a refcount, and since LXC is still quite new, maybe there's a problem releasing the references (unshared refcount??). The structure is quite simple and LXC-like : I have the filesystems mounted by the main system and they are bind-mounted in their namespaces. I also have 1 BTRFS volume. And since BTRFS and LXC have been improved in linux 3.0, I think I'll upgrade if the problem is not solved and even if it's not in the gentoo's stable branch -- I have an evil nerd side
FWIW :
Code:
www2 ~ # mount |sed -e 's/vg-h_[^ ]*/vg-h_***/' -e 's,/lxc/[^/]*/,/lxc/***/,'
rootfs on / type rootfs (rw)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
udev on /dev type devtmpfs (rw,nosuid,relatime,size=10240k,nr_inodes=386055,mode=755)
devpts on /dev/pts type devpts (rw,relatime,mode=600,ptmxmode=000)
/dev/sda2 on / type ext2 (rw,noatime,user_xattr,acl,barrier=1,data=ordered)
rc-svcdir on /lib64/rc/init.d type tmpfs (rw,nosuid,nodev,noexec,relatime,size=1024k,mode=755)
shm on /dev/shm type tmpfs (rw,nosuid,nodev,noexec,relatime)
none on /cgroup type cgroup (rw)
/dev/mapper/vg-usr on /usr type ext4 (rw,noatime)
/dev/mapper/vg-portage on /usr/portage type ext4 (rw,noatime)
/dev/mapper/vg-distfiles on /usr/portage/distfiles type ext4 (rw,noatime)
/dev/mapper/vg-home on /home type ext4 (rw,noatime)
/dev/mapper/vg-opt on /opt type ext4 (rw,noatime)
/dev/mapper/vg-tmp on /tmp type ext4 (rw,noatime)
/dev/mapper/vg-var on /var type ext4 (rw,noatime)
/dev/mapper/vg-vartmp on /var/tmp type ext4 (rw,noatime)
/dev/mapper/vg-hosting_base on /home/hosting/system type btrfs (ro,noatime,compress=lzo)
/dev/mapper/vg-hosting_user--data on /home/hosting/user-data type ext4 (ro,noatime)
/home/hosting/system on /home/hosting/template/system type none (ro,bind)
/dev/mapper/vg-h_*** on /home/hosting/lxc/***/user-data type ext4 (rw,relatime,barrier=1)
/dev/mapper/vg-h_*** on /home/hosting/lxc/***/user-data type ext4 (rw,relatime,barrier=1)
/dev/mapper/vg-h_*** on /home/hosting/lxc/***/user-data type ext4 (rw,relatime,barrier=1)
/dev/mapper/vg-h_*** on /home/hosting/lxc/***/user-data type ext4 (rw,relatime,barrier=1)
/dev/mapper/vg-h_*** on /home/hosting/lxc/***/user-data type ext4 (rw,relatime,barrier=1)
/dev/mapper/vg-h_*** on /home/hosting/lxc/***/user-data type ext4 (rw,relatime,barrier=1)
/dev/mapper/vg-h_*** on /home/hosting/lxc/***/user-data type ext4 (rw,relatime,barrier=1)
/dev/mapper/vg-h_*** on /home/hosting/lxc/***/user-data type ext4 (rw,relatime,barrier=1)
/dev/mapper/vg-h_*** on /home/hosting/lxc/***/user-data type ext4 (rw,relatime,barrier=1)
/dev/mapper/vg-h_*** on /home/hosting/lxc/***/user-data type ext4 (rw,relatime,barrier=1)
/dev/mapper/vg-h_*** on /home/hosting/lxc/***/user-data type ext4 (rw,relatime,barrier=1)
/dev/mapper/vg-h_*** on /home/hosting/lxc/***/user-data type ext4 (rw,relatime,barrier=1)
/dev/mapper/vg-h_*** on /home/hosting/lxc/***/user-data type ext4 (rw,relatime,barrier=1)
usbfs on /proc/bus/usb type usbfs (rw,noexec,nosuid,devmode=0664,devgid=85)
rpc_pipefs on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
nfsd on /proc/fs/nfsd type nfsd (rw,noexec,nosuid,nodev)
|
|
|
09-19-2011, 10:13 AM
|
#13
|
Member
Registered: Sep 2011
Posts: 96
Rep:
|
Hi again...
Quote:
Originally Posted by nwrk
Hi,
thanks again for the answer. I forgot to take a "shot" of slaptop, but it's still the same, ratios like 65% inode_cache, 33% dentry and maybe 1% for everything else; with 99% to 100% usage.
|
Alright, now I am quite sure it's memory leak...in...uhm, btrfs? ext4 is quite stable so I would point my suspicion toward btrfs. Maybe you enabled some features that somehow hold something for long time, snapshots (like ZFS) maybe?
regarding LXC, that could be an amplifying factor, or another factor that comes together. Are you suffering same thing on your other machine which uses btrfs but not using LXC?
Quote:
Originally Posted by nwrk
Hi,
thanks again for the answer. I forgot to take a "shot" of slaptop, but it's still the same, ratios like 65% inode_cache, 33% dentry and maybe 1% for everything else; with 99% to 100% usage. I suppose that this usage percent is calculated through something like a refcount, and since LXC is still quite new, maybe there's a problem releasing the references (unshared refcount??). The structure is quite simple and LXC-like : I have the filesystems mounted by the main system and they are bind-mounted in their namespaces. I also have 1 BTRFS volume. And since BTRFS and LXC have been improved in linux 3.0, I think I'll upgrade if the problem is not solved and even if it's not in the gentoo's stable branch -- I have an evil nerd side
FWIW :
Code:
/dev/mapper/vg-hosting_base on /home/hosting/system type btrfs (ro,noatime,compress=lzo)
|
wait wait wait, "compress"? hmmmmm could that be the problem?
PS: care to award me reputation point ?
|
|
1 members found this post helpful.
|
09-19-2011, 03:18 PM
|
#14
|
LQ Newbie
Registered: Sep 2011
Location: New Caledonia
Posts: 8
Original Poster
Rep:
|
Hello,
Yeah I though there could something with the quite new btrfs; my approach was "well, it's read only and I could benefit quite a lot of a fast compression like LZO". I think you're right pointing it, because even in read-only it may hold references, thus causing memory leaks. Maybe compression but I'm not sure.
My other BTRFS filesystem is used as a "buffer" (because I don't trust BTRFS for now) : it's mounted with the compress option (defaults to zlib) to have faster Oracle datafiles checks. The MD5 and SHA1 sums on the files are always good, and the controls on the blocks are good too (Oracle maintains checksums at block level). The difference is that this filesystem is unmounted after use to keep another safe copy of the backup, until the next backup. On this host, slabtop gives me that :
22780 8028 35% 0.19K 1139 20 4556K dentry
180 142 78% 0.61K 30 6 120K inode_cache
So, following your good advice, I'll let the memory go up for some time and try an unmount to see what it gives. It will some night work so give me some time please
Quote:
Originally Posted by mulyadi.santosa
PS: care to award me reputation point ?
|
You mean post rating, right ?
|
|
|
09-19-2011, 08:14 PM
|
#15
|
Member
Registered: Sep 2011
Posts: 96
Rep:
|
Quote:
Originally Posted by nwrk
Hello,
Yeah I though there could something with the quite new btrfs; my approach was "well, it's read only and I could benefit quite a lot of a fast compression like LZO". I think you're right pointing it, because even in read-only it may hold references, thus causing memory leaks. Maybe compression but I'm not sure.
|
I have no very strong believe upon it, but I quite suspect it's something related to btrfs. Mind you, even Fedora 16 plans to delay using btrfs as default filesystem. Might have something to do with this kind of stability perhaps....
Quote:
Originally Posted by nwrk
The difference is that this filesystem is unmounted after use to keep another safe copy of the backup, until the next backup. On this host, slabtop gives me that :
22780 8028 35% 0.19K 1139 20 4556K dentry
180 142 78% 0.61K 30 6 120K inode_cache
|
Ok, I assume the field left to the cache name is the cache size...looks sane for me. And that percentage is the usage percentage (active objects vs total number object in the certain cache), I guess. Again, looks sane....
So, to summary so far:
Actually, high percentage of slab size especially dentry and inode_cache are not rare. The thing is, as I notice on your early posts, that it force swapping out. So, probably this slab (or maybe other) is kept (or locked) in RAM. And the suspect here is your btrfs. LXC could be the augmentation factor too.
Quote:
Originally Posted by nwrk
You mean post rating, right ?
|
Yes, please (if I help you somehow)
|
|
|
All times are GMT -5. The time now is 07:23 AM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|