Query regarding High Inactive(file) usage
Hi kernel experts
I'm looking for help related to what seems to be a problem in the MM or FS (EXT4/JBD2) subsystem. Though it could very well be something in the userland that i'm unable to catch. I'm debugging an issue where my Virtual machine running 3.10.19 (host has the same kernel version) on QEMU (1.5.3) gets low on memory. The VM is launched with 2.4G of RAM and the usual process usage is around 1.4G. When the low memory is detected, i notice that the Inactive(file) usage reported is very high, while Cached usage is fairly low in /proc/meminfo. For e.g.: MemTotal: 2459784 kB MemFree: 89544 kB Buffers: 3316 kB Cached: 108872 kB SwapCached: 0 kB Active: 1119204 kB Inactive: 926884 kB Active(anon): 1104440 kB Inactive(anon): 1896 kB Active(file): 14764 kB Inactive(file): 924988 kB At this point, the process resident usage checked via /proc/pid/smaps hasn't gone up. Dropping caches doesn't help at all. And i don't see anything unusual in the kernel logs. I've looked at /dev/shm and other tmpfs usage. They all look normal. Disk usage for other partitions looks normal too. By normal i mean it is pretty much what i see even without the problem. So i did the next logical thing and took a VM core and decoded it in crash. A few things i noticed: 1. Going through the inode list via the Superblocks, the inodes don't have that many pages mapped. So that's why dropping caches doesn't help. 2. From the kmem output, I see huge number of pages in the Inactive LRU that don't have any mappings associated with them. This confirms the first point. 3. But most of these pages have an associated private entry. These private/FS related entries are buffer_head buffers. crash> kmem -i PAGES TOTAL PERCENTAGE TOTAL MEM 614948 2.3 GB ---- FREE 44698 174.6 MB 7% of TOTAL MEM USED 570250 2.2 GB 92% of TOTAL MEM SHARED 24930 97.4 MB 4% of TOTAL MEM BUFFERS 415 1.6 MB 0% of TOTAL MEM CACHED 24965 97.5 MB 4% of TOTAL MEM SLAB 59052 230.7 MB 9% of TOTAL MEM TOTAL HUGE 0 0 ---- HUGE FREE 0 0 0% of TOTAL HUGE TOTAL SWAP 0 0 ---- SWAP USED 0 0 0% of TOTAL SWAP SWAP FREE 0 0 0% of TOTAL SWAP COMMIT LIMIT 307474 1.2 GB ---- COMMITTED 1389447 5.3 GB 451% of TOTAL LIMIT crash> crash> kmem -V VM_STAT: NR_FREE_PAGES: 44698 NR_INACTIVE_ANON: 3554 NR_ACTIVE_ANON: 294558 NR_INACTIVE_FILE: 186610 NR_ACTIVE_FILE: 5002 For e.g., here is one of the pages I looked at. crash> kmem ffffea000004cf20 PAGE PHYSICAL MAPPING INDEX CNT FLAGS ffffea000004cf20 15fc000 0 698 1 1ffc000000082c referenced,uptodate,lru,private crash> struct page.lru -x ffffea000004cf20 lru = { next = 0xffffea0000322dc8, prev = 0xffffea000004cf78 } crash> list 0xffffea0000322dc8 | wc -l 186611 crash> struct page.private -x ffffea000004cf20 private = 0xffff8800282b2540 crash> kmem 0xffff8800282b2540 CACHE OBJSIZE ALLOCATED TOTAL SLABS SSIZE NAME ffff880095c16200 104 170225 172383 4659 4k buffer_head SLAB MEMORY TOTAL ALLOCATED FREE ffff8800282b2000 ffff8800282b20c8 37 37 0 FREE / [ALLOCATED] [ffff8800282b2540] PAGE PHYSICAL MAPPING INDEX CNT FLAGS ffffea00008c96f0 282b2000 0 0 1 1ffc0000000080 slab crash> buffer_head -x ffff8800282b2540 struct buffer_head { b_state = 0x100001, b_this_page = 0xffff8800282b2540, b_page = 0xffffea000004cf20, b_blocknr = 0x11698, b_size = 0x1000, b_data = 0xffff8800015fc000 "", b_bdev = 0x0, b_end_io = 0x0, b_private = 0x0, b_assoc_buffers = { next = 0xffff8800282b2588, prev = 0xffff8800282b2588 }, b_assoc_map = 0x0, b_count = { counter = 0x0 } } The number of buffer_head cache objects in-use seems close to the total number of in-use pages (170K buffer_head vs 186K pages). Pages have a non-zero refcount, while the buffer head itself has a zero ref-count. These buffer-heads are used by EXT4 and JBD2 (of the ones that we use in our VM) kernel modules, neither of which i have too much of an idea of. From the looks of it, the user of the buffer_head seems to have cleared the bh fields and done a put, but hasn't freed up the cache object. I've been combing through the upstream commits to see if anything remotely resembles this symptom but i haven't found anything yet. I can't recreate this issue, so debugging has gotten harder. Last i saw this was a month back and it hasn't happened ever since. Has anyone seen or debugged an issue like this? Or what would be a good approach to nail this down? Is it possible that this is somehow tied up to a funky user space trickery? TIA PS: Everything in the VM runs as root. i know its not safe, but it wasn't my decision. |
All times are GMT -5. The time now is 09:47 AM. |