process memory usage

mrfenwayfool · 02-02-2013, 01:05 PM

Hello,

I am working with an embedded system running Linux. I have a single process that has numerous threads doing the work... so there is really only a single process to worry about for all practical purposes. What I am trying to do is have this process offer IP based services as long as it has enough memory.

So for example, if my system has 8G, I want to keep allowing new connections to the services as long as there is at least 1G of free memory... but if there is less than that I want to reject new services... but continue allowing the existing connections.

Here is my problem... my process currently figures out what the current memory utilization is by looking as RSS. This works... the calculation is close enough for my purposes (yes, I know, shared memory can mess with this)... still, the calculation is fine. The BIG issue is... the RSS value seems to be a "high water mark". That is, it never shrinks since there are no other processes requesting memory... so my system stops accepting new connections even as old ones quit.

What are my options here? Is my only option to mess with malloc hooks or is there a better way to deal with this? Can I somehow "flush" my process so it gives back all unused RAM?

MrFenwayFool

syg00 · 02-02-2013, 03:55 PM

Couple of simple solutions:
- do the sums yourself from /proc/<pid>/smaps
- use drop_caches

Normally I rail against people recommending the latter because it messes with disk caching. Probably not an issue with you.

Code:

cat 3 > /proc/sys/vm/drop_caches

mrfenwayfool · 02-03-2013, 04:33 PM

Thanks for the advice.

I tried using drop_caches but I don't think that does anything to help me. The issue is that the process heap keeps expanding... sbrk() is never called to shrink the heap under normal circumstances. The result of this is that RSS shows as a "high water mark". For example, even though my process heap grows to 8G in size... that does NOT mean that my process is using ALL of that 8G at any given instant.

What I have found is that calling malloc_trim(0) every now and then from the application does indeed shrink the heap. I am playing with this but it seems calling that frequently really hits performance (there's a good reason the heap normally doesn't shrink!). Instead, I'm looking at mallinfo() which reports not only how large the heap is but also how much of the heap is actively allocated... which is really what I am really interested in.

Again, thanks for the advice!

MrFenwayFool

sundialsvcs · 02-05-2013, 09:14 AM

The approach that I would take is simply to determine how many connections you think the system should accept, and build a simple but adjustable throttle to that effect.

You can sometimes do simple things like "wrapping" the memory allocation/release calls that your application uses with code that increments or decrements a global running total ... often faster than asking the memory-allocation subsystem for that figure.

You could design a simple servo-mechanism: a thread that wakes up every (configurable... n) seconds and decides whether the throttle should be raised or lowered, within (configurable) boundaries and by some (configurable) increment. This approach would make the system become responsive to conditions as they actually are observed to take place, and reduces the need to "guess correctly."

Yes, there's a reason why the segment doesn't shrink: it forces everything to grind to a stop while a heck of a lot of page-faults take place in quick succession. Instead, the system as-designed justs the virtual memory manager do its thing. Pages that are "part of the heap," but not actively being used, get paged out. Some memory-allocators use a special "zero-fill this block of memory" system call which gives the OS the opportunity to zero the page by removing it from the VM page-tables, but I don't recall if Linux does this or not.

mrfenwayfool · 02-09-2013, 02:05 PM

Quote:

Originally Posted by sundialsvcs

The approach that I would take is simply to determine how many connections you think the system should accept, and build a simple but adjustable throttle to that effect.

You can sometimes do simple things like "wrapping" the memory allocation/release calls that your application uses with code that increments or decrements a global running total ... often faster than asking the memory-allocation subsystem for that figure.

You could design a simple servo-mechanism: a thread that wakes up every (configurable... n) seconds and decides whether the throttle should be raised or lowered, within (configurable) boundaries and by some (configurable) increment. This approach would make the system become responsive to conditions as they actually are observed to take place, and reduces the need to "guess correctly."

Yes, there's a reason why the segment doesn't shrink: it forces everything to grind to a stop while a heck of a lot of page-faults take place in quick succession. Instead, the system as-designed justs the virtual memory manager do its thing. Pages that are "part of the heap," but not actively being used, get paged out. Some memory-allocators use a special "zero-fill this block of memory" system call which gives the OS the opportunity to zero the page by removing it from the VM page-tables, but I don't recall if Linux does this or not.

Hmmm... this has turned into a long and winding road.

Looking at malloc_stats output has enlightened me a bit on the memory usage. That is, there are multiple arenas of memory since libc is trying to prevent blocking due to multiple threads running (my particular application has many, many threads due to its long history). Anyway, I thought perhaps I could limit the number of arenas using MALLOC_ARENA_MAX... and while that did limit the arenas... I'm still seeing "in use" memory for each arena far below the "system space"... meaning the arenas are growing even though there *should* be plenty of memory available. I'm wondering if this is fragmentation or fastbins using up memory... then being freed... and never used again. My application is a massive C++ application. I'm going to look into this more closely. Seems like my application is not using as much memory as I thought if you can believe the statistics from malloc_stats. I'm also wondering if I should give tcmalloc a try and see if it has similar issues. Regardless... something is not right here.

sundialsvcs · 02-10-2013, 02:18 PM

Be very alert as to exactly which metric you are observing. When space is released and available for recycling, many metrics do not go down. Also, processes in a virtual memory environment can't see what the operating system's VM manager is doing (although the software is designed to be friendly to it).

Easily the best way to get meaningful statistics is to arrange for the process, itself, to create log datasets (or externally-visible in-memory "trace tables") which describe the workflow characteristics in its own, application specific, therefore this-business-specific terms. Get out of the abstract and into the concrete. If the processing system tracks units-of-work, track those units. Track easily-gathered memory manager statistics and include them in the same or in a separate stream (taking care that none of the stats are expensive to obtain, so as to skew the results by virtue of gathering them). Now, pull these data out to a statistics system: look for statistically significant correlations.

Build "tweakable" controls and knobs, not so much in operating-system terms but rather in "what this software system does for a living" terms. Gather data about the system with various settings and, once again, use a stats package.

I've done this sort of thing many times, and one of the most interesting things that comes from it is: "real surprises."