Linux high memory usage used by no process
1 Attachment(s)
Hi,
Had I seen this thread title, by somebody else, I would have easily assumed that it would have been the cache. However I do not think is the issue. Following is the output of free -m: Code:
total used free shared buffers cached Attached is the top output (sorted by memory usage) of the server. The only process that claims any memory is the java process which uses only 2.8 of the memory. Following is the java memory options of the java: Code:
-Xms1G -Xmx8G -XX:MaxPermSize=1024m Server is Redhat 6.5, 128G RAM, 6*2.7G CPUS This used memory grows over the time very rapidly. You can see, from the top's output, the server is up for only a day and the used memory has already shot up to 42G despite of only 3.5G usage by the java process. This java process is an apache-tomcat-7.0.54 container. There may be memory leaks in the deployed application. However, I do not see a drop in the used memory even after stopping the application (the java/tomcat process). Please let me know if I am missing anything or need more information in this regard. Thanks |
Quote:
http://www.linuxatemyram.com/ There are a few things you say that indicate you haven't done much troubleshooting thus far. First, you say 'There may be memory leaks in the deployed applicaton'....so have you CHECKED THAT??? Second, you say you're using Red Hat Enterprise 6.5...while it's a bit old, it's still fully supported. Are you PAYING for RHEL??? Have you applied the kernel patches/updates/bugfixes that Red Hat makes available to you, when you purchase Red Hat Enterprise??? And, when you do pay for RHEL, you pay for support...have you contacted Red Hat, and had them work with you? Submitted a trace/dump of things so they can assist???? No one here is going to be able to diagnose your Java application, since we don't have the code, and can't run it to even BEGIN to duplicate the error, so before anything else, you and your team are going to have to diagnose your own application to rule it out, BEFORE looking at the Linux side of things. |
Well, first of all, your Java process is using 12.3g of total virtual memory. That's about 10 of the total RAM.
Also, since your consuming buffers and cache, your CPU shows little activity, and you aren't swapping, what's the problem? If you leave the server up, does it ever run into performance issues? Or does the consumed RAM % climb until it's in the 90+% and just stay there? I ask, because from your description and info you put in your post, it almost sounds like normal operation to me. |
If I were diagnosing this, I would look at slab allocations next.
That includes some other kinds of caching that are not reported as buffers or as cache. So it is possible that you have a perfectly normal behavior: excess ram correctly used for some kind of caching because there is nothing else trying to use it. It is also possible that something is seriously wrong and should be diagnosed. Quote:
Quote:
Quote:
Quote:
Probably this symptom will never become serious. There is a good chance there is no underlying malfunction and strange as the symptom looks, it is still a manifestation of normal behavior. But as long as we don't understand the symptom, the OP may have very good reason to choose to investigate now instead of waiting for the symptom to produce harm. Quote:
A "resource leak" as opposed to "memory leak" typically shows up as memory use outside the process with the leak. Common resource leaks in Linux show up looking like memory leaks in other processes, most often X. So those are equally ruled out. In Windows, a resource leak in an application typically looks like a memory leak in the kernel (as opposed to in some other process). That is less likely in Linux, but not impossible. So the next step of investigating this should be looking for something that looks like a memory leak in the kernel. Assuming you find that, you would then want to know whether that apparent kernel memory leak is really a kernel memory leak, or is an application resource leak, or is a normal caching behavior. |
Hi TBOne,
My Redhat is a PAS instance. I do not have control over it. I have not checked the memory leak. However, as I mentioned, stopping the process does not give the memory back to the OS. The java application is a third party one (so no code for me either) and I am finding it difficult to find out any memory leak in it. As because of the fact that stopping the process is not giving the memory back and it is a third party application, I am only trying to find out other ways of investigating the problem here, if there is any. I am neither seeking somebody to solve my problem nor wanting anybody to diagnose my java application. Thanks for the link; I didn't raise it without doing any work by myself. Just using Linux for seven years would not make me an expert; I am still learning. Thanks for your time. Hi Gawdly, My java memory settings is configured to use a maximum of only 8G. As I have mentioned, the server is up only for a day, the memory usage has gone up to 40+G and the java process is using only 3.5G. Given the facts that no other notable consumption by any other process and stopping the process is not giving the memory back I cant find any reasons why it is normal. Thanks |
As per @GaWdLy - is it affecting your ability to service your users ?.
If not, go find a real problem to worry about. Linux uses "lazy" (de-)allocation of RAM. It costs too much (especially on large RAM machines) to constantly run the allocation queues to move pages from allocated to non-allocated if no-one wants those page frames. You have loads of unwanted RAM - some of which probably was used at one point, and the process using it finished (probably java, but that's just a guess). Those used pages have not been moved from the allocated q because of that truckload of spare memory, so they appear still allocated even though the owning process has ended. |
Quote:
I haven't read the relevant portions of the kernel source code, so I can't say for sure you are wrong. But what you say does not fit my experience and you are giving a generally un-testable hypothesis as a reason to avoid trying to understand a symptom. Your bottom line is likely correct: the symptom probably won't expand into something serious. Understanding it would probably have only comfort value and knowledge value and no practical value for managing the server in question. But the idea of blaming any and all hidden ram use on "lazy deallocation" is unsound. I have run giant simulations on Linux systems so massively over configured in ram that there was free (not just cache but simply free) ram through the entire simulation. When such a simulation ends, with nothing else of significance running on the system, should be the perfect example of the lazy deallocation you were talking about. But there has been no missing ram. The ram the giant process had used was back in the free pool faster than I could type the free command to check on it. |
And likewise there are too many variables involved for you to state that your experience is universally applicable - have a look in /proc/sys/vm.
|
Quote:
I would guess the OP hasn't messed with any of those policies. I know no one messed with any of those policies on the systems I tested (as described above). So looking at default policy settings is a very long shot place to start diagnosing a surprising symptom. I would have started with /proc/slabinfo If there is a kernel memory leak (unlikely) it would probably show up pretty clearly in the slabinfo. If there were an application resource leak manifesting as excessive kernel memory use, that should show up there as well. In past similar threads where the issue turned out to be cache-like system behavior not included in "buffers" or "cache" that was also quite clear in slabinfo. /proc/meminfo is also a useful place to look when the basic info you get from comparing free to the Res column of top seems to be missing something important. I'm not sure whether there are other good places in /proc to look, and even for /proc/meminfo, I haven't found good enough documentation to translate the values you might find there into a real understanding of how and why memory is used. I think (but not certain) the SReclaimable field in /proc/meminfo represents the total of those cache-like memory uses that are not included in buffers or cache. Best guess, the OP's symptom is some normal behavior, not a malfunction. Within the guess of normal behavior, my best sub-guess would be cache-like use of slab-memory. I would want to understand the details within slabinfo if it were my system and that sub-guess were correct. But it would be a good start to just see whether /proc/meminfo identifies the apparently missing ram use in any kind of semi-understandable way. |
Quote:
I totally agree with syg00. OP: is there an actual problem? Or do you use not like the way the server is handling memory? If as John says, you are right to be alarmed, then your task is to stress the box to see if it will experience trouble under pressure. |
Quote:
I think I was clear that worrying about this would represent either an extreme of caution or a high level of curiosity. I was careful not to jump to any conclusion about whether such caution and/or curiosity is a "right" use of the OP's time. I just tried to allow for that possibility and help the investigation if the OP wanted to investigate. Quote:
In case the primary motivation is curiosity, memory stress artificially added to the system might be a very effective investigational tool. But going directly to a stress test intended to cause "trouble" would still be too crude. If I wanted to stress test memory usage, I would first add a lot of swap space both for safety and diagnostic value. Then run some program (easy to code if you can't find it online) that consumes enough ram to try to take all the cache, plus all the free memory plus all the memory whose use is hidden. Then look at the response. If the hidden use falls away as easily the cache under memory pressure then you have nearly proven that it was always something innocent, such as the lazy deallocation syg00 suggested (that I disbelieve) or the SReclaimable that I guessed (that would easier to look for directly). If instead significant swap gets used, that would tend to indicate (far short of prove) a more serious situation. Quote:
I don't disagree that it is likely some form of memory that would be given back if needed. But I don't think "time" will be what it takes to cause that memory to be given back. If all is really OK it would still take memory pressure. I certainly don't guess there is a memory leak or resource leak. But I understand the preference to understand something that superficially looks like a resource leak, rather then sweep it under the rug of generalizations like "lazy deallocation". |
Quote:
Quote:
If not, go find a real problem to worry about." So, IS IT affecting your users/services?? If not, then where is the problem? Unless you're having to reboot the server routinely to get things going again, having process issues, program crashes, etc., then your server is working. Report your suspicions to the application vendor FIRST, and RHEL Support SECOND if you really want to pursue things. |
Thanks for your time and comments and apologies for the delay.
As I tried to clarify (in which I may not have succeeded) my worry was the increasing use of system memory. I neither understand nor do buy the suggestions not to worry unless it causes issues. My job is to stop them before they happen. In this case I was only finding the reason of high memory. With regards to the ownership of the java application, ideally I just cared to find the vouchers of memory by the system. In my first comment, I even mentioned that even after stopping the java application, the memory usage did not drop. Had it been a memory leak, the leaked memory should have been taken back by the OS once the process is stopped (or am I wrong?). Assume, as a system admin, after stopping the java application, I was given a server and asked to find out where the memory is being used, what are my options? Am I able to find out where all the memory has gone? However, syg00(#6) and johnsfine(#07) comments shed some lights on the issue. What I found out is that the application deals with a huge amount of disk data (about 500G) to build up some disk cache. This cache was not dropped by Linux even after the application has stopped. If I wait long enough may be around 12hrs (though I did not wait that long). If I stop the application and did a manual cache drop by issuing Code:
sync Please let me know if you need more information in this regard. Thanks for you help and time. |
So that says to me unallocated memory - i.e. nobody (currently) owns it. And there hasn't been a demand for it - IMHO you don't have a problem, you merely have a symptom.
When this arises again, avoid using "3" on drop_caches. Do the following (some requires root) and post the output as it will give more granularity to the data Code:
cat /proc/meminfo > problem.txt |
kernel 2.6.32 likes to bloat the dentry cache. On one of my systems slabtop showed dentry was 70 GB, at only 10 percent usage. free shows it as application memory - not cache. After setting vm cache pressure to 1000 it improved. Kernel 3.0 seems to be better here.
BTW writing a 2 to the drop caches took several minutes! So much to the 'is availble to applications instantly'. |
All times are GMT -5. The time now is 06:21 AM. |