LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Server (https://www.linuxquestions.org/questions/linux-server-73/)
-   -   Linux high memory usage used by no process (https://www.linuxquestions.org/questions/linux-server-73/linux-high-memory-usage-used-by-no-process-4175536748/)

guna_pmk 03-14-2015 05:46 PM

Linux high memory usage used by no process
 
1 Attachment(s)
Hi,

Had I seen this thread title, by somebody else, I would have easily assumed that it would have been the cache. However I do not think is the issue.

Following is the output of free -m:

Code:

            total      used      free    shared    buffers    cached
Mem:        129058      52408      76649          0      9290      1095
-/+ buffers/cache:      42023      87035
Swap:        3967          0      3967

See the buffers/cache claims that the used memory is 42G out of 128G.

Attached is the top output (sorted by memory usage) of the server. The only process that claims any memory is the java process which uses only 2.8 of the memory. Following is the java memory options of the java:

Code:

-Xms1G -Xmx8G -XX:MaxPermSize=1024m
According to top the java process uses only about 3.5G memory and no other process uses any memory (or negligible memory) at all. What holds the rest of about 40G?

Server is Redhat 6.5, 128G RAM, 6*2.7G CPUS

This used memory grows over the time very rapidly. You can see, from the top's output, the server is up for only a day and the used memory has already shot up to 42G despite of only 3.5G usage by the java process.

This java process is an apache-tomcat-7.0.54 container.

There may be memory leaks in the deployed application. However, I do not see a drop in the used memory even after stopping the application (the java/tomcat process).

Please let me know if I am missing anything or need more information in this regard.

Thanks

TB0ne 03-14-2015 06:16 PM

Quote:

Originally Posted by guna_pmk (Post 5332235)
Hi,
Had I seen this thread title, by somebody else, I would have easily assumed that it would have been the cache. However I do not think is the issue. Following is the output of free -m:
Code:

            total      used      free    shared    buffers    cached
Mem:        129058      52408      76649          0      9290      1095
-/+ buffers/cache:      42023      87035
Swap:        3967          0      3967

See the buffers/cache claims that the used memory is 42G out of 128G. Attached is the top output (sorted by memory usage) of the server. The only process that claims any memory is the java process which uses only 2.8 of the memory. Following is the java memory options of the java:
Code:

-Xms1G -Xmx8G -XX:MaxPermSize=1024m
According to top the java process uses only about 3.5G memory and no other process uses any memory (or negligible memory) at all. What holds the rest of about 40G? Server is Redhat 6.5, 128G RAM, 6*2.7G CPUS

This used memory grows over the time very rapidly. You can see, from the top's output, the server is up for only a day and the used memory has already shot up to 42G despite of only 3.5G usage by the java process. This java process is an apache-tomcat-7.0.54 container.

There may be memory leaks in the deployed application. However, I do not see a drop in the used memory even after stopping the application (the java/tomcat process). Please let me know if I am missing anything or need more information in this regard.

After using Linux for seven years now, you should be familiar with how to correctly check memory, and how to perform basic diagnostics on a server.
http://www.linuxatemyram.com/

There are a few things you say that indicate you haven't done much troubleshooting thus far. First, you say 'There may be memory leaks in the deployed applicaton'....so have you CHECKED THAT??? Second, you say you're using Red Hat Enterprise 6.5...while it's a bit old, it's still fully supported. Are you PAYING for RHEL??? Have you applied the kernel patches/updates/bugfixes that Red Hat makes available to you, when you purchase Red Hat Enterprise??? And, when you do pay for RHEL, you pay for support...have you contacted Red Hat, and had them work with you? Submitted a trace/dump of things so they can assist????

No one here is going to be able to diagnose your Java application, since we don't have the code, and can't run it to even BEGIN to duplicate the error, so before anything else, you and your team are going to have to diagnose your own application to rule it out, BEFORE looking at the Linux side of things.

GaWdLy 03-14-2015 06:19 PM

Well, first of all, your Java process is using 12.3g of total virtual memory. That's about 10 of the total RAM.

Also, since your consuming buffers and cache, your CPU shows little activity, and you aren't swapping, what's the problem? If you leave the server up, does it ever run into performance issues? Or does the consumed RAM % climb until it's in the 90+% and just stay there?

I ask, because from your description and info you put in your post, it almost sounds like normal operation to me.

johnsfine 03-15-2015 05:30 AM

If I were diagnosing this, I would look at slab allocations next.

That includes some other kinds of caching that are not reported as buffers or as cache. So it is possible that you have a perfectly normal behavior: excess ram correctly used for some kind of caching because there is nothing else trying to use it. It is also possible that something is seriously wrong and should be diagnosed.

Quote:

Originally Posted by guna_pmk (Post 5332235)
I would have easily assumed that it would have been the cache. However I do not think is the issue.

I agree. You posted the information needed to distinguish your situation from the more common "problem" answered by the "LinuxAteMyRam" page. So you should have gotten better replies from experts.

Quote:

Originally Posted by TB0ne (Post 5332242)
After using Linux for seven years now, you should be familiar with how to correctly check memory, and how to perform basic diagnostics on a server.
http://www.linuxatemyram.com/

So how long have you been using Linux? I would think long enough to realize what a useless distraction that link is for the cases where the issue is clearly something else entirely.

Quote:

Originally Posted by GaWdLy (Post 5332244)
Well, first of all, your Java process is using 12.3g of total virtual memory. That's about 10 of the total RAM.

That is a completely wrong point of view (enough so that I am upset by whoever clicked the link that your post was helpful. In this forum, we should be able to demonstrate a higher level of competence in giving and identifying good advice for moderately hard problems). Virtual memory of a process as a fraction of physical ram is usually a meaningless ratio. In a 64-bit system, virtual memory as a measure of any kind of resource usage is typically meaningless. We already know that the system wide swap usage is insignificant, so we know for certain that this is NOT one of the rare cases in which that virtual memory number has useful meaning. We know for sure that the Res column in top is meaningful and the VIRT column in top is useless.

Quote:

Also, since your consuming buffers and cache, your CPU shows little activity, and you aren't swapping, what's the problem? If you leave the server up, does it ever run into performance issues? Or does the consumed RAM climb until it's in the 90+ and just stay there?

I ask, because from your description and info you put in your post, it almost sounds like normal operation to me.
If the server is important enough, or if the OP is trying to learn, then ignoring a symptom until it becomes serious is not the best course.

Probably this symptom will never become serious. There is a good chance there is no underlying malfunction and strange as the symptom looks, it is still a manifestation of normal behavior. But as long as we don't understand the symptom, the OP may have very good reason to choose to investigate now instead of waiting for the symptom to produce harm.

Quote:

Originally Posted by guna_pmk (Post 5332235)
There may be memory leaks in the deployed application. However, I do not see a drop in the used memory even after stopping the application (the java/tomcat process).

If it is a memory leak in an application and that application were visible in top, then (because we know swap usage is insignificant) the leak would show up in the Res column in top. So absent some theory for how applications might be hidden from top, you seem to have ruled that out.

A "resource leak" as opposed to "memory leak" typically shows up as memory use outside the process with the leak. Common resource leaks in Linux show up looking like memory leaks in other processes, most often X. So those are equally ruled out. In Windows, a resource leak in an application typically looks like a memory leak in the kernel (as opposed to in some other process). That is less likely in Linux, but not impossible. So the next step of investigating this should be looking for something that looks like a memory leak in the kernel. Assuming you find that, you would then want to know whether that apparent kernel memory leak is really a kernel memory leak, or is an application resource leak, or is a normal caching behavior.

guna_pmk 03-15-2015 05:43 AM

Hi TBOne,

My Redhat is a PAS instance. I do not have control over it. I have not checked the memory leak. However, as I mentioned, stopping the process does not give the memory back to the OS. The java application is a third party one (so no code for me either) and I am finding it difficult to find out any memory leak in it. As because of the fact that stopping the process is not giving the memory back and it is a third party application, I am only trying to find out other ways of investigating the problem here, if there is any. I am neither seeking somebody to solve my problem nor wanting anybody to diagnose my java application. Thanks for the link; I didn't raise it without doing any work by myself. Just using Linux for seven years would not make me an expert; I am still learning. Thanks for your time.

Hi Gawdly,

My java memory settings is configured to use a maximum of only 8G. As I have mentioned, the server is up only for a day, the memory usage has gone up to 40+G and the java process is using only 3.5G. Given the facts that no other notable consumption by any other process and stopping the process is not giving the memory back I cant find any reasons why it is normal.

Thanks

syg00 03-15-2015 06:01 AM

As per @GaWdLy - is it affecting your ability to service your users ?.
If not, go find a real problem to worry about.

Linux uses "lazy" (de-)allocation of RAM. It costs too much (especially on large RAM machines) to constantly run the allocation queues to move pages from allocated to non-allocated if no-one wants those page frames.
You have loads of unwanted RAM - some of which probably was used at one point, and the process using it finished (probably java, but that's just a guess). Those used pages have not been moved from the allocated q because of that truckload of spare memory, so they appear still allocated even though the owning process has ended.

johnsfine 03-15-2015 06:37 AM

Quote:

Originally Posted by syg00 (Post 5332400)
Linux uses "lazy" (de-)allocation of RAM. It costs too much (especially on large RAM machines) to constantly run the allocation queues to move pages from allocated to non-allocated if no-one wants those page frames.
You have loads of unwanted RAM - some of which probably was used at one point, and the process using it finished (probably java, but that's just a guess). Those used pages have not been moved from the allocated q because of that truckload of spare memory, so they appear still allocated even though the owning process has ended.

I think you are wrong, especially about what is likely in the case being discussed, but also generally about the significance of "lazy de-allocation".

I haven't read the relevant portions of the kernel source code, so I can't say for sure you are wrong. But what you say does not fit my experience and you are giving a generally un-testable hypothesis as a reason to avoid trying to understand a symptom. Your bottom line is likely correct: the symptom probably won't expand into something serious.
Understanding it would probably have only comfort value and knowledge value and no practical value for managing the server in question. But the idea of blaming any and all hidden ram use on "lazy deallocation" is unsound.

I have run giant simulations on Linux systems so massively over configured in ram that there was free (not just cache but simply free) ram through the entire simulation. When such a simulation ends, with nothing else of significance running on the system, should be the perfect example of the lazy deallocation you were talking about. But there has been no missing ram. The ram the giant process had used was back in the free pool faster than I could type the free command to check on it.

syg00 03-15-2015 07:08 AM

And likewise there are too many variables involved for you to state that your experience is universally applicable - have a look in /proc/sys/vm.

johnsfine 03-15-2015 07:35 AM

Quote:

Originally Posted by syg00 (Post 5332433)
have a look in /proc/sys/vm.

IIUC, those are all vm policy settings, rather than any information about the current status of VM.

I would guess the OP hasn't messed with any of those policies. I know no one messed with any of those policies on the systems I tested (as described above). So looking at default policy settings is a very long shot place to start diagnosing a surprising symptom.

I would have started with /proc/slabinfo

If there is a kernel memory leak (unlikely) it would probably show up pretty clearly in the slabinfo. If there were an application resource leak manifesting as excessive kernel memory use, that should show up there as well. In past similar threads where the issue turned out to be cache-like system behavior not included in "buffers" or "cache" that was also quite clear in slabinfo.

/proc/meminfo is also a useful place to look when the basic info you get from comparing free to the Res column of top seems to be missing something important. I'm not sure whether there are other good places in /proc to look, and even for /proc/meminfo, I haven't found good enough documentation to translate the values you might find there into a real understanding of how and why memory is used.

I think (but not certain) the SReclaimable field in /proc/meminfo represents the total of those cache-like memory uses that are not included in buffers or cache. Best guess, the OP's symptom is some normal behavior, not a malfunction. Within the guess of normal behavior, my best sub-guess would be cache-like use of slab-memory. I would want to understand the details within slabinfo if it were my system and that sub-guess were correct. But it would be a good start to just see whether /proc/meminfo identifies the apparently missing ram use in any kind of semi-understandable way.

GaWdLy 03-15-2015 09:14 AM

Quote:

Originally Posted by johnsfine (Post 5332389)
If I were diagnosing this, I would look at slab allocations next.

That includes some other kinds of caching that are not reported as buffers or as cache. So it is possible that you have a perfectly normal behavior: excess ram correctly used for some kind of caching because there is nothing else trying to use it. It is also possible that something is seriously wrong and should be diagnosed.



I agree. You posted the information needed to distinguish your situation from the more common "problem" answered by the "LinuxAteMyRam" page. So you should have gotten better replies from experts.



So how long have you been using Linux? I would think long enough to realize what a useless distraction that link is for the cases where the issue is clearly something else entirely.



That is a completely wrong point of view (enough so that I am upset by whoever clicked the link that your post was helpful. In this forum, we should be able to demonstrate a higher level of competence in giving and identifying good advice for moderately hard problems). Virtual memory of a process as a fraction of physical ram is usually a meaningless ratio. In a 64-bit system, virtual memory as a measure of any kind of resource usage is typically meaningless. We already know that the system wide swap usage is insignificant, so we know for certain that this is NOT one of the rare cases in which that virtual memory number has useful meaning. We know for sure that the Res column in top is meaningful and the VIRT column in top is useless.



If the server is important enough, or if the OP is trying to learn, then ignoring a symptom until it becomes serious is not the best course.

Probably this symptom will never become serious. There is a good chance there is no underlying malfunction and strange as the symptom looks, it is still a manifestation of normal behavior. But as long as we don't understand the symptom, the OP may have very good reason to choose to investigate now instead of waiting for the symptom to produce harm.



If it is a memory leak in an application and that application were visible in top, then (because we know swap usage is insignificant) the leak would show up in the Res column in top. So absent some theory for how applications might be hidden from top, you seem to have ruled that out.

A "resource leak" as opposed to "memory leak" typically shows up as memory use outside the process with the leak. Common resource leaks in Linux show up looking like memory leaks in other processes, most often X. So those are equally ruled out. In Windows, a resource leak in an application typically looks like a memory leak in the kernel (as opposed to in some other process). That is less likely in Linux, but not impossible. So the next step of investigating this should be looking for something that looks like a memory leak in the kernel. Assuming you find that, you would then want to know whether that apparent kernel memory leak is really a kernel memory leak, or is an application resource leak, or is a normal caching behavior.

John, normally I would do a good job defending my position, but I agree that my comment about his virt mem was wrong. Don't let it upset you too much, because I stand by my second point and that is that Java will take as much memory as you allow it, and Linux will take its time giving you back your memory.

I totally agree with syg00.

OP: is there an actual problem? Or do you use not like the way the server is handling memory? If as John says, you are right to be alarmed, then your task is to stress the box to see if it will experience trouble under pressure.

johnsfine 03-15-2015 10:36 AM

Quote:

Originally Posted by GaWdLy (Post 5332468)
If as John says, you are right to be alarmed,

I really don't think I said any such thing.

I think I was clear that worrying about this would represent either an extreme of caution or a high level of curiosity. I was careful not to jump to any conclusion about whether such caution and/or curiosity is a "right" use of the OP's time. I just tried to allow for that possibility and help the investigation if the OP wanted to investigate.

Quote:

then your task is to stress the box to see if it will experience trouble under pressure.
If the server is so important that such a level of caution would be called for, stressing it that way would be unwise. Duplicating the symptom in a test system and stressing that might be wise.

In case the primary motivation is curiosity, memory stress artificially added to the system might be a very effective investigational tool. But going directly to a stress test intended to cause "trouble" would still be too crude.

If I wanted to stress test memory usage, I would first add a lot of swap space both for safety and diagnostic value. Then run some program (easy to code if you can't find it online) that consumes enough ram to try to take all the cache, plus all the free memory plus all the memory whose use is hidden. Then look at the response. If the hidden use falls away as easily the cache under memory pressure then you have nearly proven that it was always something innocent, such as the lazy deallocation syg00 suggested (that I disbelieve) or the SReclaimable that I guessed (that would easier to look for directly). If instead significant swap gets used, that would tend to indicate (far short of prove) a more serious situation.

Quote:

Originally Posted by GaWdLy (Post 5332468)
I stand by my second point and that is that Java will take as much memory as you allow it, and Linux will take its time giving you back your memory.

And I stand by my estimate that such an explanation for the symptoms posted by the OP is almost certainly wrong. I believe it wasn't Java that used the memory. It think it wasn't anything that would have been visible earlier in top (no ordinary use by some process that is gone now but Linux is slow to garbage collect it).

I don't disagree that it is likely some form of memory that would be given back if needed. But I don't think "time" will be what it takes to cause that memory to be given back. If all is really OK it would still take memory pressure.

I certainly don't guess there is a memory leak or resource leak. But I understand the preference to understand something that superficially looks like a resource leak, rather then sweep it under the rug of generalizations like "lazy deallocation".

TB0ne 03-15-2015 10:44 AM

Quote:

Originally Posted by guna_pmk (Post 5332394)
Hi TBOne,
My Redhat is a PAS instance. I do not have control over it. I have not checked the memory leak. However, as I mentioned, stopping the process does not give the memory back to the OS. The java application is a third party one (so no code for me either) and I am finding it difficult to find out any memory leak in it. As because of the fact that stopping the process is not giving the memory back and it is a third party application, I am only trying to find out other ways of investigating the problem here, if there is any. I am neither seeking somebody to solve my problem nor wanting anybody to diagnose my java application. Thanks for the link; I didn't raise it without doing any work by myself. Just using Linux for seven years would not make me an expert; I am still learning. Thanks for your time.

So if you have no control over the Linux system, can't access the source code for the third party application, what can you do? Even IF the problem is identified, you say you can't solve it, since you can't modify the Linux server nor the application. Again, if you're using RHEL, you can try to contact Red Hat support, and they can analyze a system dump/trace, and see if they can give you any clues, but that only MIGHT work. Again, go back to the application developer with your concerns, and check that FIRST, since that's what you suspect. If you purchased the application, you purchased support, as you do with RHEL.
Quote:

My java memory settings is configured to use a maximum of only 8G. As I have mentioned, the server is up only for a day, the memory usage has gone up to 40+G and the java process is using only 3.5G. Given the facts that no other notable consumption by any other process and stopping the process is not giving the memory back I cant find any reasons why it is normal.
I'd see syg00's post #6, and go back to " is it affecting your ability to service your users ?.
If not, go find a real problem to worry about." So, IS IT affecting your users/services?? If not, then where is the problem?

Unless you're having to reboot the server routinely to get things going again, having process issues, program crashes, etc., then your server is working. Report your suspicions to the application vendor FIRST, and RHEL Support SECOND if you really want to pursue things.

guna_pmk 03-20-2015 06:24 AM

Thanks for your time and comments and apologies for the delay.

As I tried to clarify (in which I may not have succeeded) my worry was the increasing use of system memory. I neither understand nor do buy the suggestions not to worry unless it causes issues. My job is to stop them before they happen. In this case I was only finding the reason of high memory.

With regards to the ownership of the java application, ideally I just cared to find the vouchers of memory by the system. In my first comment, I even mentioned that even after stopping the java application, the memory usage did not drop. Had it been a memory leak, the leaked memory should have been taken back by the OS once the process is stopped (or am I wrong?). Assume, as a system admin, after stopping the java application, I was given a server and asked to find out where the memory is being used, what are my options? Am I able to find out where all the memory has gone?

However, syg00(#6) and johnsfine(#07) comments shed some lights on the issue. What I found out is that the application deals with a huge amount of disk data (about 500G) to build up some disk cache. This cache was not dropped by Linux even after the application has stopped. If I wait long enough may be around 12hrs (though I did not wait that long). If I stop the application and did a manual cache drop by issuing

Code:

sync
echo 3 > /proc/sys/vm/drop_caches

brought the memory back to it normal capacity which means memory will be returned when needed by applications. But why is this not indicated in the free command as cached? I don't know. Anyway, I can now say that it was a disk operation that ballooned the memory and appears to be harmless.

Please let me know if you need more information in this regard.

Thanks for you help and time.

syg00 03-20-2015 07:59 AM

So that says to me unallocated memory - i.e. nobody (currently) owns it. And there hasn't been a demand for it - IMHO you don't have a problem, you merely have a symptom.
When this arises again, avoid using "3" on drop_caches. Do the following (some requires root) and post the output as it will give more granularity to the data
Code:

cat /proc/meminfo > problem.txt
echo -e "\n\n" >> problem.txt
slabtop -s c -o | head -n 25 >> problem.txt
echo -e "\n------- echo 1 to drop_caches -----\n" >> problem.txt
echo 1 > /proc/sys/vm/drop_caches
cat /proc/meminfo >> problem.txt
echo -e "\n\n" >> problem.txt
slabtop -s c -o | head -n 25 >> problem.txt
echo -e "\n------- echo 2 to drop_caches -----\n" >> problem.txt
echo 2 > /proc/sys/vm/drop_caches
cat /proc/meminfo >> problem.txt
echo -e "\n\n" >> problem.txt
slabtop -s c -o | head -n 25 >> problem.txt


MadeInGermany 03-21-2015 03:31 PM

kernel 2.6.32 likes to bloat the dentry cache. On one of my systems slabtop showed dentry was 70 GB, at only 10 percent usage. free shows it as application memory - not cache. After setting vm cache pressure to 1000 it improved. Kernel 3.0 seems to be better here.
BTW writing a 2 to the drop caches took several minutes! So much to the 'is availble to applications instantly'.


All times are GMT -5. The time now is 06:21 AM.