LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Obtaining a more detailed breakdown of RAM usage? (https://www.linuxquestions.org/questions/linux-newbie-8/obtaining-a-more-detailed-breakdown-of-ram-usage-774403/)

SirTristan 12-09-2009 12:48 AM

Obtaining a more detailed breakdown of RAM usage?
 
My server keeps running out of RAM, leading to swap usage and often an a need for a reboot.

In 'top' I see a bunch of httpd processes using up about 1% of memory each, but no further breakdown of what account, script, path etc. is responsible.

How could one obtain more thorough details about what specifically is causing the shortage?

mail4vijay 12-09-2009 12:55 AM

Quote:

Originally Posted by SirTristan (Post 3785008)
My server keeps running out of RAM, leading to swap usage and often an a need for a reboot.

In 'top' I see a bunch of httpd processes using up about 1% of memory each, but no further breakdown of what account, script, path etc. is responsible.

How could one obtain more thorough details about what specifically is causing the shortage?

Hi Can you take a look on this link for Linux monitoring
http://www.cyberciti.biz/tips/top-li....html#comments

johnsfine 12-09-2009 07:02 AM

Quote:

Originally Posted by SirTristan (Post 3785008)
My server keeps running out of RAM, leading to swap usage

Are you assuming that swap usage is automatically bad? A little swap usage is normal and healthy for a Linux system and doesn't indicate "running out of RAM".

Quote:

and often an a need for a reboot.
Explain. What makes you think you need to reboot?

Quote:

In 'top' I see a bunch of httpd processes using up about 1% of memory each,
Unless a "bunch" is more than I would expect, why do you care about 1% of memory each?

Quote:

but no further breakdown of what account, script, path etc. is responsible.
I think you can get that info from top. Maybe you can get more detail from ps. Read the man or info pages for top and/or ps.

Quote:

How could one obtain more thorough details about what specifically is causing the shortage?
At a time when you think there is a shortage, do
Code:

cat /proc/meminfo
and post the results.

Probably someone will then explain why there isn't actually any shortage. But if there is a shortage, we will then know enough to tell you more about diagnosing or correcting it.

SirTristan 12-09-2009 10:44 AM

Quote:

Originally Posted by mail4vijay (Post 3785013)
Hi Can you take a look on this link for Linux monitoring
http://www.cyberciti.biz/tips/top-li....html#comments

Thank you. Lot of stuff there - and unfortunately I don't know nearly enough about Linux to understand the output of most of those commands. Or which commands might give the info I need.

This is a web server (although all content on the server is my own), and the only info I can find is that apache/httpd and perhaps mysql are to blame. That's not enough info for me to debug with though. Clearly I have a RAM leak in some script, but I haven't been able to locate it at all yet. Ideally what I'd like to know is what specific URL(s) are responsible for the most RAM use. Is that possible?
Quote:

Originally Posted by johnsfine (Post 3785391)
Are you assuming that swap usage is automatically bad? A little swap usage is normal and healthy for a Linux system and doesn't indicate "running out of RAM".

The server crashes and becomes nonresponsive. The swap use leads to server load in the hundreds.
Quote:

Explain. What makes you think you need to reboot?
The server crashes completely and I cannot log in via SSH. A reboot is definitely required.
Quote:

Unless a "bunch" is more than I would expect, why do you care about 1% of memory each?
I had 4GB of RAM and it would be used up totally. I upgraded to 6GB and it still gets maxed out from time to time, although there's been no crash yet since the upgrade.

At the moment there isn't a shortage, top says there's about 1.2 gigs free. But here's the meminfo results:
Code:

MemTotal:      6097676 kB
MemFree:      1223860 kB
Buffers:        178772 kB
Cached:        2607836 kB
SwapCached:          0 kB
Active:        3446984 kB
Inactive:      1196752 kB
HighTotal:          0 kB
HighFree:            0 kB
LowTotal:      6097676 kB
LowFree:      1223860 kB
SwapTotal:    4104596 kB
SwapFree:      4104428 kB
Dirty:            4664 kB
Writeback:          0 kB
AnonPages:    1857160 kB
Mapped:          24860 kB
Slab:          162440 kB
PageTables:      33424 kB
NFS_Unstable:        0 kB
Bounce:              0 kB
CommitLimit:  7153432 kB
Committed_AS:  3293660 kB
VmallocTotal: 34359738367 kB
VmallocUsed:    266324 kB
VmallocChunk: 34359470875 kB
HugePages_Total:    0
HugePages_Free:      0
HugePages_Rsvd:      0
Hugepagesize:    2048 kB


zQUEz 12-09-2009 11:43 AM

In your meminfo listing your free RAM is actually closer to 4GB.
Your "cached" value represents file cache which will be dropped by the kernel if RAM becomes exhausted.
Also, that 1% of memory that httpd is using as reported by top may not be entirely accurate.
This is because (if memory serves), shared memory can be double counted.
When the server gets close to a point of needing a reboot, try the following and past the output:
`ps ax -o pid,comm,%mem,rss`

lazlow 12-09-2009 12:34 PM

Are you running 32bit with PAE or 64bit?

johnsfine 12-09-2009 12:40 PM

Quote:

Originally Posted by SirTristan (Post 3785634)
The server crashes and becomes nonresponsive.

That will need some diagnosis. The meminfo during a heavier load would probably help.

But you should add more swap space (another swap partition or a swap file or increase the current swap partition size). That will let the system slow down more gracefully as it becomes overloaded, so you have more opportunity to diagnose what went wrong (and so your work is less at risk to badly timed reboots).

Quote:

The swap use leads to server load in the hundreds.
I don't know what you mean, but I expect the swap use is more symptom than cause.

You may be hitting the commit limit, meaning your system would run OK with more swap space or maybe just with different commit parameters.

Quote:

I had 4GB of RAM and it would be used up totally. I upgraded to 6GB and it still gets maxed out from time to time, although there's been no crash yet since the upgrade.
Since you didn't understand that most of the Cached and Buffers memory is actually free, we can't trust your report that even the 4GB of RAM was used up. Likely the 6GB wasn't.

You can hit a commit limit without actually using that much memory.

Quote:

At the moment there isn't a shortage, top says there's about 1.2 gigs free. But here's the meminfo results:
That info under heavier load would be more informative.

Code:

MemFree:      1223860 kB
Buffers:        178772 kB
Cached:        2607836 kB

The effective free memory is all of MemFree plus some of Buffers, plus most of Cached.

Code:

AnonPages:    1857160 kB
That is moderately high. I expect it gets much higher when you have problems. So you do want to find out which processes have that memory and why and whether it represents an error that should be corrected.

Code:

CommitLimit:  7153432 kB
Committed_AS:  3293660 kB

Those may represent the start of the problem. I don't recall enough general info about that nor know enough specific info about your system to say for sure.

Quote:

Originally Posted by lazlow (Post 3785719)
Are you running 32bit with PAE or 64bit?

Code:

VmallocChunk: 34359470875 kB
What do you think? If that value is correct, it must be x86_64. If that value is incorrect, it must be x86_64 (because the bug in generating that value was specific to x86_64). I'm pretty sure it is correct (the bug would display an even bigger value and IIUC the patch for that bug was available over 4 years ago).

SirTristan 12-09-2009 12:48 PM

Quote:

Originally Posted by lazlow (Post 3785719)
Are you running 32bit with PAE or 64bit?

64bit.
Quote:

Originally Posted by johnsfine (Post 3785724)
Since you didn't understand that most of the Cached and Buffers memory is actually free, we can't trust your report that even the 4GB of RAM was used up. Likely the 6GB wasn't.

Just trust me - that is what support said as well. My RAM was used up leading to mass swapping and an ever-increasing CPU load spiral of death. Since the 6GB upgrade I haven't surpassed the limits yet. Hopefully this issue doesn't recur now.
Quote:

The effective free memory is all of MemFree plus some of Buffers, plus most of Cached.
Thanks. What does "AnonPages", "CommitLimit", and "Committed_AS" mean?

johnsfine 12-09-2009 01:12 PM

Quote:

Originally Posted by SirTristan (Post 3785732)
My RAM was used up leading to mass swapping and an ever-increasing CPU load spiral of death.

Normally too much swapping leads to very low CPU utilization. When you say "increasing CPU load", what exactly are you measuring? Does that measure tell you user mode vs. kernel mode.

In Windows, I have seen a behavior in which nearly all available time is spent soft faulting pages between a process's working set and the page cache. So the kernel CPU time is nearly 100% and no significant work gets done. I have no idea how Windows decides the working set size for each process. This sick behavior occurs when the working set size needs to be large, but isn't. The process could run just fine with available physical ram if its working set size were larger.

I have never seen any similar behavior from Linux. But I can't say for sure that it never happens. But if a shortage of memory causes high CPU use, it is hard to imagine an underlying mechanism other than excess soft faults.

I'm not sure in Linux how you measure the rate of soft faults.

Are you sure the high CPU load was not normal under the workload? Maybe the swapping was normal and doing no harm and the CPU was fully used because the system was given that much work to do.

Quote:

What does "AnonPages", "CommitLimit", and "Committed_AS" mean?
The Anonymous pages are what most people think of as ordinary memory use. It is the part of each process's memory that would be paged out to the swap area if paged out. In typical modern Linux systems, a large fraction of virtual memory use is either direct mapped to files (the executable and .so files and some data files) or not really used at all (demand zero or copy on write, etc.). Those things are not anonymous pages. So having as large a fraction anonymous as you do is less common and may indicate some service you are running has a memory leak.

Because of the large amount of virtual memory (demand zero, etc.) that isn't any form of real memory nor mapping, the OS must work with an estimate of the total level of ram+swap that it needs. It may reject memory allocation requests even when there is plenty of memory if it over estimates the fraction of demand zero et. that will become real memory use.

In Windows that problem is more common and often requires that you have significantly more swap space than you will ever actually use. So far as I know there is no work around other than allocating excess swap space.

In Linux, the problem is less common and excess swap space is not the only work around. But excess swap space is still the simplest work around (requires a less detailed understanding of the local characteristics of the problem than tweaking the commit parameters requires).

chrism01 12-09-2009 05:08 PM

Good stuff from johnsfine.

@OP: if the new mem is holding up that's good. You could also look at tweaking the num of processes that Apache uses.
See http://httpd.apache.org/docs/2.2/mod/directives.html and checkout the directives mentioning servers/child/threads/requests.

syg00 12-09-2009 07:15 PM

Quote:

My RAM was used up leading to mass swapping and an ever-increasing CPU load spiral of death.
and
Quote:

The swap use leads to server load in the hundreds.
Sounds like disk contention to me. Is this a hosted server - virtualized maybe ?. How many disks/controllers ?. If virtualized how many *real* disks/controllers - and how many guests share them ?.

Regardless, that is all symptom, not (strictly) the underlying problem. As you said, you need to fix the leak. To track process memory consumption over time, look at pidstat in the sysstat package - needs a reasonably recent kernel/sysstat environment. Collectl also offers similar, but is not widely installed by default.
Else try the following based on the link above "ps -aux | sort -nr -k 4 | head -n 15" If you like the result, maybe kick off a background loop to record it
Code:

while true ; do  ps -aux | sort -nr -k 4 | head -n 15 >> memhogs.txt ; echo -e "\n-----------------\n" >> memhogs.txt ; sleep 300 ; done &
That will write the data out each 5 minutes - that file could get big; adjust time as necessary. When it starts it will give a response like [1] 9999 - to kill it use the number in the square brackets "kill %1". If you have to reboot the output file will still be usable.

SirTristan 12-09-2009 09:57 PM

Thanks a lot for the help guys :) If the issue recurs I'll post more info. I'm also going to be increasing swap space from 4GB to 8GB.
Quote:

Originally Posted by johnsfine (Post 3785758)
Normally too much swapping leads to very low CPU utilization. When you say "increasing CPU load", what exactly are you measuring?

I'm referring to what's shown by the 'uptime' command. The server has 8 CPUs so I believe a load of 8 means all CPUs are fully utilized - I could be mistaken on that, but loads higher than about 10 lead to noticeable lag. The load spiral would lead to loads that would never cease increasing and end up in the hundreds, crashing the machine.
Quote:

Are you sure the high CPU load was not normal under the workload? Maybe the swapping was normal and doing no harm and the CPU was fully used because the system was given that much work to do.
The workload may have been somewhat high, but nothing all that abnormal - the absurd load levels seem to have been caused by some vicious cycle where processes got jammed somehow.
Quote:

Originally Posted by syg00 (Post 3786027)
Sounds like disk contention to me. Is this a hosted server - virtualized maybe ?. How many disks/controllers ?. If virtualized how many *real* disks/controllers - and how many guests share them ?.

It's a dedicated hosting server, but it only hosts my own content. The disk has 1 4 port and 1 2 port SATA controller.
Quote:

Else try the following based on the link above "ps -aux | sort -nr -k 4 | head -n 15"
The thing with ps is that it shows httpd as the process, but I already know apache processes are the cause - there's many different web pages on the server though that could be to blame, and I don't know which one to target.

ramram29 12-09-2009 11:21 PM

That doesn't sound right. You can run a very fast web server even with 128MB of memory and no swap. It may not be memory at all. You may have network congestion. Have you tried running netstat to see if you are maxing out on established connections. Ideally you can get up to 65,356 simultaneous established connections but it's really much lower than that. You may be getting 30,000. In that case you can add 1TB of memory and your bottleneck will still be your network established connections. When that occurs then it's time to divide the load using two or more load balancing servers - divide and conquer!


All times are GMT -5. The time now is 01:59 AM.