Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Hello,
I have a process that will be running just fine for weeks at a time but after about two weeks of uptime the process while just be gone. I suspect it is exhausting all the memory on the machine, and since there is no swap setup it just kills the process. A long time ago I found a log that showed processes that get reaped because of memory consumption, however, I can not for the life of me find this log. Does anyone know where such a log would exist?
My Linux version : Linux 2.6.35.14-97.44.amzn1.i686 i686
(Amazon AMI)
Typically, if a program has a memory leak, or is otherwise consuming all the system memory, it will just use it all up until the kernel seizes. If the program itself has memory management built in, then perhaps it is reaping itself, in which case perhaps it has its own logging facility. Or perhaps there is another generic process on the system that manages memory?
The syslog, typically /var/log/messages, is the first place I would look, but it sounds like you've already checked there.
Typically, if a program has a memory leak, or is otherwise consuming all the system memory, it will just use it all up until the kernel seizes. If the program itself has memory management built in, then perhaps it is reaping itself, in which case perhaps it has its own logging facility. Or perhaps there is another generic process on the system that manages memory?
I do not think this is correct. During memory shortages, processes are killed by the Linux OOM (Out Of Memory) Killer as much as is necessary for the kernel to continue operating.
I'm not an expert on this, but I believe that it is all done by the kernel. There doesn't seem to be a separate log for it... I think OOM information just gets dumped to the regular kernel log which can be read by dmesg, but you should be able to filter out these messages to a separate file with syslogd if you wish.
According to the above references, you can mark a process so that it can not be killed by the OOM Killer.
I do not think this is correct. During memory shortages, processes are killed by the Linux OOM (Out Of Memory) Killer as much as is necessary for the kernel to continue operating.
I'm not an expert on this, but I believe that it is all done by the kernel. There doesn't seem to be a separate log for it... I think OOM information just gets dumped to the regular kernel log which can be read by dmesg, but you should be able to filter out these messages to a separate file with syslogd if you wish.
According to the above references, you can mark a process so that it can not be killed by the OOM Killer.
OOM stuff should appear in dmesg|/var/log/messages, though, shouldn't it? something like
"Out of Memory: Killed process 18254 (ntop)."
that's why i suspected something else killing the process. the kernel is usually pretty clear when it is doing such things.
So it looks like you are right, the oom killer is what would log such and action. However, following the ideas on this link http://stackoverflow.com/questions/6...nux-oom-killer I am not showing my process was reaped. So most likely it is not consuming to much memory. Let me give you some more information, so the process that is running is a Glassfish 3.1.2.2 server instance, the logging that glassfish provides is not helping, one second it is serving requests like normal, than the next the entire process is gone. I am not sure where else to look for a solution.
Maybe it would be better to run a small SAR (Atop, Dstat, Collectl, whatever else you fancy) and actually collect system statistics first? That, together with reviewing any bug tickets wrt Java and Glassfish and reviewing your Glassfish server settings might provide a more efficient approach because IMHO here looking for log entries is reactive, an after-the-fact op and that itself won't change or improve anything.
So it looks like you are right, the oom killer is what would log such and action. However, following the ideas on this link http://stackoverflow.com/questions/6...nux-oom-killer I am not showing my process was reaped. So most likely it is not consuming to much memory. Let me give you some more information, so the process that is running is a Glassfish 3.1.2.2 server instance, the logging that glassfish provides is not helping, one second it is serving requests like normal, than the next the entire process is gone. I am not sure where else to look for a solution.
Are you already starting it in verbose mode? e.g.:
Haven't seen much activity here on this over the last few days. Any progress?
One thing about OOM I've discovered with collectl. Normally, collectl runs every monitoring interval at exactly the same time - no drift at all, well maybe an occasional msec, but it's very accurate and never misses a sample. Whenever OOM runs, at just about the highest priority as it can, a side effect is collectl stalls and missing sampling intervals. In fact on some systems I've seen collectl stall for over a couple of minutes! When this happens, you can often find a kernel daemon running at 100% in the process log either just before collectl stalls OR when I comes back, I forget which since I don't see it very often.
In other words, if you see long stalls in collectl logs, there's a good chance OOM was running and if there are no stalls it probably wasn't.
-mark
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.