LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Slackware (https://www.linuxquestions.org/questions/slackware-14/)
-   -   Tracing memory leak ? (https://www.linuxquestions.org/questions/slackware-14/tracing-memory-leak-609427/)

Yalla-One 12-27-2007 11:44 AM

Tracing memory leak ?
 
Hi,

Running top, I notice that memory used increases with 124k every minute. After a couple of hours it has consumed all available (2GB) memory and begins to swap.

What is the best way to trace down the culprit? I'm on kernel 2.6.23.12 (custom, to get built-in support for the raid controller), and have disabled most services such as ntpd, nfs, rpc and so forth...

I've run top to try to visually see which application/module increases memory consumption with 124k every minute (approximately), but it's a near impossible task.

I noticed through a google search that there was a new memory leak function in the kernel (2.6.19), but this seems to be changed as Documentation/kmemleak.txt doesn't exist on any of my recent kernels.

Any hints would be greatly appreciated!

-y1

Alien_Hominid 12-27-2007 12:11 PM

http://gentoo-wiki.com/FAQ_Linux_Memory_Management

Uncle_Theodore 12-27-2007 12:23 PM

I don't think OP's problem is as simple as just misinterpreting the output of the top command. Since the system starts to swap, it might have a real memory leak. Unfortunately, finding such is a very difficult problem. There are commercial tools for finding leaks on a working system (they can be easily found on Google), but overall, your best chance is to turn off services (start from the X server) and trying to see if the leakage stops...

Yalla-One 12-27-2007 12:32 PM

Thanks Uncle_Theodore,

The problem is that even without any services running, the system steadily eats 124k memory every minute, and thus after some hours of operation has to start swapping because all memory is consumed. This is clearly due to a leak somewhere and as you say has nothing to do with misinterpreting top or how Linux pageing works.

Since this is a hobby server I'm not able to rush into getting a commercial memoryleak troubleshooter. It's a brand new system though that's been operational for less than half a day, so I'll try a different, vanilla kernel and see if it's reproduced there, and then go the slow and painful route of adding one and one kernel module to see where it all goes wrong...

If anyone has some recommendations on where to start or how to best perform this, I'm all ears!

-y1

David1357 12-27-2007 01:15 PM

Quote:

Originally Posted by Yalla-One (Post 3003079)
If anyone has some recommendations on where to start or how to best perform this, I'm all ears!

Well, you should be able to get the source for your kernel and "make oldconfig" to setup the original options. Then "make menuconfig" and select "Kernel Debugging" as a built-in as well as the other memory related options (Debug slab, kobject debugging, etc.).

Unfortunately, this method has the chance of turning your problem into an Heisenbug.

Yalla-One 12-28-2007 05:35 AM

Solved!
 
Found the problem!
It was the Raid controller's kernel driver. When running without it, the system stays stable overnight without dying from memory exhaustion. Will either switch to soft-raid or just replace the controller with a better solution.

-y1

H_TeXMeX_H 12-28-2007 07:26 AM

If you found the driver that has the leak, maybe you should report it (if it isn't already reported), it's usually a rather easy fix once it is found.

Yalla-One 12-28-2007 07:44 AM

Yeah - the problem is that it turns out that the driver isn't opensource at all - it's HighPoint RocketRaid which has "Open Source" all over the box, but it turns out that all they make available is the object files, so no modifications are possible. It's borderline cheating the customer, since .o files hardly qualify as "open source", so I returned the RocketRaid card and got an Areca controller which has in-kernel support and functions like a charm.

So to everyone reading this thread considering a RAID controller from HighPoint, my advise would be to consider carefully again, since the driver is not open source, leaks memory, and as such is impossible to fix.

H_TeXMeX_H 12-28-2007 08:04 AM

Hah, well they should get sued for that. If the driver were open source, it might have been fixed by now.


All times are GMT -5. The time now is 05:48 AM.