LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices


Reply
  Search this Thread
Old 07-22-2008, 04:17 AM   #1
browny_amiga
Member
 
Registered: Dec 2001
Location: /mnt/UNV/Mlkway/Earth/USA/California/Silicon Valley
Distribution: Kubuntu, Debian Buster Stable, Windoze 7
Posts: 684

Rep: Reputation: 56
Unhappy Linux Kernel flaw? DOS when out of memory...


Hi

I have a Debian Etch server that I use with NXserver as a terminal server.
Now I have noticed an issue over the last 3 years that has made me wonder if the Linux Kernel has a flaw and how it could be prevented:

Using firefox on the server over the remote NX connection, when having many many tabs open, the memory of the server runs full.
From that point on, everything freezes, the server is running the HD like crazy and becomes unresponsive (due to extreme slowness).

I know that firefox probably has a memory leak and fills up the RAM, which in turn causes massive swap space usage, which slows the server down to a crawl.
This is normal on any server/desktop system.
But then I removed the swap, because 800 MB in RAM is enough and if there is a leak, the server should run out of memory ASAP, because the kernel usually has a emergency mechanism, killing off a rapant process that asks for more and more ram, overloading the system.
In the case of the SWAP, the server would probably do that eventually, but I have let it run in this state for over 10 hours and besides extreme and nonstop HD activity, I could not even log in locally.

Deactivating the SWAP should have stopped HD activity, since the server cannot page out memory anymore. But, and this is the kicker of this whole post, IT STILL DOES!!!
I have no idea what it is frantically accessing on the HD, but the result is the same, it basically crashes.. But the problem for me is, that it is not a clean crash where the system returns to an usable state, but it stays locked and unavailable till I shut it down by the Hardware reset switch, which is probably not very good for my Software RAID that I have on that bucket (with 4 SATA HDs (250 Gbyte each)

The HW is a Asus board, a AMD 3000+ 64 bit CPU
running on 32bit Debian Etch

The question is: What is the system reading (or writing) so frantically on the HD(s), even though swap is disabled? Why does it go on for hours and hours?

Markus
 
Old 07-22-2008, 05:06 AM   #2
salasi
Senior Member
 
Registered: Jul 2007
Location: Directly above centre of the earth, UK
Distribution: SuSE, plus some hopping
Posts: 4,070

Rep: Reputation: 897Reputation: 897Reputation: 897Reputation: 897Reputation: 897Reputation: 897Reputation: 897
Quote:
Originally Posted by browny_amiga View Post
Using firefox on the server over the remote NX connection, when having many many tabs open, the memory of the server runs full.
From that point on, everything freezes, the server is running the HD like crazy and becomes unresponsive (due to extreme slowness).
That's actually slightly reassuring because I see something similar with (k)Ubuntu. In my case any browser (Opera, Firefox, Konqueror) and akregator do it; Opera does it more quickly but that's probably because opera's tab management is better and I tend to leave more tabs open at once.

Note that akregator is also a browser; so all the browsers that I've tried leak memory to some extent; that's either a big coincidence or there is, eg, some underlying library that is 'bad'. Or maybe its the GUI (tried kde and gnome, and kde is worse than gnome, but I don't really like gnome...)

On the one hand, it sounds as if going 'up' to debian with its legendary stability probably won't help me; on the other hand, someone else has the same problem.
[QUOTE]
I know that firefox probably has a memory leak and fills up the RAM, which in turn causes massive swap space usage, which slows the server down to a crawl.
This is normal on any server/desktop system.
[QUOTE]
Well the memory leak shouldn't be normal, but the consequences of one are. My observation is that the system becomes unusable when more than half the swap is used; this in itself is a bit odd as you would have expected the rate of pages going out to swap would be the critical factor; I'm guessing that there is some shift of strategy within the kernel once swap starts getting heavily used that increases hd activity, but I don't know that.
Quote:
But then I removed the swap, because 800 MB in RAM is enough and if there is a leak, the server should run out of memory ASAP, because the kernel usually has a emergency mechanism, killing off a rapant process that asks for more and more ram, overloading the system.
Well, I can see your logic, but at the point that all memory is gone, things are a bit problematic; I'm guessing that eventually things would sort themselves out but maybe on a timescale that is geological, due to the difficulty of doing the sorting out work once all the memory is gone.
[QUOTE]
Deactivating the SWAP should have stopped HD activity, since the server cannot page out memory anymore. But, and this is the kicker of this whole post, IT STILL DOES!!!
[QUOTE]
Are you sure it is still swapping or is it just trying (ineffectually) to tidy up while closing down processes?

Quote:
But the problem for me is, that it is not a clean crash where the system returns to an usable state, but it stays locked and unavailable till I shut it down by the Hardware reset switch, which is probably not very good for my Software RAID that I have on that bucket (with 4 SATA HDs (250 Gbyte each)
I got tired with it corrupting the system every so often - say one in twenty crashes; so, being pragmatic, I bought more RAM! (Which was a good thing anyway, as it was old hardware that didn't really have enough.) But that isn't an ideal solution, in any way. But now it will comfortably stay up for a month without problems and I can clean up the browser and akregator by killing and restarting just those applications.
 
Old 07-22-2008, 06:05 AM   #3
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,128

Rep: Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121
Quote:
Originally Posted by browny_amiga View Post
I know that firefox probably has a memory leak and fills up the RAM, which in turn causes massive swap space usage, which slows the server down to a crawl.
?????
Show us the evidence - pure conjecture at this point.
Quote:
Deactivating the SWAP should have stopped HD activity, since the server cannot page out memory anymore.
More conjecture - the swapoff would not have completed if there was insufficient memory to contain all the swapped storage. Swap is probably contributing, but obviously isn't the total story.
Quote:
The question is: What is the system reading (or writing) so frantically on the HD(s), even though swap is disabled? Why does it go on for hours and hours?
Shared libraries being continually discarded and refetched - because of memory pressure. Same thing with disk cache ... (Maybe) just thrashing back and fro. Guesswork really.
Do you have sysstat installed and running ???. Do you have taskstats on in the kernel ???. You need hard data.
 
Old 07-22-2008, 06:18 AM   #4
browny_amiga
Member
 
Registered: Dec 2001
Location: /mnt/UNV/Mlkway/Earth/USA/California/Silicon Valley
Distribution: Kubuntu, Debian Buster Stable, Windoze 7
Posts: 684

Original Poster
Rep: Reputation: 56
Question

<That's actually slightly reassuring because I see
>something similar with (k)Ubuntu. In my case any browser (Opera,
>Firefox, Konqueror) and akregator do it; Opera does it more quickly
>that's probably because opera's tab management is better and I tend to
>leave more tabs open at once.

I never had this with Opera, Firefox just does that.

>Note that akregator is also a browser; so all the browsers that I've
>tried leak memory to some extent; that's either a big coincidence or
>there is, eg, some underlying library that is 'bad'. Or maybe its the
>GUI (tried kde and gnome, and kde is worse than gnome, but I don't
>really like gnome...)

It is the browser, no doubt. They have a scalability problem (use it in big dimensions and it will crash). I usualy experience the problem when I have a massive number of tab open, this with embedded videos (myspace is notorious)

>On the one hand, it sounds as if going 'up' to debian with its
>legendary stability probably won't help me; on the other hand, someone
>else has the same problem.

Nope, that won't help. It seems to be the kernel and the basic system components.

>Well the memory leak shouldn't be normal, but the consequences of one
>are. My observation is that the system becomes unusable when more than
>half the swap is used;

Well, the system just goes on using more and more swap. It is built that it does not realize how slow a HD is. Transparently built that one lay does not know what the below one is, it thinks that it is fast ram.

>Well, I can see your logic, but at the point that all memory is gone,
>things are a bit problematic; I'm guessing that eventually things
>would sort themselves out but maybe on a timescale that is geological,

Nope, when the SWAP is off (or none exist) it is all ram. Once that is used up, some process is going to crash and at that point, memory is freed up. I have seen the kernel do magic sometimes, killing off processes (showing on the server console). It looked like it was trying to maintain sanity and restore stability.

>Are you sure it is still swapping or is it just trying (ineffectually)
>to tidy up while closing down processes?

No, it cannot be swaping, because I reinstalled the system when enlarging my RAID array and there is no SWAP anymore in the system, no partition for it, so it cannot be that.
But tiding up, for more than 10 hours? 10 hours of nonstop HD activity, I mean the disks rattling like crazy and the led flashing nonstop? I just don't get it....
I don't have that many programs running on the server.


>I got tired with it corrupting the system every so often - say one in
>twenty crashes; so, being pragmatic, I bought more RAM! (Which was a
>good thing anyway, as it was old hardware that didn't really have
>enough.) But that isn't an ideal solution, in any way. But now it will

Well, I guess it delays it some, but when you got a leak, it usualy grows and it will give you a few hours more before it crashes, but it will nevertheless crash.

I just wonder what the system is doing doing so much I/O on the HDs at that moment.
It should not be doing anything to the HD at all.

Last edited by browny_amiga; 07-22-2008 at 06:20 AM.
 
Old 07-22-2008, 06:52 AM   #5
browny_amiga
Member
 
Registered: Dec 2001
Location: /mnt/UNV/Mlkway/Earth/USA/California/Silicon Valley
Distribution: Kubuntu, Debian Buster Stable, Windoze 7
Posts: 684

Original Poster
Rep: Reputation: 56
Cool

>I know that firefox probably has a memory leak and fills up the RAM, >which in turn causes massive swap space usage, which slows the server >down to a crawl.


>Show us the evidence - pure conjecture at this point.

Ok, then let me be more specific. It is Firefox, and not possibly, it is certain! How can I tell? I got a memory meter, showing the memory usage.
Normally I use about 300 megs of the 800 I got. So now I got Firefox open, with like 20 tabs and the memory is almost full. If I close Firefox, the memory drops down to like 250 megs used, so it is Firefox. If I feel the memory leak escalating, I can still close firefox to avert crash, if not, bang!

>Deactivating the SWAP should have stopped HD activity,
>since the server cannot page out memory anymore.

>More conjecture - the swapoff would not have completed if there was
>insufficient memory to contain all the swapped storage. Swap is
>probably contributing, but obviously isn't the total story.

Well, conjecture on your side there. You should believe me when I say "swap if off". And of course there is always the possibility that I cannot turn it off, since there is not enough ram to take the input, but I would see that it does not get turned off. No, in this case THERE IS NO SWAP. No partition, nothing, nada. I disolved it, especially, because it is pointless and a waste of time. I don't have time to wait 4 days of swapping before the system is usuable again. Forcing the situation by only having real ram and a programm getting the ugly "OUT OF MEMORY" error and crashing is preferable to the endless swapping. After all, if a problem is getting solved and it takes 9000 years, we don't consider that it gets solved at all, because we will be all dead and "in good time" does not apply.


Quote:
The question is: What is the system reading (or writing) so frantically on the HD(s), even though swap is disabled? Why does it go on for hours and hours?
>Shared libraries being continually discarded and refetched -
>because of memory pressure. Same thing with disk cache ...
>(Maybe) just thrashing back and fro. Guesswork really.

Ok, that is a theory that I could build on. It could be an endless loop (it surely looks like one since I have never seen a system in that state recover)

>Do you have sysstat installed and running ???. Do you have
>taskstats on in the kernel ???. You need hard data.

I will see that I can turn that on, the question is, will it still log when it crashes, so I can read it after the crash.

Markus
 
Old 07-22-2008, 07:01 AM   #6
pinniped
Senior Member
 
Registered: May 2008
Location: planet earth
Distribution: Debian
Posts: 1,732

Rep: Reputation: 50
If you use up the memory and start swapping, the disk thrashing will really slow things down. It is normal to have the computer misbehave when memory is too low - files can't be opened, processes can't be started, shells can't be spawned to do maintenance. A small amount of memory and disk space is reserved for 'root' - this is an attempt to give the system a chance to function well enough for an admin to fix things. I think the default is about 4% of total - so if you have 1GB RAM you've got an enormous 40MB to play with.
 
Old 07-22-2008, 07:03 AM   #7
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,128

Rep: Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121
I wasn't disputing the fact you swapoff'd successfully. I was disputing your contention all hard-disk activity would stop as a result.
Sysstat will log successfully (if more slowly) up until failure - you can do historical analysis with sar on that data.
 
Old 03-30-2009, 10:44 AM   #8
DaveQB
Member
 
Registered: Oct 2003
Location: Sydney, Australia.
Distribution: Debian, Ubuntu
Posts: 400

Rep: Reputation: 39
yeah I get this exact same thing. Have been since I started using Linux based Os about 6 years ago. Its a frustrating issue. It used to happen a lot more for me back in the Mandrake days. Then it was the Kswapd that seemed to top the top table.

I like the idea it's Firefox or any browser, but I had it happen on my server, no GUI, when it was making a 4GB zero file with dd. Had plenty of HDD space, but it just unresponsive, with OOM error printing to the screen every 5 seconds or there abouts.

Bloody frustrating stuff indeed!
 
Old 03-30-2009, 02:12 PM   #9
johnsfine
LQ Guru
 
Registered: Dec 2007
Distribution: Centos
Posts: 5,286

Rep: Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197
Quote:
Originally Posted by DaveQB View Post
yeah I get this exact same thing. Have been since I started using Linux based Os about 6 years ago.
1) Did you notice the age of the previous post before writing your reply?

2) I use both Windows and Linux and I see problems like this more on Windows than on Linux, and when I do see such problems on Linux, there is better documentation of the internal behaviors and better tools to diagnose and/or correct the problem. You seem to be describing this as a Linux specific problem, and it really isn't.
 
Old 03-30-2009, 04:02 PM   #10
DaveQB
Member
 
Registered: Oct 2003
Location: Sydney, Australia.
Distribution: Debian, Ubuntu
Posts: 400

Rep: Reputation: 39
Quote:
Originally Posted by johnsfine View Post
1) Did you notice the age of the previous post before writing your reply?

2) I use both Windows and Linux and I see problems like this more on Windows than on Linux, and when I do see such problems on Linux, there is better documentation of the internal behaviors and better tools to diagnose and/or correct the problem. You seem to be describing this as a Linux specific problem, and it really isn't.
1) Yes I did.

2) I don't know, I don't use Windows so I can't say either way. I was clearly not comparing this issue with any other OS; I don't particularly care about other OS's to be frank.

What is a good way to troubleshoot this? My efforts over the years have proven fruitless.
 
Old 03-30-2009, 04:16 PM   #11
browny_amiga
Member
 
Registered: Dec 2001
Location: /mnt/UNV/Mlkway/Earth/USA/California/Silicon Valley
Distribution: Kubuntu, Debian Buster Stable, Windoze 7
Posts: 684

Original Poster
Rep: Reputation: 56
Quote:
Originally Posted by johnsfine View Post
1) Did you notice the age of the previous post before writing your reply?

2) I use both Windows and Linux and I see problems like this more on Windows than on Linux, and when I do see such problems on Linux, there is better documentation of the internal behaviors and better tools to diagnose and/or correct the problem. You seem to be describing this as a Linux specific problem, and it really isn't.
To 1:
Problemsolving NEVER gets old. The issue also still exists in my case, but I think it has something to do with the Linux Softraid I use, because I don't get this on other machines. It seems that the softraid really misbehaves when it runs out of memory.

2: Well, I don't doubt that it is worse in Windows. Heck, it surely is, that OS ist just so badly put together, that is why I am using XP now on 1 machine to game and all other professional machines and laptops run Linux, and much better too.
But we were talking about Linux and yes, I guess there are many more tools to troubleshoot the issue. My pragmatism leads me to a solution in python where I would just watch memory usage and when it goes over a certain treshhold, I will just find out which program is using too much memory, tell it to close nicely, and then, if the program continues to gobble up memory (as firefox is fond of, with the 4527 tabs I have open), it gets a SIGKILL (shot in the head). The user then is notified by pop up that the program got terminated in order to maintain system stability. A log entry is done.

Now if I only knew how to make a pop-up appear on anybodies screen that is logged-in in X. I have another thread for it, but it seems to be impossible to do. One truly annoying thing that Linux has (and nobody has any explaination for it) is that X sessions owned by a user cannot be used by root. Root is all powerful and must be able to control everything, otherwise it defeats the purpose of root. But that is another story.

What I must add is, that while Software RAID on Linux might cause this problem, otherwise it is totally amazing and wonderful. Works more reliably than any software RAID card that you can buy for lots of money. Right CPU and fast HW and it flies like a rocket. It is Hardware independent, so migrating it over to other systems is totally easy, just plug it in. No evil BIOSes, no RAID HW card failure and no replacement, no incompatible RAID Disk format which makes migration to another manufacturer impossible.

Dual and Quad core CPUs nowadays got power to spare for RAID. Peanuts for these numbercrunching-monsters.


Cheers

Markus
 
Old 03-30-2009, 04:35 PM   #12
johnsfine
LQ Guru
 
Registered: Dec 2007
Distribution: Centos
Posts: 5,286

Rep: Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197
Quote:
Originally Posted by syg00 View Post
Shared libraries being continually discarded and refetched - because of memory pressure.
I don't know if either of you (Markus or DaveQB) got the significance of the above statement.

When you restrict the swapping (through swap area size and/or the swappiness parameter), you may get much more of the disk accesses that a naive user would think of as "swapping".

The swap area is used only for memory contents that do not have file mapping. When you prevent those pages from being swapped out, you may make the system thrash rereading pages that don't need the swap area.

If your system fundamentally doesn't have enough ram for the tasks you want to accomplish, buy a system with enough ram.

If your programs have memory leaks, either debug the memory leaks, or abort and restart the programs often enough to control the memory leaks.

Either way, you are better off with a big swap area so Linux can slow down smoothly as memory becomes overused, often giving you (and/or your daemon program) time to kill a well chosen task.

Quote:
Originally Posted by DaveQB View Post
I don't particularly care about other OS's
I'm sorry I inferred, from your earlier statement, an OS comparison that you apparently didn't intend.

Last edited by johnsfine; 03-30-2009 at 04:39 PM.
 
Old 03-30-2009, 06:23 PM   #13
browny_amiga
Member
 
Registered: Dec 2001
Location: /mnt/UNV/Mlkway/Earth/USA/California/Silicon Valley
Distribution: Kubuntu, Debian Buster Stable, Windoze 7
Posts: 684

Original Poster
Rep: Reputation: 56
Quote:
Originally Posted by johnsfine View Post
I don't know if either of you (Markus or DaveQB) got the significance of the above statement.
Oh, I do. I think it explains why the HD is running nonstop and the system becomes totally nonresponsive, till you hold the powerbutton.
I just know that the kernel usually has some magic about that, and, at least in non softraid systems, tends to shoot an offender task in the head at some point, when it gets out of control. You see that on the console, saying something like "killed xzy".
It takes a while, because I guess the kernel acts conservative, but it does.

Of course I agree with your statements, more ram, debugging the app. Unfortunately, 1 giga ram is enough and unfortunately, I cannot debug iceweasel (firefox) and therefore there is no solution, except restarting it regularily... or having my tool that makes sure no program can crash the machine. Software RAID works like a charm, but does not like if you pull the plug out of the system, which is what you have to do when the system crashes. Having lots of swap, little swap or no swap at all (swapoff -a) does not make any difference at all in the behaviour.

What I am curious about: does anybody know how the kernel does shut down "rogue" programs that grab more and more ram and threaten to crash the system? Is the functionality documented somewhere? I really like that one, it makes a lot of sense.

Markus
 
Old 03-30-2009, 06:31 PM   #14
DaveQB
Member
 
Registered: Oct 2003
Location: Sydney, Australia.
Distribution: Debian, Ubuntu
Posts: 400

Rep: Reputation: 39
Interesting discussions.

I have no swap on my current Desktop, but did on previous one's where I had this issues; seems really independent of swap or no swap, kind of hardware, distro etc.

I never thought I would say it, but maybe 4GB of RAM is not enough. I might create an 8GB Swap partition and see if that has any effect at all.

Wouldn't one think, that in this day an age, an application running out of memory [would this not be common?] would be handled by the OS/kernel gracefully and without issue the systems stability? I mean, sure kill the app, whatever you have to do, but don't bring the system to its knee's just cause an app didn't know how much RAM it had and ran out.

I guess things aren't as easy to remedy as they appear on the surface.
 
Old 03-30-2009, 06:50 PM   #15
browny_amiga
Member
 
Registered: Dec 2001
Location: /mnt/UNV/Mlkway/Earth/USA/California/Silicon Valley
Distribution: Kubuntu, Debian Buster Stable, Windoze 7
Posts: 684

Original Poster
Rep: Reputation: 56
>I never thought I would say it, but maybe 4GB of RAM
>is not enough. I might create an 8GB Swap partition
>and see if that has any effect at all.

Well, no amount of ram will be enough, ever. If you got a leak (like in Firefox) it is just a matter of time till it fills up and crashes. Well, 4 gig is a lot, so it will take longer, but I will not stuff the system with ram, just to delay the crash and not fix the root problem. Kind of expensive and unintelligent of a solution ;-)

>Wouldn't one think, that in this day an age, an
>application running out of memory [would this not
>be common?] would be handled by the OS/kernel
>gracefully and without issue the systems
>stability? I mean, sure kill the app, whatever you
>have to do, but don't bring the system to its
>knee's just cause an app didn't know how much RAM
>it had and ran out.

It really is not so easy, because you cannot just kill the app. There are also high performance apps, that use a lot of ram and fast. You have to make some good guesses, if this is a abnormal situation.

As said before, I am pretty sure this is a RAID issue, because I never had it on one server (and I also used FF on it over terminal session) and then I upgraded the server to software raid (policy: all systems need RAID 5, crappy HDD quality drove me to it) and BANG...

I guess the RAID cannot work properly without some libs and trying to load these, finding no mem, trying again, and this endless.

I will see to write that "system-guard" progi in python when I get around to it, and of course I got 213245 other projects in python AND python is only a hobby for me (but a cool one)

;-)

Markus
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
LXer: Dos flaw hits Linux kernel LXer Syndicated Linux News 0 05-04-2006 06:21 PM
Linux memory DoS - naughty applications like Samba! humbletech99 Linux - Security 6 04-10-2006 09:29 AM
??Compiling new Linux Kernel in DOS,and then installing and booting System whit that isilinux Linux - General 2 01-26-2005 10:28 AM
Flaw in kernel 2.4.26 gstasica Linux - General 3 07-16-2004 03:27 PM
is there a patch for kernel flaw found in Debian ergo_sum Linux - Newbie 3 12-02-2003 11:15 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Server

All times are GMT -5. The time now is 02:41 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration