LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Software (http://www.linuxquestions.org/questions/linux-software-2/)
-   -   Strange Memory Usage (http://www.linuxquestions.org/questions/linux-software-2/strange-memory-usage-743833/)

sreeharsha.t 07-30-2009 02:22 AM

Strange Memory Usage
 
1 Attachment(s)
Recently I was investigating why one of our servers which has 6 GB RAM was showing only 60MB free memory. I was shocked to find that some processes are occupying memory in Gigs (I would rather say in Tera Bytes).

Check the attached screen-shot:

Attachment 1108

This is not possible, practically, as our Server doesn't have that much space, neither as RAM nor as SWAP.

Any Idea of why this happens?

PS: The server is running on Xen Dom 0 Kernel and hosts two Xen Dom-u hosts.

salasi 07-30-2009 06:30 AM

well, superficially, it looks as if you have stumbled across a bug in the Gnome monitor tool when reporting large numbers. It may have something to with Xen and the way that it is set up, but I don't know anything about that.

It would be sensible to look at the other tools available, to see whether they report similar numbers (top, htop, ksysguard and probably loads of others).

This is a server and you have gnome installed and are using it? Ho, hum.

johnsfine 07-30-2009 07:45 AM

The incorrect value is obviously a 64 bit value. Is this a 64 bit system where the memory size should be 64 bit? Or is it a 32 bit system?
Either way, the displayed value is a bug, but the nature of the bug would depend on the system bit size.

If you convert the wrong value to hex, the high 32 bits are apparently text. They have the value "in b".

So I expect

1) The system is 64 bit.
2) The program (Gnome monitor tool) has some unintended assumptions that are only valid in 32 bit.
3) It accidentally combined a 32 bit value for memory size with whatever was left over in the next 32 bits of memory from some previous operation (that "in b") to produce a garbage value.

It wouldn't have anything to do with the actual value that is correct (so not "when reporting large numbers" as salasi guessed). It is much more likely to be an error in a stack variable so it depends on some unrelated result in an immediately preceding operation (preceding operation inside the tool as it gathers info about processes, not preceding from a user point of view).

From a programmer viewpoint, it is probably use of an int or unsigned int in some place that should have used size_t. But that alone shouldn't be able to cause this symptom, there must also be some questionable cast operation. Personally, I use a lot of questionable cast operations. I just always think through all the potential consequences.

Edit: I just ran gnome-system-monitor on a 64 bit Centos 5.3 system and I see the same bug, but with different values, not the 6585170340.5 that the OP saw and significantly, I see different values on different processes. I also see the problem goes away fairly quickly as the program continues to run. It comes back randomly as other people start and stop programs and it comes back as I resize the System Monitor window. But it doesn't come back spontaneously when nothing is changing. This is version 2.16.0 of gnome-system-monitor. I might look at the source code later to see if the bug is obvious. But first I ought to check whether it is already fixed in some later version. I don't know which, if any, later versions could be installed/run in Centos 5.3 without generating a dependency mess.

johnsfine 07-30-2009 03:06 PM

I was on a Mepis computer at lunch time just long enough to get Synaptic to install whatever version of gnome-system-monitor is current in 64 bit Mepis 8 and to test that.

I didn't see any failure similar to the ones I saw with version 2.16.0 in Centos 5.3.

That gnome-system-monitor (in Mepis 8.0) was a much newer version (but at the moment I forget the number) and clearly very enhanced since the version in Centos 5.3. So I suspect the bug has been fixed (rather than it just isn't triggered by the set of processes on my Mepis system).

If the OP, is still interested, you might want to download source for a newer version of gnome-system-monitor and build and/or download binaries for it from Fedora.

I generally like the stability of Centos, but sometimes you want to use a program where they are too far behind important developments. I don't know how practical it is to get one program from Fedora out ahead of the rest of your RHEL or Centos system, but I suspect it isn't practical. We usually rebuild locally from source when the Centos copy of an individual program is too far out of date.

sreeharsha.t 07-31-2009 02:58 AM

Quote:

Originally Posted by salasi (Post 3625293)
It may have something to with Xen and the way that it is set up, but I don't know anything about that.

From what johnsfine said, it doesn't appear it has got something to do with Xen.

Quote:

Originally Posted by salasi (Post 3625293)
It would be sensible to look at the other tools available, to see whether they report similar numbers (top, htop, ksysguard and probably loads of others).

The other tools reported the correct memory values.

Quote:

This is a server and you have gnome installed and are using it? Ho, hum.
This server is in Dev env, so we use gnome to speed up things.




Quote:

Originally Posted by johnsfine (Post 3625344)
So I expect

1) The system is 64 bit.
2) The program (Gnome monitor tool) has some unintended assumptions that are only valid in 32 bit.
3) It accidentally combined a 32 bit value for memory size with whatever was left over in the next 32 bits of memory from some previous operation (that "in b") to produce a garbage value.

This is a 64 bit system and I agree with your inference on this program's behaviour.

Quote:

Originally Posted by johnsfine (Post 3625344)
If the OP, is still interested, you might want to download source for a newer version of gnome-system-monitor and build and/or download binaries for it from Fedora.

I shall try to do that. Alternatively, I checked the gnome-system-monitor in Fedora 10 on another 64bit machine and it seems that the bug was fixed in it. I am presently not at that computer and can post the version of it soon. The one which is having this problem (bug) is of version 2.16.0.


All times are GMT -5. The time now is 10:24 PM.