LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Software (http://www.linuxquestions.org/questions/linux-software-2/)
-   -   All memory being consumed by only a few processes (http://www.linuxquestions.org/questions/linux-software-2/all-memory-being-consumed-by-only-a-few-processes-784536/)

rtr_87 01-25-2010 02:18 AM

All memory being consumed by only a few processes
 
This question may be hardware or software related (or both), I'm not sure. I recently built a new server, running 64 bit Slackware 13.0 with the following specs:

MSI 785GTM-E45
AMD Phenom II X2 550
2GB DDR2
Onboard video from AMD 785G chipset
2x 80GB IDE system drives using software RAID with 2GB swap partition

I only include these because I'm not convinced my problem is not hardware related at some level. Basically, when I first start up the system, the memory usage is anywhere from 60 to 200MB. Then it starts to gradually climb until there is only 12-15MB free. This can take anywhere from a few hours to a few days.

The only thing I really use this for is to serve Samba shares and the occasional SSH login. I've set up Samba shares to be accessed by my Windows machines as well as a Mac. Initially I just explored the network if I wanted to see the share from these machines, but that was too slow and unreliable (the server would not always show up in Windows), so now I automatically mount the share as a network drive at startup (from Windows). So I don't know if this would have anything to do with the steadily increasing memory usage. These systems are not on/connected all the time, but the memory usage seems to rise anyway.

When I run top, it reports that nearly all of the physical memory has been consumed (after a while of uptime), but none of the swap space has even been touched. This is a typical output of the first several lines, sorted by swap size:

Code:

top - 07:45:48 up 2 days, 12:33,  2 users,  load average: 0.00, 0.00, 0.00
Tasks: 105 total,  1 running, 104 sleeping,  0 stopped,  0 zombie
Cpu(s):  0.0%us,  0.2%sy,  0.0%ni, 99.8%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  1792956k total,  1778440k used,    14516k free,  160576k buffers
Swap:  2104440k total,        0k used,  2104440k free,  1529088k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  SWAP COMMAND
16310 root      20  0 68452 1384  480 S    0  0.1  0:00.00  65m smbd
16312 root      20  0 70720 3868 2916 S    0  0.2  0:00.00  65m smbd
16306 root      20  0 68452 3300 2396 S    0  0.2  0:00.00  63m smbd
16309 root      20  0 42564  940  280 S    0  0.1  0:00.00  40m nmbd
 3448 root      20  0 49724 8176 1632 S    0  0.5  0:00.74  40m ddclient
16308 root      20  0 42824 2000 1300 S    0  0.1  0:00.00  39m nmbd
16363 rtr      20  0 31836 1524 1024 S    0  0.1  0:00.00  29m sshd
16361 root      20  0 31700 2364 1888 S    0  0.1  0:00.00  28m sshd
 3257 haldaemo  20  0 28332 1836  748 S    0  0.1  0:00.18  25m hald
 3234 root      20  0 27068 1032  636 S    0  0.1  0:00.00  25m sshd
 3347 haldaemo  20  0 20876  728  564 S    0  0.0  0:00.00  19m hald-addon-acpi
 3286 root      20  0 19904  632  468 S    0  0.0  0:00.00  18m hald-addon-inpu
 3450 root      20  0 19980 1792 1068 S    0  0.1  0:01.21  17m bash
16364 rtr      20  0 20464 2280 1324 S    0  0.1  0:00.00  17m bash
16378 rtr      20  0 19128 1296  972 R    0  0.1  0:00.01  17m top
16355 root      20  0 19012 1288  972 S    0  0.1  0:03.07  17m top
 3252 messageb  20  0 14820  456  264 S    0  0.0  0:00.00  14m dbus-daemon
 3258 root      20  0 13664  628  464 S    0  0.0  0:00.00  12m hald-runner
 3393 daemon    20  0 12344  404  272 S    0  0.0  0:00.00  11m atd
 1527 root      16  -4 12832  912  372 S    0  0.1  0:00.10  11m udevd
 3391 root      20  0 12344  672  544 S    0  0.0  0:00.00  11m crond
 3147 root      20  0  8204  548  416 S    0  0.0  0:00.00 7656 dhcpcd
 3221 root      20  0  6020  528  436 S    0  0.0  0:00.00 5492 inetd
 3093 root      20  0  6032  648  520 S    0  0.0  0:00.00 5384 syslogd
 3097 root      20  0  3928  412  324 S    0  0.0  0:00.00 3516 klogd
 3452 root      20  0  3924  472  388 S    0  0.0  0:00.00 3452 agetty
 3455 root      20  0  3924  472  388 S    0  0.0  0:00.00 3452 agetty
 3451 root      20  0  3924  476  388 S    0  0.0  0:00.00 3448 agetty
 3453 root      20  0  3924  476  388 S    0  0.0  0:00.00 3448 agetty
 3454 root      20  0  3924  476  388 S    0  0.0  0:00.00 3448 agetty
 3241 root      20  0  3920  504  396 S    0  0.0  0:00.00 3416 acpid
    1 root      20  0  3928  536  444 S    0  0.0  0:00.74 3392 init

Everything else below this section of top is pretty much zeros as far as resources are concerned. If I stop the samba daemon, the processes disappear, but I don't regain any of the memory they were using. Is there something I'm missing here? It says 0k of swap is being used at the top, but the table clearly says there's at least around 600MB being used.

After a while, maybe a few hours, maybe a few days, the server will crash, spewing a buch of module and register information to the console. As far as I can tell they've all been slightly different, I can't be sure because there's no way to capture it. But I did type out one of the errors by hand:

Code:

------------[ cut here ]------------
WARNING: at kernel/smp.c:226 smp_call_function_single+0x94/0x120()
Hardware name: MS-7549
Modulse linked in: snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss
ipv6 lp ppdev parport_pc parport fuse snd_hda_codec_atihdmi snd_hda_intel snd_hda_codec snd_hdwep snd_pcm snd_timer
snd ses soundcore i2c_piix4 snd_page_alloc enclosure thermal psmouse rtc_cmos processor rtc_core wmi shpchp r8169
serio_raw thermal_sys sg evdev rtc_lib button hwmon mii
Pid: 0, comm: swapper Tainted: G        D W  2.6.29.6 #2
Call Trace:
 <IRQ>  [<ffffffff8023c50a>] warn_slowpath+0xea/0x160
 [<ffffffff808f9310>] dump_stack+0x69/0x6f
 [<ffffffff8023c52a>] warn_slowpath+0x10a/0x160
 [<ffffffff80256626>] up+0x16/0x50
 [<ffffffff8023cd65>] release_console_sem+0x1a5/0x1f0
 [<ffffffff80260b44>] smp_call_function_single+0x94/0x120
 [<ffffffff8023cd65>] release_console_sem+0x1a5/0x1f0
 [<ffffffff80260d91>] smp_call_function_many+0x1c1/0x260
 [<ffffffff80256626>] up+0x16/0x50
 [<ffffffff80260e50>] smp_call_function+0x20/0x30
 [<ffffffff8021d980>] native_smp_send_stop+0x20/0x30
 [<ffffffff808f93a4>] panic+0x8e/0x145
 [<ffffffff8020e999>] show_registers+0x99/0x2b0
 [<ffffffff80256626>] up+0x16/0x50
 [<ffffffff8023cd65>] release_console_sem+0x1a5/0x1f0
 [<ffffffff8020fbe5>] oops_end+0x95/0xa0
 [<ffffffff808fc00f>] general_protection+0x1f/0x30
 [<ffffffff802d5dfc>] pio_put+0x2c/0x40
 [<ffffffff804e2f35>] __end_that_request_first+0x105/0x2e0
 [<ffffffff804e3139>] end_that_request_data+0x29/0x70
 [<ffffffff804e3c0a>] blk_end_io+0x2a/0xa0
 [<ffffffff805acbf4>] __ide_end_request+0x54/0x100
 [<ffffffff805b6f32>] ide_dma_intr+0x62/0xd0
 [<ffffffff8024690e>] run_timer_softirq+0x4e/0x200
 [<ffffffff805b6ed0>] ide_dma_intr+0x0/0xd0
 [<ffffffff805ac8b2>] ide_intr+0x182/0x220
 [<ffffffff8026ee04>] handle_IRQ_event+0x34/0x70
 [<ffffffff80270628>] handle_edge_irq+0xb8/0x150
 [<ffffffff8020e6ee>] do_IRQ+0x7e/0x110
 [<ffffffff8020c313>] ret_from_intr+0x0/0xa
 <EOI>  [<ffffffff805acd50>] do_ide_request+0x0/0x710
 [<ffffffff80213472>] default_idle+0x42/0x50
 [<ffffffff80213668>] c1e_idle+0xa8/0x100
 [<ffffffff8020a9a0>] cpu_idle+0x60/0xa0
---[ end trace 089c62bcb69a8c7f ]---

I can't be sure, but I believe the message just repeated itself before this part. MS-7549 is one of the components of the mainboard (MSI), maybe the northbridge or something (it seems to be associated with more than one model), so I thought for sure this meant a hardware problem, maybe with the storage controller or memory. But Memtest86 passes for more than a day without any errors, and I loaded Windows on one of the disks and ran it as hard as I could with prime95 and nothing so much as flinched. So I've got to think this is some kind of issue with linux and/or how it is handling my hardware configuration.

I'm out of ideas as to what's causing all this. Am I even looking in the right places? I was going to try installing another distro or maybe just reinstalling Slackware (this time without the RAID) to see if that cleared anything up, especially considering Windows will run without issue. My hardware isn't anything outrageous, but does anyone see any possible compatibility issues with Linux?

Thanks

AleLinuxBSD 01-25-2010 02:39 AM

Ram.
It is normal that all your ram is used even with few process after a while.

Server crash.
Perhpas you can add cpufrequtils on your pc to see if it is an overheating problem.
Or if you have many ram modules you can try to remove something.

knudfl 01-25-2010 02:57 AM

Why have "extra" RAM, if it is not used ?
The ideal situation is that all RAM is used all the time.
Probably the only OS's that can't do it are all the ones
made by MSFT.

Just Google .. unix memory management ..

Example : http://www.dataexpedition.com/~sbnoble/Tips/memory.html
.....

AlucardZero 01-25-2010 06:22 AM

www.linuxatemyram.com

johnsfine 01-25-2010 09:03 AM

I have no clue about your crash.
Others already explained why it is perfectly normal and correct for almost all your memory to be "used".

I'll just explain one more detail:

Quote:

Originally Posted by rtr_87 (Post 3839468)
It says 0k of swap is being used at the top, but the table clearly says there's at least around 600MB being used.

The 0k swap used is correct.

The SWAP column of top is useless information that has almost no relationship to the amount of swap space used. Whoever named that column SWAP has caused massive confusion.

The SWAP column just gives the difference between virtual size and resident size.

In some of the earliest and crudest virtual memory OS's, the difference between virtual memory size and resident memory was the amount of swap space used. But those designs were obsolete before Linux was invented. People who still describe virtual memory that way are just confused.

Most of virtual memory is neither resident in physical ram nor in swap space. The difference (virtual minus resident) is a generally meaningless quantity.

rtr_87 01-25-2010 09:36 PM

Well, I had written a lengthy response to all your replies, thank you by the way, but the tab timed out when I tried to submit it and it didn't save what I'd written. Basically, I think it must have been a loose DIMM or something because memtest86 seems to run fine now but it did flag a few things a while back. I may consider replacing the DIMMs anyway, but definitely will if any more problems occur.

And in reply to AleLinuxBSD, I think the core temperature is not a problem. Under Windows it held steady at 36C during a torture test. I'm inclined to think the problem was the RAM all along.

And thanks for the info on Linux memory usage. It makes perfect sense, I'd just never thought of it like that.


All times are GMT -5. The time now is 03:57 AM.