This question may be hardware or software related (or both), I'm not sure. I recently built a new server, running 64 bit Slackware 13.0 with the following specs:
MSI 785GTM-E45
AMD Phenom II X2 550
2GB DDR2
Onboard video from AMD 785G chipset
2x 80GB IDE system drives using software RAID with 2GB swap partition
I only include these because I'm not convinced my problem is not hardware related at some level. Basically, when I first start up the system, the memory usage is anywhere from 60 to 200MB. Then it starts to gradually climb until there is only 12-15MB free. This can take anywhere from a few hours to a few days.
The only thing I really use this for is to serve Samba shares and the occasional SSH login. I've set up Samba shares to be accessed by my Windows machines as well as a Mac. Initially I just explored the network if I wanted to see the share from these machines, but that was too slow and unreliable (the server would not always show up in Windows), so now I automatically mount the share as a network drive at startup (from Windows). So I don't know if this would have anything to do with the steadily increasing memory usage. These systems are not on/connected all the time, but the memory usage seems to rise anyway.
When I run top, it reports that nearly all of the physical memory has been consumed (after a while of uptime), but none of the swap space has even been touched. This is a typical output of the first several lines, sorted by swap size:
Code:
top - 07:45:48 up 2 days, 12:33, 2 users, load average: 0.00, 0.00, 0.00
Tasks: 105 total, 1 running, 104 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0%us, 0.2%sy, 0.0%ni, 99.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 1792956k total, 1778440k used, 14516k free, 160576k buffers
Swap: 2104440k total, 0k used, 2104440k free, 1529088k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ SWAP COMMAND
16310 root 20 0 68452 1384 480 S 0 0.1 0:00.00 65m smbd
16312 root 20 0 70720 3868 2916 S 0 0.2 0:00.00 65m smbd
16306 root 20 0 68452 3300 2396 S 0 0.2 0:00.00 63m smbd
16309 root 20 0 42564 940 280 S 0 0.1 0:00.00 40m nmbd
3448 root 20 0 49724 8176 1632 S 0 0.5 0:00.74 40m ddclient
16308 root 20 0 42824 2000 1300 S 0 0.1 0:00.00 39m nmbd
16363 rtr 20 0 31836 1524 1024 S 0 0.1 0:00.00 29m sshd
16361 root 20 0 31700 2364 1888 S 0 0.1 0:00.00 28m sshd
3257 haldaemo 20 0 28332 1836 748 S 0 0.1 0:00.18 25m hald
3234 root 20 0 27068 1032 636 S 0 0.1 0:00.00 25m sshd
3347 haldaemo 20 0 20876 728 564 S 0 0.0 0:00.00 19m hald-addon-acpi
3286 root 20 0 19904 632 468 S 0 0.0 0:00.00 18m hald-addon-inpu
3450 root 20 0 19980 1792 1068 S 0 0.1 0:01.21 17m bash
16364 rtr 20 0 20464 2280 1324 S 0 0.1 0:00.00 17m bash
16378 rtr 20 0 19128 1296 972 R 0 0.1 0:00.01 17m top
16355 root 20 0 19012 1288 972 S 0 0.1 0:03.07 17m top
3252 messageb 20 0 14820 456 264 S 0 0.0 0:00.00 14m dbus-daemon
3258 root 20 0 13664 628 464 S 0 0.0 0:00.00 12m hald-runner
3393 daemon 20 0 12344 404 272 S 0 0.0 0:00.00 11m atd
1527 root 16 -4 12832 912 372 S 0 0.1 0:00.10 11m udevd
3391 root 20 0 12344 672 544 S 0 0.0 0:00.00 11m crond
3147 root 20 0 8204 548 416 S 0 0.0 0:00.00 7656 dhcpcd
3221 root 20 0 6020 528 436 S 0 0.0 0:00.00 5492 inetd
3093 root 20 0 6032 648 520 S 0 0.0 0:00.00 5384 syslogd
3097 root 20 0 3928 412 324 S 0 0.0 0:00.00 3516 klogd
3452 root 20 0 3924 472 388 S 0 0.0 0:00.00 3452 agetty
3455 root 20 0 3924 472 388 S 0 0.0 0:00.00 3452 agetty
3451 root 20 0 3924 476 388 S 0 0.0 0:00.00 3448 agetty
3453 root 20 0 3924 476 388 S 0 0.0 0:00.00 3448 agetty
3454 root 20 0 3924 476 388 S 0 0.0 0:00.00 3448 agetty
3241 root 20 0 3920 504 396 S 0 0.0 0:00.00 3416 acpid
1 root 20 0 3928 536 444 S 0 0.0 0:00.74 3392 init
Everything else below this section of top is pretty much zeros as far as resources are concerned. If I stop the samba daemon, the processes disappear, but I don't regain any of the memory they were using. Is there something I'm missing here? It says 0k of swap is being used at the top, but the table clearly says there's at least around 600MB being used.
After a while, maybe a few hours, maybe a few days, the server will crash, spewing a buch of module and register information to the console. As far as I can tell they've all been slightly different, I can't be sure because there's no way to capture it. But I did type out one of the errors by hand:
Code:
------------[ cut here ]------------
WARNING: at kernel/smp.c:226 smp_call_function_single+0x94/0x120()
Hardware name: MS-7549
Modulse linked in: snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss
ipv6 lp ppdev parport_pc parport fuse snd_hda_codec_atihdmi snd_hda_intel snd_hda_codec snd_hdwep snd_pcm snd_timer
snd ses soundcore i2c_piix4 snd_page_alloc enclosure thermal psmouse rtc_cmos processor rtc_core wmi shpchp r8169
serio_raw thermal_sys sg evdev rtc_lib button hwmon mii
Pid: 0, comm: swapper Tainted: G D W 2.6.29.6 #2
Call Trace:
<IRQ> [<ffffffff8023c50a>] warn_slowpath+0xea/0x160
[<ffffffff808f9310>] dump_stack+0x69/0x6f
[<ffffffff8023c52a>] warn_slowpath+0x10a/0x160
[<ffffffff80256626>] up+0x16/0x50
[<ffffffff8023cd65>] release_console_sem+0x1a5/0x1f0
[<ffffffff80260b44>] smp_call_function_single+0x94/0x120
[<ffffffff8023cd65>] release_console_sem+0x1a5/0x1f0
[<ffffffff80260d91>] smp_call_function_many+0x1c1/0x260
[<ffffffff80256626>] up+0x16/0x50
[<ffffffff80260e50>] smp_call_function+0x20/0x30
[<ffffffff8021d980>] native_smp_send_stop+0x20/0x30
[<ffffffff808f93a4>] panic+0x8e/0x145
[<ffffffff8020e999>] show_registers+0x99/0x2b0
[<ffffffff80256626>] up+0x16/0x50
[<ffffffff8023cd65>] release_console_sem+0x1a5/0x1f0
[<ffffffff8020fbe5>] oops_end+0x95/0xa0
[<ffffffff808fc00f>] general_protection+0x1f/0x30
[<ffffffff802d5dfc>] pio_put+0x2c/0x40
[<ffffffff804e2f35>] __end_that_request_first+0x105/0x2e0
[<ffffffff804e3139>] end_that_request_data+0x29/0x70
[<ffffffff804e3c0a>] blk_end_io+0x2a/0xa0
[<ffffffff805acbf4>] __ide_end_request+0x54/0x100
[<ffffffff805b6f32>] ide_dma_intr+0x62/0xd0
[<ffffffff8024690e>] run_timer_softirq+0x4e/0x200
[<ffffffff805b6ed0>] ide_dma_intr+0x0/0xd0
[<ffffffff805ac8b2>] ide_intr+0x182/0x220
[<ffffffff8026ee04>] handle_IRQ_event+0x34/0x70
[<ffffffff80270628>] handle_edge_irq+0xb8/0x150
[<ffffffff8020e6ee>] do_IRQ+0x7e/0x110
[<ffffffff8020c313>] ret_from_intr+0x0/0xa
<EOI> [<ffffffff805acd50>] do_ide_request+0x0/0x710
[<ffffffff80213472>] default_idle+0x42/0x50
[<ffffffff80213668>] c1e_idle+0xa8/0x100
[<ffffffff8020a9a0>] cpu_idle+0x60/0xa0
---[ end trace 089c62bcb69a8c7f ]---
I can't be sure, but I believe the message just repeated itself before this part. MS-7549 is one of the components of the mainboard (MSI), maybe the northbridge or something (it seems to be associated with more than one model), so I thought for sure this meant a hardware problem, maybe with the storage controller or memory. But Memtest86 passes for more than a day without any errors, and I loaded Windows on one of the disks and ran it as hard as I could with prime95 and nothing so much as flinched. So I've got to think this is some kind of issue with linux and/or how it is handling my hardware configuration.
I'm out of ideas as to what's causing all this. Am I even looking in the right places? I was going to try installing another distro or maybe just reinstalling Slackware (this time without the RAID) to see if that cleared anything up, especially considering Windows will run without issue. My hardware isn't anything outrageous, but does anyone see any possible compatibility issues with Linux?
Thanks