Local partitions are being spontaneously unmounted.
Situation and symptoms:
I'm running a Samba server for a small business. Three partitions are supposed to be mounted in a hierarchy:
hdd1: share root
Currently the partitions are mounted manually and not listed in the fstab. All are on two plain vanilla ATA/IDE drives, plugged directly into a motherboard with no RAID. All are primary partitions. With no apparent significant changes to the system, these partitions have begun spontaneously unmounting themselves once or twice a day. On all occasions, the only users logged into the system at the time have been normal CIFS clients. The root and boot partitions on hda do not get unmounted and the system including Samba continues to run happily; the share just suddenly appears empty, then appears full again when the partitions are remounted.
For a year or so, we had been running the server on an Ubuntu box. Our load recently outgrew that machine and we moved the server to a better box: an old Gateway with nice, generic components. At the same time, I migrated it to my current preferred distro, Arch. I am also running Arch on a webserver, my home machine and my work dual-boot laptop; obviously none of them display the current issue. Arch is installed very slimly on the Samba box - no X, no big overhead daemons - and the system reports about 10% processor and memory load at peak.
After initial setup, we experienced an issue with a few XP clients frequently disconnecting from Samba and getting locked out. This was eventually traced to the use of Arch's updated Samba 3.0.23c, which has documented problems with these behaviors per the 3.0.23d changelog. The machine ran properly in all other respects for about ten days - in particular, it displayed no unmount problems with the same heirarchy and fstab.
Here's where things turn into bad science. Last Friday we experienced a power outage. The Samba box had been set up fast and I hadn't yet been able to arrange downtime to put it on the UPS, so the system went down over the weekend. While putting it back up on Monday, I took the opportunity to upgrade Samba to 3.0.23d (via a pacman -Sy samba). Fortunately this solved the Samba issue. Unfortunately, it gives two possible causes for the current issue which are simultaneous from the box's point of view: the rude shutdown and the update to Samba. I did not, repeat not run a pacman -Syu or install or alter any other packages. I don't see how either of these could possibly have affected mount functionality, but they are the only two changes made between OK operation and the appearance of the problem.
Since then, we've had three disconnects - two yesterday and one so far today. As stated above, I haven't been logged in at all when any of the incidents happened, let alone messing around as root or anything. The drives are listed by mount when mounted and not listed after they disappear. They remount happily afterwards with no errors or fscks, leading me to suspect that they are actually being politely unmounted (as opposed to failing outside of software).
My Googling for the issue brings up one dude with something similar happening at some point on Fedora 5, and a gaggle of Apple users encountering MacOS' wonky Firewire and mounting schemes. The only lead I've found is one comment suggesting that something may be unmounting all partitions without open files. This could match the unmount pattern, though I don't know a good way to verify that and it could be a red herring.
After today's disconnect I did run a pacman -Syu and reboot. Among other things I see it's upgraded the kernel and glibc. I hope naively that this will solve the problem, but in the meantime I figured I'd throw it out into the hivemind too.
I know my description is a bit vague in terms of error messages, for the simple reason that I don't know where I should look. In the absence of complete fixes, I'd love to hear suggestions for troubleshooting commands or potentially relevant logfiles. Thanks for reading.
A bit early to say, but there have been no drops since the system upgrade. My guess is a kernel issue.
Dropped again. Gah! Vague bug report ahoy.
Error messages or no, the legacy software at the center of the issue runs much faster and more stably on a faster box. This problem appears to have resulted from disk hammering.
|All times are GMT -5. The time now is 02:44 PM.|