Originally Posted by jailbait
These problems sound like the problems that you could get if you reboot without going through a normal shutdown. Did you reboot immediately without issuing a shutdown command?
Thanks for the reply, Steve. It's much appreciated. I think you're right. I reached the same conclusion yesterday myself. I now believe what you described is almost exactly what happened. However, tehre is no "ctrl-alt-del button long enough to reach my server which is 1,500 miles away and the shutdown script we use does make it a point to do an orderly shutdown of everything on the server before it restarts the system. So, no, I don't THINK this is our fault.
However, from what I can tell, it appears the primary hd's main file system was glitched in an undocumented server "event" last weekend. My guess is the server center lost power sometime Sunday night and their Ops just restarted their servers without doing any sort of fsck recovery or any announcement to their client admins (moi) about what had happened. The server's response had been sluggish this week but I attributed it to heavy net loads. However, when I tried to install a new user app Friday, the proverbial defecation hit the perrenial ventilation and I was faced with a server with a glitched primary hd.
That's when I went looking for the manpages for fsck...
I wasn't too concerned about this at first because I knew we had a primary hd full-drive-clone backup on the secondary and several interim backups made this past week stored on the primary hd too.
However, once the journaling file system had "recovered" yesterday, we'd lost all
our intermediate backups -- which had been stored as "tarballs" out in "no-man's-land" on the primary 500gb hd.
The "journaled" recovery basically rolled us clear back to last Sunday night shortly before the system crash occurred and about 24 hours after the drive backup to the secondary was made. (sigh...) Unfortunately, this is a brand new server. So, althogh I had backups working, I didn't yet have the normal overnight FTPs of intermediate backups to a remote B/U drive here in our data-center operating yet. All I can say in self defense is no one expects to have this sort of a hit on a server that's barely 60 days old.
Still, we "recovered" in a manner of speaking if one doesn't consider a week's work lost (and 1,500 new grey hairs for me) to be a big deal
. But we're still getting intermittent segfaults from apache on that server which is in a datacenter 1500 miles away!
Thanks a lot for your insights and thoughts, Steve. It's helpful to have someone else independently confirm my own conclusions. This is one of "those situations" where there's no one else around here with the tech savvy to diagnose and repair a problem of this nature or for me to discuss this with except my wife and the cat. I love them both but I must say neither of them is terribly helpful in a situation like this.
Wish me luck. This battle isn't over yet...