Why did linux boot with main file system read only after kernel install?
A few days ago, I tried migrating mailman 2.1.11 to my Debian etch4.0r3 server by backing up my old install of the mailman app and its data from my old Redhat server (where it had been running under python 2.4) and moving it to my new Debian server and extracting it from the archive in usr/local/mailman (where it would also be running under python 2.4!)
That approach was unsuccessful. I'm not sure why. It seemed to me that what I did should have worked fine. But for some reason it did not. So today I started over. I began by renaming the /usr/local/mailman directory (to protect its archives (data) directory contents from accidental deletion). Next, I checked to confirm the python version installed on my debian server was still okay using python -v from the shell prompt to verify that python ran and to confirm what version it was. Python came right up saying it was v2.4. So far so good... My third step was to download a fresh copy of mailman 2.1.1 from the gnu server and unzip and untar it into a fresh mailman directory where it would soon be installed. As soon as that was finished, I started gradually working through the setup process to install Mailman using the admin installation guide on gnu's mailman site. However, when I got to the step that told me to run ./configure, I did that and the configure immediately bitched that there was something wrong with the python installation and insisted python should be repaired before continuing. The interesting part is python 2.4 had been installed with aptitude but had not yet been used since then because I hadn't needed it yet. It was installed specifically for the needs of this site and for the mailman application. So, I'm not sure how python got "damaged". Okay... now I was suddenly on a whole new trouble shooting path. What I did next was fire up aptitude and uninstall python 2.4 completely with the intent of reinstalling it immediately. I uninstalled python and of course the way aptitude works it automatically removed a list of other apps that were no longer needed at the same time. When the uninstall completed, I turned around and reinstalled python and its docs along with a python runtime speedup tool named psyco all at once. Halfway through that install, aptitude informed me it was "now installing a new version of the linux kernel" (Oops! I hadn't ASKED for or authorized a kernel upgrade. So where the devil did it come from? I dunno!). In that informational notice, aptitude recommended that I should reboot the server immediately after the install was finished so the kernel upgrade and configuration process could be completed. Naturally, I followed those instructions to the letter, but when I went back and logged on to the system after the reboot I discovered that: a) The root file system is now "read only" b) my "newly installed" mailman directory seems to have completely vanished. In its place is the old original mailman directory from two weeks ago that I had renamed to mailman.save earlier today. c) not ONE of the websites on my server can be accessed now. I'm sure that's because the primary file system is "read only" Is there anyone here who has a clue what I've done wrong and how to fix it? I'm completely bewildered and confused at this point. Thanks! |
Quote:
-------------------- Steve Stites |
Quote:
However, from what I can tell, it appears the primary hd's main file system was glitched in an undocumented server "event" last weekend. My guess is the server center lost power sometime Sunday night and their Ops just restarted their servers without doing any sort of fsck recovery or any announcement to their client admins (moi) about what had happened. The server's response had been sluggish this week but I attributed it to heavy net loads. However, when I tried to install a new user app Friday, the proverbial defecation hit the perrenial ventilation and I was faced with a server with a glitched primary hd. That's when I went looking for the manpages for fsck... I wasn't too concerned about this at first because I knew we had a primary hd full-drive-clone backup on the secondary and several interim backups made this past week stored on the primary hd too. However, once the journaling file system had "recovered" yesterday, we'd lost all our intermediate backups -- which had been stored as "tarballs" out in "no-man's-land" on the primary 500gb hd. The "journaled" recovery basically rolled us clear back to last Sunday night shortly before the system crash occurred and about 24 hours after the drive backup to the secondary was made. (sigh...) Unfortunately, this is a brand new server. So, althogh I had backups working, I didn't yet have the normal overnight FTPs of intermediate backups to a remote B/U drive here in our data-center operating yet. All I can say in self defense is no one expects to have this sort of a hit on a server that's barely 60 days old. Still, we "recovered" in a manner of speaking if one doesn't consider a week's work lost (and 1,500 new grey hairs for me) to be a big deal :(. But we're still getting intermittent segfaults from apache on that server which is in a datacenter 1500 miles away! Thanks a lot for your insights and thoughts, Steve. It's helpful to have someone else independently confirm my own conclusions. This is one of "those situations" where there's no one else around here with the tech savvy to diagnose and repair a problem of this nature or for me to discuss this with except my wife and the cat. I love them both but I must say neither of them is terribly helpful in a situation like this. Wish me luck. This battle isn't over yet... |
You'll want to become friends with the uptime command. Just typing it into a terminal will tell you how long your machine has been on. You can then do a simple bit of addition/subtraction level math to determine how long it has been since it was turned on.
If you haven't rebooted since you did the kernel upgrade, "uptime" should be the amount of time that has passed since you rebooted, if it is less, then it is possible the data center lost power. I don't think that is very likely however. Any data center that loses power and then doesn't fess up would not keep my business. Many people have software that monitors servers, and when they start calling to discern why their software sent out alert emails during the power failure, they'd need some very crafty answers. Peace, JimBass |
Any f/s error could potentially cause a r/o remount - have a look at fstab.
|
Quote:
Thanks for the tip. I'll remember uptime. I don't know what I'd do without my helpful friends on the net! Best, GregPlatt - a.k.a. WebSissy |
Quote:
There are no errors in our fstab. And of course, noone at the server center would ever admit they restarted my system improperly. I'll probably never know exactly what caused this problem. I just know it happened. In fact, I'm still cleaning fresh manure out of my hair and off the walls, ceilings and floors! |
I wasn't trying to suggest you had a problem in fstab; do you have "errors=remount-ro" as an option ???. That might give you what you have seen.
|
All times are GMT -5. The time now is 05:25 AM. |