LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Debian (https://www.linuxquestions.org/questions/debian-26/)
-   -   Why did linux boot with main file system read only after kernel install? (https://www.linuxquestions.org/questions/debian-26/why-did-linux-boot-with-main-file-system-read-only-after-kernel-install-675598/)

websissy 10-10-2008 09:27 PM

Why did linux boot with main file system read only after kernel install?
 
A few days ago, I tried migrating mailman 2.1.11 to my Debian etch4.0r3 server by backing up my old install of the mailman app and its data from my old Redhat server (where it had been running under python 2.4) and moving it to my new Debian server and extracting it from the archive in usr/local/mailman (where it would also be running under python 2.4!)

That approach was unsuccessful. I'm not sure why. It seemed to me that what I did should have worked fine. But for some reason it did not.

So today I started over. I began by renaming the /usr/local/mailman directory (to protect its archives (data) directory contents from accidental deletion).

Next, I checked to confirm the python version installed on my debian server was still okay using python -v from the shell prompt to verify that python ran and to confirm what version it was. Python came right up saying it was v2.4.

So far so good...

My third step was to download a fresh copy of mailman 2.1.1 from the gnu server and unzip and untar it into a fresh mailman directory where it would soon be installed. As soon as that was finished, I started gradually working through the setup process to install Mailman using the admin installation guide on gnu's mailman site. However, when I got to the step that told me to run ./configure, I did that and the configure immediately bitched that there was something wrong with the python installation and insisted python should be repaired before continuing. The interesting part is python 2.4 had been installed with aptitude but had not yet been used since then because I hadn't needed it yet. It was installed specifically for the needs of this site and for the mailman application. So, I'm not sure how python got "damaged".

Okay... now I was suddenly on a whole new trouble shooting path. What I did next was fire up aptitude and uninstall python 2.4 completely with the intent of reinstalling it immediately. I uninstalled python and of course the way aptitude works it automatically removed a list of other apps that were no longer needed at the same time. When the uninstall completed, I turned around and reinstalled python and its docs along with a python runtime speedup tool named psyco all at once.

Halfway through that install, aptitude informed me it was "now installing a new version of the linux kernel" (Oops! I hadn't ASKED for or authorized a kernel upgrade. So where the devil did it come from? I dunno!). In that informational notice, aptitude recommended that I should reboot the server immediately after the install was finished so the kernel upgrade and configuration process could be completed.

Naturally, I followed those instructions to the letter, but when I went back and logged on to the system after the reboot I discovered that:

a) The root file system is now "read only"
b) my "newly installed" mailman directory seems to have completely vanished. In its place is the old original mailman directory from two weeks ago that I had renamed to mailman.save earlier today.
c) not ONE of the websites on my server can be accessed now. I'm sure that's because the primary file system is "read only"

Is there anyone here who has a clue what I've done wrong and how to fix it? I'm completely bewildered and confused at this point.

Thanks!

jailbait 10-11-2008 02:15 PM

Quote:

Originally Posted by websissy (Post 3306585)

aptitude recommended that I should reboot the server immediately after the install was finished so the kernel upgrade and configuration process could be completed.

Naturally, I followed those instructions to the letter, but when I went back and logged on to the system after the reboot I discovered that:

a) The root file system is now "read only"
b) my "newly installed" mailman directory seems to have completely vanished. In its place is the old original mailman directory from two weeks ago that I had renamed to mailman.save earlier today.
c) not ONE of the websites on my server can be accessed now. I'm sure that's because the primary file system is "read only"

These problems sound like the problems that you could get if you reboot without going through a normal shutdown. Did you reboot immediately without issuing a shutdown command?

--------------------
Steve Stites

websissy 10-12-2008 09:43 AM

Quote:

Originally Posted by jailbait (Post 3307103)
These problems sound like the problems that you could get if you reboot without going through a normal shutdown. Did you reboot immediately without issuing a shutdown command?

--------------------
Steve Stites

Thanks for the reply, Steve. It's much appreciated. I think you're right. I reached the same conclusion yesterday myself. I now believe what you described is almost exactly what happened. However, tehre is no "ctrl-alt-del button long enough to reach my server which is 1,500 miles away and the shutdown script we use does make it a point to do an orderly shutdown of everything on the server before it restarts the system. So, no, I don't THINK this is our fault.

However, from what I can tell, it appears the primary hd's main file system was glitched in an undocumented server "event" last weekend. My guess is the server center lost power sometime Sunday night and their Ops just restarted their servers without doing any sort of fsck recovery or any announcement to their client admins (moi) about what had happened. The server's response had been sluggish this week but I attributed it to heavy net loads. However, when I tried to install a new user app Friday, the proverbial defecation hit the perrenial ventilation and I was faced with a server with a glitched primary hd.

That's when I went looking for the manpages for fsck...

I wasn't too concerned about this at first because I knew we had a primary hd full-drive-clone backup on the secondary and several interim backups made this past week stored on the primary hd too.

However, once the journaling file system had "recovered" yesterday, we'd lost all our intermediate backups -- which had been stored as "tarballs" out in "no-man's-land" on the primary 500gb hd.

The "journaled" recovery basically rolled us clear back to last Sunday night shortly before the system crash occurred and about 24 hours after the drive backup to the secondary was made. (sigh...) Unfortunately, this is a brand new server. So, althogh I had backups working, I didn't yet have the normal overnight FTPs of intermediate backups to a remote B/U drive here in our data-center operating yet. All I can say in self defense is no one expects to have this sort of a hit on a server that's barely 60 days old.

Still, we "recovered" in a manner of speaking if one doesn't consider a week's work lost (and 1,500 new grey hairs for me) to be a big deal :(. But we're still getting intermittent segfaults from apache on that server which is in a datacenter 1500 miles away!

Thanks a lot for your insights and thoughts, Steve. It's helpful to have someone else independently confirm my own conclusions. This is one of "those situations" where there's no one else around here with the tech savvy to diagnose and repair a problem of this nature or for me to discuss this with except my wife and the cat. I love them both but I must say neither of them is terribly helpful in a situation like this.

Wish me luck. This battle isn't over yet...

JimBass 10-12-2008 05:19 PM

You'll want to become friends with the uptime command. Just typing it into a terminal will tell you how long your machine has been on. You can then do a simple bit of addition/subtraction level math to determine how long it has been since it was turned on.

If you haven't rebooted since you did the kernel upgrade, "uptime" should be the amount of time that has passed since you rebooted, if it is less, then it is possible the data center lost power. I don't think that is very likely however. Any data center that loses power and then doesn't fess up would not keep my business. Many people have software that monitors servers, and when they start calling to discern why their software sent out alert emails during the power failure, they'd need some very crafty answers.

Peace,
JimBass

syg00 10-12-2008 07:02 PM

Any f/s error could potentially cause a r/o remount - have a look at fstab.

websissy 10-13-2008 12:29 AM

Quote:

Originally Posted by JimBass (Post 3308048)
You'll want to become friends with the uptime command. Just typing it into a terminal will tell you how long your machine has been on. You can then do a simple bit of addition/subtraction level math to determine how long it has been since it was turned on.

If you haven't rebooted since you did the kernel upgrade, "uptime" should be the amount of time that has passed since you rebooted, if it is less, then it is possible the data center lost power. I don't think that is very likely however. Any data center that loses power and then doesn't fess up would not keep my business. Many people have software that monitors servers, and when they start calling to discern why their software sent out alert emails during the power failure, they'd need some very crafty answers.

Peace,
JimBass

You know, Jim, I've been a hands-on tech pro in the IT biz for over 4 decades. I started back in the days of punched cards, ALC and COBOL. Over the years I've installed configured and/or served as lead admin on many systems. So, when I decided to lease a server and become its sole tech and admin I had some clue what I was in for. Still, it had been years since I last worked as a *nix admin and even then I had consultants and advisors to lean on. Needless to say much has changed since '94. The web as we know it was barely on the scopes then. Thus even after 40 years and at age 58, I'm still learning and I've come to appreciate guys like you and the others here even more now than I did way back when both I and the IT world were young and innocent. ;)

Thanks for the tip. I'll remember uptime. I don't know what I'd do without my helpful friends on the net!

Best,
GregPlatt - a.k.a. WebSissy

websissy 10-13-2008 12:41 AM

Quote:

Originally Posted by syg00 (Post 3308120)
Any f/s error could potentially cause a r/o remount - have a look at fstab.

Yeah, backup... You know I've heard that word somewhere before. Must find time to look it up and figure out what they're talking about! :D

There are no errors in our fstab. And of course, noone at the server center would ever admit they restarted my system improperly. I'll probably never know exactly what caused this problem. I just know it happened. In fact, I'm still cleaning fresh manure out of my hair and off the walls, ceilings and floors!

syg00 10-13-2008 02:49 AM

I wasn't trying to suggest you had a problem in fstab; do you have "errors=remount-ro" as an option ???. That might give you what you have seen.


All times are GMT -5. The time now is 05:25 AM.