LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices


Reply
  Search this Thread
Old 10-15-2009, 03:33 PM   #1
zman2245
LQ Newbie
 
Registered: Mar 2009
Location: San Francisco, CA
Posts: 26

Rep: Reputation: 15
Serial console always stuck after EXT3-FS error


Hello experts:
This has happend to me twice in the last month now. The first time I ended up reinstalling Linux. Basically, I saw some strange behavior followed by an EXT3-FS error. Now even after multiple power cycles, the serial console does not function. The box boots up fine, just the serial console is stuck. I did some googling on this error, and everyone pretty much says that rebooting should "fix" the EXT-FS error; however, I haven't been able to find anything related to serial console problems.

Logs are below...

I am running RHEL 5.3 64-bit (kernel version 2.6.18)

Thanks,
Zack

The "strange behavior" was:
[root@Ares fast_path_harness]# make
make -C /lib/modules/2.6.18-120.el5/build M=/root/fast_path_harness modules
make[1]: Entering directory `/usr/src/kernels/2.6.18-120.el5-x86_64'
/root/fast_path_harness/.fp_cleanup.o.cmd:1: warning: NUL character seen; rest of line ignored
/root/fast_path_harness/.fp_cleanup.o.cmd:1: *** missing separator. Stop.
make[1]: *** [_module_/root/fast_path_harness] Error 2

Then after some time this got output to console:
ext3_abort called
EXT3-fs error (device dm-0): ext3_journal_start_sb: Detected aborted journal
Remounting filesystem read-only


After power-cycle serial console is stuck here:
Bios Version: S5500.86B.01.00.0038.060120091503
Platform ID: S5520UR
8192 MB system memory found
Current Memory Speed: 1067 MHz
Intel(R) Xeon(R) CPU E5520 @ 2.27GHz
Intel(R) Xeon(R) CPU E5520 @ 2.27GHz
USB keyboard detected
USB mouse detected
 
Old 10-19-2009, 04:22 AM   #2
rylan76
Senior Member
 
Registered: Apr 2004
Location: Potchefstroom, South Africa
Distribution: Fedora 17 - 3.3.4-5.fc17.x86_64
Posts: 1,552

Rep: Reputation: 103Reputation: 103
Hmm are you sure you don't

A. have a HDD that is slowly going bad
B. have a hardware fault
C. bad RAM

?

Your core problem is that your filing system seems to be going corrupt. With ext-3 this usually is due to hardware failure, as logically it is very stable and quite hard to crash with an unexpected power-outage or other error.

It can also be caused by a bad DIMM or SIMM, I once had a vaguely similar filesystem problem (that also necessitated me to reinstall) Fedora 6 and it turned out it was a bad RAM chip. Apparently some data was cached, got corrupted on the bad RAM chip, and was then written to disk and this disturbed EXT-3.
 
Old 10-19-2009, 04:11 PM   #3
zman2245
LQ Newbie
 
Registered: Mar 2009
Location: San Francisco, CA
Posts: 26

Original Poster
Rep: Reputation: 15
rylan, thanks. It is a brand new board from Intel I am using at work for developement, so it definitely shouldn't have parts going bad. But of course it's possible that a DIMM is corrupted.

I wonder, would crashing the kernel many times cause something like this? I am doing quite a bit of kernel dev...

Also, one other update to my original post - I was mistaken in saying the board boots fine. It actually does not boot after hitting this error. The boot gets stuck at the screen:

______________________________________________________________________
root (hd0, 0)
<other grub stuff>...
i8042.c: No controller found (not an issuefrom what I've read)





kernel alive
kernel direct mapping tables up to 270000000 @ 8000-13000
______________________________________________________________________

Thanks,
Zack
 
Old 10-20-2009, 02:42 AM   #4
rylan76
Senior Member
 
Registered: Apr 2004
Location: Potchefstroom, South Africa
Distribution: Fedora 17 - 3.3.4-5.fc17.x86_64
Posts: 1,552

Rep: Reputation: 103Reputation: 103
Quote:
Originally Posted by zman2245 View Post
rylan, thanks. It is a brand new board from Intel I am using at work for developement, so it definitely shouldn't have parts going bad. But of course it's possible that a DIMM is corrupted.
You're right of course, but maybe you just have a new-build, but bad board? Have you tried exchanging under warranty? I've had this a few times - brand new HW that is bad right out of the box - and simply exchanged under warranty for other parts of the same model. I thought about this a bit and yeah, it could be a bad DIMM but I think it is a bit of a long shot to have a bad RAM chip completely corrupt the filing system and preventing a boot. So it might still be something else.

Quote:
I wonder, would crashing the kernel many times cause something like this? I am doing quite a bit of kernel dev...
Not sure... it might! Did you have a lot of kernel crashes / panics on an untweaked / virgin kernel on that system? This can also be an indicator of bad RAM or hardware. Modern Linux kernels are absurdly stable to the point of being completely ridiculous - especially a "stock" distro kernel (a kernel you compile and tweak yourself won't necessarily be so stable, of course). I've been playing with Linux on and off for about six years now, and I think so far I've seen three (count 'em - THREE) kernel panics, all down to bad hardware. But I doubt if most kernel panic modalities can corrupt the filing system. As I have it, that's the point of a kernel panic - it stops the kernel before it starts doing arbitrary stuff like executing data (instead of code) and / or wiping files and corrupting discs.

Quote:
Also, one other update to my original post - I was mistaken in saying the board boots fine. It actually does not boot after hitting this error. The boot gets stuck at the screen:

______________________________________________________________________
root (hd0, 0)
<other grub stuff>...
i8042.c: No controller found (not an issuefrom what I've read)
Thanks,
Zack
Does the above occur with a stock kernel, or one of the kernels you have worked on?

As far as I know RHEL is supposed to be for really stable hosting and server work... it might be better to play with kernels with something less stable / intended for serious work like Fedora (the "community" version of RHEL)?

I think at this point you can try a few more things:

1. Run memtest

Most distros' live DVDs or CDs has an option to run memtest86 on the system, without booting anything, direct off the disc. I discovered a bad DIMM once by leaving memtest86 running all night - only then did one of the chips turn up bad. Leaving it for an hour or so during the workday did not find the problem. Maybe you can try it overnight too?

2. Try and boot off a rescue DVD / CD

Since your system won't boot off the installed disc, how about this? Especially if you've done some kernel dev (and maybe your work has tainted the kernel so badly it can't run) this will point it out. If it boots off the rescue disc, you're the culprit!

3. Kernel parameter all-generic-ide

Long shot, I've had this parameter make a non-booting install suddenly start working off the fixed HDD drive which was on a SATA controller (not stock IDE).

4. Another distro with a newer kernel?

If none of the above work, try getting another distro completely with a possibly newer kernel version? If this works, it means that your hardware is fine.

5. Windows

Sacrilege! But if all else fails and you can run Windows on that system, try it... if it works, it obviously means Linux is the culprit. This will also establish if you are experiencing hardware of software troubles.

Last edited by rylan76; 10-20-2009 at 02:44 AM.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
serial console +plx tech console server card pankajd Linux - Hardware 1 10-14-2009 01:45 PM
serial console pvpnguyen Red Hat 1 11-17-2007 01:42 PM
serial connection stuck sebastien.lorandel Linux - Software 5 08-28-2007 09:02 AM
serial console - working, then "stuck" output and can't login srfeo Linux - General 0 03-29-2007 04:45 AM
serial console ixion Linux - Software 3 03-19-2003 12:36 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Server

All times are GMT -5. The time now is 06:46 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration