LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > Slackware
User Name
Password
Slackware This Forum is for the discussion of Slackware Linux.

Notices


Reply
  Search this Thread
Old 08-18-2006, 10:40 AM   #1
Prostetnic_Jeltz
Member
 
Registered: Feb 2006
Posts: 66

Rep: Reputation: 16
Filesystem Corruption Hell


Hi all -

slack install had been working fine, with upgrades, for over a year. occasional fs corruption problems (always on the / partition, ext3), but nothing fsck didn't fix.

yesterday in response to a simple command (no unusual activity or installs recently), the terminal spit out an error with something like "libblkid.so.1 not found" - I checked /lib and sure enough several libraries had borked permissions (like Br-?r-S?-x) and really large filesize numbers or commas in the filesize column.

ok, tried shutdown -F, but fsck wouldn't run, saying there were missing libraries. ok, boot from install disk and try fsck on the / partition, but fsck came back saying the filesystem was clean! tried everything I could think of with fsck, no luck.

I then tried directly copying over the corrupted .so's from the install disk, but I can't remove the corrupted ones ("operation not permitted"). I tried recreating a new /lib, but the system didn't like that, init couldn't run, and eventually got I kernel panics over the version of libc and whatever it is trying to read (the installed system is upgraded over the install disks which are 10.1)

I'm lost now -- any suggestions as to how to fix this, get fsck to work, or how to recreate a /lib on a non-bootable system!?

thanks for any help at all

Last edited by Prostetnic_Jeltz; 08-18-2006 at 10:41 AM.
 
Old 08-18-2006, 11:15 AM   #2
Franklin
Senior Member
 
Registered: Oct 2002
Distribution: Slackware
Posts: 1,348

Rep: Reputation: 217Reputation: 217Reputation: 217
Quote:
Originally Posted by Prostetnic_Jeltz
Hi all -

slack install had been working fine, with upgrades, for over a year. occasional fs corruption problems (always on the / partition, ext3), but nothing fsck didn't fix.
That does not seem like a good situation - a warning of things to come maybe?

Quote:
Originally Posted by Prostetnic_Jeltz
yesterday in response to a simple command (no unusual activity or installs recently), the terminal spit out an error with something like "libblkid.so.1 not found" - I checked /lib and sure enough several libraries had borked permissions (like Br-?r-S?-x) and really large filesize numbers or commas in the filesize column.

ok, tried shutdown -F, but fsck wouldn't run, saying there were missing libraries. ok, boot from install disk and try fsck on the / partition, but fsck came back saying the filesystem was clean! tried everything I could think of with fsck, no luck.

[snipped the rest]

I'm lost now -- any suggestions as to how to fix this, get fsck to work, or how to recreate a /lib on a non-bootable system!?

thanks for any help at all
Well, I'm no expert but, I would suspect some long developing hardware problem. If you used a sane partion scheme (i.e. /home, /usr/local, other partitions with important data, on separate partitions), I would reinstall from scratch, without formating the partitions you need to save. Then, when you have a bootable system, save your data and then try to ID the problem - HD, Mem, motherboard etc.

If everything is installed to one, large / partition, I don't have a good suggestion other than booting a live CD (slackware disk 2 should work) and mounting the old root - again with the aim to save your data rather than save the install.

Perhaps others can suggest something else.
 
Old 08-18-2006, 12:56 PM   #3
Prostetnic_Jeltz
Member
 
Registered: Feb 2006
Posts: 66

Original Poster
Rep: Reputation: 16
thanks for your reply, Franklin. I agree with your suggestion on hardware - but on the other hand, I have several partitions on 2 drives, and it only ever occurred on one partition (the only one which is ext3, btw), so I'm holding out hope it could be non-hardware related. in any event, there's some underlying problem there for sure.

on the data, I have important stuff backed up, and I have a second drive, so I could transfer anything important using a live cd (I think - haven't tried it, but it should work). but it would take a lot of time to get everything all set up again -

and of course, doing a reinstall instead of solving it (if there's a solution) irks me as a geek, on principle
 
Old 08-18-2006, 01:38 PM   #4
Franklin
Senior Member
 
Registered: Oct 2002
Distribution: Slackware
Posts: 1,348

Rep: Reputation: 217Reputation: 217Reputation: 217
Quote:
Originally Posted by Prostetnic_Jeltz
thanks for your reply, Franklin. I agree with your suggestion on hardware - but on the other hand, I have several partitions on 2 drives, and it only ever occurred on one partition (the only one which is ext3, btw), so I'm holding out hope it could be non-hardware related. in any event, there's some underlying problem there for sure.
I didn't mention this before because I can't verify what actually was the cause, but I had a similar issue with a drive that was on my server. The server ran slackware 10.2 with 2.4.32. There was one drive partitioned as swap, /, /home, and /data. /home and / were formatted with reiser. /data was formatted ext3. One day, I could not access one of my directories on the /data partition (ext3) due to corruption and an interesting error that I unfortunately can't remember now. Long story short, I was able to recover my data but I had to rebuild the journal to do it and it was messy - lost all file names but saved everything.

Anyway, I tested the drive over and over with several utilities and I can't find anything wrong with it. I switched to all reiserfs and things have been fine since. Never had an issue with ext3 before and I don't know what caused this.

I still have the drive running, but I don't really trust it 100%. I did find it interesting that after pulling everything out of the server I could not start it again - dead power supply.
 
Old 08-19-2006, 04:05 AM   #5
gnashley
Amigo developer
 
Registered: Dec 2003
Location: Germany
Distribution: Slackware
Posts: 4,928

Rep: Reputation: 612Reputation: 612Reputation: 612Reputation: 612Reputation: 612Reputation: 612
reiserfs is probably a better choice.
 
Old 08-19-2006, 08:27 AM   #6
davidsrsb
Member
 
Registered: Oct 2003
Location: Kuala Lumpur, Malaysia
Distribution: Slackware 13.37 current
Posts: 770

Rep: Reputation: 33
I have only ever had file system corruption on machines with bad motherboard/ram
I have had both ext3 and reiserfs3 fail irreversably when this has happened.
Bad hardware is fatal to a Linux box.
 
Old 08-19-2006, 10:32 AM   #7
salmaklak
LQ Newbie
 
Registered: Jun 2005
Distribution: Slackware, Ubuntu on my PS3
Posts: 22

Rep: Reputation: 15
1. smartmontools(dot)sourceforge(dot)net.
2. 18 inches maximum for IDE leads.
Good luck.
 
Old 08-19-2006, 12:22 PM   #8
ledow
Member
 
Registered: Apr 2005
Location: UK
Distribution: Slackware 13.0
Posts: 241

Rep: Reputation: 34
I have to agree with davidsrsb here - there isn't a filesystem in existence that can adequately compensate for faulty bits being written to disk. It's just not supposed to happen, in the same way that if you get a bit-error in a vital part of RAM, your computer will not handle it (unless you use ECC RAM and even then it's not necessarily guaranteed - it does a better job at noticing corruption but it can't always correct it).

In terms of filesystems, reiser and ext3 are just as susceptible to faulty bits as any other - the only advantage they have is their journalling which means that bits are double-checked if a crash should occur in the middle of it being written. That does not mean that they will recover from bits that "change" on the disk afterwards, e.g. bad sectors, faulty RAM etc.

Personally, I've never seen either reiser or ext3 "recover" better than the other when it comes to random filesystem corruption. It all depends on the luck of the draw as to where the changed bits are, what part of the filesystem that hits, how easy it is to detect that a bit has gone wrong (internal checksums, copies of the FAT etc.), how easy it is to "guess" what was meant by the corrupted part (e.g. recreating from checksums, using a second copy of the index, etc.). In practical terms, reiser and ext3 and most other common filesystems have next-to-no checks that anything that's not in a journal is "intact". If they do, they very, very rarely have any information which would aid recovery of that filesystem by an automated system (though a human could probably have a good stab).

Last edited by ledow; 08-19-2006 at 12:23 PM.
 
Old 08-19-2006, 03:23 PM   #9
Prostetnic_Jeltz
Member
 
Registered: Feb 2006
Posts: 66

Original Poster
Rep: Reputation: 16
many thanks for the replies, everybody.

once I get an installation up and running again, I'll start investigating for hardware problems.... a friend mentioned that he would suspect the power supply, and I'll try smartmontools. also I found this app which seems to be perfect for testing memory: http://www.memtest86.com/ I didn't know there was a cable length limit for ide - I think they're ok, but I'll check that too, as soon as I find a tape measure
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
DISCUSSION: Virtual Filesystem: Building a Linux Filesystem from an Ordinary File mchirico LinuxAnswers Discussion 0 10-28-2004 10:35 PM
Encrypted Root Filesystem HOWTO and /dev filesystem tmillard Linux From Scratch 0 10-18-2004 03:58 PM
Filesystem corruption on software RAID5 drkdiggler Linux - General 6 02-23-2004 08:39 PM
corruption issue, please HELP iceman42 Linux - Hardware 3 09-18-2003 09:39 PM
Squid corruption silva Linux - Newbie 2 07-18-2003 01:18 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > Slackware

All times are GMT -5. The time now is 04:11 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration