SlackwareThis Forum is for the discussion of Slackware Linux.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
slack install had been working fine, with upgrades, for over a year. occasional fs corruption problems (always on the / partition, ext3), but nothing fsck didn't fix.
yesterday in response to a simple command (no unusual activity or installs recently), the terminal spit out an error with something like "libblkid.so.1 not found" - I checked /lib and sure enough several libraries had borked permissions (like Br-?r-S?-x) and really large filesize numbers or commas in the filesize column.
ok, tried shutdown -F, but fsck wouldn't run, saying there were missing libraries. ok, boot from install disk and try fsck on the / partition, but fsck came back saying the filesystem was clean! tried everything I could think of with fsck, no luck.
I then tried directly copying over the corrupted .so's from the install disk, but I can't remove the corrupted ones ("operation not permitted"). I tried recreating a new /lib, but the system didn't like that, init couldn't run, and eventually got I kernel panics over the version of libc and whatever it is trying to read (the installed system is upgraded over the install disks which are 10.1)
I'm lost now -- any suggestions as to how to fix this, get fsck to work, or how to recreate a /lib on a non-bootable system!?
thanks for any help at all
Last edited by Prostetnic_Jeltz; 08-18-2006 at 10:41 AM.
slack install had been working fine, with upgrades, for over a year. occasional fs corruption problems (always on the / partition, ext3), but nothing fsck didn't fix.
That does not seem like a good situation - a warning of things to come maybe?
Quote:
Originally Posted by Prostetnic_Jeltz
yesterday in response to a simple command (no unusual activity or installs recently), the terminal spit out an error with something like "libblkid.so.1 not found" - I checked /lib and sure enough several libraries had borked permissions (like Br-?r-S?-x) and really large filesize numbers or commas in the filesize column.
ok, tried shutdown -F, but fsck wouldn't run, saying there were missing libraries. ok, boot from install disk and try fsck on the / partition, but fsck came back saying the filesystem was clean! tried everything I could think of with fsck, no luck.
[snipped the rest]
I'm lost now -- any suggestions as to how to fix this, get fsck to work, or how to recreate a /lib on a non-bootable system!?
thanks for any help at all
Well, I'm no expert but, I would suspect some long developing hardware problem. If you used a sane partion scheme (i.e. /home, /usr/local, other partitions with important data, on separate partitions), I would reinstall from scratch, without formating the partitions you need to save. Then, when you have a bootable system, save your data and then try to ID the problem - HD, Mem, motherboard etc.
If everything is installed to one, large / partition, I don't have a good suggestion other than booting a live CD (slackware disk 2 should work) and mounting the old root - again with the aim to save your data rather than save the install.
thanks for your reply, Franklin. I agree with your suggestion on hardware - but on the other hand, I have several partitions on 2 drives, and it only ever occurred on one partition (the only one which is ext3, btw), so I'm holding out hope it could be non-hardware related. in any event, there's some underlying problem there for sure.
on the data, I have important stuff backed up, and I have a second drive, so I could transfer anything important using a live cd (I think - haven't tried it, but it should work). but it would take a lot of time to get everything all set up again -
and of course, doing a reinstall instead of solving it (if there's a solution) irks me as a geek, on principle
thanks for your reply, Franklin. I agree with your suggestion on hardware - but on the other hand, I have several partitions on 2 drives, and it only ever occurred on one partition (the only one which is ext3, btw), so I'm holding out hope it could be non-hardware related. in any event, there's some underlying problem there for sure.
I didn't mention this before because I can't verify what actually was the cause, but I had a similar issue with a drive that was on my server. The server ran slackware 10.2 with 2.4.32. There was one drive partitioned as swap, /, /home, and /data. /home and / were formatted with reiser. /data was formatted ext3. One day, I could not access one of my directories on the /data partition (ext3) due to corruption and an interesting error that I unfortunately can't remember now. Long story short, I was able to recover my data but I had to rebuild the journal to do it and it was messy - lost all file names but saved everything.
Anyway, I tested the drive over and over with several utilities and I can't find anything wrong with it. I switched to all reiserfs and things have been fine since. Never had an issue with ext3 before and I don't know what caused this.
I still have the drive running, but I don't really trust it 100%. I did find it interesting that after pulling everything out of the server I could not start it again - dead power supply.
I have only ever had file system corruption on machines with bad motherboard/ram
I have had both ext3 and reiserfs3 fail irreversably when this has happened.
Bad hardware is fatal to a Linux box.
I have to agree with davidsrsb here - there isn't a filesystem in existence that can adequately compensate for faulty bits being written to disk. It's just not supposed to happen, in the same way that if you get a bit-error in a vital part of RAM, your computer will not handle it (unless you use ECC RAM and even then it's not necessarily guaranteed - it does a better job at noticing corruption but it can't always correct it).
In terms of filesystems, reiser and ext3 are just as susceptible to faulty bits as any other - the only advantage they have is their journalling which means that bits are double-checked if a crash should occur in the middle of it being written. That does not mean that they will recover from bits that "change" on the disk afterwards, e.g. bad sectors, faulty RAM etc.
Personally, I've never seen either reiser or ext3 "recover" better than the other when it comes to random filesystem corruption. It all depends on the luck of the draw as to where the changed bits are, what part of the filesystem that hits, how easy it is to detect that a bit has gone wrong (internal checksums, copies of the FAT etc.), how easy it is to "guess" what was meant by the corrupted part (e.g. recreating from checksums, using a second copy of the index, etc.). In practical terms, reiser and ext3 and most other common filesystems have next-to-no checks that anything that's not in a journal is "intact". If they do, they very, very rarely have any information which would aid recovery of that filesystem by an automated system (though a human could probably have a good stab).
once I get an installation up and running again, I'll start investigating for hardware problems.... a friend mentioned that he would suspect the power supply, and I'll try smartmontools. also I found this app which seems to be perfect for testing memory: http://www.memtest86.com/ I didn't know there was a cable length limit for ide - I think they're ok, but I'll check that too, as soon as I find a tape measure
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.