Linux - GeneralThis Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I did something stupid. I did a CTRL-Alt-Del while the system was starting up which forced the system to shut down. I think this screwed up my system. The filesystem is partially shot. Check out the following partial sample of my / directory with ls -la:
You can see my /root and /usr folders are no longer folders but have turned into very large corrupted files. I've shown the /sbin directory as a comparison.
What on earth is going on? How do I fix this? I tried a linux resue and then fsck against the partition which gave me the message "Journal restored". But nothing changed.
Thanks, Tim. I've passed a fsck -f /dev/hda2 (also with the yes option - can't remember the flag) and there are literally thousands of pages of corrupted inodes being processed. I'll post the result as soon as it stops.
The drive is a 4 week old 160GIG WD drive.
Update: Still going over an hour. I'll stop it and run it all day tomorrow. Crazy amount of errors.
Last edited by sausagejohnson; 06-26-2004 at 08:15 AM.
I'm not prepared to say that the drive is kaput just yet. Although after running fsck for several hours with hundreds of thousands of inodes being constantly repaired. Probably around 600,000 inode repairs so far.
The fact is that I have been working in the windows partition at the end of the drive for several days now working a DV editing project. As you might know with DV editing, software crashes occasionally and so once or twice, the software locked up and hard reboot was needed.
Of course, I would go to the GRUB screen and select Windows 98. Then, the DOS SCANDISK screen came up and I let it do it's thing. Now I am beginning to wonder if this program somehow went cross-partition and caused the damage. This seems unthinkable except that it is unusual for windows to live at the end of a drive as it does in my setup.
Anyhow, there is no data corruption in the 30 GIG windows partition, but seemingly massive corruption in the 120 GIG linux partition. I'll continue to run fsck tonight (it picks up from where it left off) and see if the damage repair completes itself.
How many inodes are there in 120GIG of space? How long will this take does anyone know?
You mentioned that this is a 4-week-old 160GB drive. You're above the threshold for LBA48 there, so there is a possibility that you've just been bit by coincidence.
What version of the Linux kernel are you running, and how is your drive connected to your machine?
I'm running Redhat 9. It's always been a faithful distribution. I'm connecting via IDE using one of those new fangled IDE cables. The board is a Gygabyte 7n400Pro. Not using a SATA drive.
Cool. I looked up your board and it's based on the nForce chipset. That did ring a bell on something that I came across today trying to debug similar problems on my system. This was mentioned on a posting I saw:
Quote:
The ATA100 support for nforce2 boards is mature in kernel versions 2.4.24 and 2.6.3. Just be sure to enable the kernel's nforce2 IDE driver.
Thanks for the document reference. I will go through that. I guess tonight, I'll finish up the fsck and see what's left of the system. My /work directory was untouched and I backed it up so no worries about loss.
I was thinking, too, that this might be Windows 98 not handling the large drive correctly. Did you notice the corrupt files before or after windows did its scandisk? I just can't help but go back to the fact that you have an LBA48 drive which may or may not be handled correctly by both of these operating systems.
I can't remember if it was before or after. Sorry, I know that doesn't help. The thing that makes me doubt that the OSes couldn't handle the drive is the fact that everything was perfect for about month. I have installed games to windows, no problems, and have been working mostly in linux for over a month on various bits and pieces.
For the last week, I have been capturing large DV files and doing huge edits in windows. I would imagine if windows wasn't able to handle it, my DV files should have becoming corrupted. Instead, I created perfect edits and exported them back to DV tape without any glitches.
Anyhow, I stopped fsck after many hours last night and checked the file structure with the boot CD. linux rescue can no longer see any linux files at all (at the beginning of this thread I had a file structure and only /usr and /root were damaged), so basically all those fsck inode repairs were simply destroying what little filestructure was left.
Tonight I will reinstall linux, and then go into windows and deliberately reboot without shutting down, and perform a DOS scandisk to see once and for all if that caused the problem.
Good point on the Windows bit. A couple of things I thought of last night that I'd like to see.[list=1][*]How big does your BIOS say your drive is (in sectors)?[*]What does hdparm -I say about your drive?[*]What does fdisk -lu say about your drive? [/list=1]
Once again, going back to the LBA48 support, your BIOS should say something around 312,000,000 sectors if you have a 160GB drive. If it's showing less, your BIOS doesn't support LBA48 and you'll want to upgrade. This may or may not be the source of the problem, but is certainly something to look at.
The question about hdparm is based on what I've seen with my Maxtor 200GB drive. Here's what it says:
Code:
[root@webserver /]# hdparm -I /dev/hdb
/dev/hdb:
ATA device, with non-removable media
Model Number: Maxtor 6Y200P0
Serial Number: Y617WD4E
Firmware Revision: YAR41BW0
Standards:
Supported: 7 6 5 4
Likely used: 7
Configuration:
Logical max current
cylinders 16383 65535
heads 16 1
sectors/track 63 63
--
CHS current addressable sectors: 4128705
LBA user addressable sectors: 268435455
LBA48 user addressable sectors: 398297088
device size with M = 1024*1024: 194481 MBytes
device size with M = 1000*1000: 203928 MBytes (203 GB)
Capabilities:
LBA, IORDY(can be disabled)
Queue depth: 1
Standby timer values: spec'd by Standard, no device specific minimum
R/W multiple sector transfer: Max = 16 Current = 16
Advanced power management level: unknown setting (0x0000)
Recommended acoustic management value: 192, current value: 254
DMA: mdma0 mdma1 mdma2 udma0 udma1 *udma2 udma3 udma4 udma5 udma6
Cycle time: min=120ns recommended=120ns
PIO: pio0 pio1 pio2 pio3 pio4
Cycle time: no flow control=120ns IORDY flow control=120ns
Commands/features:
Enabled Supported:
* NOP cmd
* READ BUFFER cmd
* WRITE BUFFER cmd
* Host Protected Area feature set
* Look-ahead
* Write cache
* Power Management feature set
Security Mode feature set
* SMART feature set
* FLUSH CACHE EXT command
* Mandatory FLUSH CACHE command
* Device Configuration Overlay feature set
* 48-bit Address feature set
* Automatic Acoustic Management feature set
SET MAX security extension
Advanced Power Management feature set
* DOWNLOAD MICROCODE cmd
* SMART self-test
* SMART error logging
Security:
Master password revision code = 65534
supported
not enabled
not locked
not frozen
not expired: security count
not supported: enhanced erase
HW reset results:
CBLID- above Vih
Device num = 1 determined by CSEL
Checksum: correct
Take a look at the bold-faced Configuration section. Note that under straight LBA, I could only address about 137GB (in hard drive marketing Gig's (1,000,000,000 bytes); however, in LBA48 mode the entire drive is addressable.
Finally, fdisk -lu gives you a dump of the partition table. Here's what my 200GB drive looks like:
Code:
[root@webserver /]# fdisk -lu /dev/hdb
Disk /dev/hdb: 255 heads, 63 sectors, 24792 cylinders
Units = sectors of 1 * 512 bytes
Device Boot Start End Blocks Id System
/dev/hdb1 * 63 398283479 199141708+ 83 Linux
Notice that my Linux partition starts on sector 63 and extends to sector 398,283,479, which, if you look at the LBA48 user addressable sectors, you'll see is slightly less than the maximum of 398,297,088.
I think that, if you check these things, it might reveal a clue to what's going on. Aside from that, and checking the drivers, I can't think of anything else that might contribute to the issue.
Let me know how your rebuild goes and please post the results of those commands.
Thanks, bwyer. I'll run these test and post the results after the rebuild. I don't remember ever seeing an LBA48 entry in my BIOS, but I would imagine being a new nforce2 board it would have that. Still, we'll see. Thanks for your advice. I'll get the results up tomorrow.
Hi bwyer,
I was very busy last night and I did all of the following:
1) Started the RH9 install and preserved my /boot on /dev/hda1
2) Selected format for dev/hda2 and then did a bad blocks check
3) Bad blocks were located and RH9 recommended that I did NOT use this drive. Hmmm... not good.
4) I downloaded the Western Digital diagnostic tool and did the fast test and it came back saying the drive was clean. I did not run the extended test because it recommended that I backup the drive and I didn't want it to destroy the windows side.
5) Decided to push ahead with the install, and it did so without any problems. Linux runs beautifully again, and so is my existing windows partition at the end of the drive.
6) Checked BIOS, and it says the following about my drive:
/dev/hda:
ATA device, with non-removable media
Model Number: WDC WD1600JB-00FUA0
Serial Number: WD-WMAER1147969
Firmware Revision: 15.05R15
Standards:
Supported: 6 5 4 3
Likely used: 6
Configuration:
Logical max current
cylinders 16383 65535
heads 16 1
sectors/track 63 63
--
CHS current addressable sectors: 4128705
LBA user addressable sectors: 268435455
LBA48 user addressable sectors: 312579695
device size with M = 1024*1024: 152626 MBytes
device size with M = 1000*1000: 160040 MBytes (160 GB)
Capabilities:
LBA, IORDY(can be disabled)
bytes avail on r/w long: 74 Queue depth: 1
Standby timer values: spec'd by Standard, with device specific minimum
R/W multiple sector transfer: Max = 16 Current = 16
Recommended acoustic management value: 128, current value: 254
DMA: mdma0 mdma1 mdma2 udma0 udma1 *udma2 udma3 udma4 udma5
Cycle time: min=120ns recommended=120ns
PIO: pio0 pio1 pio2 pio3 pio4
Cycle time: no flow control=120ns IORDY flow control=120ns
Commands/features:
Enabled Supported:
* READ BUFFER cmd
* WRITE BUFFER cmd
* Host Protected Area feature set
* Look-ahead
* Write cache
* Power Management feature set
Security Mode feature set
SMART feature set
* FLUSH CACHE EXT command
* Mandatory FLUSH CACHE command
* Device Configuration Overlay feature set
* 48-bit Address feature set
Automatic Acoustic Management feature set
SET MAX security extension
* DOWNLOAD MICROCODE cmd
* SMART self-test
* SMART error logging
Security:
supported
not enabled
not locked
not frozen
not expired: security count
not supported: enhanced erase
HW reset results:
CBLID- above Vih
Device num = 0 determined by CSEL
Checksum: correct
8) I ran fdisk -lu /dev/hda and got:
Code:
Disk /dev/hda: 160.0 GB, 160040803840 bytes
255 heads, 63 sectors/track, 19457 cylinders, total 312579695 sectors
Units = sectors of 1 * 512 = 512 bytes
Device Boot Start End Blocks Id System
/dev/hda1 63 160649 80293+ 83 Linux
/dev/hda2 160650 249039629 124439490 83 Linux
/dev/hda3 * 249039630 310472189 30716280 c Win95 FAT32 (LBA)
/dev/hda4 310472190 312576704 1052257+ 82 Linux swap
9) Then I went to windows (dos mode) and did a scandisk c: /CHECKONLY. It came up not errors, and it didn't affect linux. So I can rule that out.
So all looks pretty good and close to what you said. I should run the entended test anyway I guess, but I think this was just a freak thing. I guess I will continue to backup on a regular basis. A shame I can't bring any closure to this for others but Redhat's indication that there may be bad blocks on the drive is still a concern. Perhaps I'll unmount my /dev/hda2 and fun a read-only deep check using fsck.
bwyer, thank you for all your assistence with this. I appreciate the time you took.
Last edited by sausagejohnson; 06-29-2004 at 06:30 PM.
Looks like you've got all of your bases covered. The main concerns I had don't look like they were an issue, considering the fact that your BIOS did recognize your drive correctly, as does Linux.
So, in summary, you've determined the following (lemme see if I have the facts straight):
Your BIOS recognizes your drive correctly, so it has to support LBA48
Windows 98 works fine with your system (it probably wouldn't have if the above weren't true) and SCANDISK didn't corrupt
Western Digital's diagnostics did not identify any issues with your drive
Given these facts, it appears that there can't be anything wrong with your drive, at least mechanically and from a DOS perspective. Now, taking the Linux side of things:
The RH9 installer received errors on a Bad Block Scan (which obviously weren't really bad blocks because the drive passed diags)
Linux reports the correct geometry that happens to match the BIOS (good)
We know that there is a custom IDE driver for Linux for your particular chipset
I'd say that you need to double-check the driver. By virtue of the fact that a custom driver was written for this chipset implies that there's some deficiency in the base IDE driver that makes it incompatible with this chipset. It is possible that the deficiency is causing corruption.
I did just come across the changelist for the 2.4.21 kernel in searching Google that mentions that the first direct support for the Nforce2 IDE controller was 2.4.21-pre4. I also found that the Nforce2 IDE is on the HCL. I'm guessing that the custom drivers were required prior to 2.4.21, or there may have been some bugfixes later.
I also found some fixes for some issues with the Nforce2 chipset in 2.4.26.
The bottom line: Probably your best bet would be to make sure you're on the latest kernel you can get for RH9. I think RedHat Network still works enough to get you to the latest release. Either that, or build you a custom kernel.
In any case, good luck and sorry I couldn't find you a quick fix.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.