LinuxQuestions.org - Slackware 10.2 HPT372 3 Drive RAID Storage failing

- Slackware (https://www.linuxquestions.org/questions/slackware-14/)

- - Slackware 10.2 HPT372 3 Drive RAID Storage failing (https://www.linuxquestions.org/questions/slackware-14/slackware-10-2-hpt372-3-drive-raid-storage-failing-404318/)

Slackware 10.2 HPT372 3 Drive RAID Storage failing

I have a problem... I have been a linux user for almost 10 years, and I have run across things of this nature... but now it is happening to me:

I have a good little system, it's kind of a dual boot., but not logically. I have a Windows Drive on a tray and a Linux Drive on a tray, I shut down and swap. I then have internal storage which is now in the form of a 3 drive RAID in JBOD mode (Just one big Disk). I have used this controller (which is built onto the MB) for just single drives, and now I have expanded and need the extra storage (though it is only total of 80GB, I don't have the $$ to buy a single 80GB drive) to swap between windows and linux.

As stated, earlier, when there was only 1 drive, all was well and good, but now that there is 3, there is an issue. The drives are logically one, and it is formatted in FAT32 (so I can read and write from either OS... I had a bad exp. with captiveNTFS, because my system is not windows XP SP1, its SP2).

I can mount my "RAID" (/dev/hde1) for a short period of time, from the command line, and list the contents, then unmount it. if I do any more than that... the drives get into a loop and I have to re-start the machine. I am using KDE, and I know kded can do that some times, but I have tried shutting it down, to no avail.

within my fstab, I have tried a few different configurations, and they all have done the same thing, but I left them just to show.
my fstab is as follows:

/dev/hda3 swap swap defaults 0 0
/dev/hda2 / ext3 defaults 1 1
/dev/hda1 /boot ext3 defaults 1 2
/dev/hda4 /home reiserfs defaults 1 2

#################### RAID Devices ####################
#/dev/hde1 /mnt/raid-1 vfat noauto,users,exec,rw 1 0
#/dev/hde1 /mnt/raid-1 vfat defaults 1 0
/dev/hde1 /mnt/raid-1 vfat noauto,users,exec 0 0
#####################################################

#################### Removable Storage ###############
/dev/cdrom /mnt/cdrom auto noauto,owner,ro 0 0
/dev/fd0 /mnt/floppy auto noauto,owner 0 0
######################################################
devpts /dev/pts devpts gid=5,mode=620 0 0
proc /proc proc defaults 0 0

This is just after a clean install (minus the home folder), so I have not added my DVD-ROM drive into the picture yet, but other than that, that should be it.

Thanks
Matt-

Quote:

Originally Posted by ERRDivideByZero

I can mount my "RAID" (/dev/hde1) for a short period of time, from the command line, and list the contents, then unmount it. if I do any more than that... the drives get into a loop and I have to re-start the machine.

Would love to help but I would need more details. When you say "loop" what do you mean? Do you mean that the filesystem loops round to root again? Or the computer goes into a infinite loop? I don't understand.

What do the logs say? Why not boot into single-user console mode and try it out and see what happens? Have you tried a scandisk (or equivalent) of the disks to check their integrity?

My bet is that your logs are screaming with all sorts of drive errors. I'd boot single-user mode and see what happens, while watching the system logs.

Additionally, you do understand that a JBOD is not really a raid at all? There's certainly no data integrity so if there is a problem on any one of those disks, chances are you've lost most of your data.

Additionally, a motherboard controller is not ideal as I bet it probably doesn't even allow you to access SMART information for the hard drives that are on it, which would mean that you wouldn't have had any warning at all of them being about to fail.

By loop, I mean that the drive just errors as it is running and the only way that I can see them is if I shut the machine down (init 6 or 0), I don't know which log to look at, but I am sure you are right.... they are probably about to explode.... though I looked @ dmseg and it has some interesting things in it:

ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
HPT372: IDE controller at PCI slot 00:0f.0
HPT372: chipset revision 5
HPT372: not 100% native mode: will probe irqs later
HPT37X: using 33MHz PCI clock
ide2: BM-DMA at 0xdc00-0xdc07, BIOS settings: hde: DMA, hdf: DMA
ide3: BM-DMA at 0xdc08-0xdc0f, BIOS settings: hdg: DMA, hdh: pio

## So it detects all of the drives above##

Partition check:
hda: hda1 hda2 hda3 hda4
hde: [PTBL] [3649/255/63] hde1: D
hdf: unknown partition table
hdg: unknown partition table

## But then here it has an issue with the partition table... because it is JBOD ##

yes, I do realize there is no redundancy with the JBOD mode, but I need space right now... not redundancy, and I don't have the money to drop on a new drive... simple as that. I can see however if a drive is going to fail... and not to mention, I got a pretty good ear for it :-)

I am going to try single user mode shortly, I'll let you know how it works out.

Quote:

Originally Posted by ERRDivideByZero

By loop, I mean that the drive just errors as it is running and the only way that I can see them...

By them I mean the errors...

Single user mode did not change anything. I got the following error,
This happened after I tried to delete a small file on the hard disk (rm):

Code:

Jan 17 13:24:15 RITSUKO kernel: Filesystem panic (dev 21:01).

Jan 17 13:24:15 RITSUKO kernel:  fat_free: deleting beyond EOF

Jan 17 13:24:15 RITSUKO kernel:  File system has been set read-only

and if anyone could tell me WTF this is, that would be great:

Code:

Jan 17 13:25:20 RITSUKO insmod: /lib/modules/2.4.31/kernel/drivers/hotplug/pciehp.o.gz: insmod pciehp failed 

Jan 17 13:25:20 RITSUKO kernel: shpchp: shpc_init : shpc_cap_offset == 0 

Jan 17 13:25:20 RITSUKO insmod: /lib/modules/2.4.31/kernel/drivers/hotplug/shpchp.o.gz: init_module: No such device 

Jan 17 13:25:20 RITSUKO insmod: /lib/modules/2.4.31/kernel/drivers/hotplug/shpchp.o.gz: 

Hint: insmod errors can be caused by incorrect module parameters, including invalid IO or IRQ parameters.      

You may find more information in syslog or the output from dmesg

well... thats probably going to be it for me for now
Thanks again
Matt-

Quote:

Originally Posted by ERRDivideByZero

Single user mode did not change anything. I got the following error,
This happened after I tried to delete a small file on the hard disk (rm):

Code:

Jan 17 13:24:15 RITSUKO kernel: Filesystem panic (dev 21:01).

Jan 17 13:24:15 RITSUKO kernel:  fat_free: deleting beyond EOF

Jan 17 13:24:15 RITSUKO kernel:  File system has been set read-only

This means that you really need to scan that filesystem (windows scandisk or some other suitable utility). The filesystem is corrupt (likely some of your files will disappear/be corrupt etc.) or Linux is not seeing all of the disk correctly.

The other errors you posted are only relevant if you are using PCI-Express cards in your machine (most probably not).

You could also try commands like:

smartctl -a /dev/hde

which should return lots of info on the health of that drive if the controller supports it.

Personally, I keep an eye on Smart values for all my disks, /var/log/syslog and /var/log/messages. So far, in five years my drives have put up one single uncorrectable error (a single corrupt byte on the disk) but I know from my work in schools that disks can start dying without you noticing if you don't check the info regularly (one disk had reallocated every single corrupt sector it could so the next corrupt sector would have meant permanent data loss).

The file system is fine... as stated, it works fine in windows, I can read and write without any issues, and SMARTCTL only looks at the first drive on the chain anyway, so I would not be able to see if the other drives were going if that happened... at least until my system did a POST.

at any rate... as far as the other errors go..... I DO NOT have PCI-X cards or even slots on my MB. I don't know why that shit comes up at startup, but it is not something I compiled.... fresh install, but whatever.

I was thinking about this earlier, maybe I should re-compile the HPT372 Drivers.
lemme know what you think.
Matt-

Quote:

Originally Posted by ERRDivideByZero

SMARTCTL only looks at the first drive on the chain anyway, so I would not be able to see if the other drives were going if that happened

True, but looking at the raw devices I can see them all.... and they all Pass the SMART test.

I am going to see if I can get this system to work with Fedora Core 3... that is the last time it worked properly with multiple drives in FAT32 JBOD mode. I'll let you know (maybe the 2.6 kernel will play nice with me after I get the updates.

yeah, well it does not detect my drives properly in the setup, which I think I am going to go to sleep first, then figure this out.

after I get up I will probably install and update, then make sure the HPT drivers are properly installed and go from there.

After all the updates, installing the highpoint software (which the daemon only stays running for about 2 seconds) I still cannot get this to work right.... I may just go back to slackware... I don't know yet, but I still have fedora working at least half right.

... time for a new, larger hard drive... come on tax returns!
-Matt