Slackware 10.2 HPT372 3 Drive RAID Storage failing
I have a problem... I have been a linux user for almost 10 years, and I have run across things of this nature... but now it is happening to me:
I have a good little system, it's kind of a dual boot., but not logically. I have a Windows Drive on a tray and a Linux Drive on a tray, I shut down and swap. I then have internal storage which is now in the form of a 3 drive RAID in JBOD mode (Just one big Disk). I have used this controller (which is built onto the MB) for just single drives, and now I have expanded and need the extra storage (though it is only total of 80GB, I don't have the $$ to buy a single 80GB drive) to swap between windows and linux. As stated, earlier, when there was only 1 drive, all was well and good, but now that there is 3, there is an issue. The drives are logically one, and it is formatted in FAT32 (so I can read and write from either OS... I had a bad exp. with captiveNTFS, because my system is not windows XP SP1, its SP2). I can mount my "RAID" (/dev/hde1) for a short period of time, from the command line, and list the contents, then unmount it. if I do any more than that... the drives get into a loop and I have to re-start the machine. I am using KDE, and I know kded can do that some times, but I have tried shutting it down, to no avail. within my fstab, I have tried a few different configurations, and they all have done the same thing, but I left them just to show. my fstab is as follows: /dev/hda3 swap swap defaults 0 0 /dev/hda2 / ext3 defaults 1 1 /dev/hda1 /boot ext3 defaults 1 2 /dev/hda4 /home reiserfs defaults 1 2 #################### RAID Devices #################### #/dev/hde1 /mnt/raid-1 vfat noauto,users,exec,rw 1 0 #/dev/hde1 /mnt/raid-1 vfat defaults 1 0 /dev/hde1 /mnt/raid-1 vfat noauto,users,exec 0 0 ##################################################### #################### Removable Storage ############### /dev/cdrom /mnt/cdrom auto noauto,owner,ro 0 0 /dev/fd0 /mnt/floppy auto noauto,owner 0 0 ###################################################### devpts /dev/pts devpts gid=5,mode=620 0 0 proc /proc proc defaults 0 0 This is just after a clean install (minus the home folder), so I have not added my DVD-ROM drive into the picture yet, but other than that, that should be it. Thanks Matt- |
Quote:
What do the logs say? Why not boot into single-user console mode and try it out and see what happens? Have you tried a scandisk (or equivalent) of the disks to check their integrity? My bet is that your logs are screaming with all sorts of drive errors. I'd boot single-user mode and see what happens, while watching the system logs. Additionally, you do understand that a JBOD is not really a raid at all? There's certainly no data integrity so if there is a problem on any one of those disks, chances are you've lost most of your data. Additionally, a motherboard controller is not ideal as I bet it probably doesn't even allow you to access SMART information for the hard drives that are on it, which would mean that you wouldn't have had any warning at all of them being about to fail. |
By loop, I mean that the drive just errors as it is running and the only way that I can see them is if I shut the machine down (init 6 or 0), I don't know which log to look at, but I am sure you are right.... they are probably about to explode.... though I looked @ dmseg and it has some interesting things in it:
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx HPT372: IDE controller at PCI slot 00:0f.0 HPT372: chipset revision 5 HPT372: not 100% native mode: will probe irqs later HPT37X: using 33MHz PCI clock ide2: BM-DMA at 0xdc00-0xdc07, BIOS settings: hde: DMA, hdf: DMA ide3: BM-DMA at 0xdc08-0xdc0f, BIOS settings: hdg: DMA, hdh: pio ## So it detects all of the drives above## Partition check: hda: hda1 hda2 hda3 hda4 hde: [PTBL] [3649/255/63] hde1: D hdf: unknown partition table hdg: unknown partition table ## But then here it has an issue with the partition table... because it is JBOD ## yes, I do realize there is no redundancy with the JBOD mode, but I need space right now... not redundancy, and I don't have the money to drop on a new drive... simple as that. I can see however if a drive is going to fail... and not to mention, I got a pretty good ear for it :-) I am going to try single user mode shortly, I'll let you know how it works out. |
Quote:
|
Single user mode did not change anything. I got the following error,
This happened after I tried to delete a small file on the hard disk (rm): Code:
Jan 17 13:24:15 RITSUKO kernel: Filesystem panic (dev 21:01). Code:
Jan 17 13:25:20 RITSUKO insmod: /lib/modules/2.4.31/kernel/drivers/hotplug/pciehp.o.gz: insmod pciehp failed Thanks again Matt- |
Quote:
The other errors you posted are only relevant if you are using PCI-Express cards in your machine (most probably not). You could also try commands like: smartctl -a /dev/hde which should return lots of info on the health of that drive if the controller supports it. Personally, I keep an eye on Smart values for all my disks, /var/log/syslog and /var/log/messages. So far, in five years my drives have put up one single uncorrectable error (a single corrupt byte on the disk) but I know from my work in schools that disks can start dying without you noticing if you don't check the info regularly (one disk had reallocated every single corrupt sector it could so the next corrupt sector would have meant permanent data loss). |
The file system is fine... as stated, it works fine in windows, I can read and write without any issues, and SMARTCTL only looks at the first drive on the chain anyway, so I would not be able to see if the other drives were going if that happened... at least until my system did a POST.
at any rate... as far as the other errors go..... I DO NOT have PCI-X cards or even slots on my MB. I don't know why that shit comes up at startup, but it is not something I compiled.... fresh install, but whatever. I was thinking about this earlier, maybe I should re-compile the HPT372 Drivers. lemme know what you think. Matt- |
Quote:
|
I am going to see if I can get this system to work with Fedora Core 3... that is the last time it worked properly with multiple drives in FAT32 JBOD mode. I'll let you know (maybe the 2.6 kernel will play nice with me after I get the updates.
|
yeah, well it does not detect my drives properly in the setup, which I think I am going to go to sleep first, then figure this out.
after I get up I will probably install and update, then make sure the HPT drivers are properly installed and go from there. |
After all the updates, installing the highpoint software (which the daemon only stays running for about 2 seconds) I still cannot get this to work right.... I may just go back to slackware... I don't know yet, but I still have fedora working at least half right.
... time for a new, larger hard drive... come on tax returns! -Matt |
All times are GMT -5. The time now is 01:05 PM. |