a story of RAID, SuSe, and ReiserFS
hardware: supermicro 6013P-8, adaptec2015s raid card, 3 seagate 34GB SCSCI HDDs
a server I was using for a very popular forum was hacked, my fault for not updating from SuSe 9.2 and the multiple popular php sources that enable it. so I needed to do a flat reinstall regardless since this was a more than just defacing the forum. the server had been very slow over the last couple weeks but mainly from being compromised I believe. I was using the hardware RAID controller and the drives were setup as a RAID5 formatted as ReiserFS.
Now when i took the box down at the NOC it had been serving, although compromised. When trying to get it to boot, it would only come up in safemode but then isuing any commands just gave segmentation faults. (a failing HDD may have taken the box down versus the blackhat)
so there I sat with the box and my fresh spanking 10.1 cds from novell. i was setup under the hardware RAID5 using ReiserFS just like with 9.2. i got all the way through the install and pressed "finished" and then the server locked up. the last line displayed mentioned copying swap. the HDD cylinder light was blinking along with the individual drive lightsd in a half amber state. so I tried the whole process over. I couldn't get through a full install and the server locked at different points in the process.
also in order to get through the partitioning page, I'd have to go into the adaptec utility on startup and remove the raid5 array, then re-create it.
my sysadmin friend tells me to try downloading the isos from ftp.suse since she had expereinced some cd drives not reading the new disks due to the coating. so I tried that and finally did get the installation to complete. but it would lock up on me at different times trying to admin it. And the box didn't even have a load on it and it would lock, just sitting there for instance.
so my sysadmin friend (she lives 3000 miles away) tells me she's not keen on the raid controller card and has seen more problems with it than its worth. she also suggested it was a hardware issue, like a drive that was a problem.
so i removed the 2015S card then started up the install again. on the partitioning page, it said it didn't understand the drive partitioning, but did show the middle drive available to use. Since this was raid5 all 3 should have been giving the same problem. So here i started feeling I had a bad drive. At this point I'm still using ReiserFS formatting.
so I go back in, install the 2015s raid controller, delete the array, power down then remove now try going forward with a software raid approach, still ReiserFS. Why ReiserFS, default for the install so you'd think it would be best to use. Now I'm going to try to use one drive for / and the other 2 drives in a software raid-1 for /srv (the content I need to have redundancy of).
I get the same result again, but now its the RAID1 on these 2 drives that fails when the box locks up. I couldn't tell which drive this time but I'm gonna go with the first hint above and I remove the middle drive, do the install on only the first drive just to get the box up. WOW! Just like the fairy tell should go. Easy install, the box doesn't lock.
Now, the mysql backups I had on the box were mysql4.0 but using the mysqlhotbackup method, so I have to take the third drive and install 9.2 just so I can come up and then do a proper mysqldump to import into mysql 5.0 under SuSe10.1. The install goes perfect again, and all the mysqldumping etc. So the 3rd drive is fine it seems. INTERESTINGLY ENOUGH, to get the files off the initial RAID5 I actually booted the machine off the LIVE 9.3 CD and then mounted the array to /mnt and I copied what I needed to a USB drive for safe keeping. The box didn't lock during any of this thankfully.
my sysadmin friend now tells me to forget ReiserFS and even redo the first drive under EXT3 or she won't help me when that ReiserFS drive crashes. she's telling me about all the problems she's having getting data off other ResierFS drives that fail.
so when I get that middle drive RMA'd (advanced replacement) through seagate, I'm gonna do a RAID1 using EXT3 format on those 2 drives for /srv as planned. note: I will also have mysql data written to /srv so this raid config will see a lot of work.
Now, what I'd like to know from your intuition dealing with these variables, is it the 2015s raid controller, the ReiserFS format, or SuSe 10.1 at fault? Could it have been the ReiserFS format that made the one drive fall out of the array? It did the same thing whether I did hardware or software raid. is it better to just use EXT3 in a RAID array regardless? is it better to use the 2015s card than SuSe software raid, assuming EXT3?
so what exactly are those GREEN/RED lights for on those HDDs? when will that red light come on? Is this drive really faulty, why no red light? How exactly are you supposed to diagnose a problem in RAID5 if this doesn't turn red?
how do you monitor drive status in a RAID1, to determine the one that fails?
Last edited by Boss Hoss; 07-09-2006 at 11:39 PM.
|