-   Linux - Hardware (
-   -   Drive issues or LVM issues? (

Barefootpanda 10-23-2008 01:54 PM

Drive issues or LVM issues?
Hey guys,
I'm not sure where to start here. All the SMART data is fine and reports normal all the time, but about once a week my machine will go haywire.

My setup
2x 250Gb in VolGroup00 (no problems)
1x 1Tb in VolGroup01 (I can't remember if there are problems, but I want to say it's fine)
1x 1Tb in VolGroup02 (Will get I/O issues about once a week, but doesn't usually have to for the fsck)
2x 1Tb in VolGroup03 (Once a week this one seems to need the fsck forced on reboot)

VolGroup00 << On Board SATA
VolGroup01 & 02 << PCIe SATA Card
VolGroup03 & 04 << PCIe SATA Card

The part I find odd is that I can browse my VolGroup03 just fine and 02 is the one that has an I/O issue. When I reboot the system usually fixes the issues and gets back to business, but today the auto fsck didn't finish correctly and forced me to run fsck from a shell. This has been running all day and it's not showing any signs of stopping soon. 2Tb of data takes a while.

I haven't had any odd noises or warnings. Are there any utilities I should be running to check status or verify stuff? Is there a way to check individual drives w/o destroying the LVM?

This is a backup server and I have some clients that are running larger raids and need more than 1Tb of storage so for now I'm putting them on the LVM group instead of doing a hardware raid.

farslayer 10-23-2008 02:27 PM

Typically drive manufacturers have non-destructive diagnostics tools to test a drive.

search for seatools for a bootable DOS ISO that can test Seagate and Maxtor drives.

Then I would look into hardware issues with your controller, just to be ion the safe side. I have a couple 8 port SATA RAID controllers here that had a manufacturing hardware defect that cause drives to drop offline for no reason once two drive drop offline at the same time the array was destroyed. Took two years for the issue to show up on my system and I lost everything (thank goodness I had backups) The manufacturer had recalled the defective boards about 6 months after I bought mine, and replaced them for people that called in. The manufacturer didn't notify anyone directly (So tell me why I should bother registering my products again ?). so I never saw the recall notice. Grrrr. I still have the two new Replacement controllers sitting here in my office. I had a spare controller on the shelf in case of failure, but the spare was the same hardware version as the original. After loosing a 2 TB array I've been a bit leery of using them again, even though the issue has been addressed. The point of this little tale is that there could possibly be a hardware defect in the controller you have, and unless you do some searching to find out, no-one will ever notify you.. Why should I care, I didn't need that SQL Server anyway.. it only contained the Database for our ERP system..

On the up side, it finally got management to pry open their wallet, and allow me to BUY decent servers instead of forcing me to piece together whitebox franken-servers out of off the shelf parts..

Best of luck tracking down your phantom issue.

All times are GMT -5. The time now is 05:54 PM.