LinuxQuestions.org
Visit Jeremy's Blog.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware
User Name
Password
Linux - Hardware This forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux?

Notices


Reply
  Search this Thread
Old 09-04-2006, 12:34 PM   #1
patniemeyer
LQ Newbie
 
Registered: Sep 2006
Posts: 15

Rep: Reputation: 0
RAID 5 Disk Loss every 6-8 months normal?


I have been running a redhat 8 box with a 3ware 7506-4 card and 4x 250GB WD drives as a RAID 5 for almost three years now. I find that I lose a disk on average every 6-8 months. If it were once a year maybe I would think this is normal... but this is really making me lose confidence in my RAID.

I have tried several things:

- I mounted a giant fan on the side of the box... the drives run at almost *room temperature*.
- I upgraded the power supply to make sure it was adequate (and it's on a UPS).

The motherboard is an ASUS A7V with the KT-133 chipset.

Are RAIDs just this unstable?

I run validation checks on the disks about 3 hours a week. Aside from that they are pretty much idle.

Should I configure my BIOS to let the drives sleep when idle? I wasn't sure if the 3ware card would like that or not and I'm afraid to try it.

Very frustrated, any suggestions appreciated.


thanks,
Pat
 
Old 09-04-2006, 02:14 PM   #2
Electro
LQ Guru
 
Registered: Jan 2002
Posts: 6,042

Rep: Reputation: Disabled
It is not usual that hard drives fail. Today's Hard drives are mechanical. It is better to put mechanical hard drives in either full on or full off which means do not put them in any power saving modes or else they will fail sooner than you think. If do not want the hard drive to fail, turn it off.

RAID-5, RAID-1, RAID-6, RAID-10, RAID-15, RAID-16 only minimizes the chances of losing data. Does not prevent it.

What do you mean by upgrade the power supply. Did you just upgrade the wattage or got a better power supply. I recommend power supplies from Zalman, Power & Cooling, Enermax, Seasonic. I have a Seasonic S12-430 (430 watts). It is noiseless, has active power factor controll, universal voltage, and it regulates voltage with in 1%. I do not recommend power supplies from Antec because they are crap for short and long periods of time.

There are several kinds of UPS. I prefer people get in-line UPS instead what the majority of UPS which are stand-by. Stand-by UPS are ok, but not ok for servers. In-line UPS always gets its power from the battery instead switching to the battery like stand-by UPS.
 
Old 09-04-2006, 03:24 PM   #3
lazlow
Senior Member
 
Registered: Jan 2006
Posts: 4,363

Rep: Reputation: 172Reputation: 172
Pat

I assume you are checking your drive temps with hddtemp?

You can check your power supply voltages with lm_sensors. It is kind of a bear to get dialed in (getting true temps and voltages, 35c vs 37c etc) but it will tell you (without dialing in) if your voltages are jumping around.

Quality power supplies can make a huge difference on the number of unexplained problems that occur. Opinions vary on what power supplies are the best. Check a review site that you trust and read the reviews. Some people will tell you that review sites are biased. They are, but if a site is biased too much it will not be around for very long.

I seem to remember that WD makes multiple versions of a lot of drives. Some are specifically designed for raid/server use, while others are meant for desktops. Maybe you would have better luck with the other type of drive.

Good Luck

Lazlow
 
Old 09-05-2006, 01:05 AM   #4
patniemeyer
LQ Newbie
 
Registered: Sep 2006
Posts: 15

Original Poster
Rep: Reputation: 0
Thanks for the comments.

After mounting the giant fan (I literally cut an 8" hole in the side of the case and mounted a Radio Shack 12v fan there) the drives run at literally about room temperature. I clean the case every few months to get rid of dust, etc.

By upgrading the power supply I mean I just went to CompUSA and bought one of a higher wattage... according to the 3Ware estimate of power requirments under startup / load etc.

The UPS is an APC 650 something.

I guess I'll leave the drive sleep option off, as it always has been.

I am thinking of building a new raid at some point to upgrade to 500GB SATA drives and the latest/greatest 3ware card. Does anyone have a recommendation on an ideal setup for this? I'll have to check for supported motherboards, etc.

What I would like is a good case and power supply to support 4 disks with plenty of cooling, etc.


thanks,
Pat
 
Old 09-05-2006, 01:38 AM   #5
lazlow
Senior Member
 
Registered: Jan 2006
Posts: 4,363

Rep: Reputation: 172Reputation: 172
Pat

What destro are you running? If it is FC lm_sensors and hddtemp are both available by yum. These two apps are available for most destros. Before you invest a lot more money I would suggest you try to figure out where you are "short" with this setup. If you are having a motherboard cooling problem lm_sensors wil tell you very quickly. Same for voltage variation. Hddtemp will tell you a lot about your drives. If your drives are running hard for short bursts, you may no be able to detect how hot they are actually getting by touch. If you find a problem this way, you will be able to fix it quickly or know where you need to allocate more resources on the next machine.


You have to be careful with compusa (or any large dealer). A lot of times stuff will get returned and they will just re-shrinkwrap them and sell them to someone else. Their power supplies (house brand) are usually not very good (my best guess at your drive problem). While I am not a huge seasonic fan they do make a very good power supply( s12 series). If modular plugs is a requirement they have a new model coming out (m12 series) on Sept 20 (I think) in the US.

When you add fans to a case you need to balance the in vs out air flow. Adding a 120mm fan sucking in or blowing out will only help a little. Try to always add fans in pairs. One sucking in at the bottom end of the case on one side (say the front) and one blowing out the top end of the case on the other side(say the back). This keeps a flow of air going through as much of the case as possible, drag out heat with it. Remember heat rises. I always use 120mm fans when I physically can. The noise to air flow ratios are much better that way. I have picked up a lot of cases for $10 that people had heat problems with. Add two 120mm fans as described and absolutely no more heat problem. Yes, I still pick up PIIIs, rework them for $75, and resell them for $300. While I am only selling a couple per month now, it is still fun(and the excess cash buys more toys).

Lazlow
 
Old 09-05-2006, 03:10 AM   #6
J.W.
LQ Veteran
 
Registered: Mar 2003
Location: Boise, ID
Distribution: Mint
Posts: 6,642

Rep: Reputation: 87
Hard drive failures after only 6-8 months are definitley not normal. If you've got decent cooling, a UPS, surge protector, and a suitable PSU, then the other question I'd ask is whether or not you are using a power conditioner. Dips in the power level can be just as harmful as spikes
 
Old 09-05-2006, 09:28 AM   #7
patniemeyer
LQ Newbie
 
Registered: Sep 2006
Posts: 15

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by lazlow
What destro are you running? If it is FC lm_sensors and hddtemp are both available by yum. These two apps are available for most destros. Before you invest a lot more money I would suggest you try to figure out where you are "short" with this setup. If you are having a motherboard cooling problem lm_sensors wil tell you very quickly. Same for voltage variation. Hddtemp will tell you a lot about your drives. If your drives are running hard for short bursts, you may no be able to detect how hot they are actually getting by touch. If you find a problem this way, you will be able to fix it quickly or know where you need to allocate more resources on the next machine.
Thanks again for the help. I am running a vanilla redhat 8.0 install on an ASUS motherboard. I will try to find and install these apps (I don't see them as is).

A few comments:

Although the cooling is not terribly professsional, I did insure that air directs out of the box at several points and the fan is so massive that I really don't think temp is the problem.

I take it from the comments that power supply, despite being new, adequate wattage and on a UPS may *still* be a factor... I do get minor hits occasionally where the UPS comes on... Would you recommend a better power supply (again) or a better UPS first?


thanks
Pat
 
Old 09-05-2006, 10:22 AM   #8
patniemeyer
LQ Newbie
 
Registered: Sep 2006
Posts: 15

Original Poster
Rep: Reputation: 0
I tried installing hddtemp, but I don't think it will work with my raid card. It wants a drive device as an argument.

My 3ware software reports that the status on each drive is "ok". I assume that's the SMART status. But it doesn't report temps.


Pat
 
Old 09-05-2006, 12:50 PM   #9
lazlow
Senior Member
 
Registered: Jan 2006
Posts: 4,363

Rep: Reputation: 172Reputation: 172
Pat

I am not sure about raid5 and hddtemp, it does work with raid0.

archive.download.redhat.com/pub/redhat/linux/8.0/en/os/i386/RedHat/RPMS/lm_sensors-2.6.3-2.i386.rpm

RH8 is getting a little long of the tooth. If you can get lm_sensors working I would put that as #1. If you cannot, I would buy a well reviewed power supply next. Yes, brand new, high wattage, low quality power supplies can be very "bad".

That massive fan is not so close to anything that you are getting interference with stuff?

Lazlow
 
Old 09-05-2006, 10:48 PM   #10
patniemeyer
LQ Newbie
 
Registered: Sep 2006
Posts: 15

Original Poster
Rep: Reputation: 0
Thanks. I tried installing lm_sensors but it reports "no sensors". I'm sure it's due to an outdated i2c package or something like that. I'm having deja vu about trying that.

I will try harder to get it working. What exactly am I looking for with the output? Voltage variations?

I realize that redhat is on the fringe now. I am debating about going through an upgrade or just hanging on a while longer and building a whole new raid box. I just have no confidence in this hardware any more.


thanks again for the advice,
Pat
 
Old 09-05-2006, 11:36 PM   #11
lazlow
Senior Member
 
Registered: Jan 2006
Posts: 4,363

Rep: Reputation: 172Reputation: 172
Pat
Try sensors-detect. That is the setup routine for lm_sensors.

Usually you check (print out) the voltages while the machine is at idle and then put the entire machine under stress (like encoding videos). After the machine has been under stress for a few minutes, recheck the voltages. Compare the two sets of readings. Any voltage changes of significant size are a bad sign. Occasionally the voltages will be out of line all the time, but usually it is the shift under load that causes problems.

The accuracy (absolute) of lm_sensors can usually be compared in the bios. Warm the computer up to normal operating temperature, reboot the machine and check the bios temperature/voltage readings. Compare those numbers to lm_sensors numbers, if they differ significantly there are instructions on how to calibrate lm_sensors in the man section.

Good Luck
Lazlow
 
Old 09-06-2006, 12:23 AM   #12
patniemeyer
LQ Newbie
 
Registered: Sep 2006
Posts: 15

Original Poster
Rep: Reputation: 0
I don't seem to have sensors-detect.
I just installed the RPM. Perhaps RH 8.0 is too old.


thanks,
Pat
 
Old 09-06-2006, 12:40 AM   #13
lazlow
Senior Member
 
Registered: Jan 2006
Posts: 4,363

Rep: Reputation: 172Reputation: 172
Pat

You have to run it as root (su - ). If you have lm_sensors you have sensors-detect(it is part of the package).

lazlow
 
Old 09-06-2006, 10:32 AM   #14
patniemeyer
LQ Newbie
 
Registered: Sep 2006
Posts: 15

Original Poster
Rep: Reputation: 0
Thanks. I missed it in /usr/sbin.

I had some errors on the detect, but just manually issuing the modprobe commands and trying it out I get:

Adapter: SMBus Via Pro adapter at e800
Algorithm: Non-I2C SMBus adapter
VCore 1: +1.80 V (min = +1.66 V, max = +2.03 V)
VCore 2: +0.11 V (min = +1.66 V, max = +2.03 V) ALARM
+3.3V: +3.34 V (min = +2.97 V, max = +3.63 V)
+5V: +4.91 V (min = +4.50 V, max = +5.48 V)
+12V: +12.39 V (min = +10.79 V, max = +13.11 V)
-12V: -12.32 V (min = -15.06 V, max = -12.32 V) ALARM
-5V: -5.03 V (min = -5.48 V, max = -4.50 V)
fan1: 7941 RPM (min = 3000 RPM, div = 2)
fan2: 4440 RPM (min = 3000 RPM, div = 2)
fan3: 0 RPM (min = 3000 RPM, div = 2) ALARM
#################################################################
#################################################################
#################################################################
vid: +1.85 V


I assume VCore 2 is ok because I don't have a dual core.
I tried putting it under some load but all that happened was that +5V dropped from 4.91 to 4.89.

I will try to keep an eye on it next time it does its verify/scan of the raid. That should put it under some stress.


Pat
 
Old 09-06-2006, 10:56 AM   #15
stress_junkie
Senior Member
 
Registered: Dec 2005
Location: Massachusetts, USA
Distribution: Ubuntu 10.04 and CentOS 5.5
Posts: 3,873

Rep: Reputation: 335Reputation: 335Reputation: 335Reputation: 335
One suggestion not mentioned so far, I would avoid Western Digital disk drives. I know that some people have had good experience with them. My experience with them is very very bad. I like Maxtor and Seagate. Seagate recently purchased Maxtor. Anyway, I have had Seagate drives in work that lasted for years with the power on and the drives spinning 24/7. I have never seen one fail. I have had the same experience with Maxtor drives at home. Yes, I keep some computers at home running 24/7 for months at a time.

Last edited by stress_junkie; 09-06-2006 at 10:58 AM.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
LEDs come on every 15 seconds. Is this normal: scsi raid? cruiserparts Linux - Hardware 0 06-30-2006 09:52 AM
RAID 1: adding a new SCSI disk to existing disk.... help mgy Linux - Enterprise 4 04-17-2006 04:56 AM
Power loss - disk health check - how to force? kalahari875 Mandriva 2 05-27-2004 08:12 AM
save to disk as normal user gh0ul Linux - Newbie 6 10-08-2003 09:34 AM
What'd the difference between RAID and normal IDE? Onox Linux - Hardware 4 07-07-2003 04:09 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware

All times are GMT -5. The time now is 02:20 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration