LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Software (http://www.linuxquestions.org/questions/linux-software-2/)
-   -   Very slow software RAID5/LVM array - which drive is dying? (http://www.linuxquestions.org/questions/linux-software-2/very-slow-software-raid5-lvm-array-which-drive-is-dying-880384/)

The Belgain 05-12-2011 03:00 PM

Very slow software RAID5/LVM array - which drive is dying?
 
Hi,

I have an Ubuntu 10.04 system which is used as a file-server, primarily for storing video. The setup combines two RAID5 arrays joined in an LVM (details below).

It's served me very well in the past, with decent enough performance for my use (around 150-200MB/s sequential reads for example). All I really ask of it is to be able to stream HD video, which shouldn't be too onerous.

As I say, the setup used to work absolutely fine but it now grinds to a halt (i.e. <1MB/s reads, 100ms+ seek times, ...) sometimes - even with no load on the system. I strongly suspect one of the hard drives is on its way out, but can't tell which one. I've looked at the drives in system monitor and they all look healthy - SMART reports them all as having either no bad sectors or 1/2 bad sectors. I've run transfer tests on each individual drive and they're perfectly fast. The problem is that the issue is intermittent - when I run a test over a particular drive it's fine more often than not.

Any suggestions for how to pin down this problem?

Physical drives (all SATA):
-- 2x 320GB drives (partitions: 320GB)
-- 3x 750GB drives (partitions: 320GB, 430GB)
-- 1x 1.5TB drive (patitions: 320GB, 430GB)

RAID5 arrays:
-- 1x 6 drive RAID5 array, comprising the 320GB partitions.
-- 1x 4 drive RAID5 array, comprising the 430GB partitions.

LVM:
-- One VG comprising the two RAID5 arrays.

(It's a slightly odd setup, the aim is to be able to grow the array by adding larger drives in future.)

jefro 05-12-2011 04:24 PM

Swap out with new drives and rebuild it then see if it goes away may be the way.

I'd look at all smart data but it may end up being controller or cables or other issues.

The Belgain 05-13-2011 01:39 AM

Thanks for the reply. I'm hoping there's a better way though; swapping out a drive and rebuilding the array will take a very long time (especially with the array running slow). Rebuilds take several hours to complete and I'd need to do that 6 times.

I might try moving the drives between controllers, and swapping cables out thought - that would be quicker...

H_TeXMeX_H 05-13-2011 01:44 AM

Run a smart long test on each, and then post the attributes.

The Belgain 05-14-2011 06:26 PM

1 Attachment(s)
I've run a long smart test on each of the 7 drives (that's the 6 drives in the array, plus the OS boot drive) - run as "sudo smartctl --test=long <device>". All tests have passed - I've attached the full SMART data for all of them below - that's the output of "sudo smartctl --al <device>".

Any ideas? I'm wondering whether something else might be causing the array to be slowing down, but can't think what that might be. In terms of the drives, what's a sensible upper limit for the temperature they should run at? Some of them are just over 50 celsius, I don't know whether that's reasonable.

H_TeXMeX_H 05-15-2011 03:28 AM

Here are interesting bits:

Code:

Model Family:    Seagate Barracuda 7200.10 family
#1
190 Airflow_Temperature_Cel 0x0022  048  032  045    Old_age  Always  In_the_past 52 (255 255 59 28)
#2
190 Airflow_Temperature_Cel 0x0022  047  031  045    Old_age  Always  In_the_past 53 (255 255 61 28)

These two did overheat. However, all the other attributes are normal, and all the smart long tests passed.

This means that no drives are failing, but I would add some more fans.

Did you update recently or change anything on this system ? Maybe it was a bad update, or something changed to cause this ...
Maybe check the logs for anything suspicious, /var/log/ messages syslog.
Also check the cables.

The Belgain 05-15-2011 11:14 AM

Thanks for the help - I'll look at adding more fans to the case.

I haven't changed anything recently (other than installing standard security updates for Ubuntu). It's possible that the case has gathered some dust over time - I'll clean the filters for the case fans and see if I can blow some dust out of the case. What's a safe temperature for drives to operate at?

As an when I get new drives I'll try to get some 5400rpm ones rather than 7200rpm as they should run a little cooler.

H_TeXMeX_H 05-15-2011 12:05 PM

I have the same drive and it runs at:

Code:

Model Family:    Seagate Barracuda 7200.10 family
...
190 Airflow_Temperature_Cel 0x0022  063  059  045    Old_age  Always      -      37 (Min/Max 27/38)

So that's 37C, yours are running at 52 and 53C ... quite a bit more. I suspect poor airflow. Certainly if there is dust, clean it out, maybe add more fans if necessary.

EDIT:
According to the manual:
http://www.seagate.com/docs/pdf/data...da_7200_10.pdf
The max operating temp is 60C, and yours have gone over in the past.

The Belgain 05-15-2011 02:51 PM

Having given it a good clean and rerouted some cables for better airflow, the drive temperature is now a little lower (46C for those two drives). However that doesn't seem to have helped - I'm still seeing poor performance with occasional long waits for access.

One thing which has now occured to me though is file fragmentation. Many of the files here have been downloaded by Bittorrent, and when upgrading from Ubuntu 8.04 to 10.04 recently, I changed Bittorrent client from Vuze to Transmission (the default Ubuntu client). One change is that Vuze allocates the entire file on disk prior to starting the download, whereas Transmission allocates it incrementally while downloading.

I suspect that it's resulting in some very fragmented files, which is making access very slow. Picking a recently-downloaded 50MB file at random, filefrag reports it has 1399 extents which seems very poor to me (the array isn't very full: 600GB free out of 2.9TB). Picking an old 350MB file shows a more reasonable 39 extents.

A quick Google shows this bug report/discussion relating to precisely this issue: https://trac.transmissionbt.com/ticket/849. I'll set the option in Transmission to preallocate files, and see if that helps. I suspect the RAID/LVM setup I have exacerbates the problem (having two bits of the same drive in the same logical volume). This is an EXT3 filesystem - I wonder whether EXT4 would have helped at all...

Anyway, thanks for the help - hopefully I have what I need now.

H_TeXMeX_H 05-16-2011 04:17 AM

I didn't know about filefrag, and was looking for such a program. I probably didn't find it because it can only be run as root.

Certainly 1399 extents in very fragmented. Try copying the file and using that instead (you can use cp or dd to copy it).


All times are GMT -5. The time now is 07:48 PM.