Very slow software RAID5/LVM array - which drive is dying?
Linux - SoftwareThis forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Very slow software RAID5/LVM array - which drive is dying?
Hi,
I have an Ubuntu 10.04 system which is used as a file-server, primarily for storing video. The setup combines two RAID5 arrays joined in an LVM (details below).
It's served me very well in the past, with decent enough performance for my use (around 150-200MB/s sequential reads for example). All I really ask of it is to be able to stream HD video, which shouldn't be too onerous.
As I say, the setup used to work absolutely fine but it now grinds to a halt (i.e. <1MB/s reads, 100ms+ seek times, ...) sometimes - even with no load on the system. I strongly suspect one of the hard drives is on its way out, but can't tell which one. I've looked at the drives in system monitor and they all look healthy - SMART reports them all as having either no bad sectors or 1/2 bad sectors. I've run transfer tests on each individual drive and they're perfectly fast. The problem is that the issue is intermittent - when I run a test over a particular drive it's fine more often than not.
Thanks for the reply. I'm hoping there's a better way though; swapping out a drive and rebuilding the array will take a very long time (especially with the array running slow). Rebuilds take several hours to complete and I'd need to do that 6 times.
I might try moving the drives between controllers, and swapping cables out thought - that would be quicker...
I've run a long smart test on each of the 7 drives (that's the 6 drives in the array, plus the OS boot drive) - run as "sudo smartctl --test=long <device>". All tests have passed - I've attached the full SMART data for all of them below - that's the output of "sudo smartctl --al <device>".
Any ideas? I'm wondering whether something else might be causing the array to be slowing down, but can't think what that might be. In terms of the drives, what's a sensible upper limit for the temperature they should run at? Some of them are just over 50 celsius, I don't know whether that's reasonable.
These two did overheat. However, all the other attributes are normal, and all the smart long tests passed.
This means that no drives are failing, but I would add some more fans.
Did you update recently or change anything on this system ? Maybe it was a bad update, or something changed to cause this ...
Maybe check the logs for anything suspicious, /var/log/ messages syslog.
Also check the cables.
Thanks for the help - I'll look at adding more fans to the case.
I haven't changed anything recently (other than installing standard security updates for Ubuntu). It's possible that the case has gathered some dust over time - I'll clean the filters for the case fans and see if I can blow some dust out of the case. What's a safe temperature for drives to operate at?
As an when I get new drives I'll try to get some 5400rpm ones rather than 7200rpm as they should run a little cooler.
Model Family: Seagate Barracuda 7200.10 family
...
190 Airflow_Temperature_Cel 0x0022 063 059 045 Old_age Always - 37 (Min/Max 27/38)
So that's 37C, yours are running at 52 and 53C ... quite a bit more. I suspect poor airflow. Certainly if there is dust, clean it out, maybe add more fans if necessary.
Having given it a good clean and rerouted some cables for better airflow, the drive temperature is now a little lower (46C for those two drives). However that doesn't seem to have helped - I'm still seeing poor performance with occasional long waits for access.
One thing which has now occured to me though is file fragmentation. Many of the files here have been downloaded by Bittorrent, and when upgrading from Ubuntu 8.04 to 10.04 recently, I changed Bittorrent client from Vuze to Transmission (the default Ubuntu client). One change is that Vuze allocates the entire file on disk prior to starting the download, whereas Transmission allocates it incrementally while downloading.
I suspect that it's resulting in some very fragmented files, which is making access very slow. Picking a recently-downloaded 50MB file at random, filefrag reports it has 1399 extents which seems very poor to me (the array isn't very full: 600GB free out of 2.9TB). Picking an old 350MB file shows a more reasonable 39 extents.
A quick Google shows this bug report/discussion relating to precisely this issue: https://trac.transmissionbt.com/ticket/849. I'll set the option in Transmission to preallocate files, and see if that helps. I suspect the RAID/LVM setup I have exacerbates the problem (having two bits of the same drive in the same logical volume). This is an EXT3 filesystem - I wonder whether EXT4 would have helped at all...
Anyway, thanks for the help - hopefully I have what I need now.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.