LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices


Reply
  Search this Thread
Old 02-26-2017, 06:08 AM   #1
Manager
LQ Newbie
 
Registered: Dec 2011
Location: Bangkok
Distribution: CentOS
Posts: 8

Rep: Reputation: Disabled
RAID seek_error_rate on both disks suddenly growing very fast by smartctl


My mail server started slowing down a lot today, sometimes timing out, so I SSH-ed in and checked. What I found is:

The raid processes were taking up about 10-15% of CPU time (my server isn't very busy, just mail for a dozen users and some low traffic websites, typically 98-99% idle, but that had dipped into to the 80's and dwelled there when this started).

From smartctl:

seek_error_rate has started growing amazingly fast on both disks:

sda increases by about 3000 per MINUTE on average
sdb increases by about 2000 per MINUTE on average

I watched this over 30 minutes, and it was fairly steady, totaling ~60,000 and ~90,000 growth by the end of the 30 minute viewing period. This is approximately 33 to 50 seek errors per SECOND per drive on average. The total number for the seek_error_rate was 133,457,113 and 559,913,401 for the two drives.

Yet, no changes in Raw Read Error Rate during this time. It is zero for sdb, no change on sda but it is sticking at 60,121,870 which is remarkable ... albeit, again, did not grow over the 30 minutes in which I was observing the constant seek errors.

No changes in Reallocated Sector Count.
No spin retries.
Temperature around 40C (105F) for both, so not getting hot.

I'm hesitant to try rebooting without better understanding what might be the problem, and perhaps doing more diagnostics while it's assuredly still alive (plus doing a fire drill to double check backups while it's still accessible).

It's running CentOS 5.11, and since it's about to reach End of Life, I plan to install a new OS starting in about a month from now, so this needs a proper analysis whereby whatever needs to be fixed should be fixed in advance of the upgrade.

After all these years, it seems very odd that this would start to happen with both drives at the same time. It's at a big server rack facility / data center.

Anybody have experience with something like this, or any insights or ideas?

Last edited by Manager; 02-26-2017 at 06:31 AM.
 
Old 02-26-2017, 09:41 PM   #2
sundialsvcs
LQ Guru
 
Registered: Feb 2004
Location: SE Tennessee, USA
Distribution: Gentoo, LFS
Posts: 10,659
Blog Entries: 4

Rep: Reputation: 3941Reputation: 3941Reputation: 3941Reputation: 3941Reputation: 3941Reputation: 3941Reputation: 3941Reputation: 3941Reputation: 3941Reputation: 3941Reputation: 3941
Maybe the RAID controller is failing. Or maybe, it's just really bad luck.

Power-supply problems? Disk drives use direct-current motors which can't tolerate the slightest dip in line voltage.
 
Old 02-27-2017, 12:16 PM   #3
Manager
LQ Newbie
 
Registered: Dec 2011
Location: Bangkok
Distribution: CentOS
Posts: 8

Original Poster
Rep: Reputation: Disabled
solved

It looks like it actually may not be a worry. It's a Seagate Barracuda 7200.10 and Seagate reports things as follows, according to the author below:

"... the author explains that all the values are actually 48 bits, and due to the way they are encoded it follows that those values are large. More specifically, raw value of the Seek error rate attribute should be converted to hexadecimal and then upper 16 bits are number of errors, while lower 32 bits are total number of seeks.

"In this concrete case the raw value for Seek error rate is 17262017054, or 0x000404E57A1E. The first 16 bits is 0x0004 and the last 32 bits are 0x04E57A1E. What this means is that there were 4 seek errors (meaning the head wasn't positioned correctly after being moved to some track) but there were 82147870 seeks in total. So, this is very very small fraction of errors."

http://sgros.blogspot.com/2013/01/se...rt-values.html

In my original post here on LinuxQuestions.org, since my values do not go beyond 8 digits in hex, I guess there have been zero seek errors, and it is mainly just counting the number of seeks, not the number of errors, which is why it has been growing at such a fast rate.

(One of my drives is much older than the other, due to the fairly recently replacement of a drive due to failure, which would explain the difference in total numbers, as comparing the two numbers for the two drives, they are nearly proportional to number of hours in service.)
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
LXer: OpenDaylight: Open-source SDN is growing fast LXer Syndicated Linux News 0 02-10-2014 05:50 PM
[SOLVED] RAID 5 with 4 hard disks... array started with only 3 out of 4 disks kikinovak Slackware 9 08-11-2012 06:33 AM
Disks suddenly very slow wimafrank Linux - Hardware 5 05-22-2011 10:18 PM
LXer: Netbooks growing twice as fast as notebooks LXer Syndicated Linux News 0 09-01-2009 01:40 AM
LXer: Rails extensions are growing fast thanks to many LXer Syndicated Linux News 0 03-16-2007 09:01 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Server

All times are GMT -5. The time now is 04:41 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration