LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware
User Name
Password
Linux - Hardware This forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux?

Notices


Reply
  Search this Thread
Old 05-19-2011, 05:51 PM   #1
wimafrank
LQ Newbie
 
Registered: Aug 2007
Posts: 4

Rep: Reputation: 0
Disks suddenly very slow


I have a Debian Squeeze 64bit server for file storage. Suddenly all SATA 3GB/s disks become slow. Here are the outputs of hdparm:

#hdparm -tT /dev/sd?

/dev/sda:
Timing cached reads: 58 MB in 2.03 seconds = 28.52 MB/sec
Timing buffered disk reads: 12 MB in 3.18 seconds = 3.78 MB/sec

/dev/sdb:
Timing cached reads: 66 MB in 2.00 seconds = 32.94 MB/sec
Timing buffered disk reads: 20 MB in 3.10 seconds = 6.45 MB/sec

/dev/sdc:
Timing cached reads: 68 MB in 2.01 seconds = 33.89 MB/sec
Timing buffered disk reads: 22 MB in 3.25 seconds = 6.76 MB/sec

/dev/sdd:
Timing cached reads: 68 MB in 2.02 seconds = 33.61 MB/sec
Timing buffered disk reads: 20 MB in 3.10 seconds = 6.45 MB/sec

Disks on another server has around 6GB/sec for cashed reads, and 100MB/sec for buffered reads. So something must have happened. Please help finding the cause!

Thanks a lot in advance,

Frank
 
Old 05-19-2011, 06:21 PM   #2
wimafrank
LQ Newbie
 
Registered: Aug 2007
Posts: 4

Original Poster
Rep: Reputation: 0
Some additional info

1) Disk temperatures seem OK:

# hddtemp /dev/sd[a-d]
/dev/sda: ST32000542AS: 37°C
/dev/sdb: ST32000542AS: 37°C
/dev/sdc: ST32000542AS: 37°C
/dev/sdd: ST32000542AS: 37°C

2) CPU temperatures seem to be a little high, but another server of mine frequently reached 90°C and still runs well:

# sensors
w83627ehf-isa-fff8
Adapter: ISA adapter
VCore: +2.04 V (min = +2.04 V, max = +2.04 V) ALARM
in1: +13.46 V (min = +13.46 V, max = +13.46 V) ALARM
AVCC: +4.08 V (min = +4.08 V, max = +4.08 V) ALARM
3VCC: +4.08 V (min = +4.08 V, max = +4.08 V) ALARM
in4: +2.04 V (min = +2.04 V, max = +2.04 V) ALARM
in5: +2.04 V (min = +2.04 V, max = +2.04 V) ALARM
in6: +6.53 V (min = +6.53 V, max = +6.53 V) ALARM
VSB: +4.08 V (min = +4.08 V, max = +4.08 V) ALARM
VBAT: +4.08 V (min = +4.08 V, max = +4.08 V) ALARM
in9: +2.04 V (min = +2.04 V, max = +2.04 V) ALARM
Case Fan: 0 RPM (min = 0 RPM, div = 128) ALARM
CPU Fan: 0 RPM (min = 0 RPM, div = 128) ALARM
Aux Fan: 0 RPM (min = 0 RPM, div = 128) ALARM
Sys Temp: -1.0°C (high = -1.0°C, hyst = -1.0°C) ALARM sensor = diode
CPU Temp: +0.0°C (high = +0.0°C, hyst = +0.0°C) ALARM sensor = diode
AUX Temp: +0.0°C (high = +0.0°C, hyst = +0.0°C) ALARM sensor = diode
cpu0_vid: +0.000 V

coretemp-isa-0000
Adapter: ISA adapter
Core 0: +59.0°C (high = +80.0°C, crit = +100.0°C)

coretemp-isa-0001
Adapter: ISA adapter
Core 1: +56.0°C (high = +80.0°C, crit = +100.0°C)

coretemp-isa-0002
Adapter: ISA adapter
Core 2: +56.0°C (high = +80.0°C, crit = +100.0°C)

coretemp-isa-0003
Adapter: ISA adapter
Core 3: +52.0°C (high = +80.0°C, crit = +100.0°C)

coretemp-isa-0004
Adapter: ISA adapter
Core 4: +59.0°C (high = +80.0°C, crit = +100.0°C)

coretemp-isa-0005
Adapter: ISA adapter
Core 5: +56.0°C (high = +80.0°C, crit = +100.0°C)

coretemp-isa-0006
Adapter: ISA adapter
Core 6: +55.0°C (high = +80.0°C, crit = +100.0°C)

coretemp-isa-0007
Adapter: ISA adapter
Core 7: +52.0°C (high = +80.0°C, crit = +100.0°C)

3) I experienced a sudden reboot and another sudden failure of the server yesterday. I was backing up files from one disk to another one using tar. dmesg seems OK to me. Maybe I didn't look at the right messages.

I cannot reboot the machine at this moment since it is being used. I tried to umount a disk, but got a device busy message. When I run

umount -l /dev/sda1

then it was umounted. But when I run

fsck /dev/sda1

fsck.ext2: Device or resource busy while trying to open /dev/sda1
Filesystem mounted or opened exclusively by another program?

Any idea?

Thanks a lot!

Frank

Last edited by wimafrank; 05-19-2011 at 06:40 PM.
 
Old 05-19-2011, 06:54 PM   #3
onebuck
Moderator
 
Registered: Jan 2005
Location: Central Florida 20 minutes from Disney World
Distribution: Slackware®
Posts: 13,925
Blog Entries: 44

Rep: Reputation: 3159Reputation: 3159Reputation: 3159Reputation: 3159Reputation: 3159Reputation: 3159Reputation: 3159Reputation: 3159Reputation: 3159Reputation: 3159Reputation: 3159
Hi,

I would first run the manufactures diagnostics for each drive. You could use 'smartctl' but I prefer disk diak diagnostics that will exercise the hdd.

Quote:
excerpt from 'man smartctl';

smartctl - Control and Monitor Utility for SMART Disks

SYNOPSIS
smartctl [options] device

FULL PATH
/usr/sbin/smartctl

PACKAGE VERSION
smartmontools-5.39.1 released 2010-01-28 at 20:48:28

DESCRIPTION
smartctl controls the Self-Monitoring, Analysis and Reporting Technology (SMART) system built into many ATA-3 and later ATA, IDE
and SCSI-3 hard drives. The purpose of SMART is to monitor the reliability of the hard drive and predict drive failures, and to
carry out different types of drive self-tests. This version of smartctl is compatible with ATA/ATAPI-7 and earlier standards (see
REFERENCES below)

smartctl is a command line utility designed to perform SMART tasks such as printing the SMART self-test and error logs, enabling
and disabling SMART automatic testing, and initiating device self-tests. Note: if the user issues a SMART command that is (appar-
ently) not implemented by the device, smartctl will print a warning message but issue the command anyway (see the -T, --tolerance
option below). This should not cause problems: on most devices, unimplemented SMART commands issued to a drive are ignored and/or
return an error.

smartctl also provides support for polling TapeAlert messages from SCSI tape drives and changers.
Tools, Recovery, Diagnostic, Emergency section has several LiveCD that can be used to hopefully diagnose your system.

UBCD Ultimate Boot CD allows users to run floppy-based diagnostic tools from most CDROM drives on Intel-compatible machines, no operating system required. The CD includes many diagnostic utilities.

 
Old 05-19-2011, 07:14 PM   #4
wimafrank
LQ Newbie
 
Registered: Aug 2007
Posts: 4

Original Poster
Rep: Reputation: 0
smartctl output (so far)

Hi Gary,

Thank you so much for your help and advice. I ran smartctl and so far got the following outputs. The long test will take about 255 minutes and is still running. So far it seems OK.

Since this problem happened to all disks, I doubt the problem is somewhere else. But smartctl may give some hints. Do you see any interesting things from the outputs so far?

I'll try other tools you suggested when I have a chance of shutting down the server. I may mount the disks one by one to see how the speed would look like. But for now, I cannot shut it down ...

Thanks again,

Frank

#smartctl -a /dev/sdg
smartctl version 5.38 [x86_64-unknown-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model: ST32000542AS
Serial Number: 6XW254RX
Firmware Version: CC34
User Capacity: 2,000,398,934,016 bytes
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: 8
ATA Standard is: ATA-8-ACS revision 4
Local Time is: Thu May 19 19:04:02 2011 CDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 249) Self-test routine in progress...
90% of test remaining.
Total time to complete Offline
data collection: ( 643) seconds.
Offline data collection
capabilities: (0x73) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 255) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x103f) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 111 099 006 Pre-fail Always - 38089347
3 Spin_Up_Time 0x0003 100 100 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 36
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 065 060 030 Pre-fail Always - 3275714
9 Power_On_Hours 0x0032 099 099 000 Old_age Always - 1692
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 42
183 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0
184 Unknown_Attribute 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
188 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 065 059 045 Old_age Always - 35 (Lifetime Min/Max 32/40)
194 Temperature_Celsius 0x0022 035 041 000 Old_age Always - 35 (0 5 0 0)
195 Hardware_ECC_Recovered 0x001a 037 028 000 Old_age Always - 38089347
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 22252225561406
241 Unknown_Attribute 0x0000 100 253 000 Old_age Offline - 940217992
242 Unknown_Attribute 0x0000 100 253 000 Old_age Offline - 75451099

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Self-test routine in progress 90% 1692 -
# 2 Short offline Aborted by host 20% 1692 -
# 3 Extended offline Interrupted (host reset) 00% 0 -
# 4 Extended offline Aborted by host 90% 0 -
# 5 Extended offline Aborted by host 90% 0 -

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
 
Old 05-19-2011, 08:18 PM   #5
onebuck
Moderator
 
Registered: Jan 2005
Location: Central Florida 20 minutes from Disney World
Distribution: Slackware®
Posts: 13,925
Blog Entries: 44

Rep: Reputation: 3159Reputation: 3159Reputation: 3159Reputation: 3159Reputation: 3159Reputation: 3159Reputation: 3159Reputation: 3159Reputation: 3159Reputation: 3159Reputation: 3159
Hi,

I really think that you get more from individual diagnostic for each drive using the manufactures diagnostic set.

'smartctl' is fine for general/overall testing if something is really staring you in the face. Sometimes the diagnostic set will do a better job of identifying problems. You may need to isolate each drive for testing. Do not rule out cabling.
 
Old 05-22-2011, 10:18 PM   #6
wimafrank
LQ Newbie
 
Registered: Aug 2007
Posts: 4

Original Poster
Rep: Reputation: 0
Resolved (partially)

I finally rebooted my machine and everything is back to normal. All disks have normal speed. I then ran fsck on the disks and fixed a few errors. So far the machine runs well.

So it looks to me that a few disk errors could dramatically slow down disk speed, not only on the disks with errors, but also all other disks. Not sure why and how this happened. Probably it is because the CPU is constantly checking the bad sectors so everything slows down.

Anyway, this is not an ideal solution, but may give other people some hints if they face the same issue.

Thanks again for Gary's help!

Frank
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
The system suddenly became very slow. Lockywolf Linux - Desktop 10 07-04-2009 12:19 AM
Fluxbox suddenly very slow to start. Tomermory Linux - General 4 02-04-2008 11:26 PM
Mandrake 9.1 Suddenly VERY Slow. papasan Linux - Software 6 10-27-2003 02:36 PM
RedHat 8 Suddenly Slow jmckeand Linux - Distributions 6 03-01-2003 02:57 AM
Suddenly running slow Branman Linux - Newbie 1 07-18-2002 09:07 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware

All times are GMT -5. The time now is 02:16 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration