LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > General
User Name
Password
General This forum is for non-technical general discussion which can include both Linux and non-Linux topics. Have fun!

Notices


Reply
  Search this Thread
Old 03-01-2018, 05:51 AM   #1
jsbjsb001
Senior Member
 
Registered: Mar 2009
Location: Earth? I would say I hope so but I'm not so sure about that... I could just be a figment of your imagination too.
Distribution: CentOS at the time of this writing, but some others over the years too...
Posts: 1,767

Rep: Reputation: 848Reputation: 848Reputation: 848Reputation: 848Reputation: 848Reputation: 848Reputation: 848
When was the last time you had hardware failure?


I've had a hard drive for about 7 or 8 years now and it has failed on me. I was expecting it to fail within the week some time, not just because of it's age, but also because I used it for digital TV recording. As it was a 2TB drive that I brought for that reason - to record off the TV card and then USB TV tuner.

The below output is from just last night:

Code:
[root@localhost ~]# smartctl -a /dev/sdc
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.12.2-1.el7.elrepo.x86_64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.14 (AF)
Device Model:     ST2000DM001-9YN164
Serial Number:    S1E06LZF
LU WWN Device Id: 5 000c50 04af29a74
Firmware Version: CC4H
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Thu Mar  1 03:45:41 2018 ACDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever 
                                        been run.
Total time to complete Offline 
data collection:                (  584) seconds.
Offline data collection
capabilities:                    (0x73) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine 
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 226) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x3085) SCT Status supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   091   089   006    Pre-fail  Always       -       148733137
  3 Spin_Up_Time            0x0003   094   094   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   098   098   020    Old_age   Always       -       2325
  5 Reallocated_Sector_Ct   0x0033   067   051   036    Pre-fail  Always       -       43784
  7 Seek_Error_Rate         0x000f   057   056   030    Pre-fail  Always       -       124575004860
  9 Power_On_Hours          0x0032   089   089   000    Old_age   Always       -       10097
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   098   098   020    Old_age   Always       -       2312
183 Runtime_Bad_Block       0x0032   089   089   000    Old_age   Always       -       11
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   001   001   000    Old_age   Always       -       201
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       1 1 1
189 High_Fly_Writes         0x003a   099   099   000    Old_age   Always       -       1
190 Airflow_Temperature_Cel 0x0022   056   051   045    Old_age   Always       -       44 (Min/Max 25/44)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       63
193 Load_Cycle_Count        0x0032   085   085   000    Old_age   Always       -       30689
194 Temperature_Celsius     0x0022   044   049   000    Old_age   Always       -       44 (0 5 0 0 0)
197 Current_Pending_Sector  0x0012   100   001   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   001   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       7618h+25m+07.646s
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       181960776074463
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       103270281311952

SMART Error Log Version: 1
ATA Error Count: 1
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 1 occurred at disk power-on lifetime: 10078 hours (419 days + 22 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 ff ff ff 4f 00      00:00:40.382  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00      00:00:40.348  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00      00:00:40.348  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00      00:00:40.348  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00      00:00:40.348  READ FPDMA QUEUED

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%      7243         -
# 2  Extended offline    Completed without error       00%      7164         -
# 3  Extended offline    Completed without error       00%      7093         -
# 4  Extended offline    Completed without error       00%      7016         -
# 5  Extended offline    Completed without error       00%      6936         -
# 6  Extended offline    Completed without error       00%      6886         -
# 7  Extended offline    Completed without error       00%      6812         -
# 8  Extended offline    Completed without error       00%      6748         -
# 9  Extended offline    Completed without error       00%      6686         -
#10  Extended offline    Completed without error       00%      6593         -
#11  Extended offline    Completed without error       00%      6488         -
#12  Extended offline    Completed without error       00%      6391         -
#13  Extended offline    Completed without error       00%      6299         -
#14  Extended offline    Completed without error       00%      6210         -
#15  Extended offline    Completed without error       00%      6128         -
#16  Extended offline    Completed without error       00%      6052         -
#17  Extended offline    Completed without error       00%      5972         -
#18  Extended offline    Completed without error       00%      5881         -
#19  Extended offline    Completed without error       00%      5785         -
#20  Extended offline    Completed without error       00%      5695         -
#21  Extended offline    Completed without error       00%      5603         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
The bad/reallocated sectors you see started a little earlier than the last week.

The output below is what I got tonight - and what inspired this thread.

Code:
[root@localhost ~]# smartctl -a /dev/sdc
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.12.2-1.el7.elrepo.x86_64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

^C^C^C^C^C^C^C^C^C^C
[root@localhost ~]# smartctl -a /dev/sdc
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.12.2-1.el7.elrepo.x86_64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               ��5�Ǒ�6
Product:              *,�2T
                           =B�
Revision:             �x��
User Capacity:        6,275,328,826,604,553,078 bytes [6275 PB]
Logical block size:   2385397821 bytes
scsiModePageOffset: raw_curr too small, offset=80 resp_len=94 bd_len=76
scsiModePageOffset: raw_curr too small, offset=80 resp_len=94 bd_len=76
>> Terminate command early due to bad response to IEC mode page
A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.
[root@localhost ~]# smartctl -a /dev/sdc
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.12.2-1.el7.elrepo.x86_64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

Short INQUIRY response, skip product id
A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.
[root@localhost ~]#
smartctl froze on me the first time I ran it - as you saw.

Funny error message in the middle there...

And yes, I have a BACKUP!!

Well, that's true for at least 90% what was on there - I just lost my porn though...

But, I knew if the drive failed I would lose it, when I backed up the rest of it tho...
 
Old 03-01-2018, 06:16 AM   #2
Trihexagonal
Member
 
Registered: Jul 2017
Location: Land of 1000 Nights
Distribution: FreeBSD, OpenBSD and Solaris
Posts: 165

Rep: Reputation: 113Reputation: 113
I had a 1TB HDD I had intended to use as a backup drive fail within the last 6-8 months. I keep Flash Drives as backup for my docs, images, etc, and populate all my laptops off the same drives, so I didn't lose anything important.

The very worst was when I had my favorite Thinkpad T61 docked and compiling ports. It was going to be busy a while so I pulled the USB mouse from the dock, it froze and went to heaven before my eyes. It looked and ran like it just came out of the box, too. Now all it's good for is parts, but that's the silver lining to the cloud.

I don't dock them anymore.
 
Old 03-01-2018, 08:54 AM   #3
rokytnji
LQ Veteran
 
Registered: Mar 2008
Location: Waaaaay out West Texas
Distribution: AntiX 17
Posts: 5,561
Blog Entries: 20

Rep: Reputation: 2636Reputation: 2636Reputation: 2636Reputation: 2636Reputation: 2636Reputation: 2636Reputation: 2636Reputation: 2636Reputation: 2636Reputation: 2636Reputation: 2636
Last one was when I did a cpu upgrade and tried to put the pentium m p4 cpu in upside down. In a IBM laptop. It was a freebie from city hall and I should have left it well enough alone.
I also dropped a 1 TB external drive from the table to the floor. Sounds like Breaking glass.

I washed a 2 gig usb drive. I air dried it. Gparted it. It holds files still. I would not trust it boot a iso though.

If you notice the trend here. My hardware failures are due to my greasy grubby fingers.

Tried to brick my chromebook last night. I was not successful. Sometimes I am the windshield. Sometimes the bug.
 
Old 03-01-2018, 10:56 AM   #4
Myk267
Member
 
Registered: Apr 2012
Location: California
Posts: 386
Blog Entries: 15

Rep: Reputation: Disabled
There's a lot of moisture here, as I live not far from the West coast, so electronics don't always last as long as they should. Hermetically sealed things like HDDs are fine, but "open air" things like PSUs and motherboards seem to give up much sooner.

I also have a developing theory that placing computer cases near outer walls might be a contributing factor.

Knock on wood.
 
Old 03-01-2018, 03:37 PM   #5
enorbet
Senior Member
 
Registered: Jun 2003
Location: Virginia
Distribution: Slackware = Main OpSys for decades while testing others to keep up
Posts: 1,923

Rep: Reputation: 1819Reputation: 1819Reputation: 1819Reputation: 1819Reputation: 1819Reputation: 1819Reputation: 1819Reputation: 1819Reputation: 1819Reputation: 1819Reputation: 1819
The most recent hardware failure I've experienced was almost 10 years ago and that was on one (of 2) IBM DTLA IDE 7200rpm drives that were billed as the fastest IDE drives available at the time but resulted in a class action suit. Shortly after that IBM sold their hdd business to Hitachi iirc. The remaining unit still runs fine though obviously no longer in daily use due to it's interface is now on a rarely used secondary box.

I attribute my low failure rates to being obsessive about thermals. I prefer that all my boxes run substantially less than 40C. The only exception is laptops which I do modify to be cooler but still tend to run closer to 45-50C.

Last edited by enorbet; 03-02-2018 at 08:37 AM.
 
Old 03-01-2018, 04:15 PM   #6
Trihexagonal
Member
 
Registered: Jul 2017
Location: Land of 1000 Nights
Distribution: FreeBSD, OpenBSD and Solaris
Posts: 165

Rep: Reputation: 113Reputation: 113
Quote:
Originally Posted by enorbet View Post
The most recent hardware failure I've experienced was almost 10 years ago and that was on one (of 2) IBM DTLA Ide drives that were billed as the fastest IDE drives available at the time but resulted in a class action suit. Shortly after that IBM sold their hdd business to Hitachi iirc. The remaining unit still runs fine though obviously no longer in daily use due to it's interface is now on a rarely used secondary box.
I still have the IBM 80GB HDD that came with my Gateway Windows98 tower and used it in my pfSense box till I retired it a couple years ago.
 
Old 03-02-2018, 06:48 AM   #7
//////
Member
 
Registered: Nov 2005
Location: Land of Linux :: Finland
Distribution: win 10 | OpenBSD 6.3 bridge | Fedora 28 | Fedora 27 Server
Posts: 341

Rep: Reputation: 126Reputation: 126
i have been hit by hardware failure twice in my lifetime.

fist one were tv-tuner card, dont remember maker of it and second one were my adsl modem couple years ago.
 
Old 03-02-2018, 07:35 AM   #8
Michael Uplawski
Member
 
Registered: Dec 2015
Location: Normandy, France
Distribution: Debian buster/sid
Posts: 691
Blog Entries: 22

Rep: Reputation: 422Reputation: 422Reputation: 422Reputation: 422Reputation: 422
My external hard drive can no longer be booted since a week ago.

As the original notebook, where I have extracted it from, has died a while ago, I am now considering the purchase of a new machine. This HP Pavillon dv6 that I am using now, has survived them all, and against all odds. Windows and two shutdowns a day due to overheating, now the freezing cold would not kill it, nor any of its internal components... We face all kinds of damage elsewhere, but all I had to replace, was the power cable.

Makes me think...
 
Old 03-03-2018, 02:50 AM   #9
jsbjsb001
Senior Member
 
Registered: Mar 2009
Location: Earth? I would say I hope so but I'm not so sure about that... I could just be a figment of your imagination too.
Distribution: CentOS at the time of this writing, but some others over the years too...
Posts: 1,767

Original Poster
Rep: Reputation: 848Reputation: 848Reputation: 848Reputation: 848Reputation: 848Reputation: 848Reputation: 848
Done some digging the other day and found the following output in my kernel log. I figured just in case some else in the future has a drive fail them, the following might be useful to them and help them diagnose their issue.

Code:
[ 1148.876293] ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 1148.876303] ata4.00: failed command: IDENTIFY DEVICE
[ 1148.876311] ata4.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 21 pio 512 in
         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1148.876316] ata4.00: status: { DRDY }
[ 1148.876322] ata4: hard resetting link
[ 1154.227111] ata4: link is slow to respond, please be patient (ready=0)
[ 1158.912941] ata4: COMRESET failed (errno=-16)
[ 1158.912955] ata4: hard resetting link
[ 1164.272777] ata4: link is slow to respond, please be patient (ready=0)
[ 1168.952626] ata4: COMRESET failed (errno=-16)
[ 1168.952640] ata4: hard resetting link
[ 1174.308529] ata4: link is slow to respond, please be patient (ready=0)
[ 1203.959707] ata4: COMRESET failed (errno=-16)
[ 1203.959728] ata4: limiting SATA link speed to 1.5 Gbps
[ 1203.959734] ata4: hard resetting link
[ 1209.001579] ata4: COMRESET failed (errno=-16)
[ 1209.001598] ata4: reset failed, giving up
[ 1209.001602] ata4.00: disabled
[ 1209.001630] ata4: EH complete
The following is what I'm getting now, but oddly enough the device node is still there for the drive, but the partition is gone. (it only had the 1 partition on it, that took up 100% of the drive) And smartctl thinks it's a USB device now...

Code:
[root@localhost ~]# smartctl -a /dev/sdc
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.12.2-1.el7.elrepo.x86_64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

/dev/sdc: Unknown USB bridge [0x058f:0x6366 (0x100)]
Please specify device type with the -d option.

Use smartctl -h to get a usage summary
 
  


Reply

Tags
fail, failed, failure, hardware failure


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Hardware diagnosis and common symptoms of hardware failure TobiSGD Linux - Hardware 0 05-11-2013 09:23 PM
System time vs Hardware time and Daylight Savings Time Toadman Linux - General 6 03-17-2007 08:12 AM
System time vs Hardware time and Daylight Savings Time Toadman Linux - Networking 6 03-16-2007 07:14 PM
Where does RH8 daily set system time to hardware clock time smartnorman Red Hat 1 05-24-2006 02:42 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > General

All times are GMT -5. The time now is 08:45 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration