LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Slackware (https://www.linuxquestions.org/questions/slackware-14/)
-   -   help determining RAID fail event (https://www.linuxquestions.org/questions/slackware-14/help-determining-raid-fail-event-4175672595/)

dimm0k 04-05-2020 11:38 AM

help determining RAID fail event
 
I recently received an email stating a "FAIL event has been detected on md device /dev/md0" whereby it looks like it went back to resyncing everything from the good drive, but upon checking smartctl there does not seem to indicate anything issues with the bad drive so I'm wondering if this needs to be investigated further or if it was a false alarm. if it is, how can I get rid of the flag in mdadm that states the drive is faulty?

here's some relevant information when this event was triggered
Code:

A Fail event has been detected on md device /dev/md0.

The device /dev/sdc1 may be involved.

Contents of /proc/mdstat:
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath]
md0 : active raid1 sdb1[0] sdc1[1](F)
      2930162552 blocks super 1.2 [2/1] [U_]
      [=====>...............]  resync = 28.3% (830933760/2930162552) finish=7399.8min speed=4728K/sec

unused devices: <none>

Contents of mdadm --detail
/dev/md0:
        Version : 1.2
  Creation Time : Tue Aug  2 10:36:53 2011
    Raid Level : raid1
    Array Size : 2930162552 (2794.42 GiB 3000.49 GB)
  Used Dev Size : 2930162552 (2794.42 GiB 3000.49 GB)
  Raid Devices : 2
  Total Devices : 2
    Persistence : Superblock is persistent

    Update Time : Fri Apr  3 00:17:37 2020
          State : active, degraded, resyncing
 Active Devices : 1
Working Devices : 1
 Failed Devices : 1
  Spare Devices : 0

  Resync Status : 99% complete

          Name : defiant:0  (local to host defiant)
          UUID : a043a371:530d4c99:daed879a:904c0e11
        Events : 1382

    Number  Major  Minor  RaidDevice State
      0      8      17        0      active sync  /dev/sdb1
      1      8      33        1      faulty  /dev/sdc1

Contents of dmesg:
[ 6034.544287] ata2.01: configured for UDMA/133
[ 6036.687034] e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
[ 6038.086733] ata1.00: configured for UDMA/133
[ 6046.837423] PM: resume of devices complete after 13097.208 msecs
[ 6046.861968] Restarting tasks ... done.
[ 6046.905404] md: checkpointing resync of md0.
[ 6046.970037] RAID1 conf printout:
[ 6046.970045]  --- wd:1 rd:2
[ 6046.970053]  disk 0, wo:0, o:1, dev:sdb1
[ 6046.970062]  disk 1, wo:1, o:0, dev:sdc1

also, here's what smartctl has to say for the drive in question
Code:

smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.4.208] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:    Hitachi Deskstar 5K3000
Device Model:    Hitachi HDS5C3030ALA630
Serial Number:    MJ1321YNG17PEA
LU WWN Device Id: 5 000cca 228c0913e
Firmware Version: MEAOA580
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    5700 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:  ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 2.6, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Sun Apr  5 12:30:53 2020 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x84)        Offline data collection activity
                                        was suspended by an interrupting command from host.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (  0)        The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (38166) seconds.
Offline data collection
capabilities:                          (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003)        Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01)        Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:          (  1) minutes.
Extended self-test routine
recommended polling time:          ( 636) minutes.
SCT capabilities:                (0x003d)        SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate    0x000b  100  100  016    Pre-fail  Always      -      0
  2 Throughput_Performance  0x0005  134  134  054    Pre-fail  Offline      -      109
  3 Spin_Up_Time            0x0007  220  220  024    Pre-fail  Always      -      273 (Average 362)
  4 Start_Stop_Count        0x0012  100  100  000    Old_age  Always      -      86
  5 Reallocated_Sector_Ct  0x0033  100  100  005    Pre-fail  Always      -      0
  7 Seek_Error_Rate        0x000b  100  100  067    Pre-fail  Always      -      0
  8 Seek_Time_Performance  0x0005  132  132  020    Pre-fail  Offline      -      32
  9 Power_On_Hours          0x0012  092  092  000    Old_age  Always      -      58611
 10 Spin_Retry_Count        0x0013  100  100  060    Pre-fail  Always      -      0
 12 Power_Cycle_Count      0x0032  100  100  000    Old_age  Always      -      48
192 Power-Off_Retract_Count 0x0032  099  099  000    Old_age  Always      -      1817
193 Load_Cycle_Count        0x0012  099  099  000    Old_age  Always      -      1817
194 Temperature_Celsius    0x0002  193  193  000    Old_age  Always      -      31 (Min/Max 19/48)
196 Reallocated_Event_Count 0x0032  100  100  000    Old_age  Always      -      0
197 Current_Pending_Sector  0x0022  100  100  000    Old_age  Always      -      0
198 Offline_Uncorrectable  0x0008  100  100  000    Old_age  Offline      -      0
199 UDMA_CRC_Error_Count    0x000a  200  200  000    Old_age  Always      -      0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


Richard Cranium 04-05-2020 05:14 PM

From I read, you don't have smartctld configured to run self tests on your devices. You aren't the only one; mine was mis-configured not that long ago (<1year).

That's not good; device self-tests are how you are given early notice that something's about to break or just broke. Take a look at the comments in /etc/smartd.conf to see what's possible.

I've got (pretty much) the following in my /etc/smartd.conf:
Code:

DEVICESCAN -a -o on -S on -n standby,q -s (S/../.././02|L/../../6/03)
When I look at my devices, I'll see that tests have been run...

Code:

# smartctl -a /dev/sda
smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.4.217] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:    Seagate Constellation ES (SATA 6Gb/s)
Device Model:    ST1000NM0011
Serial Number:    Z1N4CMG8
LU WWN Device Id: 5 000c50 0640164d4
Firmware Version: SN03
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7202 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:  ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sun Apr  5 17:12:48 2020 CDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.

General SMART Values:
Offline data collection status:  (0x82)        Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (  0)        The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (  600) seconds.
Offline data collection
capabilities:                          (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003)        Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01)        Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:          (  1) minutes.
Extended self-test routine
recommended polling time:          ( 151) minutes.
Conveyance self-test routine
recommended polling time:          (  2) minutes.
SCT capabilities:                (0x10bd)        SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate    0x000f  083  063  044    Pre-fail  Always      -      203399523
  3 Spin_Up_Time            0x0003  095  095  000    Pre-fail  Always      -      0
  4 Start_Stop_Count        0x0032  100  100  020    Old_age  Always      -      296
  5 Reallocated_Sector_Ct  0x0033  099  099  036    Pre-fail  Always      -      43
  7 Seek_Error_Rate        0x000f  087  060  030    Pre-fail  Always      -      512552029
  9 Power_On_Hours          0x0032  037  037  000    Old_age  Always      -      55417
 10 Spin_Retry_Count        0x0013  100  100  097    Pre-fail  Always      -      0
 12 Power_Cycle_Count      0x0032  100  100  020    Old_age  Always      -      296
184 End-to-End_Error        0x0032  100  100  099    Old_age  Always      -      0
187 Reported_Uncorrect      0x0032  100  100  000    Old_age  Always      -      0
188 Command_Timeout        0x0032  100  099  000    Old_age  Always      -      3
189 High_Fly_Writes        0x003a  100  100  000    Old_age  Always      -      0
190 Airflow_Temperature_Cel 0x0022  059  045  045    Old_age  Always  In_the_past 41 (Min/Max 37/42)
191 G-Sense_Error_Rate      0x0032  100  100  000    Old_age  Always      -      0
192 Power-Off_Retract_Count 0x0032  100  100  000    Old_age  Always      -      122
193 Load_Cycle_Count        0x0032  097  097  000    Old_age  Always      -      6257
194 Temperature_Celsius    0x0022  041  055  000    Old_age  Always      -      41 (0 20 0 0 0)
195 Hardware_ECC_Recovered  0x001a  119  099  000    Old_age  Always      -      203399523
197 Current_Pending_Sector  0x0012  100  100  000    Old_age  Always      -      0
198 Offline_Uncorrectable  0x0010  100  100  000    Old_age  Offline      -      0
199 UDMA_CRC_Error_Count    0x003e  200  200  000    Old_age  Always      -      0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline      Completed without error      00%    55401        -
# 2  Extended offline    Completed without error      00%    55382        -
# 3  Short offline      Completed without error      00%    55377        -
# 4  Short offline      Completed without error      00%    55353        -
# 5  Short offline      Completed without error      00%    55329        -
# 6  Short offline      Completed without error      00%    55305        -
# 7  Short offline      Completed without error      00%    55281        -
# 8  Short offline      Completed without error      00%    55257        -
# 9  Short offline      Completed without error      00%    55233        -
#10  Extended offline    Completed without error      00%    55213        -
#11  Short offline      Completed without error      00%    55209        -
#12  Short offline      Completed without error      00%    55185        -
#13  Short offline      Completed without error      00%    55161        -
#14  Short offline      Completed without error      00%    55138        -
#15  Short offline      Completed without error      00%    55114        -
#16  Short offline      Completed without error      00%    55090        -
#17  Short offline      Completed without error      00%    55066        -
#18  Extended offline    Completed without error      00%    55046        -
#19  Short offline      Completed without error      00%    55042        -
#20  Short offline      Completed without error      00%    55018        -
#21  Short offline      Completed without error      00%    54994        -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Notice the "SMART Self-test log structure revision number 1" portion of the report!

EDIT: You can trigger such tests by hand as well; it's best to have the daemon do that work for you.

upnort 04-05-2020 06:14 PM

My experience at work with hardware RAID controllers is the failure tag won't disappear until the array is fully rebuilt.

That does not address what triggered the original alert. The date stamp on the email should provide help where to start looking in the system logs.

I agree with Richard Cranium to configure cron with automated smartd self tests. I do this with servers at work. I scheduled daily short tests and weekly long tests. Another weekly cron job grabs the smartctl output and sends an email.

I do likewise with some basic weekly RAID emails.

Disclaimer: I am not a RAID guru and don't play one on TV.

Richard Cranium 04-05-2020 06:22 PM

Quote:

Originally Posted by upnort (Post 6108232)
I agree with Richard Cranium to configure cron with automated smartd self tests. I do this with servers at work. I scheduled daily short tests and weekly long tests. Another weekly cron job grabs the smartctl output and sends an email.

Actually, smartd will do its own scheduling; it may or may not use cron internally (I honestly haven't bothered to look) and you most certainly can configure smartd to email you on its own. (I left that bit out of the DEVICESCAN string that I provided.)

bassmadrigal 04-05-2020 06:51 PM

Quote:

Originally Posted by Richard Cranium (Post 6108218)
I've got (pretty much) the following in my /etc/smartd.conf:
Code:

DEVICESCAN -a -o on -S on -n standby,q -s (S/../.././02|L/../../6/03)

I'll admit that my brain is a little fuzzed out right now and I'm struggling making sense of the conf file and your line. Would you mind breaking down what your DEVICESCAN line is doing? If not, I can dig through things a bit more once my mind is in a better place.

dimm0k 04-05-2020 08:55 PM

@Richard Cranium, thank you for mentioning smartd.conf as I did not have that configured properly at all to monitor my devices!

@upnort, good idea on matching the email's timestamp and the system logs. I've attached a snippet of it in case, but I've been experimenting with rtcwake to put the system to "freeze" state until 23:59.59 whereby the system would come out of sleep and after about 10 minutes begin to do an rsnapshot backup. I'm wondering if the sdc didn't wake up quick enough for the raid that it decided to resync. that said, according to mdadm --detail the raid has been rebuilt, however the drive is still in fault mode...

Code:

Apr  4 00:00:11 defiant kernel: [ 5853.713271] Freezing user space processes ... (elapsed 0.001 seconds) done.
Apr  4 00:00:11 defiant kernel: [ 5853.714589] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
Apr  4 00:00:11 defiant kernel: [ 5853.717291] sd 1:0:1:0: [sdc] Synchronizing SCSI cache
Apr  4 00:00:11 defiant kernel: [ 5853.717463] parport_pc 00:04: disabled
Apr  4 00:00:11 defiant kernel: [ 5853.717842] serial 00:03: disabled
Apr  4 00:00:11 defiant kernel: [ 5853.718304] serial 00:02: disabled
Apr  4 00:00:11 defiant kernel: [ 5853.718705] sd 1:0:0:0: [sdb] Synchronizing SCSI cache
Apr  4 00:00:11 defiant kernel: [ 5853.718853] sd 0:0:0:0: [sda] Synchronizing SCSI cache
Apr  4 00:00:11 defiant kernel: [ 5853.718995] sd 0:0:0:0: [sda] Stopping disk
Apr  4 00:00:11 defiant kernel: [ 5853.719313] e1000e: EEE TX LPI TIMER: 00000000
Apr  4 00:00:11 defiant kernel: [ 5853.719339] e1000e: EEE TX LPI TIMER: 00000000
Apr  4 00:00:11 defiant kernel: [ 5853.729423] sd 1:0:1:0: [sdc] Stopping disk
Apr  4 00:00:11 defiant kernel: [ 5853.729546] sd 1:0:0:0: [sdb] Stopping disk
Apr  4 00:00:11 defiant kernel: [ 5858.063093] sd 1:0:1:0: [sdc] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
Apr  4 00:00:11 defiant kernel: [ 5858.063100] sd 1:0:1:0: [sdc] tag#0 Sense Key : 0xb [current] [descriptor]
Apr  4 00:00:11 defiant kernel: [ 5858.063105] sd 1:0:1:0: [sdc] tag#0 ASC=0x47 ASCQ=0x0
Apr  4 00:00:11 defiant kernel: [ 5858.063112] sd 1:0:1:0: [sdc] tag#0 CDB: opcode=0x8a 8a 00 00 00 00 00 63 0e 33 80 00 00 05 80 00 00
Apr  4 00:00:11 defiant kernel: [ 5858.063223] md: md0: resync interrupted.
Apr  4 00:00:11 defiant kernel: [ 5858.074053] PM: suspend of devices complete after 4357.854 msecs
Apr  4 00:00:11 defiant kernel: [ 5858.085051] PM: late suspend of devices complete after 10.986 msecs
Apr  4 00:00:11 defiant kernel: [ 5858.086220] pcieport 0000:00:1c.4: System wakeup enabled by ACPI
Apr  4 00:00:11 defiant kernel: [ 5858.086307] pcieport 0000:00:1c.2: System wakeup enabled by ACPI
Apr  4 00:00:11 defiant kernel: [ 5858.086479] uhci_hcd 0000:00:1d.2: System wakeup enabled by ACPI
Apr  4 00:00:11 defiant kernel: [ 5858.086485] ehci-pci 0000:00:1d.7: System wakeup enabled by ACPI
Apr  4 00:00:11 defiant kernel: [ 5858.086605] uhci_hcd 0000:00:1d.1: System wakeup enabled by ACPI
Apr  4 00:00:11 defiant kernel: [ 5858.086635] uhci_hcd 0000:00:1d.0: System wakeup enabled by ACPI
Apr  4 00:00:11 defiant kernel: [ 5858.086787] uhci_hcd 0000:00:1a.2: System wakeup enabled by ACPI
Apr  4 00:00:11 defiant kernel: [ 5858.086794] ehci-pci 0000:00:1a.7: System wakeup enabled by ACPI
Apr  4 00:00:11 defiant kernel: [ 5858.086862] uhci_hcd 0000:00:1a.1: System wakeup enabled by ACPI
Apr  4 00:00:11 defiant kernel: [ 5858.086906] uhci_hcd 0000:00:1a.0: System wakeup enabled by ACPI
Apr  4 00:00:11 defiant kernel: [ 5858.097453] PM: noirq suspend of devices complete after 12.365 msecs
Apr  4 00:00:11 defiant kernel: [ 5920.977004] Task dump for CPU 1:
Apr  4 00:00:11 defiant kernel: [ 5920.977004] swapper/1      R  running task        0    0      1 0x00200000
Apr  4 00:00:11 defiant kernel: [ 5920.977004] Task dump for CPU 2:
Apr  4 00:00:11 defiant kernel: [ 5920.977004] swapper/2      R  running task        0    0      1 0x00200000
Apr  4 00:00:11 defiant kernel: [ 5980.979004] Task dump for CPU 1:
Apr  4 00:00:11 defiant kernel: [ 5980.979004] swapper/1      R  running task        0    0      1 0x00200000
Apr  4 00:00:11 defiant kernel: [ 5980.979004] Task dump for CPU 2:
Apr  4 00:00:11 defiant kernel: [ 5980.979004] swapper/2      R  running task        0    0      1 0x00200000
Apr  4 00:00:11 defiant kernel: [ 6033.716048] sd 1:0:0:0: [sdb] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x00
Apr  4 00:00:11 defiant kernel: [ 6033.716055] sd 1:0:0:0: [sdb] tag#0 CDB: opcode=0x88 88 00 00 00 00 00 63 0e 3e 00 00 00 04 00 00 00
Apr  4 00:00:11 defiant kernel: [ 6033.722081] uhci_hcd 0000:00:1a.0: System wakeup disabled by ACPI
Apr  4 00:00:11 defiant kernel: [ 6033.722162] uhci_hcd 0000:00:1a.1: System wakeup disabled by ACPI
Apr  4 00:00:11 defiant kernel: [ 6033.722233] uhci_hcd 0000:00:1a.2: System wakeup disabled by ACPI
Apr  4 00:00:11 defiant kernel: [ 6033.722575] uhci_hcd 0000:00:1d.0: System wakeup disabled by ACPI
Apr  4 00:00:11 defiant kernel: [ 6033.722664] uhci_hcd 0000:00:1d.1: System wakeup disabled by ACPI
Apr  4 00:00:11 defiant kernel: [ 6033.722761] uhci_hcd 0000:00:1d.2: System wakeup disabled by ACPI
Apr  4 00:00:11 defiant kernel: [ 6033.733456] pcieport 0000:00:1c.4: System wakeup disabled by ACPI
Apr  4 00:00:11 defiant kernel: [ 6033.733522] ehci-pci 0000:00:1d.7: System wakeup disabled by ACPI
Apr  4 00:00:11 defiant kernel: [ 6033.733625] ehci-pci 0000:00:1a.7: System wakeup disabled by ACPI
Apr  4 00:00:11 defiant kernel: [ 6033.733898] PM: noirq resume of devices complete after 12.017 msecs
Apr  4 00:00:11 defiant kernel: [ 6033.740204] PM: early resume of devices complete after 6.220 msecs
Apr  4 00:00:11 defiant kernel: [ 6033.740634] rtc_cmos 00:01: System wakeup disabled by ACPI
Apr  4 00:00:11 defiant kernel: [ 6033.740650] pcieport 0000:00:1c.2: System wakeup disabled by ACPI
Apr  4 00:00:11 defiant kernel: [ 6033.743172] serial 00:02: activated
Apr  4 00:00:11 defiant kernel: [ 6033.745655] serial 00:03: activated
Apr  4 00:00:11 defiant kernel: [ 6033.753640] parport_pc 00:04: activated
Apr  4 00:00:11 defiant kernel: [ 6033.817883] sd 0:0:0:0: [sda] Starting disk
Apr  4 00:00:11 defiant kernel: [ 6033.817885] sd 1:0:0:0: [sdb] Starting disk
Apr  4 00:00:11 defiant kernel: [ 6033.817918] sd 1:0:1:0: [sdc] Starting disk
Apr  4 00:00:11 defiant kernel: [ 6034.068735] ata3: SATA link down (SStatus 0 SControl 300)
Apr  4 00:00:11 defiant kernel: [ 6034.079464] ata4: SATA link down (SStatus 0 SControl 300)
Apr  4 00:00:11 defiant kernel: [ 6034.520073] ata2.00: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Apr  4 00:00:11 defiant kernel: [ 6034.520086] ata2.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Apr  4 00:00:11 defiant kernel: [ 6034.523151] ata2.01: ACPI cmd ef/03:45:00:00:00:b0 (SET FEATURES) filtered out
Apr  4 00:00:11 defiant kernel: [ 6034.523156] ata2.01: ACPI cmd ef/03:0c:00:00:00:b0 (SET FEATURES) filtered out
Apr  4 00:00:11 defiant kernel: [ 6034.523319] ata2.01: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
Apr  4 00:00:11 defiant kernel: [ 6034.524070] ata1.00: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Apr  4 00:00:11 defiant kernel: [ 6034.524082] ata1.01: SATA link down (SStatus 0 SControl 300)
Apr  4 00:00:11 defiant kernel: [ 6034.527159] ata1.00: ACPI cmd ef/03:45:00:00:00:a0 (SET FEATURES) filtered out
Apr  4 00:00:11 defiant kernel: [ 6034.527164] ata1.00: ACPI cmd ef/03:0c:00:00:00:a0 (SET FEATURES) filtered out
Apr  4 00:00:11 defiant kernel: [ 6034.529150] ata2.00: ACPI cmd ef/03:45:00:00:00:a0 (SET FEATURES) filtered out
Apr  4 00:00:11 defiant kernel: [ 6034.529155] ata2.00: ACPI cmd ef/03:0c:00:00:00:a0 (SET FEATURES) filtered out
Apr  4 00:00:11 defiant kernel: [ 6034.529310] ata2.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
Apr  4 00:00:11 defiant kernel: [ 6034.529381] ata1.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
Apr  4 00:00:11 defiant kernel: [ 6034.538288] ata2.00: configured for UDMA/133
Apr  4 00:00:11 defiant kernel: [ 6034.544287] ata2.01: configured for UDMA/133
Apr  4 00:00:11 defiant kernel: [ 6036.687034] e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
Apr  4 00:00:11 defiant kernel: [ 6038.086733] ata1.00: configured for UDMA/133
Apr  4 00:00:11 defiant kernel: [ 6046.837423] PM: resume of devices complete after 13097.208 msecs
Apr  4 00:00:11 defiant kernel: [ 6046.861968] Restarting tasks ... done.
Apr  4 00:00:11 defiant kernel: [ 6046.905404] md: checkpointing resync of md0.
Apr  4 00:00:11 defiant kernel: [ 6046.975363] md: resync of RAID array md0
Apr  4 00:00:11 defiant kernel: [ 6046.975370] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
Apr  4 00:00:11 defiant kernel: [ 6046.975374] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for resync.
Apr  4 00:00:11 defiant kernel: [ 6046.975381] md: using 128k window, over a total of 2930162552k.
Apr  4 00:00:11 defiant kernel: [ 6046.975385] md: resuming resync of md0 from checkpoint.
Apr  4 00:00:11 defiant kernel: [ 6046.975719] md: md0: resync done.
Apr  5 00:00:25 defiant kernel: [ 6912.543143] PM: Syncing filesystems ... done.
Apr  5 00:00:25 defiant kernel: [ 6913.061333] Freezing user space processes ... (elapsed 0.001 seconds) done.
Apr  5 00:00:25 defiant kernel: [ 6913.062642] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
Apr  5 00:00:25 defiant kernel: [ 6913.065267] parport_pc 00:04: disabled
Apr  5 00:00:25 defiant kernel: [ 6913.065421] sd 1:0:1:0: [sdc] Synchronizing SCSI cache
Apr  5 00:00:25 defiant kernel: [ 6913.065585] sd 1:0:0:0: [sdb] Synchronizing SCSI cache
Apr  5 00:00:25 defiant kernel: [ 6913.065616] sd 1:0:1:0: [sdc] Stopping disk
Apr  5 00:00:25 defiant kernel: [ 6913.065795] sd 0:0:0:0: [sda] Synchronizing SCSI cache
Apr  5 00:00:25 defiant kernel: [ 6913.065807] sd 1:0:0:0: [sdb] Stopping disk
Apr  5 00:00:25 defiant kernel: [ 6913.065825] serial 00:03: disabled
Apr  5 00:00:25 defiant kernel: [ 6913.065964] sd 0:0:0:0: [sda] Stopping disk
Apr  5 00:00:25 defiant kernel: [ 6913.066412] serial 00:02: disabled
Apr  5 00:00:25 defiant kernel: [ 6913.066499] e1000e: EEE TX LPI TIMER: 00000000
Apr  5 00:00:25 defiant kernel: [ 6913.066522] e1000e: EEE TX LPI TIMER: 00000000
Apr  5 00:00:25 defiant kernel: [ 6914.455057] PM: suspend of devices complete after 1390.900 msecs
Apr  5 00:00:25 defiant kernel: [ 6914.466057] PM: late suspend of devices complete after 10.988 msecs
Apr  5 00:00:25 defiant kernel: [ 6914.466957] pcieport 0000:00:1c.4: System wakeup enabled by ACPI
Apr  5 00:00:25 defiant kernel: [ 6914.467380] uhci_hcd 0000:00:1d.2: System wakeup enabled by ACPI
Apr  5 00:00:25 defiant kernel: [ 6914.467382] ehci-pci 0000:00:1d.7: System wakeup enabled by ACPI
Apr  5 00:00:25 defiant kernel: [ 6914.467459] uhci_hcd 0000:00:1d.1: System wakeup enabled by ACPI
Apr  5 00:00:25 defiant kernel: [ 6914.467512] uhci_hcd 0000:00:1d.0: System wakeup enabled by ACPI
Apr  5 00:00:25 defiant kernel: [ 6914.467514] pcieport 0000:00:1c.2: System wakeup enabled by ACPI
Apr  5 00:00:25 defiant kernel: [ 6914.467634] ehci-pci 0000:00:1a.7: System wakeup enabled by ACPI
Apr  5 00:00:25 defiant kernel: [ 6914.467689] uhci_hcd 0000:00:1a.2: System wakeup enabled by ACPI
Apr  5 00:00:25 defiant kernel: [ 6914.467745] uhci_hcd 0000:00:1a.1: System wakeup enabled by ACPI
Apr  5 00:00:25 defiant kernel: [ 6914.467787] uhci_hcd 0000:00:1a.0: System wakeup enabled by ACPI
Apr  5 00:00:25 defiant kernel: [ 6914.478256] PM: noirq suspend of devices complete after 12.166 msecs
Apr  5 00:00:25 defiant kernel: [ 6914.479261] uhci_hcd 0000:00:1a.0: System wakeup disabled by ACPI
Apr  5 00:00:25 defiant kernel: [ 6914.479261] uhci_hcd 0000:00:1a.1: System wakeup disabled by ACPI
Apr  5 00:00:25 defiant kernel: [ 6914.479261] uhci_hcd 0000:00:1a.2: System wakeup disabled by ACPI
Apr  5 00:00:25 defiant kernel: [ 6914.479261] uhci_hcd 0000:00:1d.0: System wakeup disabled by ACPI
Apr  5 00:00:25 defiant kernel: [ 6914.479261] uhci_hcd 0000:00:1d.1: System wakeup disabled by ACPI
Apr  5 00:00:25 defiant kernel: [ 6914.479261] uhci_hcd 0000:00:1d.2: System wakeup disabled by ACPI
Apr  5 00:00:25 defiant kernel: [ 6914.489155] ehci-pci 0000:00:1d.7: System wakeup disabled by ACPI
Apr  5 00:00:25 defiant kernel: [ 6914.489159] ehci-pci 0000:00:1a.7: System wakeup disabled by ACPI
Apr  5 00:00:25 defiant kernel: [ 6914.490424] pcieport 0000:00:1c.4: System wakeup disabled by ACPI
Apr  5 00:00:25 defiant kernel: [ 6914.490655] PM: noirq resume of devices complete after 12.303 msecs
Apr  5 00:00:25 defiant kernel: [ 6914.491450] PM: early resume of devices complete after 0.685 msecs
Apr  5 00:00:25 defiant kernel: [ 6914.492071] pcieport 0000:00:1c.2: System wakeup disabled by ACPI
Apr  5 00:00:25 defiant kernel: [ 6914.497160] rtc_cmos 00:01: System wakeup disabled by ACPI
Apr  5 00:00:25 defiant kernel: [ 6914.503622] serial 00:02: activated
Apr  5 00:00:25 defiant kernel: [ 6914.510152] serial 00:03: activated
Apr  5 00:00:25 defiant kernel: [ 6914.512565] parport_pc 00:04: activated
Apr  5 00:00:25 defiant kernel: [ 6914.565401] sd 0:0:0:0: [sda] Starting disk
Apr  5 00:00:25 defiant kernel: [ 6914.565403] sd 1:0:0:0: [sdb] Starting disk
Apr  5 00:00:25 defiant kernel: [ 6914.565440] sd 1:0:1:0: [sdc] Starting disk
Apr  5 00:00:25 defiant kernel: [ 6914.816723] ata4: SATA link down (SStatus 0 SControl 300)
Apr  5 00:00:25 defiant kernel: [ 6914.827459] ata3: SATA link down (SStatus 0 SControl 300)
Apr  5 00:00:25 defiant kernel: [ 6915.271076] ata1.00: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Apr  5 00:00:25 defiant kernel: [ 6915.271086] ata1.01: SATA link down (SStatus 0 SControl 300)
Apr  5 00:00:25 defiant kernel: [ 6915.273074] ata2.00: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Apr  5 00:00:25 defiant kernel: [ 6915.273087] ata2.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Apr  5 00:00:25 defiant kernel: [ 6915.274152] ata1.00: ACPI cmd ef/03:45:00:00:00:a0 (SET FEATURES) filtered out
Apr  5 00:00:25 defiant kernel: [ 6915.274157] ata1.00: ACPI cmd ef/03:0c:00:00:00:a0 (SET FEATURES) filtered out
Apr  5 00:00:25 defiant kernel: [ 6915.276145] ata2.01: ACPI cmd ef/03:45:00:00:00:b0 (SET FEATURES) filtered out
Apr  5 00:00:25 defiant kernel: [ 6915.276150] ata2.01: ACPI cmd ef/03:0c:00:00:00:b0 (SET FEATURES) filtered out
Apr  5 00:00:25 defiant kernel: [ 6915.276323] ata1.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
Apr  5 00:00:25 defiant kernel: [ 6915.276399] ata2.01: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
Apr  5 00:00:25 defiant kernel: [ 6915.282146] ata2.00: ACPI cmd ef/03:45:00:00:00:a0 (SET FEATURES) filtered out
Apr  5 00:00:25 defiant kernel: [ 6915.282151] ata2.00: ACPI cmd ef/03:0c:00:00:00:a0 (SET FEATURES) filtered out
Apr  5 00:00:25 defiant kernel: [ 6915.282310] ata2.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
Apr  5 00:00:25 defiant kernel: [ 6915.291309] ata2.00: configured for UDMA/133
Apr  5 00:00:25 defiant kernel: [ 6915.297302] ata2.01: configured for UDMA/133
Apr  5 00:00:25 defiant kernel: [ 6917.413041] e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
Apr  5 00:00:25 defiant kernel: [ 6918.709566] ata1.00: configured for UDMA/133
Apr  5 00:00:25 defiant kernel: [ 6940.057352] PM: resume of devices complete after 25565.892 msecs
Apr  5 00:00:25 defiant kernel: [ 6940.069055] Restarting tasks ... done.

mdadm --detail
Code:

{~}# mdadm --detail /dev/md0                                                   
/dev/md0:
        Version : 1.2
  Creation Time : Tue Aug  2 10:36:53 2011
    Raid Level : raid1
    Array Size : 2930162552 (2794.42 GiB 3000.49 GB)
  Used Dev Size : 2930162552 (2794.42 GiB 3000.49 GB)
  Raid Devices : 2
  Total Devices : 2
    Persistence : Superblock is persistent

    Update Time : Sun Apr  5 21:11:27 2020
          State : clean, degraded
 Active Devices : 1
Working Devices : 1
 Failed Devices : 1
  Spare Devices : 0

          Name : defiant:0  (local to host defiant)
          UUID : a043a371:530d4c99:daed879a:904c0e11
        Events : 2181

    Number  Major  Minor  RaidDevice State
      0      8      17        0      active sync  /dev/sdb1
      2      0        0        2      removed

      1      8      33        -      faulty  /dev/sdc1


Richard Cranium 04-06-2020 02:49 AM

Quote:

Originally Posted by bassmadrigal (Post 6108244)
I'll admit that my brain is a little fuzzed out right now and I'm struggling making sense of the conf file and your line. Would you mind breaking down what your DEVICESCAN line is doing? If not, I can dig through things a bit more once my mind is in a better place.

One of the comments in /etc/smartd.conf is ...

Code:

# First ATA/SATA or SCSI/SAS disk.  Monitor all attributes, enable
# automatic online data collection, automatic Attribute autosave, and
# start a short self-test every day between 2-3am, and a long self test
# Saturdays between 3-4am.
#/dev/sda -a -o on -S on -s (S/../.././02|L/../../6/03))

Further down, there's...

Code:

# HERE IS A LIST OF DIRECTIVES FOR THIS CONFIGURATION FILE.
# PLEASE SEE THE smartd.conf MAN PAGE FOR DETAILS
#
#  -d TYPE Set the device type: ata, scsi, marvell, removable, 3ware,N, hpt,L/M/N
#  -T TYPE set the tolerance to one of: normal, permissive
#  -o VAL  Enable/disable automatic offline tests (on/off)
#  -S VAL  Enable/disable attribute autosave (on/off)
#  -n MODE No check. MODE is one of: never, sleep, standby, idle
#  -H      Monitor SMART Health Status, report if failed
#  -l TYPE Monitor SMART log.  Type is one of: error, selftest
#  -f      Monitor for failure of any 'Usage' Attributes
#  -m ADD  Send warning email to ADD for -H, -l error, -l selftest, and -f
#  -M TYPE Modify email warning behavior (see man page)
#  -s REGE Start self-test when type/date matches regular expression (see man page)
#  -p      Report changes in 'Prefailure' Normalized Attributes
#  -u      Report changes in 'Usage' Normalized Attributes
#  -t      Equivalent to -p and -u Directives
#  -r ID  Also report Raw values of Attribute ID with -p, -u or -t
#  -R ID  Track changes in Attribute ID Raw value with -p, -u or -t
#  -i ID  Ignore Attribute ID for -f Directive
#  -I ID  Ignore Attribute ID for -p, -u or -t Directive
#  -C ID  Report if Current Pending Sector count non-zero
#  -U ID  Report if Offline Uncorrectable count non-zero
#  -W D,I,C Monitor Temperature D)ifference, I)nformal limit, C)ritical limit
#  -v N,ST Modifies labeling of Attribute N (see man page)
#  -a      Default: equivalent to -H -f -t -l error -l selftest -C 197 -U 198
#  -F TYPE Use firmware bug workaround. Type is one of: none, samsung
#  -P TYPE Drive-specific presets: use, ignore, show, showall
#    #      Comment: text after a hash sign is ignored
#    \      Line continuation character

So, all put together, this...
Code:

DEVICESCAN -a -o on -S on -n standby,q -s (S/../.././02|L/../../6/03)
...means

For every smart capable device in the system:
  • Monitor SMART Health Status, report if failed
  • Monitor for failure of any 'Usage' Attributes
  • Report changes in 'Prefailure' Normalized Attributes
  • Report changes in 'Usage' Normalized Attributes
  • Monitor the error SMART log
  • Monitor the selftest SMART log
  • Report if Current Pending Sector count non-zero
  • Report if Offline Uncorrectable count non-zero
(all of that is what -a breaks down to)
PLUS..
  • Enable automatic offline tests (i.e -o on)
  • Enable attribute autosave (i.e., -S on)
  • No check on standby and don't bother to tell me that you skipped a test because of this (i.e., -n standby,q)
  • Run a short self-test between 2-3am every day and a long self-test every Saturday between 3-4am (i.e., -s (S/../.././02|L/../../6/03)

EDIT: I forgot to mention that smartd will also log to syslog if something bad happened in a test. If you already have a log-scraping tool that looks for things to alarm, then you can use that to send a warning email. smartd can also be configured to email warnings as well.

bassmadrigal 04-06-2020 10:34 AM

Quote:

Originally Posted by Richard Cranium (Post 6108324)
One of the comments in /etc/smartd.conf is ...

Code:

# First ATA/SATA or SCSI/SAS disk.  Monitor all attributes, enable
# automatic online data collection, automatic Attribute autosave, and
# start a short self-test every day between 2-3am, and a long self test
# Saturdays between 3-4am.
#/dev/sda -a -o on -S on -s (S/../.././02|L/../../6/03))

Further down, there's...

Code:

# HERE IS A LIST OF DIRECTIVES FOR THIS CONFIGURATION FILE.
# PLEASE SEE THE smartd.conf MAN PAGE FOR DETAILS
#
#  -d TYPE Set the device type: ata, scsi, marvell, removable, 3ware,N, hpt,L/M/N
#  -T TYPE set the tolerance to one of: normal, permissive
#  -o VAL  Enable/disable automatic offline tests (on/off)
#  -S VAL  Enable/disable attribute autosave (on/off)
#  -n MODE No check. MODE is one of: never, sleep, standby, idle
#  -H      Monitor SMART Health Status, report if failed
#  -l TYPE Monitor SMART log.  Type is one of: error, selftest
#  -f      Monitor for failure of any 'Usage' Attributes
#  -m ADD  Send warning email to ADD for -H, -l error, -l selftest, and -f
#  -M TYPE Modify email warning behavior (see man page)
#  -s REGE Start self-test when type/date matches regular expression (see man page)
#  -p      Report changes in 'Prefailure' Normalized Attributes
#  -u      Report changes in 'Usage' Normalized Attributes
#  -t      Equivalent to -p and -u Directives
#  -r ID  Also report Raw values of Attribute ID with -p, -u or -t
#  -R ID  Track changes in Attribute ID Raw value with -p, -u or -t
#  -i ID  Ignore Attribute ID for -f Directive
#  -I ID  Ignore Attribute ID for -p, -u or -t Directive
#  -C ID  Report if Current Pending Sector count non-zero
#  -U ID  Report if Offline Uncorrectable count non-zero
#  -W D,I,C Monitor Temperature D)ifference, I)nformal limit, C)ritical limit
#  -v N,ST Modifies labeling of Attribute N (see man page)
#  -a      Default: equivalent to -H -f -t -l error -l selftest -C 197 -U 198
#  -F TYPE Use firmware bug workaround. Type is one of: none, samsung
#  -P TYPE Drive-specific presets: use, ignore, show, showall
#    #      Comment: text after a hash sign is ignored
#    \      Line continuation character

So, all put together, this...
Code:

DEVICESCAN -a -o on -S on -n standby,q -s (S/../.././02|L/../../6/03)
...means

For every smart capable device in the system:
  • Monitor SMART Health Status, report if failed
  • Monitor for failure of any 'Usage' Attributes
  • Report changes in 'Prefailure' Normalized Attributes
  • Report changes in 'Usage' Normalized Attributes
  • Monitor the error SMART log
  • Monitor the selftest SMART log
  • Report if Current Pending Sector count non-zero
  • Report if Offline Uncorrectable count non-zero
(all of that is what -a breaks down to)
PLUS..
  • Enable automatic offline tests (i.e -o on)
  • Enable attribute autosave (i.e., -S on)
  • No check on standby and don't bother to tell me that you skipped a test because of this (i.e., -n standby,q)
  • Run a short self-test between 2-3am every day and a long self-test every Saturday between 3-4am (i.e., -s (S/../.././02|L/../../6/03)

EDIT: I forgot to mention that smartd will also log to syslog if something bad happened in a test. If you already have a log-scraping tool that looks for things to alarm, then you can use that to send a warning email. smartd can also be configured to email warnings as well.

Awesome! That was really in depth and much easier to read than the conf file when I was looking at it yesterday. Thanks! I'll likely get this implemented when I get home tonight.

Richard Cranium 04-06-2020 10:43 AM

Keep in mind there is a simple
Code:

DEVICESCAN
line near the top of the config file by default. Update that one or comment it out.
Otherwise, when you put your new DEVICESCAN line at the bottom of the file, the config file parser doesn't bother to look at it. Zero guesses on how I know that tidbit.

upnort 04-06-2020 12:27 PM

Quote:

I'm wondering if the sdc didn't wake up quick enough for the raid that it decided to resync.
Good point. I don't know. While Linux software RAID is well tested for a couple of decades, suspend might be outside the scope of the design. Primarily RAID targets systems running 24/7 -- business continuity. Something that suspends, like a laptop or home server that is powered down nightly, might not be an expected use case for software RAID. Might want to poke around the web.

Quote:

however the drive is still in fault mode
The state is listed as clean, degraded. Look into how to remove the degraded state.

dimm0k 04-06-2020 01:57 PM

Quote:

Originally Posted by upnort (Post 6108446)
Good point. I don't know. While Linux software RAID is well tested for a couple of decades, suspend might be outside the scope of the design. Primarily RAID targets systems running 24/7 -- business continuity. Something that suspends, like a laptop or home server that is powered down nightly, might not be an expected use case for software RAID. Might want to poke around the web.

what you said definitely makes sense! thank you, I'll poke some more to see if anyone has any info on this!

The state is listed as clean, degraded. Look into how to remove the degraded state.

working on that now!

Richard Cranium 04-06-2020 07:56 PM

One more thing, I've put this into my /etc/rc.d/rc.local (which is more RAID related than smartd related)...
Code:

# Increase timeouts for all non-ERC drives.
# see https://raid.wiki.kernel.org/index.php/Timeout_Mismatch
for i in /dev/sd? ; do
    if smartctl -l scterc,70,70 ${i} > /dev/null ; then
        echo -n ${i} " is good "
    else
        echo 180 > /sys/block/${i/\/dev\/}/device/timeout
        echo -n ${i} " is  bad "
    fi;
    smartctl -i ${i} | egrep "(Device Model|Product:)"
    blockdev --setra 1024 ${i}
done

The link in the code block explains the issue.

dimm0k 04-07-2020 09:12 AM

Quote:

Originally Posted by Richard Cranium (Post 6108554)
One more thing, I've put this into my /etc/rc.d/rc.local (which is more RAID related than smartd related)...
Code:

# Increase timeouts for all non-ERC drives.
# see https://raid.wiki.kernel.org/index.php/Timeout_Mismatch
for i in /dev/sd? ; do
    if smartctl -l scterc,70,70 ${i} > /dev/null ; then
        echo -n ${i} " is good "
    else
        echo 180 > /sys/block/${i/\/dev\/}/device/timeout
        echo -n ${i} " is  bad "
    fi;
    smartctl -i ${i} | egrep "(Device Model|Product:)"
    blockdev --setra 1024 ${i}
done

The link in the code block explains the issue.

thanks for this, definitely was not aware of this! while it definitely can be used on my desktop, it unfortunately does not work for the older drives I have on my backup server! this does shed more light on what happened with my drive on this system.

Richard Cranium 04-08-2020 01:01 AM

To be honest, I'm fairly certain that someone else on the forum mentioned the timeout mismatch; I don't remember who did so or when they did.

While it's possible that I ran across this while reading the RAID wiki, the mere fact that this is the first time I'm posted anything about it, tells me that someone else beat me to the punch. Hopefully they'll show up and tell us when I saw said mention.


All times are GMT -5. The time now is 10:26 PM.