LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices


Reply
  Search this Thread
Old 10-24-2016, 08:26 PM   #1
xonogenic
Member
 
Registered: Feb 2006
Posts: 30

Rep: Reputation: 2
BTRFS encrypted RAID10 hangs system


This issue has been frustrating me since I built what was supposed to be a NAS/Home Theatre box about a month ago. I have a single drive that runs the Arch linux, and 4 drives, all WD Red 3TB. All four drives are encrypted with LUKS, and run btrfs in mirrored raid.

Whenever I do something that has a significant amount disk activity, the whole system just hangs. Nothing of significance shows up in any log that I can find. I've seen some transient issues show up in the logs, but nothing consistent enough to explain this. I also have SMART running which has found no issues. I've been scouring BTRFS forums for anything that sounds like this and have found nothing. If there is information missing, let me know.

I'm hoping someone can give me an idea of where to look.

BTRFS configuration:
Code:
[root@media_pc ~]# btrfs dev usage /mnt
/dev/mapper/data1, ID: 1
   Device size:             2.73TiB
   Device slack:              0.00B
   Data,RAID10:           981.00GiB
   Metadata,RAID10:         1.50GiB
   Unallocated:             1.77TiB

/dev/mapper/data2, ID: 2
   Device size:             2.73TiB
   Device slack:              0.00B
   Data,RAID10:           981.00GiB
   Metadata,RAID10:         1.50GiB
   Unallocated:             1.77TiB

/dev/mapper/data3, ID: 3
   Device size:             2.73TiB
   Device slack:              0.00B
   Data,RAID10:           981.00GiB
   Metadata,RAID10:         1.50GiB
   System,RAID1:           32.00MiB
   Unallocated:             1.77TiB

/dev/mapper/data4, ID: 4
   Device size:             2.73TiB
   Device slack:              0.00B
   Data,RAID10:           981.00GiB
   Metadata,RAID10:         1.50GiB
   System,RAID1:           32.00MiB
   Unallocated:             1.77TiB

[root@media_pc ~]# btrfs dev stats /mnt
[/dev/mapper/data1].write_io_errs   0
[/dev/mapper/data1].read_io_errs    0
[/dev/mapper/data1].flush_io_errs   0
[/dev/mapper/data1].corruption_errs 0
[/dev/mapper/data1].generation_errs 0
[/dev/mapper/data2].write_io_errs   0
[/dev/mapper/data2].read_io_errs    0
[/dev/mapper/data2].flush_io_errs   0
[/dev/mapper/data2].corruption_errs 0
[/dev/mapper/data2].generation_errs 0
[/dev/mapper/data3].write_io_errs   0
[/dev/mapper/data3].read_io_errs    0
[/dev/mapper/data3].flush_io_errs   0
[/dev/mapper/data3].corruption_errs 0
[/dev/mapper/data3].generation_errs 0
[/dev/mapper/data4].write_io_errs   0
[/dev/mapper/data4].read_io_errs    0
[/dev/mapper/data4].flush_io_errs   0
[/dev/mapper/data4].corruption_errs 0
[/dev/mapper/data4].generation_errs 0
System Config:

Code:
[root@media_pc ~]# uname -a
Linux media_pc.localhost 4.7.4-1-ARCH #1 SMP PREEMPT Thu Sep 15 15:24:29 CEST 2016 x86_64 GNU/Linux

[root@media_pc ~]# lspci
00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD/ATI] RX780/RX790 Host Bridge
00:02.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] RX780/RD790 PCI to PCI bridge (external gfx0 port A)
00:04.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] RD790 PCI to PCI bridge (PCI express gpp port A)
00:09.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] RD790 PCI to PCI bridge (PCI express gpp port E)
00:0a.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] RD790 PCI to PCI bridge (PCI express gpp port F)
00:11.0 SATA controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 SATA Controller [IDE mode] (rev 40)
00:12.0 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB OHCI0 Controller
00:12.2 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB EHCI Controller
00:13.0 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB OHCI0 Controller
00:13.2 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB EHCI Controller
00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD/ATI] SBx00 SMBus Controller (rev 41)
00:14.1 IDE interface: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 IDE Controller (rev 40)
00:14.2 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] SBx00 Azalia (Intel HDA) (rev 40)
00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 LPC host controller (rev 40)
00:14.4 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] SBx00 PCI to PCI Bridge (rev 40)
00:14.5 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB OHCI2 Controller
00:15.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] SB700/SB800/SB900 PCI to PCI bridge (PCIE port 0)
00:16.0 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB OHCI0 Controller
00:16.2 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB EHCI Controller
00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 10h Processor HyperTransport Configuration
00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 10h Processor Address Map
00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 10h Processor DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 10h Processor Miscellaneous Control
00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 10h Processor Link Control
01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 06)
02:08.0 FireWire (IEEE 1394): VIA Technologies, Inc. VT6306/7/8 [Fire II(M)] IEEE 1394 OHCI Controller (rev c0)
03:00.0 USB controller: NEC Corporation uPD720200 USB 3.0 Host Controller (rev 03)
04:00.0 SATA controller: JMicron Technology Corp. JMB361 AHCI/IDE (rev 02)
04:00.1 IDE interface: JMicron Technology Corp. JMB361 AHCI/IDE (rev 02)
05:00.0 RAID bus controller: Marvell Technology Group Ltd. 88SE9485 SAS/SATA 6Gb/s controller (rev c3)
06:00.0 VGA compatible controller: NVIDIA Corporation GF106 [GeForce GTS 450] (rev a1)
06:00.1 Audio device: NVIDIA Corporation GF106 High Definition Audio Controller (rev a1)
Previous Boot:

Code:
Oct 22 00:00:15 media_pc.southpark.com mandb[2839]: 0 old database entries were purged.
Oct 22 00:00:15 media_pc.southpark.com systemd[1]: Started Update man-db cache.
Oct 22 00:01:01 media_pc.southpark.com CROND[2887]: (root) CMD (run-parts /etc/cron.hourly)
Oct 22 00:01:01 media_pc.southpark.com anacron[2892]: Anacron started on 2016-10-22
Oct 22 00:01:01 media_pc.southpark.com anacron[2892]: Normal exit (0 jobs run)
Oct 22 00:22:45 media_pc.southpark.com kernel: sd 11:0:0:0: [sdb] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x07 driverbyte=0x08
Oct 22 00:22:45 media_pc.southpark.com kernel: sd 11:0:0:0: [sdb] tag#0 Sense Key : 0x4 [current] [descriptor] 
Oct 22 00:22:45 media_pc.southpark.com kernel: sd 11:0:0:0: [sdb] tag#0 ASC=0x0 ASCQ=0x0 
Oct 22 00:22:45 media_pc.southpark.com kernel: sd 11:0:0:0: [sdb] tag#0 CDB: opcode=0x85 85 06 2c 00 da 00 00 00 00 00 4f 00 c2 00 b0 00
Oct 22 00:22:48 media_pc.southpark.com smartd[931]: Device: /dev/sdg [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 117 to 116
Smart Output

Code:
root@media_pc ~]# smartctl -a /dev/sdc
smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.7.4-1-ARCH] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Red
Device Model:     WDC WD30EFRX-68AX9N0
Serial Number:    WD-WCC1T0451060
LU WWN Device Id: 5 0014ee 2081b04c5
Firmware Version: 80.00A80
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Mon Oct 24 21:22:09 2016 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

+++++++++++++++++++++
Removed some informational output
+++++++++++++++++++++

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   181   179   021    Pre-fail  Always       -       5950
  4 Start_Stop_Count        0x0032   099   099   000    Old_age   Always       -       1052
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   061   061   000    Old_age   Always       -       28981
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       76
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       66
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       985
194 Temperature_Celsius     0x0022   118   107   000    Old_age   Always       -       32
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     28570         -
# 2  Extended offline    Completed without error       00%     28069         -
# 3  Short offline       Completed without error       00%     27990         -
# 4  Short offline       Completed without error       00%     27822         -
# 5  Short offline       Completed without error       00%     27654         -
# 6  Short offline       Completed without error       00%     27487         -
# 7  Short offline       Completed without error       00%     27319         -
# 8  Extended offline    Aborted by host               90%     27319         -
# 9  Short offline       Completed without error       00%     27151         -
#10  Short offline       Completed without error       00%     26983         -
#11  Short offline       Completed without error       00%     26815         -
#12  Short offline       Completed without error       00%     26647         -
#13  Extended offline    Completed without error       00%     26583         -
#14  Short offline       Completed without error       00%     26480         -
#15  Short offline       Completed without error       00%     26312         -
#16  Short offline       Completed without error       00%     26144         -
#17  Short offline       Completed without error       00%     25976         -
#18  Extended offline    Completed without error       00%     25864         -
#19  Short offline       Completed without error       00%     25808         -
#20  Short offline       Completed without error       00%     25640         -
#21  Short offline       Completed without error       00%     25473         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

[root@media_pc ~]# smartctl -a /dev/sdd
smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.7.4-1-ARCH] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Red
Device Model:     WDC WD30EFRX-68AX9N0
Serial Number:    WD-WCC1T1473654
LU WWN Device Id: 5 0014ee 208db6f09
Firmware Version: 80.00A80
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Mon Oct 24 21:22:12 2016 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

+++++++++++++++++++++
Removed some informational output
+++++++++++++++++++++

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   172   171   021    Pre-fail  Always       -       6358
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       916
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   064   064   000    Old_age   Always       -       26890
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       81
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       70
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       845
194 Temperature_Celsius     0x0022   117   106   000    Old_age   Always       -       33
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     26453         -
# 2  Extended offline    Completed without error       00%     26007         -
# 3  Short offline       Completed without error       00%     25928         -
# 4  Short offline       Completed without error       00%     25760         -
# 5  Short offline       Completed without error       00%     25592         -
# 6  Short offline       Completed without error       00%     25425         -
# 7  Extended offline    Completed without error       00%     25264         -
# 8  Short offline       Aborted by host               90%     25257         -
# 9  Short offline       Completed without error       00%     25089         -
#10  Short offline       Completed without error       00%     24921         -
#11  Short offline       Completed without error       00%     24753         -
#12  Short offline       Completed without error       00%     24585         -
#13  Extended offline    Completed without error       00%     24520         -
#14  Short offline       Completed without error       00%     24418         -
#15  Short offline       Completed without error       00%     24250         -
#16  Short offline       Completed without error       00%     24082         -
#17  Short offline       Completed without error       00%     23914         -
#18  Extended offline    Completed without error       00%     23801         -
#19  Short offline       Completed without error       00%     23746         -
#20  Short offline       Completed without error       00%     23578         -
#21  Short offline       Completed without error       00%     23411         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

[root@media_pc ~]# smartctl -a /dev/sde
smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.7.4-1-ARCH] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Red
Device Model:     WDC WD30EFRX-68AX9N0
Serial Number:    WD-WCC1T0392935
LU WWN Device Id: 5 0014ee 25d72f463
Firmware Version: 80.00A80
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Mon Oct 24 21:22:15 2016 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

+++++++++++++++++++++
Removed some informational output
+++++++++++++++++++++

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       30
  3 Spin_Up_Time            0x0027   176   175   021    Pre-fail  Always       -       6158
  4 Start_Stop_Count        0x0032   099   099   000    Old_age   Always       -       1043
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   061   061   000    Old_age   Always       -       28946
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       77
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       67
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       975
194 Temperature_Celsius     0x0022   118   109   000    Old_age   Always       -       32
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       1

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     28506         -
# 2  Extended offline    Completed without error       00%     28070         -
# 3  Short offline       Completed without error       00%     27990         -
# 4  Short offline       Completed without error       00%     27822         -
# 5  Short offline       Completed without error       00%     27655         -
# 6  Short offline       Completed without error       00%     27487         -
# 7  Short offline       Completed without error       00%     27319         -
# 8  Extended offline    Aborted by host               90%     27319         -
# 9  Short offline       Completed without error       00%     27151         -
#10  Short offline       Completed without error       00%     26983         -
#11  Short offline       Completed without error       00%     26815         -
#12  Short offline       Completed without error       00%     26648         -
#13  Extended offline    Completed without error       00%     26583         -
#14  Short offline       Completed without error       00%     26480         -
#15  Short offline       Completed without error       00%     26312         -
#16  Short offline       Completed without error       00%     26144         -
#17  Short offline       Completed without error       00%     25976         -
#18  Extended offline    Completed without error       00%     25864         -
#19  Short offline       Completed without error       00%     25808         -
#20  Short offline       Completed without error       00%     25641         -
#21  Short offline       Completed without error       00%     25473         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

[root@media_pc ~]# smartctl -a /dev/sdf
smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.7.4-1-ARCH] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Red
Device Model:     WDC WD30EFRX-68AX9N0
Serial Number:    WD-WCC1T1473647
LU WWN Device Id: 5 0014ee 2b386474a
Firmware Version: 80.00A80
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Mon Oct 24 21:22:17 2016 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

+++++++++++++++++++++
Removed some informational output
+++++++++++++++++++++

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   175   174   021    Pre-fail  Always       -       6208
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       923
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   067   067   000    Old_age   Always       -       24196
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       80
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       70
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       852
194 Temperature_Celsius     0x0022   119   107   000    Old_age   Always       -       31
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     23785         -
# 2  Extended offline    Completed without error       00%     23281         -
# 3  Short offline       Completed without error       00%     23201         -
# 4  Short offline       Completed without error       00%     23033         -
# 5  Short offline       Completed without error       00%     22865         -
# 6  Short offline       Completed without error       00%     22698         -
# 7  Extended offline    Completed without error       00%     22537         -
# 8  Short offline       Aborted by host               90%     22530         -
# 9  Short offline       Completed without error       00%     22362         -
#10  Short offline       Completed without error       00%     22194         -
#11  Short offline       Completed without error       00%     22026         -
#12  Short offline       Completed without error       00%     21858         -
#13  Extended offline    Completed without error       00%     21794         -
#14  Short offline       Completed without error       00%     21691         -
#15  Short offline       Completed without error       00%     21523         -
#16  Short offline       Completed without error       00%     21355         -
#17  Short offline       Completed without error       00%     21187         -
#18  Extended offline    Completed without error       00%     21075         -
#19  Short offline       Completed without error       00%     21019         -
#20  Short offline       Completed without error       00%     20851         -
#21  Short offline       Completed without error       00%     20684         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Last edited by xonogenic; 10-25-2016 at 08:09 AM. Reason: Grammer
 
Old 10-24-2016, 09:08 PM   #2
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,126

Rep: Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120
You've been a member here long enough to know you should be using [code] tags to maintain output layout - especially if posting truckloads of it. If it's unreadable why should anyone bother ?.
Quote:
Originally Posted by btrfs FAQ
Also keep in mind that if you use partition level encryption and btrfs RAID on top of multiple encrypted partitions, the partition encryption will have to individually encrypt each copy. This may result in somewhat reduced performance compared to a traditional RAID setup where the encryption might be done on top of RAID. Whether the encryption has a significant impact depends on the workload, and note that many newer CPUs have hardware encryption support.
Edit: (called off for lunch )
I'm not a big fan of inserting yet more block level drivers unless there is a real need - encrypting a media server ?. I tested btrfs RAID10 years ago when it first shipped - never noticed any discernable performance issues, but I only did a few quick tests rather than benchmarks. All local disk, no encryption.
Recovery was flakey, but performed ok for me. These days I use RAID5, but again no load, and no encryption.

Last edited by syg00; 10-24-2016 at 09:34 PM.
 
Old 10-24-2016, 09:51 PM   #3
wpeckham
LQ Guru
 
Registered: Apr 2010
Location: Continental USA
Distribution: Debian, Ubuntu, RedHat, DSL, Puppy, CentOS, Knoppix, Mint-DE, Sparky, VSIDO, tinycore, Q4OS,Manjaro
Posts: 5,623

Rep: Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695
I set up something like this and did not like the performance. I scrubbed it all and make MDM raid-5 with one spare using ext4 and was happy with the performance. BTRFS is getting better, but it is not production ready yet. I am looking forward with GREAT anticipation to running BTRFS in Raid-6 mode when that gets reliable. Perhaps that will come next year.
 
Old 10-25-2016, 08:12 AM   #4
xonogenic
Member
 
Registered: Feb 2006
Posts: 30

Original Poster
Rep: Reputation: 2
Quote:
Originally Posted by syg00 View Post
You've been a member here long enough to know you should be using [code] tags to maintain output layout - especially if posting truckloads of it. If it's unreadable why should anyone bother ?.
My bad, I've fixed that.

I understand what you are saying about stacking block level drivers, but it seems to me that the worst that should happen would be crappy performance.
 
Old 10-25-2016, 05:14 PM   #5
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,126

Rep: Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120
No, that would assume everything is independant of everything else. Cache/slab/sysctls (generally) are all global. Have a read of this for some enlightenment.

If I were you, I'd invert the stack, as the FAQ suggests - create the RAID10 on the base devices and create the luks container on that. Makes key management easier, and you don't have to worry about stride/stripe mismatch affecting I/O performance.
 
Old 10-27-2016, 09:52 PM   #6
xonogenic
Member
 
Registered: Feb 2006
Posts: 30

Original Poster
Rep: Reputation: 2
That article was good reading, and I have gotten some promising results from tweaking the dirty_bytes and dirty_background_bytes, but the system still hangs after sustained writes. At the moment, I'm trying to get my data rsynced to a backup drive. Once I can get that done, I'm going to give freebsd a try and see how that goes.

Thank you for your input, definitely pointed me in what is likely the right direction.
 
Old 10-30-2016, 10:19 AM   #7
xonogenic
Member
 
Registered: Feb 2006
Posts: 30

Original Poster
Rep: Reputation: 2
By way of an update,I mucked about with the dirty_bytes and dirty_background_bytes with limited success. The hanging still happened, but sometimes it would run for a little longer. When I was thinking about it, it occured to me that I had not configured any swap space. I didn't think that this would cause an issue like this without showing an out of memory error or something like that. Since I was out of options anyways, I added a swap device. In spite of only writing out 6 to 10mb to swap, this appears to have solved the issue.

On further reading, modifying /proc/sys/vm/swappiness to 1 likely would have solved this issue as well. For now, it seems like IO is not causing hanging, but more testing is needed.

Excellent reference here -> https://git.kernel.org/cgit/linux/ke.../sysctl/vm.txt

Wiki on Swappiness here -> https://en.wikipedia.org/wiki/Swappiness

Thanks for the input! I hope this helps someone else.

Edit: Last Update - btrfs balance still hangs the entire system regardless of ionice setting, nice setting or anything that makes sense in the swap/writeback subsystem. I think this is an issue with btrfs itself not being able to properly throttle writes through LUKS, but that is just speculation. Maybe it will be mysteriously fixed in a future update.

Last edited by xonogenic; 11-01-2016 at 10:17 PM.
 
Old 01-04-2017, 02:58 PM   #8
xonogenic
Member
 
Registered: Feb 2006
Posts: 30

Original Poster
Rep: Reputation: 2
Final update:

This whole saga appears to have been to result of a subtlely broken motherboard. I replaced it, and everything is working as I expect.
 
  


Reply

Tags
btrfs



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
installation on btrfs file system is not possible Andrey@ Slackware - Installation 5 09-03-2015 07:00 AM
BTRFS system file zanier Linux - General 2 07-24-2012 09:21 AM
BTRFS file system zanier Linux - Kernel 3 05-27-2012 04:54 PM
LXer: Install Fedora 16 on an encrypted btrfs file system LXer Syndicated Linux News 0 11-16-2011 04:50 PM
LXer: The Btrfs file system LXer Syndicated Linux News 0 07-17-2009 02:00 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - General

All times are GMT -5. The time now is 10:19 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration