Linux - Hardware This forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux? |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
 |
01-19-2024, 07:49 PM
|
#1
|
Member
Registered: Nov 2015
Posts: 262
Rep: 
|
Does this mean the boot SSD is about to fail?
Was helping a friend with a weird Linux problem, he's using Debian 12.
Firefox would load and immediately crash if you tried to open a web page. I installed Chrome and Brave and they refuse to load. Both give a generic 'bus error'.
I checked in dmesg and saw TONS of entries like this:
Code:
[ 21.460501] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[ 21.460512] ata1.00: irq_stat 0x40000001
[ 21.460515] ata1.00: failed command: READ DMA
[ 21.460516] ata1.00: cmd c8/00:20:90:18:c4/00:00:00:00:00/e7 tag 1 dma 16384 in
res 51/40:20:90:18:c4/00:00:00:00:00/e7 Emask 0x9 (media error)
[ 21.460522] ata1.00: status: { DRDY ERR }
[ 21.460523] ata1.00: error: { UNC }
[ 21.497642] ata1.00: configured for UDMA/133
[ 21.497661] sd 0:0:0:0: [sda] tag#1 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
[ 21.497665] sd 0:0:0:0: [sda] tag#1 Sense Key : Medium Error [current]
[ 21.497666] sd 0:0:0:0: [sda] tag#1 Add. Sense: Unrecovered read error - auto reallocate failed
[ 21.497669] sd 0:0:0:0: [sda] tag#1 CDB: Read(10) 28 00 07 c4 18 90 00 00 20 00
[ 21.497674] I/O error, dev sda, sector 130291856 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 2
[ 21.497693] ata1: EH complete
[ 21.564458] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[ 21.564464] ata1.00: irq_stat 0x40000001
[ 21.564466] ata1.00: failed command: READ DMA
[ 21.564466] ata1.00: cmd c8/00:20:90:18:c4/00:00:00:00:00/e7 tag 17 dma 16384 in
res 51/40:20:90:18:c4/00:00:00:00:00/e7 Emask 0x9 (media error)
[ 21.564471] ata1.00: status: { DRDY ERR }
[ 21.564472] ata1.00: error: { UNC }
[ 21.591996] ata1.00: configured for UDMA/133
[ 21.592006] sd 0:0:0:0: [sda] tag#17 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
[ 21.592009] sd 0:0:0:0: [sda] tag#17 Sense Key : Medium Error [current]
[ 21.592010] sd 0:0:0:0: [sda] tag#17 Add. Sense: Unrecovered read error - auto reallocate failed
[ 21.592011] sd 0:0:0:0: [sda] tag#17 CDB: Read(10) 28 00 07 c4 18 90 00 00 20 00
[ 21.592012] I/O error, dev sda, sector 130291856 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
[ 21.592021] ata1: EH complete
[ 21.676370] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[ 21.676375] ata1.00: irq_stat 0x40000001
[ 21.676377] ata1.00: failed command: READ DMA
[ 21.676377] ata1.00: cmd c8/00:e0:b0:0d:c4/00:00:00:00:00/e7 tag 4 dma 114688 in
res 51/40:e0:b0:0d:c4/00:00:00:00:00/e7 Emask 0x9 (media error)
[ 21.676381] ata1.00: status: { DRDY ERR }
[ 21.676382] ata1.00: error: { UNC }
[ 21.703646] ata1.00: configured for UDMA/133
[ 21.703661] sd 0:0:0:0: [sda] tag#4 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
[ 21.703665] sd 0:0:0:0: [sda] tag#4 Sense Key : Medium Error [current]
[ 21.703666] sd 0:0:0:0: [sda] tag#4 Add. Sense: Unrecovered read error - auto reallocate failed
[ 21.703668] sd 0:0:0:0: [sda] tag#4 CDB: Read(10) 28 00 07 c4 0d b0 00 00 e0 00
[ 21.703669] I/O error, dev sda, sector 130289072 op 0x0:(READ) flags 0x80700 phys_seg 2 prio class 2
[ 21.703680] ata1: EH complete
[ 21.788436] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[ 21.788446] ata1.00: irq_stat 0x40000001
[ 21.788450] ata1.00: failed command: READ DMA
[ 21.788453] ata1.00: cmd c8/00:08:08:0e:c4/00:00:00:00:00/e7 tag 13 dma 4096 in
res 51/40:08:08:0e:c4/00:00:00:00:00/e7 Emask 0x9 (media error)
[ 21.788463] ata1.00: status: { DRDY ERR }
[ 21.788465] ata1.00: error: { UNC }
[ 21.815857] ata1.00: configured for UDMA/133
[ 21.815877] sd 0:0:0:0: [sda] tag#13 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
[ 21.815883] sd 0:0:0:0: [sda] tag#13 Sense Key : Medium Error [current]
[ 21.815886] sd 0:0:0:0: [sda] tag#13 Add. Sense: Unrecovered read error - auto reallocate failed
[ 21.815889] sd 0:0:0:0: [sda] tag#13 CDB: Read(10) 28 00 07 c4 0e 08 00 00 08 00
[ 21.815891] I/O error, dev sda, sector 130289160 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
[ 21.815911] ata1: EH complete
[ 21.876511] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[ 21.876520] ata1.00: irq_stat 0x40000001
[ 21.876526] ata1.00: failed command: READ DMA
[ 21.876528] ata1.00: cmd c8/00:08:08:0e:c4/00:00:00:00:00/e7 tag 27 dma 4096 in
res 51/40:08:08:0e:c4/00:00:00:00:00/e7 Emask 0x9 (media error)
I then installed smartctl:
Code:
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-17-amd64] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Device Model: WD Blue SA510 2.5 250GB
Serial Number: 22464U800081
LU WWN Device Id: 5 001b44 8bd7281be
Firmware Version: 52020100
User Capacity: 250,059,350,016 bytes [250 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
TRIM Command: Available, deterministic
Device is: Not in smartctl database 7.3/5319
ATA Version is: ACS-4, ACS-2 T13/2015-D revision 3
SATA Version is: SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Fri Jan 19 19:52:12 2024 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 48) A fatal error or unknown test error
occurred while the device was executing
its self-test routine and the device
was unable to complete the self-test
routine.
Total time to complete Offline
data collection: ( 0) seconds.
Offline data collection
capabilities: (0x71) SMART execute Offline immediate.
No Auto Offline data collection support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 10) minutes.
Conveyance self-test routine
recommended polling time: ( 1) minutes.
SMART Attributes Data Structure revision number: 0
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
5 Reallocated_Sector_Ct 0x0032 100 100 010 Old_age Always - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 316
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 14
165 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 9994578627391
166 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 107
167 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 15
168 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 153
169 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 50
170 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0
171 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0
172 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0
173 Unknown_Attribute 0x0032 100 100 005 Old_age Always - 130
174 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0
184 End-to-End_Error 0x0032 100 100 097 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 418
188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0
194 Temperature_Celsius 0x0022 100 100 014 Old_age Always - 29 (Min/Max 18/45)
199 UDMA_CRC_Error_Count 0x0032 100 100 000 Old_age Always - 0
230 Unknown_SSD_Attribute 0x0032 100 100 000 Old_age Always - 3655155712851
232 Available_Reservd_Space 0x0033 100 100 004 Pre-fail Always - 95
233 Media_Wearout_Indicator 0x0032 100 100 000 Old_age Always - 31008
234 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 17794
241 Total_LBAs_Written 0x0030 253 253 000 Old_age Offline - 16729
242 Total_LBAs_Read 0x0030 253 253 000 Old_age Offline - 8823
244 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0
SMART Error Log Version: 0
Warning: ATA error count 418 inconsistent with error log pointer 5
ATA Error Count: 418 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 418 occurred at disk power-on lifetime: 316 hours (13 days + 4 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 e8 26 db ee Error: UNC at LBA = 0x0edb26e8 = 249243368
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 08 e8 26 db ee 08 23:00:54.750 READ DMA
47 00 01 00 00 00 a0 08 23:00:54.750 READ LOG DMA EXT
47 00 01 30 08 00 a0 08 23:00:54.750 READ LOG DMA EXT
47 00 01 30 00 00 a0 08 23:00:54.750 READ LOG DMA EXT
47 00 01 00 00 00 a0 08 23:00:54.750 READ LOG DMA EXT
Error 417 occurred at disk power-on lifetime: 316 hours (13 days + 4 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 e8 26 db ee Error: UNC at LBA = 0x0edb26e8 = 249243368
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 08 e8 26 db ee 08 23:00:54.630 READ DMA
c8 00 08 e0 26 db ee 08 23:00:54.630 READ DMA
c8 00 08 d8 26 db ee 08 23:00:54.630 READ DMA
c8 00 08 d0 26 db ee 08 23:00:54.630 READ DMA
c8 00 08 c8 26 db ee 08 23:00:54.630 READ DMA
Error 416 occurred at disk power-on lifetime: 316 hours (13 days + 4 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 f8 26 db ee Error: UNC at LBA = 0x0edb26f8 = 249243384
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 18 f8 26 db ee 08 23:00:54.525 READ DMA
47 00 01 00 00 00 a0 08 23:00:54.525 READ LOG DMA EXT
47 00 01 30 08 00 a0 08 23:00:54.525 READ LOG DMA EXT
47 00 01 30 00 00 a0 08 23:00:54.525 READ LOG DMA EXT
47 00 01 00 00 00 a0 08 23:00:54.525 READ LOG DMA EXT
Error 415 occurred at disk power-on lifetime: 316 hours (13 days + 4 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 d8 e8 26 db ee Error: UNC 216 sectors at LBA = 0x0edb26e8 = 249243368
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 e8 10 26 db ee 08 23:00:54.415 READ DMA
c8 00 00 78 1b db ee 08 23:00:54.415 READ DMA
c8 00 00 90 19 db ee 08 23:00:54.415 READ DMA
c8 00 00 20 14 db ee 08 23:00:54.415 READ DMA
c8 00 a8 78 13 db ee 08 23:00:54.415 READ DMA
Error 414 occurred at disk power-on lifetime: 316 hours (13 days + 4 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 a8 2a db ee Error: UNC at LBA = 0x0edb2aa8 = 249244328
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 08 a8 2a db ee 08 23:00:42.675 READ DMA
ca 00 08 d8 0a e3 e0 08 23:00:42.675 WRITE DMA
47 00 01 00 00 00 a0 08 23:00:42.675 READ LOG DMA EXT
47 00 01 30 08 00 a0 08 23:00:42.675 READ LOG DMA EXT
47 00 01 30 00 00 a0 08 23:00:42.675 READ LOG DMA EXT
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 30% 316 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
He's only had this WD SSD for a couple of months. Do all these signs point to a drive that's about to die?
|
|
|
01-19-2024, 08:21 PM
|
#2
|
LQ Guru
Registered: Jan 2006
Location: Virginia, USA
Distribution: Slackware, Ubuntu MATE, Mageia, and whatever VMs I happen to be playing with
Posts: 19,885
|
I'd recommend booting to a Live CD/USB of something and running fsck on that drive.
You might also want to take a look at smartmontools.
|
|
|
01-19-2024, 08:43 PM
|
#3
|
Moderator
Registered: Aug 2002
Posts: 26,732
|
SMART and dmesg shows lots of DMA errors so I think the drive is bad but I don't know if it is going to die. If the attributes are correct you have used 5% of the disk life in a few months seems like a lot.
|
|
|
01-19-2024, 08:57 PM
|
#4
|
LQ Guru
Registered: Jan 2006
Location: Virginia, USA
Distribution: Slackware, Ubuntu MATE, Mageia, and whatever VMs I happen to be playing with
Posts: 19,885
|
It sounds as if frequent backups of crucial data would be a good action plan going forward.
|
|
|
01-20-2024, 01:11 AM
|
#5
|
Member
Registered: Jun 2020
Posts: 614
Rep: 
|
Just a thought (because I've seen this before in real machines): it could be the cable connecting the drive to the computer's motherboard internally. They're cheap. They also can fail. Agree with the suggestion to start backing things up if you still can, but I'd try swapping the cable out and see if it clears things up. May also try new cable + different port (if this is a SATA or PATA drive and you can do so).
|
|
|
01-20-2024, 03:02 AM
|
#6
|
Senior Member
Registered: Jul 2020
Posts: 1,513
|
Key points are
Code:
[ 21.460516] ata1.00: cmd c8/00:20:90:18:c4/00:00:00:00:00/e7 tag 1 dma 16384 in
res 51/40:20:90:18:c4/00:00:00:00:00/e7 Emask 0x9 (media error)
in syslog and
Code:
Error 417 occurred at disk power-on lifetime: 316 hours (13 days + 4 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 e8 26 db ee Error: UNC at LBA = 0x0edb26e8 = 249243368
in SMART log. Not good.
|
|
|
01-20-2024, 08:21 AM
|
#7
|
Senior Member
Registered: Jan 2007
Location: Wild West Wales, UK
Distribution: Linux Mint 22 MATE, Peppermint OS-Devuan, EndeavourOS, antiX
Posts: 4,355
|
road hazard,
Long test (probably less than 2 hours):
Code:
sudo smartctl -t long /dev/sda
To view the results:
Code:
sudo smartctl -a /dev/sda
Read the output and if items #5 or # 197 are not zero, then you could have a problem with that drive.
|
|
|
01-21-2024, 12:34 AM
|
#8
|
LQ Guru
Registered: Aug 2016
Location: SE USA
Distribution: openSUSE 24/7; Debian, Knoppix, Mageia, Fedora, OS/2, others
Posts: 6,496
|
Blue is WD's bargain basement product line, 2 year warranty instead of 3 or 5 or longer. Everything else from WD in equivalent form factor is better. I concur with beachboy2. Run long test, then plan accordingly.
|
|
|
01-21-2024, 05:47 AM
|
#9
|
Member
Registered: Apr 2019
Location: Esbjerg
Distribution: Windows 7...
Posts: 773
|
Quote:
Originally Posted by frankbell
It sounds as if frequent backups of crucial data would be a good action plan going forward.
|
Seing that a 250 GB SSD cost nothing these days, I would say good action is a replacement...
And get a couple or three while you're at it... 
|
|
|
01-30-2024, 09:16 AM
|
#10
|
Member
Registered: Nov 2015
Posts: 262
Original Poster
Rep: 
|
Quote:
Originally Posted by frankbell
I'd recommend booting to a Live CD/USB of something and running fsck on that drive.
You might also want to take a look at smartmontools.
|
Thanks for the reply! I think we figured out the problem. My friend sent me a picture of the drive and the end that goes into the NUC's motherboard is a ribbon cable and it's SLIGHTLY pulled up on one side. He's going to re-seat it soon and I'll try fsck. It's got to be that or the drive is just defective because we reinstalled Debian onto the NVME drive and it FLEW. When we installed Debian originally on the SATA drive, it took -FOREVER-. Like a 50rpm platter drive forever.
|
|
|
01-30-2024, 09:21 AM
|
#11
|
Member
Registered: Nov 2015
Posts: 262
Original Poster
Rep: 
|
Quote:
Originally Posted by michaelk
SMART and dmesg shows lots of DMA errors so I think the drive is bad but I don't know if it is going to die. If the attributes are correct you have used 5% of the disk life in a few months seems like a lot.
|
See my post below. It's looking like the drive wasn't connected properly..... or it's just plain defective.
Quote:
Originally Posted by obobskivich
Just a thought (because I've seen this before in real machines): it could be the cable connecting the drive to the computer's motherboard internally. They're cheap. They also can fail. Agree with the suggestion to start backing things up if you still can, but I'd try swapping the cable out and see if it clears things up. May also try new cable + different port (if this is a SATA or PATA drive and you can do so).
|
This is what it's looking like. He sent me a pic and the ribbon cable that goes to the motherboard for the SATA connection is slightly pulled up on one end. He's going to re-seat it and we'll check the drive again soon.
Also, thanks to everyone else that chimed in!
|
|
|
All times are GMT -5. The time now is 05:23 PM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|