LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware
User Name
Password
Linux - Hardware This forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux?

Notices


Reply
  Search this Thread
Old 09-22-2016, 07:38 AM   #1
adrianmariano
Member
 
Registered: Dec 2004
Distribution: Ubuntu Yakkety
Posts: 193

Rep: Reputation: 15
tons of disk errors on samsung ssd after power cycle


My computer started emitting an alert tone and once I realized what was going on and looked inside, I realized that the CPU fan was off. Presumably it was the CPU temp alarm. I powered the machine down and let it cool for a few hours.

When I powered it back up, the fan appeared to be working normally (temp reported in the BIOS was 28), but my drive, a Samsung 850 Pro, had errors. Ran fsck. Rebooted. Ran a few system updates. Got errors. Then I rebooted again and fsck reported so many errors I couldn't get through it.

Code:
Ata3.00: status: { DRDY ERR }
Ata3.00: error: { ICRC ABRT }
Ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
Ata3.00: BMDMA stat 0x26
Ata3.00: cmd ca/00:08:e8:16:40/00:00:00:00:00/e7 tag 0 dma 4096 out
                Res 51/84:00:ef:16:40/00:00:00:00:00/e7 Emask 0x30 (host bus error)
Ata3.00: status: { DRDY ERR }
Ata3.00: error: { ICRC ABRT }
Blk_update_request: I/O error, dev sda, sector 121640680
Buffer I/O error on dev sda1, logical block 15204829, lost async page write
Ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
BMDMA stat 0x26
Failed command: WRITE DMA
Above is an example of some errors that were appearing during normal operation when trying to shut down (not while fsck was running).

So what does this mean? Is the disk dead?
 
Old 09-22-2016, 02:32 PM   #2
jefro
Moderator
 
Registered: Mar 2008
Posts: 21,980

Rep: Reputation: 3625Reputation: 3625Reputation: 3625Reputation: 3625Reputation: 3625Reputation: 3625Reputation: 3625Reputation: 3625Reputation: 3625Reputation: 3625Reputation: 3625
Not sure.

Could be cpu bad, ram bad, drive controller bad, drive bad or almost any other part.

Remove SSD from this system and place in known good and see if you can get OEM diags or run smart tools on it to start.

Can run memtest also on suspect system.
 
Old 09-29-2016, 04:53 AM   #3
business_kid
LQ Guru
 
Registered: Jan 2006
Location: Ireland
Distribution: Slackware, Slarm64 & Android
Posts: 16,292

Rep: Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322
In that sort of situation, the most reliable sign of fatal trouble is constant change. I would expect the cpu to show issues before the ssd. But as jefro said, it could be many things.
 
Old 09-29-2016, 06:46 AM   #4
adrianmariano
Member
 
Registered: Dec 2004
Distribution: Ubuntu Yakkety
Posts: 193

Original Poster
Rep: Reputation: 15
The SSD is by far the newest component on this box. I bought it in March. The CPU is from 2007, the motherboard is from 2011.

So I went ahead and got a new motherboard and CPU. I booted the new setup off an old hard drive (the one I used before I got the ssd) and I ran fsck on the ssd. There were hundreds of errors fixed. I tried to boot off the ssd and it gets a bunch of errors and gets stuck. (I made the mistake of running updates during this whole situation, which might explain why the OS would be corrupted, if fsck wasn't able to truly fix problems...but just made the drive consistent.)

So the question is: how do I know if the disk is actually bad vs. just corrupted, but still usable once repaired? I've read that smart may not be useful for SSDs. Is that true?

Is there a way to have ubuntu fix a corrupted installation?
 
Old 09-29-2016, 07:00 AM   #5
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,842

Rep: Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308
Quote:
how do I know if the disk is actually bad vs. just corrupted, but still usable once repaired
try to save its content (with dd or similar) and run some disk checks

Quote:
Is there a way to have ubuntu fix a corrupted installation?
I would prefer a full reinstall, but first save your personal/important data.
 
Old 09-29-2016, 07:17 AM   #6
adrianmariano
Member
 
Registered: Dec 2004
Distribution: Ubuntu Yakkety
Posts: 193

Original Poster
Rep: Reputation: 15
Of course, the first thing I did once I fsck was done was copy my user files and my system configuration (/etc) to another disk. I used cp, not dd to do this, and it happened without any errors...but I have no way of knowing if the data is corrupted or not.

What specific disk checks are meaningful for a samsung ssd? Is smart meaningful?
 
Old 09-30-2016, 01:36 AM   #7
business_kid
LQ Guru
 
Registered: Jan 2006
Location: Ireland
Distribution: Slackware, Slarm64 & Android
Posts: 16,292

Rep: Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322
Something that hasn't been mentioned yet is your choice of mount options . Please post them. Copy a huge file to free space and use sha1sum or some such to check how valid the copy was. Get everything valuable backed up. Strive to keeping the disk cool and lightly loaded.
 
Old 09-30-2016, 05:37 AM   #8
adrianmariano
Member
 
Registered: Dec 2004
Distribution: Ubuntu Yakkety
Posts: 193

Original Poster
Rep: Reputation: 15
The mount options are the defaults picked by the ubuntu installer, I believe:

Code:
# / was on /dev/sda1 during installation
UUID=14a8e190-ceb4-44a0-9de4-5d65cf0fd009 /               ext4    errors=remount-ro 0       1
The SMART diagnostics show no errors:
Code:
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%      3736         -
# 2  Short offline       Completed without error       00%      3735         -
I tried copying a 1.7GB file and testing with sha1sum. The sums match.

If I'm supposed to "strive to keep the disk cool and lightly loaded" then that implies that there's something wrong with the disk, in which case I should send it for warranty service.
 
Old 10-01-2016, 01:46 AM   #9
business_kid
LQ Guru
 
Registered: Jan 2006
Location: Ireland
Distribution: Slackware, Slarm64 & Android
Posts: 16,292

Rep: Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322
The "strive to keep the disk cool* was in the context of getting a backup. The mount options of interest are in /etc/fstab. Some mount options (,e.g. atime) cause excessive wear on SSDs and people who know stuff will make recommendations.
 
Old 10-01-2016, 06:09 AM   #10
adrianmariano
Member
 
Registered: Dec 2004
Distribution: Ubuntu Yakkety
Posts: 193

Original Poster
Rep: Reputation: 15
The mount options I quoted above are from the fstab file. I understand the default is now reltime, which is supposed to be OK for ssds. There is the question of swap utilization, which is controlled elsewhere.
 
Old 10-02-2016, 01:44 AM   #11
business_kid
LQ Guru
 
Registered: Jan 2006
Location: Ireland
Distribution: Slackware, Slarm64 & Android
Posts: 16,292

Rep: Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322
Sorry, that was off screen on my tablet and I missed it.


I think it is
Code:
relatime
and not relatime. You can also use noatime as a mount option. relatime has a write every 15 seconds, instead of 5 seconds in the default.


How stands the disk now?
 
Old 10-02-2016, 06:12 AM   #12
adrianmariano
Member
 
Registered: Dec 2004
Distribution: Ubuntu Yakkety
Posts: 193

Original Poster
Rep: Reputation: 15
Yeah, relatime is what it is.

I reinstalled ubuntu and everything seems to be fine. I just checked syslog to see if there was anything in there about the disk and I didn't see any disk errors. So far so good.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
PiDrive A Low Power Raspberry Pi mSATA Solid State Disk (SSD) onebuck Linux - Embedded & Single-board computer 1 07-22-2015 03:00 PM
Samsung SSD errors on working Linux load dabigboy Linux - Hardware 7 05-15-2015 04:01 PM
Tons of mearaid waiting and aborting errors in messages file anon091 Linux - Server 0 12-27-2011 11:16 AM
tons of errors, no dropped, indicate where to look for problem? geeyathink Linux - Wireless Networking 1 07-02-2006 04:32 PM
tons of DVD drive errors Li... Linux - Hardware 1 11-04-2004 04:06 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware

All times are GMT -5. The time now is 02:29 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration