LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware
User Name
Password
Linux - Hardware This forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux?

Notices


Reply
  Search this Thread
Old 07-02-2021, 04:26 PM   #1
EvanRC
LQ Newbie
 
Registered: Jul 2021
Location: Middle of Northern Colorado, USA
Distribution: Ubuntu
Posts: 9

Rep: Reputation: Disabled
Exclamation How to recover data from MDADM RAID HDDs experiencing Buffer I/O errors and Target errors


Not a linux newbie, but new to these forums.

From March up until this July 2nd, the 4x 3.0 TB RAID5 system I had setup for my workplace was functioning fine. Ran on MDADM with four Seagate Barracuda drives (sdb, sdc, sdd and sde), interfacing to Ubuntu Server 20.04 Focal Fossa, kept well updated.

Recently, someone accidentally unplugged the power that it and the controlling server were hooked up to, despite the UPS specifically being for this purpose (small company). I got the server up and running fine, but the MDADM RAID didn't fare well - started simply as /dev/md0 not appearing.

When I ran mdadm --assemble --scan I got:

SDD and SDC returned four errors each (the very-close sectors separated with slashes):
Code:
blk_update_request: critical target error, dev sd*, sector 25879390758(400/608/629/628) op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
SDB returned that for just sectors 258790758(400/608).

I had assumed, "oh, maybe we just have to reboot and check them with smartctl!" However, they returned those again. SMARTCTL gave very limited information for /dev/sdb which appeared partially broken as well when compared with all the others. Notice that /dev/sde had no sector errors earlier? It came up with errors later (all of them did; look at post bottom), but I digress. SMARTCTL gave full SMART Attribute/Test/Event tables as well as more Feature and Device information than it did for /dev/sdb. I have the dumps available if needed but SDB was 'Disabled, frozen' while the others were 'Disabled, NOT FROZEN' for ATA Security.

I continued trying to figure out the problem by sifting through dmesg. When I'd run the Scan Assemble, each sector error was matched with the following:
Code:
sd 4:0:0:3: [sd*] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 4:0:0:3: [sd*] tag#0 Sense Key : Illegal Request [current]
sd 4:0:0:3: [sd*] tag#0 Add. Sense: Logical block address out of range
sd 4:0:0:3: [sd*] tag#0 CDB: Read(16) 88 00 00 00 E6 E6 E6 E6 00 00 00 00 0* 00 00
I'm not versed in this well enough to decipher that, but it looks bad.

Anyway, when I ran sudo debugfs /dev/sd* I got very similar messages, of which were basically constant across the drives.
Code:
debugfs: Bad magic number in super-block while trying to open /dev/sd*
blk_update_request: critical target error, dev sd*, sector 2589390758400 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
blk_update_request: critical target error, dev sd*, sector 2589390758400 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Buffer I/O error on dev sd*, logical block 126939695379200, async page read
blk_update_request: critical target error, dev sd*, sector 2589390758402 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Buffer I/O error on dev sd*, logical block 126939695379201, async page read
Buffer I/O error on dev sd*, logical block 126939695379202, async page read
Buffer I/O error on dev sd*, logical block 126939695379203, async page read
I looked around on the forum here about the I/O and on Ubuntu Forums about the Superblock error. The first of the two said to look in SMARTCTL for 'Commands leading to the command that caused the error,' but the SMARTCTL coughed up nothing of that sort. The latter had first suggested doing sudo fdisk -l /dev/sd* but gave no indication of their Disk model, oddly. Since the entire group of disks were in use for the RAID, the only "partition" was virtual, which may explain some of the debugfs output. I couldn't use mdadm -E /dev/md0 since the virtual device no longer existed, and using it on any of the disks gave the four sector errors previously described, but for all of them, including SDE and SDB.


From a few other forums, I tried a few other miscellaneous things which resulted similarly. So, this leads me to the buildup question(s) - can I recover anything from the RAID? Are the drives shot from the power loss, or just need some special repair tool? Or is it time to cut my losses and pull a complete wipe (the RAID had some complex but unused code, as well as old backups) of them and start anew?

I'm already close to being in over my head here, despite having some confidence in my CLI and Disk Management abilities. Any help would be greatly appreciated.
 
Old 07-03-2021, 04:36 AM   #2
lvm_
Senior Member
 
Registered: Jul 2020
Posts: 1,521

Rep: Reputation: 519Reputation: 519Reputation: 519Reputation: 519Reputation: 519Reputation: 519
Ok, first, using debugfs as well as other filesystem tools on raid members is definitely pointless and possibly dangerous. Second, if you are interested in determining what really happened to disks you have to make smartctl work and possibly scan disks with badblocks as well. But the most important thing is recovering the data, and the best way of doing it is to copy data from suspect disks to new media ASAP using ddrescue or similar tool, run mdadm -E on recovered raid members and find three with closest event counts, force-assemble the array out of them and scan filesystem and recently change files for errors.
 
Old 07-03-2021, 11:47 PM   #3
computersavvy
Senior Member
 
Registered: Aug 2016
Posts: 3,345

Rep: Reputation: 1486Reputation: 1486Reputation: 1486Reputation: 1486Reputation: 1486Reputation: 1486Reputation: 1486Reputation: 1486Reputation: 1486Reputation: 1486
Barracuda drives are not meant for raid, and most of them use the SMR recording tech which drastically slows writes. Losing power may easily have corrupted the file system.

First, try an assembly of the raid with 3 drives (try omitting sdb), and if you can get it running in degraded state then immediately backup the data before doing anything else. Do not allow the filesystem to be put into use before the backup is complete. Do an fsck on the filesystem as soon as it is operational and before the backup is done to try and fix any file system errors so they do not impact the backup.

On a raid array, file system errors on one device may be repeated on the others since the data is striped and it tries to remain consistent across all devices. You do not know where the actual error is from the file system side, although smartctl should be able to give the hardware errors for each device.
 
Old 07-06-2021, 10:02 AM   #4
EvanRC
LQ Newbie
 
Registered: Jul 2021
Location: Middle of Northern Colorado, USA
Distribution: Ubuntu
Posts: 9

Original Poster
Rep: Reputation: Disabled
The drives are enclosed in semi-hardware-based RAID enclosure. Completely unplugging the system and waiting for some time ended up completely restoring the RAID.

I appreciate the help, however.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Recover mdadm RAID after failure during RAID level change Caetel Linux - General 1 11-07-2013 10:38 PM
[SOLVED] Backup Daemons or mdadm RAID Across Internal and External HDDs? PehJota Linux - General 6 08-18-2010 07:44 AM
Can I have RAID 1 and RAID 5 with 3 HDDs? Akhran Linux - Newbie 3 10-13-2006 03:36 AM
What is the difference between the free buffer and buffer in the buffer hash queue? Swagata Linux - Enterprise 0 05-25-2006 11:57 PM
Accessing other HDDs/Booting from other HDDs Namatacka Ubuntu 2 05-07-2006 11:21 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware

All times are GMT -5. The time now is 07:33 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration