PLEASE HELP - Smart, DD, FSCK, millions of unattached inodes

deriklogov1983 · 10-11-2014, 05:52 PM

Need help , short story:
* Smartctl start showing errors on 4TB drive
* I decide to do clone drive with DD
* DD was working for 28 hours, at the end it shows Not Enough Space but print out that (4.0T) copied
* After reboot from clone drive it ask to do FSCK
* I run fsck with attribute Y , to auto fix all problems
* FSCK is working for more than 40 Hours
with following content:

Unattached node xxxxxxxxx
Connect to /lost+found? yes

-It looks like its going thru millions of those innodes.
-I dont know how long it will take to complete as I run fsck without -C option.

Hard drive had about 700G information with millions of small files (Total Size of hard drive is 4TB)

Please suggest me, if its worth to wait for FSCK to complete or it looks more like new clone drive is corrupted and its better to cancel
fsck and try to install OS on new drive and try to copy files directly ? I was really hoping to do clone drive so I dont have to reinstall
many small programs and do necessary configurations.

Thank You very much for any feedbacks or suggestions

jailbait · 10-11-2014, 06:49 PM

I suggest that you format the partition and copy the files across that you want to put on the drive. The file system is completely buggered which is fairly common when you use dd. When fsck ends you will have a jumble of directories in lost+found which would take you weeks to sort out by hand.

------------------
Steve Stites

unSpawn · 10-11-2014, 07:03 PM

Quote:

Originally Posted by deriklogov1983

Smartctl start showing errors on 4TB drive

Exactly what kind of errors?

Quote:

Originally Posted by deriklogov1983

I decide to do clone drive with DD

If SMART data shows the disk is failing then you better use ddrescue or dd_rescue (see their respective manual pages for more nfo) or, as jailbait suggested just try to copy over the files you can salvage.

Quote:

Originally Posted by deriklogov1983

DD was working for 28 hours, at the end it shows Not Enough Space but print out that (4.0T) copied

Depending on the file system used it may, or may not, "just" affect the tail of the partition. As with the "showing errors" part, the more verbose you are, the more exact nfo you share the better.

Quote:

Originally Posted by deriklogov1983

I was really hoping to do clone drive so I dont have to reinstall many small programs and do necessary configurations.

If the data was unique and valuable then I'd go through the motions salvaging whatever possible but if it's only installations and configuration then with all due respect I'd wouldn't. Anyway, now you've found one reason why people make backups.

deriklogov1983 · 10-11-2014, 07:53 PM

Here is the error from smartctl

S.M.A.R.T Errors on /dev/sda
From Command: /usr/sbin/smartctl -q errorsonly -H -l selftest -l error /dev/sda
ATA Error Count: 36 (device log contains only the most recent five errors)
Error 36 occurred at disk power-on lifetime: 11544 hours (481 days + 0 hours)
Error 35 occurred at disk power-on lifetime: 11544 hours (481 days + 0 hours)
Error 34 occurred at disk power-on lifetime: 11544 hours (481 days + 0 hours)
Error 33 occurred at disk power-on lifetime: 11544 hours (481 days + 0 hours)
Error 32 occurred at disk power-on lifetime: 11544 hours (481 days + 0 hours)
and first it starts with 1 , then every couple days it starts increasing.

So what do you think happens with FSCK ? why it found so many inodes on brand new drive ? And as it looks like that error with inode number is increasing by 1,
Inode 7216654 ref count is 2, should be1.
and next
Inode 7216655 ref count is 2, should be1.
So that inode number is increasing by 1 , so basically every inode.
Why is that, and what is happening ?

unSpawn · 10-11-2014, 08:43 PM

Quote:

Originally Posted by deriklogov1983

Code:

/usr/sbin/smartctl -q errorsonly -H -l selftest -l error /dev/sda

Using "-q errorsonly" effectively ensures no information is shown that could help us help you. With all due respect if you don't know what switches do or cause please first read the manual or don't use them.

Quote:

Originally Posted by deriklogov1983

So what do you think happens with FSCK ? why it found so many inodes on brand new drive ? And as it looks like that error with inode number is increasing by 1, Inode 7216654 ref count is 2, should be1. and next Inode 7216655 ref count is 2, should be1. So that inode number is increasing by 1 , so basically every inode. Why is that, and what is happening ?

What happens during a file system check roughly depends what you cloned (whole disk, partition) and what parts went missing, when you cloned it (Live system with open files in use or powered down), the type (journaling or not) and state of the file system ("dirty" flag set or not) and if it can check its integrity using its (backup) meta data.

The simplest way to proceed would be to power the rig down, then run fsck on the source disk just to make sure, use a disk that's the same (brand, type and) size or larger, boot a Live CD, and if the source disk is fsck / bad blocks / SMART OK(-ish) try cloning it then. Then compare images using piece-wise mode of 'md5deep' in say 100m or 1g blocks.

deriklogov1983 · 10-11-2014, 09:06 PM

that smartctl is copy pasted from cpanel Email notification.

Hard drive were cloned 1 to 1, so I cloned whole drive, not just partion.
I clone that drive from Live Cd, so no drives were in use during cloning.
All parts /sda1 /sda2 /sda3 were present after cloning.

so question still the same, why are so many inodes are unattached ?

unSpawn · 10-11-2014, 09:21 PM

Quote:

Originally Posted by deriklogov1983

why are so many inodes are unattached ?

Simply put common file systems store meta data centrally. Once the mapping between file and (backup) meta data is gone the file may still exist but the file system can't "place" it properly. Think of a file system like a tree: cut off one of the lower branches and everything attached to it will go as well. In that case running 'fsck' is not like trying to glue the structure back together but rather like trying to pin all leaves in the same location.

unSpawn · 10-11-2014, 09:23 PM

Quote:

Originally Posted by deriklogov1983

that smartctl is copy pasted from cpanel Email notification. (..) I clone that drive from Live Cd, so no drives were in use during cloning.

Your words, but I very much doubt you would be able to access a remote host running AND from a Live CD AND providing a web-based management panel...

deriklogov1983 · 10-11-2014, 09:37 PM

So because of so many inodes unattached, does it mean that structure corrupted and all files would be corrupted as well ? or "it found some old tree with old leaves" ? Is that normal number of unattached inodes after cloning drive ?
I want to understand if that fsck working for so many hours with so many inodes unattached , is that normal stuff or should i stop waste time and start from another side ?

I access server thru IPMI using iKVM , when I said drives were not in use I mean that drives were not mounted.

syg00 · 10-12-2014, 03:11 AM

Why the hell so much concern over fsck on a truncated (i.e. potentially invalid) backup ?.
What does a fsck on the original return ?.

unSpawn · 10-12-2014, 04:19 AM

Quote:

Originally Posted by deriklogov1983

So because of so many inodes unattached, does it mean that structure corrupted and all files would be corrupted as well ? or "it found some old tree with old leaves" ? Is that normal number of unattached inodes after cloning drive ?

I see I shouldn't use analogies ;-p and no it is not normal.

Quote:

Originally Posted by deriklogov1983

I want to understand if that fsck working for so many hours with so many inodes unattached , is that normal stuff or should i stop waste time and start from another side ?

Yes, like I said stop the fsck and start over again.

rknichols · 10-12-2014, 09:19 AM

Quote:

Originally Posted by deriklogov1983

* Smartctl start showing errors on 4TB drive
* I decide to do clone drive with DD

Exactly what options did you use with dd? If there are I/O errors from the drive, dd will give up after the first one. That might tempt you to use the "conv=noerror" (continue after read errors) option, but that is the wrong thing to do and will result in a massively corrupted filesystem image at the destination.

For copying from a drive that has I/O errors, ddrescue is the proper tool. It will deal intelligently with those errors.