[SOLVED] UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.

JoseKreif · 06-27-2016, 10:41 AM

I'm not skilled enough in Linux yet to fully understand what's needed to resolve this.

User tells me the server crashed and powered off do to a power outage.

It makes sense to me that the inconsistency is due to the server powering off unexpectedly.

I am only able to tell the user what to do over the phone. They are located to far away from me. So I'm left with two options.

Fix this over phone/email,
or they need to overnight the Harddrive to me.

I have attached an image the user got for me.

If user enters the root password, will that bring them to a shell term for doing various commands, or can they do it from this screen?

What is needed to resolve this, step by step?

Emerson · 06-27-2016, 10:44 AM

I'd boot the server with live Linux, such as SystemRescueCD (works from USB stick) and run fsck on all volumes. SystemRescueCD does not require root password.

JoseKreif · 06-27-2016, 10:47 AM

Quote:

Originally Posted by Emerson

I'd boot the server with live Linux, such as SystemRescueCD (works from USB stick) and run fsck on all volumes. SystemRescueCD does not require root password.

I doubt they have a SystemRescueCD, so they will likely have to send my their HDD then, right?

Emerson · 06-27-2016, 10:49 AM

It can be any live Linux CD or USB, the only requirement is user must be able to get the root shell.

JoseKreif · 06-27-2016, 11:07 AM

Quote:

Originally Posted by Emerson

It can be any live Linux CD or USB, the only requirement is user must be able to get the root shell.

Okay. I'll discuss this with manager. It sounds like the best option would be to have them overnight their drive along with their latest backup tape, and I will try to get things working again and overnight it back. I just hate to think that they will be without a their server for a couple days.

rknichols · 06-27-2016, 11:18 AM

Good grief, what the user is being asked to do is enter the root password, then run "fsck /dev/VolGroup00/LogVol00". If severe filesystem corruption is suspected, it is advisable to make an image backup of the affected LV first since there is a possibility that fsck will do things that make data recovery more difficult (Its job is to make the filesystem consistent -- sometimes that's at the expense of user data.), but a simple power interruption seldom causes that kind of problem for an ext2/3/4 filesystem.

JoseKreif · 06-27-2016, 11:39 AM

Quote:

Originally Posted by rknichols

Good grief, what the user is being asked to do is enter the root password, then run "fsck /dev/VolGroup00/LogVol00". If severe filesystem corruption is suspected, it is advisable to make an image backup of the affected LV first since there is a possibility that fsck will do things that make data recovery more difficult (Its job is to make the filesystem consistent -- sometimes that's at the expense of user data.), but a simple power interruption seldom causes that kind of problem for an ext2/3/4 filesystem.

Haha, I will instruct the user enter that. They backup the system to Tapes every night. So recovery is an option if worse gets worse.

Is there any options to use for fsck, like -y ? I think otherwise, they will spend some time entering y

Edit: fsck -y /dev/VolGroup00/LogVol00

rknichols · 06-27-2016, 12:54 PM

You can use "-y". I always hesitate to recommend that because of the possibility of fsck doing something crazy. Frequently, I do a preliminary run with "-n" just to get an idea of the scope of the problem, but expecting someone not familiar with fsck to make a judgement call based on the result, or for that matter knowing when not to respond "y" when fsck wants to correct something, just isn't realistic.

With a decent backup available, go for it!

JoseKreif · 06-27-2016, 02:17 PM

Quote:

Originally Posted by rknichols

You can use "-y". I always hesitate to recommend that because of the possibility of fsck doing something crazy. Frequently, I do a preliminary run with "-n" just to get an idea of the scope of the problem, but expecting someone not familiar with fsck to make a judgement call based on the result, or for that matter knowing when not to respond "y" when fsck wants to correct something, just isn't realistic.

With a decent backup available, go for it!

That's what I assumed. It can be really hard to assist over the phone and not be able to see the screen for myself. Hoping all works out. I sent the user some instructions on what to do. We still have the backup from the day before should things go south.

JoseKreif · 06-28-2016, 07:28 AM

I had user enter root password and run fsck -y /dev/VolGroup00/LogVol00.

I had user send a screenshot of the output. It all looked successful, so I had user run shutdown -r now. After that, user is excited and claims things are working again.

sundialsvcs · 06-28-2016, 10:48 AM

You can bring up the server (maybe ...) in "single-user mode."

And then, go buy a Uninterruptible Power Supply!

JoseKreif · 06-28-2016, 10:59 AM

Quote:

Originally Posted by sundialsvcs

And then, go buy a Uninterruptible Power Supply!

I agree. This was something I brought up to management yesterday. So we have ordered a UPS for them. It's important they have it to prevent this from happening again.
Most our locations have a UPS on their servers, this situation is different because it's a bit tricky due to certain situations

rknichols · 06-28-2016, 11:12 AM

The server should be fine. "Orphan inodes" are inodes for deleted files that are held open by a running process. It is not expected that they should survive a reboot. The only problem that fsck detected was that the list of orphan inodes had become corrupt, so there was nothing lost in the filesystem. Transactions that were in progress when processes were abruptly terminated would of course have been lost.

jpollard · 06-28-2016, 11:22 AM

Quote:

Originally Posted by rknichols

The server should be fine. "Orphan inodes" are inodes for deleted files that are held open by a running process. It is not expected that they should survive a reboot. The only problem that fsck detected was that the list of orphan inodes had become corrupt, so there was nothing lost in the filesystem. Transactions that were in progress when processes were abruptly terminated would of course have been lost.

Well, usually.

Orphan inodes are inodes that are not in a directory that is within the existing filesystem tree. The problem is that you can get orphan inodes if a directory gets corrupted - the files are perfectly valid, it is just that the only directory they were in has been corrupted, and possibly deallocated.

Whether the files should be deleted or not is not a decision that fsck can make automatically. Sometimes an orphan directory occurs (usually only on older filesystems (ext2 for instance), in which case the directory may be put in the lost+found directory - and all the files contained with are no longer orphaned. This is why the "-y" option sometimes doesn't work. Two valid recovery options... but no way to decide between them.

If this happens, just run the fsck again without the "-y" option, and decide on a file-by-file basis whether to keep or delete.

rknichols · 06-28-2016, 01:05 PM

Inodes that appear in the orphan inode list are those that were deliberately disconnected from the directory tree while the inode was still in use. Inodes that get lost because the directory entry that should point to them is missing are a different matter. Those get put in lost+found by fsck. A problem with the orphan inode list is a much simpler matter, and one recommendation for fixing that is simply to mount the filesystem, since the orphan inode list is automtically cleared when the filesystem is mounted.