LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   *urgent* hard drive going bad (https://www.linuxquestions.org/questions/linux-newbie-8/%2Aurgent%2A-hard-drive-going-bad-682630/)

davidstvz 11-11-2008 10:19 AM

*urgent* hard drive going bad
 
I have tape back ups and I may still be able to get the data off of the failing drive (if the retrieval doesn't cause it to fail completely). In any case, can someone walk me through the best process here. I'm freaking out a little (this would be my first major disaster since taking up this position).

The failing drive is a drive with user files, not an OS drive, so the OS is working fine. I don't have an extra drive bay, so I need to unmount one of the other user file drives (I have 3) first.

Then I need to copy files to a new drive, remove the old drive, mount the new copy as if it were the same directory and then I should be good to go. If the copy fails, then I restore the drive from tape.

My immediate problem is that unmount is failing due to "device busy". Should I use -f or is there a gentler way to stop the device?

davidstvz 11-11-2008 10:25 AM

I went ahead and unmounted the least important drive with -f I'll be attempting the mounting of a replacement for the bad one and copying the files (cp -pr cp -pr cp -pr or is that cp -rp ??

pwc101 11-11-2008 10:32 AM

You should be able to find out what's preventing you from umounting the disk with fuser:
Code:

prompt$ fuser -vm '/dev/your_disk'
If you get a result from that, then try ending whichever processes are locking the devices. Bear in mind that cd'ing into a directory on the disk and not cd'ing out of it will prevent umount from working, so you can't umount a drive from its mounted directory (/mnt/whatever or /media/whatever).

I'd try these before forcing the umount.

Other than that, I'm afraid I can't help.

pixellany 11-11-2008 10:32 AM

1. "urgent" does not make us move faster.

2. don't panic. As with many other things, the first instinct should be to do nothing. The exception might be--if you hear ugly noises--kill the power.

3. You can use the "lazy umount", but first try closing any window (including terminals) which is displaying the contents of the drive. I discovered just recently that I could get rid of the "device busy" message by simply CDing out of the directory.

4. If you are dealing with a hard crash---ie a mechanical failure--then think twice about attempting recovery yourself. The more you run the drive, the lower the odds of recovery.

5. Depending on the circumstances, my first instinct might be to clone the drive. The theory being to get all the data off--even if some of the filesystmem is corrupted.

pwc101 11-11-2008 10:34 AM

Quote:

Originally Posted by davidstvz (Post 3338386)
I'll be attempting the mounting of a replacement for the bad one and copying the files (cp -pr cp -pr cp -pr or is that cp -rp ??

The order of arguments with GNU cp is irrelevant, as far as I'm aware.

davidstvz 11-11-2008 10:43 AM

Ok, I appreciate any help even if *urgent* doesn't get anyone to answer more quickly. I just thought I would distinguish this thread from people just tinkering around :(

I don't think a drive recovery is an option for us as long as the tape back ups work fine (can't imagine why they wouldn't). We can't justify the cost. So I'm first going to try to copy the drive, and failing that I will go to tape back up.

Now, what I'm trying to say here is that I'm completely green. I don't even know how to go about mounting a new hard drive (I can get it done eventually, but not fast enough for this situation).

I guess I need to start with formatting the replacement drive. But I don't know the command.

davidstvz 11-11-2008 10:47 AM

By the way, this is actually BSD though I guess what I'm doing is low level enough that it should make little different. I need to format the new drive using ufs file system though.

Probably fdisk -something ufs

pwc101 11-11-2008 11:01 AM

If I were you (note: I'm not a sysadmin), I'd try and get the new drive formatted correctly in another machine in case you screw up, and nuke the wrong disk with the new filesystem; this is easily done. Then, you need to stick it in the machine you want to copy the data from, and mount it. The mount command for a UFS disk seems to be pretty complex (I've never used UFS, and it seems it's actually an umbrella term for a number of different filesystems (see man mount for more info)). I'd start with looking at /etc/fstab for the appropriate options to use with mount, given the disks your replacing are presumably also UFS?

I think you need a BSD guru to come along now...

davidstvz 11-11-2008 11:04 AM

Yeah, I'm starting to realize that. Well, looks like it's Google time!

davidstvz 11-11-2008 11:16 AM

Well, I figured out how to mount the drive using "sysinstall" which made life a lot easier. Time to start copying those files...

davidstvz 11-11-2008 11:30 AM

Ok... the copy operation is starting. I think the drive is mostly good, just certain parts of it aren't doing so well. Every now and then this bright white (the console text is generally gray) message relating to the bad disk pops up. It says something about some trouble reading or writing but that it recovered. The vast majority of files seem to be getting copied with no warnings at all.

I'm keeping my fingers crossed that once this copy is done I can have the drive mounted in place of the old drive and things will be 99% well.

davidstvz 11-11-2008 01:34 PM

[edit: see bottom]

The server doesn't want to start completely properly now. That is, it's getting stuck in the startup process (beginning at "Starting mountd"). If I ctrl+c, it will move along and doing that at a few more points the thing finishes booting up and seems to be working fine at a glance.

What could it be about mounting a new drive that is causing it to hang here?

I've commented out the line for the two drive bay I'm using for the swap and copy in the fstab file, so it ought to just start right up and ignore those drives right?

EDIT: oh, it was lack of Ethernet connection causing the hang. It finally unhung complaining about inability to do something over the network.

davidstvz 11-11-2008 02:41 PM

*whew*

Done for now. The data copied successfully (the vast majority of it) and the system is back up and running again.

The only thing I need now is to figure out how to mount a UFS file system in OpenSuse (unless I'm goign to install another copy of BSD which I'm looking to avoid). I can probably Google this.

Thanks, if nothing else, for a place to vent and for your patience.

pwc101 11-11-2008 03:19 PM

Glad it's all sorted :)

At least you had the tapes to fall back on. Keywords: backup, backup, backup! ;)

davidstvz 11-11-2008 04:31 PM

I know, I know, I know! I wish it were easier.

The only option is a tape that is too small (3 whole tapes are needed to back up everything) and the tape driver is apparently causing some weird behavior (I know it's not the drive as I have two drives that behave weird on this machine and fine on another machine). Yet I've tested it with small amounts of data and it works, but the weird behavior is worrisome.

As best as I can tell the driver is writing some dummy information after each tar archive on a tape. When I try to extract each tar file in sequence, every other tar command (the even commands) simply exits with no error message (it is apparently skipping past thee junk data). The reason I think there is junk data is because when I use zip mode, it reports some kind of junk data at the end of the tar (and apparently manages to skip past it as the odd/even behavior guys away when using zipped tars). I'm keeping my fingers crossed that this thing holds together until I get this damned machine migrated to new hardware at the end of the year).

...which is going to be fun in its own right since the new hardware uses some new serial scsi interface and the old uses some older scsi interface (many more pins). So I'll have to migrate over 200 GB of data over the network at 1.5 MB a second. Damn... that's going to take something like 48 hours. Unless I can find (and justify the purchase of) some hardware to aid in the transfer. What a mess!

By the way, thanks for caring. I can't wait to replace this BSD system with an OpenSuse system. :)


All times are GMT -5. The time now is 10:49 PM.