LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Software (https://www.linuxquestions.org/questions/linux-software-2/)
-   -   The Hulk is Dead! (https://www.linuxquestions.org/questions/linux-software-2/the-hulk-is-dead-591168/)

jwhittum 10-11-2007 04:51 PM

The Hulk is Dead!
 
I am in a big jam. I am a windozer who has a Red Hat Linux 9 file server called "The Hulk". It is a Proliant 1600 server w/ 2 RAID Arrays. It was running SAMBA Server for my Windows network.

This system has run flawlessly since I set it up. It is a real beast. Windows hangs, and flops...Not Red Hat 9.

Two weeks ago it failed to boot. I got a GRUB on the screen and that was that. We had a power failure sometime before that and I noticed the LCD display had lost it's settings. Soon after that the rear fan failed. When I finally got it back up and running it said one of the drives failed. It booted once but I did not let it finish. I was going to get the bad drive out and fixed.

I backed it up a while ago before it choked but I really want to get the files back off of it because I didn't get the most recent copy and there were some I missed. I don't care about saving the OS. I will repair the drive and reload it later. Just need the files.

I have tried all sorts of methods to get inside this beast with no luck.

Is there a way to FTP into it while running Red Hat 9 linux rescue? I thought I could Filezilla into it but I cant get it to work.

PS - I am NEWBIE. I set this thing up 2 years ago and got it all working the way I wanted..Then forgot about it...DUOH! Idiot!

Any Help would be appreciated...John

GrapefruiTgirl 10-11-2007 05:09 PM

Provided you simply need to copy stuff out of the drive, so that you can replace it, why not

Either:
A) plug the bad drive into another computer,
or
B) Boot a LiveCD of something,

And then..

Copy away, whatever you like. OR, you could chroot into it if need be, but just copying the stuff from one drive (the bad one) to another should be easy enough.

EDIT -- Sorry, I was working on the assumption you had physical access to the machine.. Hummm.. This likely won't work then.

EDIT2 - Hey wait! Do you have access to it?

jschiwal 10-11-2007 05:25 PM

Does it use raid 5. If only one drive in each array is bad you could replace that drive, or simply reseat it an see if it rebuilds.

You may have had the raid controller go out. With the lcd display not working, perhaps even something on the motherboard blew. Another maybe more probable possibility is the power supply. This may be easier to replace.

On the other hand, if this was just a power outage and not a large power spike, since this server has been running for so long, maybe the battery for the bios went bad. They have a limited lifespan. It could be that the bios settings are screwed up as a result.

Also, try to determine if you have an actual drive defect or if the file system is corrupted. If it is the latter, then raid redundancy won't help you out because raid will dutifully rebuild the file system as it is.

----

Running your rescue system, are you able to mount the partions that contained the files you want to rescue? If so, your rescue disk should have an ftp client. You could from the server ftp them to another computer. If it has an sftp client (uses ssh), then you don't even need to set up an ftp service on another computer. The other computer only needs to have ssh setup.

Another option is to attach an external usb drive. Mount it and then copy the files over.

Good Luck.

jschiwal 10-11-2007 05:50 PM

Quote:

Originally Posted by GrapefruiTgirl (Post 2921284)
Provided you simply need to copy stuff out of the drive, so that you can replace it, why not

Either:
A) plug the bad drive into another computer,
or
B) Boot a LiveCD of something,

And then..

Copy away, whatever you like. OR, you could chroot into it if need be, but just copying the stuff from one drive (the bad one) to another should be easy enough.

EDIT -- Sorry, I was working on the assumption you had physical access to the machine.. Hummm.. This likely won't work then.

EDIT2 - Hey wait! Do you have access to it?

I believe that he is dealing with a couple raid 5 arrays in a subsystem (mounted in a cage). If he has a hardware problem other than a drive, maybe he could set a desktop computer next to it and pop the controller card with attached scsi cables into the desktop ( the cover off both ). But it isn't a simple matter of popping one drive out and into another computer. All 3 or 4 of the drives would need to be moved for each array without getting the order mixed up. There may be too high a probability of damaging the integrity of the raid 5 array. It can rebuild one drive, but if you damage a second, the array is lost.

GrapefruiTgirl 10-11-2007 06:05 PM

Quote:

Originally Posted by jschiwal (Post 2921309)
I believe that he is dealing with a couple raid 5 arrays in a subsystem (mounted in a cage). If he has a hardware problem other than a drive, maybe he could set a desktop computer next to it and pop the controller card with attached scsi cables into the desktop ( the cover off both ). But it isn't a simple matter of popping one drive out and into another computer. All 3 or 4 of the drives would need to be moved for each array without getting the order mixed up. There may be too high a probability of damaging the integrity of the raid 5 array. It can rebuild one drive, but if you damage a second, the array is lost.

Thanks for pointing this out to me, jschiwal ;-) I definitely know little about RAID setups. Does sound like more than a simple drive failure though huh? Specially along with the other electrical occurrences.

jschiwal 10-11-2007 06:26 PM

If the motherboard isn't fried, I think the battery is the first thing to check. I sounds like the server has been running for years without rebooting. I think that the batteries have a lifespan of about 3 years and are very easy to overlook. This could possibly cause the lcd problem, if the bios has random info saved. But with a power failure, there could have been a power spike or brownout associated with it, either before or after. I don't know if they were using a UPS or surge protector. I've had equipment damaged even with a UPS. Nothing like opening a damaged machine and seeing a burnt out hole where controller chips should be. Look at all of the cables that you have hooked up on the back of the machine. A lightning strike for example can travel through the ground wire of any one of them. The fans going out sounds like a bad sign. I don't know if the fans themselves were damaged, or if the powersupply is flacky.

If he is able to access the partitions from a rescue disk, then he is in good shape. It means that the raid arrays are working, even if one of the drives making up the array is bad. He may have a filesystem error in the /boot partition, or a damaged file such as the kernel or initrd file that is preventing booting up normally. If he can access the directories, that are being shared, then backing up those directories and then reinstalling the system ( with hopefully a newer version than RH 9 ) and finally restoring the files. I should remind him to backup some /etc/ files like smb.conf so that the share definitions are saved.

Backing them up by copying the shared directories to an external drive may have an advantage of making the contents accessible on the network by plugging in the external drive to another computer. This could be a temporary working solution allowing work to go on while the server is repaired or replaced.

sliding 10-14-2007 05:54 PM

Does the Hulk have some burned toes?
 
If your battery is still OK, you might have a look at the capacitors on the mobo.
It happens sometimes that capacitors start leaking, particularly this happens sometimes on cheaper mobo's. It can also happen on more expensive ones. Just look if you see any Electrolitic caps where the ends are no longer dead flat, may be showing some brownish dried stuff.
I have seen several PC's with disk problems at boot, where the cause of the failure was because of the mobo's caps.


All times are GMT -5. The time now is 03:26 AM.