LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices


Reply
  Search this Thread
Old 05-26-2016, 08:45 PM   #1
Red Squirrel
Senior Member
 
Registered: Dec 2003
Distribution: Mint 20.1 on workstation, Debian 11 on servers
Posts: 1,336

Rep: Reputation: 54
Main file server died, took out entire network, do I have any chance of saving it?


For some reason my UPS did not kick on properly and I just lost my file server during a routine power outage (had to turn off main breaker to check something). Now it just sits at a blinking cursor. Won't boot. I really really really don't want to have to completely reinstall and have to reconfigure everything. I have backups, but it's still a huge royal pain if I have to go through everything, as there's so much config files spread all over, I won't even remember everything off hand. It's not like I can just hit a button and restore everything. I'm hoping the actual raid arrays arn't going to be corrupt, but that's my fear.

OS is CentOS 6.4

Do I have any chance of saving this server? Anything special I can do with a boot CD or something?

Edit: have access to file system now, so it's a good start, but hoping to figure out how to get it to boot now.

Last edited by Red Squirrel; 05-27-2016 at 01:08 PM.
 
Old 05-26-2016, 08:49 PM   #2
Emerson
LQ Sage
 
Registered: Nov 2004
Location: Saint Amant, Acadiana
Distribution: Gentoo ~amd64
Posts: 7,661

Rep: Reputation: Disabled
Boot SystemRescueCD (CD or USB stick) and start checking the hard drives. If it boots. You say it won't POST?
 
Old 05-26-2016, 08:53 PM   #3
Red Squirrel
Senior Member
 
Registered: Dec 2003
Distribution: Mint 20.1 on workstation, Debian 11 on servers
Posts: 1,336

Original Poster
Rep: Reputation: 54
It POSTS, I see regular bios text and stuff but then it just goes to a screen with blinking cursor. A long time ago I accidentally screwed up a mv /folder command and did mv / and I caught it on time but a bunch of system stuff got moved. I was able to move it back, but lot of that stuff needs to be on a specific physical location on the hard drive in order to boot, so I have a feeling it has to do with that. Is there some kind of repair I can run or something?

As a start I'm going to see if I can mount the raid arrays, if I can at least confirm the data is ok it will make me feel much better. Problem will be trying to assemble 3 raid arrays and know what drive is which. I have a spreadsheet... on the file server.

I'm going to end up having to dig through backups either way I think.
 
Old 05-26-2016, 09:41 PM   #4
jefro
Moderator
 
Registered: Mar 2008
Posts: 22,001

Rep: Reputation: 3629Reputation: 3629Reputation: 3629Reputation: 3629Reputation: 3629Reputation: 3629Reputation: 3629Reputation: 3629Reputation: 3629Reputation: 3629Reputation: 3629
Almost any current "live" media could be used to see what state the drive is in. Your problems may different than the drive. It could be any number of failures.

For a number of years, distros have made their installation live. In fact they don't even bother to mention it now. See if you can't get a live Centos https://wiki.centos.org/Download
 
Old 05-26-2016, 09:48 PM   #5
Red Squirrel
Senior Member
 
Registered: Dec 2003
Distribution: Mint 20.1 on workstation, Debian 11 on servers
Posts: 1,336

Original Poster
Rep: Reputation: 54
Some good news. With a CentOS rescue CD (well a virtual ISO actually) I was able to mount the raid arrays so at least it's one thing less to worry about. Though that does not really mean nothing is corrupted, just that the arrays themselves are fine. The VMs may very well be corrupted given the VM server was also running and the data stores were basically pulled from under the rug so to speak. One of the arrays is resyncing though. I'm going to just leave it alone till that's done.

I've chrooted to the mount point created by the rescue CD and have network access. SSH beats crowching in the hot aisle trying to use keyboard in that tight space. I really need to get a KVM console, but even a 1 port is like over a grand.

I'm kinda wondering if I can just start all the services from here... Though technically it's probably running on the old kernel that's on the CD, I imagine that could be problematic. Either way I have to wait for that array to resync before I do anything. Any tips on what I may be able to do from here to get the OS to boot?

I have to say mdadm raid is super resilient though, it's never let me down.
 
Old 05-26-2016, 09:50 PM   #6
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,141

Rep: Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123
The blinking cursor is grub telling you it can't find the next stage. Like the MBR is there, but no-one else is home. Probably not good - is your boot partition (if you have one) also RAID'd ?.
Are we talking mdadm (software) RAID here ?. If you are lucky (smart) and they were created with current metadata, they should assemble correctly automagically. Are you feeling lucky ?.

Ahhh, gotta learn to type faster.

Last edited by syg00; 05-26-2016 at 09:52 PM.
 
Old 05-26-2016, 10:01 PM   #7
Red Squirrel
Senior Member
 
Registered: Dec 2003
Distribution: Mint 20.1 on workstation, Debian 11 on servers
Posts: 1,336

Original Poster
Rep: Reputation: 54
The OS is not raided, I always felt trying to raid the OS is adding too much complexity, as it's a chicken and egg scenario, the raid has to be available for OS to be seen, but the OS has to be running for the raid to run... I heard it can be done by having two boot partitions and some kind of mdadm preloader, just never looked too deeply into it. So that said the OS is just on a single SSD as I figure it's unlikely to fail randomly like a single spindle drive could. (it's not an OCZ :P ) The /boot is on a separate partition on that drive.

I have access to the server through SSH now, so I can do anything that may need to be done to get it to boot.

Oh, and I can confirm OS is 6.4 and not 6.5.
 
Old 05-26-2016, 11:16 PM   #8
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,141

Rep: Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123
Get out of the chroot and fsck everything in sight. Starting with the /boot. Force the fsck, but don't auto-reply; I always want to know what is broken, and how many. Even if I don't understand it. Gives me some gut feel as to whether I should really be restoring the whole filesystem rather than trust it again.
 
Old 05-26-2016, 11:27 PM   #9
Red Squirrel
Senior Member
 
Registered: Dec 2003
Distribution: Mint 20.1 on workstation, Debian 11 on servers
Posts: 1,336

Original Poster
Rep: Reputation: 54
I can try that, but I don't suspect this is a file system issue, though it won't hurt to check anyway as a fsck has not been run in like 1-2 years. Same with the raid I should probably run one on all the file systems.

I'm thinking the issue has to do with my mv / mishap I did several years back. The files were moved back where they go, but I'm wondering if there are some attributes that are wrong, or something along those lines. Is there some kind of tool I can run that will fix all that? Like /boot, the MBR etc.
 
Old 05-27-2016, 12:18 AM   #10
Red Squirrel
Senior Member
 
Registered: Dec 2003
Distribution: Mint 20.1 on workstation, Debian 11 on servers
Posts: 1,336

Original Poster
Rep: Reputation: 54
Oh and it might help to add, I get the blinking cursor before I get to the grub menu. It's set to automatically go to first option after 5 seconds, but that never comes up. So the issue is with grub, most likely.

Actually another thing, the grub boot file refers to (hd0,0) but the first hard drive actually ends up being one of the storage drives, they show up first, and the internal SSD is at the end (sdv). Could this be an issue? I don't really want to map to sdv as I'll be in the same boat if I add more drives to the system. Is there a way to make it use the GUID?

Last edited by Red Squirrel; 05-27-2016 at 12:55 AM.
 
Old 05-27-2016, 03:54 AM   #11
Emerson
LQ Sage
 
Registered: Nov 2004
Location: Saint Amant, Acadiana
Distribution: Gentoo ~amd64
Posts: 7,661

Rep: Reputation: Disabled
Linux kernel can mount by PARTUUID.
 
Old 05-27-2016, 05:51 AM   #12
jpollard
Senior Member
 
Registered: Dec 2012
Location: Washington DC area
Distribution: Fedora, CentOS, Slackware
Posts: 4,912

Rep: Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513
Quote:
Originally Posted by Red Squirrel View Post
Oh and it might help to add, I get the blinking cursor before I get to the grub menu. It's set to automatically go to first option after 5 seconds, but that never comes up. So the issue is with grub, most likely.

Actually another thing, the grub boot file refers to (hd0,0) but the first hard drive actually ends up being one of the storage drives, they show up first, and the internal SSD is at the end (sdv). Could this be an issue? I don't really want to map to sdv as I'll be in the same boat if I add more drives to the system. Is there a way to make it use the GUID?
That is more like the BIOS lost identification of the disk to boot.
 
Old 05-27-2016, 12:53 PM   #13
Red Squirrel
Senior Member
 
Registered: Dec 2003
Distribution: Mint 20.1 on workstation, Debian 11 on servers
Posts: 1,336

Original Poster
Rep: Reputation: 54
It's always been that way, I have 3 HBAs, for whatever reason they are always first, then the onboard is last. The OS drive is onboard. The BIOS sees it.

I'm pretty sure the reason it won't boot is because of a screw up I did once involving the mv command many years ago. I knew if I was to reboot that machine I'd be in trouble, which is why I spent over a grand in UPS batteries... but it failed me. I need to look at dual conversion but that's a couple more grand that I don't have to spend. I just need to know how do I go about repairing the MBR, but also how to modify the grub file so it refers to the GUID and not the actual letter, as that will change at times. This is what the grub.conf looks like:

Code:
# grub.conf generated by anaconda
#
# Note that you do not have to rerun grub after making changes to this file
# NOTICE:  You have a /boot partition.  This means that
#          all kernel and initrd paths are relative to /boot/, eg.
#          root (hd0,0)
#          kernel /vmlinuz-version ro root=/dev/mapper/vg_isengard-lv_root
#          initrd /initrd-[generic-]version.img
#boot=/dev/sda
default=0
timeout=5
splashimage=(hd0,0)/grub/splash.xpm.gz
hiddenmenu
title CentOS (2.6.32-573.22.1.el6.x86_64)
        root (hd0,0)
        kernel /vmlinuz-2.6.32-573.22.1.el6.x86_64 ro root=/dev/mapper/vg_isengard-lv_root rd_NO_LUKS LANG=en_US.UTF-8 rd_NO_MD rd_LVM_LV=vg_isengard/lv_swap SYSFONT=latarcyrheb-sun16 crashkernel=128M rd_LVM_LV=vg_isengard/lv_root  KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM rhgb quiet selinux=0
        initrd /initramfs-2.6.32-573.22.1.el6.x86_64.img
title CentOS (2.6.32-573.3.1.el6.x86_64)
        root (hd0,0)
        kernel /vmlinuz-2.6.32-573.3.1.el6.x86_64 ro root=/dev/mapper/vg_isengard-lv_root rd_NO_LUKS LANG=en_US.UTF-8 rd_NO_MD rd_LVM_LV=vg_isengard/lv_swap SYSFONT=latarcyrheb-sun16 crashkernel=128M rd_LVM_LV=vg_isengard/lv_root  KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM rhgb quiet selinux=0
        initrd /initramfs-2.6.32-573.3.1.el6.x86_64.img
title CentOS (2.6.32-504.30.3.el6.x86_64)
        root (hd0,0)
        kernel /vmlinuz-2.6.32-504.30.3.el6.x86_64 ro root=/dev/mapper/vg_isengard-lv_root rd_NO_LUKS LANG=en_US.UTF-8 rd_NO_MD rd_LVM_LV=vg_isengard/lv_swap SYSFONT=latarcyrheb-sun16 crashkernel=128M rd_LVM_LV=vg_isengard/lv_root  KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM rhgb quiet selinux=0
        initrd /initramfs-2.6.32-504.30.3.el6.x86_64.img
title CentOS (2.6.32-504.23.4.el6.x86_64)
        root (hd0,0)
        kernel /vmlinuz-2.6.32-504.23.4.el6.x86_64 ro root=/dev/mapper/vg_isengard-lv_root rd_NO_LUKS LANG=en_US.UTF-8 rd_NO_MD rd_LVM_LV=vg_isengard/lv_swap SYSFONT=latarcyrheb-sun16 crashkernel=128M rd_LVM_LV=vg_isengard/lv_root  KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM rhgb quiet selinux=0
        initrd /initramfs-2.6.32-504.23.4.el6.x86_64.img
title CentOS (2.6.32-358.el6.x86_64)
        root (hd0,0)
        kernel /vmlinuz-2.6.32-358.el6.x86_64 ro root=/dev/mapper/vg_isengard-lv_root rd_NO_LUKS LANG=en_US.UTF-8 rd_NO_MD rd_LVM_LV=vg_isengard/lv_swap SYSFONT=latarcyrheb-sun16 crashkernel=128M rd_LVM_LV=vg_isengard/lv_root  KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM rhgb quiet selinux=0
        initrd /initramfs-2.6.32-358.el6.x86_64.img


I've also read about grub-install, but it fails when I try it, it says:

[root@localhost grub]# grub-install /dev/sdv
/dev/sdv does not have any corresponding BIOS drive.


I also found a path called dev/mapper/vg_isengard-lv_root which I think may be a static way to refer to /dev/sdv, so I tried that too and get the same error.

Is there something in linux equivalant to fdisk /mbr?

Last edited by Red Squirrel; 05-27-2016 at 12:54 PM.
 
Old 05-27-2016, 02:12 PM   #14
Red Squirrel
Senior Member
 
Registered: Dec 2003
Distribution: Mint 20.1 on workstation, Debian 11 on servers
Posts: 1,336

Original Poster
Rep: Reputation: 54
Ok so I managed to run grub-install. I had to pull out all the other drives so that it's only the internal drive, and it was now /dev/sda. I ran it, and now I get to the boot loader, and I see the progress bar showing that it's loaded. For some reason it says CentOS 6.7 when /etc/issue says 6.4, so not sure what that is about or if it matters. But now what happens is the progress bar goes to the end, then it just sits there forever. I can actually ping the machine, but can't SSH to it. So it's not exactly frozen, but still not booting fully. It does this with and without all the drives.

Edit: Ok so I found out I can hit esc to see what's happening. Getting a whole bunch of exportfs errors that it can't resolve hostnames... why? Why is this holding up the system from booting? The DNS server is in a VM, the VM's data store is on THAT server! Is there anything I can do to bypass this?

Last edited by Red Squirrel; 05-27-2016 at 02:15 PM.
 
Old 05-27-2016, 02:49 PM   #15
Red Squirrel
Senior Member
 
Registered: Dec 2003
Distribution: Mint 20.1 on workstation, Debian 11 on servers
Posts: 1,336

Original Poster
Rep: Reputation: 54
I may have gotten lucky, I totally forgot my old DNS server is still running, just the service was off. I started switching stuff back to that DNS server and was able to get into the file server.

Now to see what the damage is on my VMs, it's not looking too good as everything is locked right up, but now that I fixed DNS it is seeing the data stores at least...
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Saving the network config file... kopele-mopele Linux - Newbie 3 04-26-2012 08:36 PM
when i connect server i have to access entire lan network of server saravanakumar Linux - Networking 2 07-29-2011 01:03 AM
ntp server serving my entire network rjerina Linux - General 1 11-15-2005 04:00 PM
Raid problems after main system disk died lovedaddy Linux - Hardware 1 08-04-2005 03:45 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Server

All times are GMT -5. The time now is 12:29 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration