LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - General (https://www.linuxquestions.org/questions/linux-general-1/)
-   -   CentOS7 fails to boot: "Failed to mount /sysroot" (https://www.linuxquestions.org/questions/linux-general-1/centos7-fails-to-boot-failed-to-mount-sysroot-4175520910/)

thealmightyos 10-03-2014 12:20 AM

CentOS7 fails to boot: "Failed to mount /sysroot"
 
This was a fully functioning server up until this afternoon. Got a strange error about something tieing up the processor while in a ssh session. Could not get any commands though. Had my GF shut it down (power off) as there was nothing I could do remotely.

When I turned it back on it looked like it was booting normally but then I received an error message "Failed to mount /sysroot" and I got dropped into an emergency terminal. There is a log file it created. Tried to cat it but way too big. just see the trailing end where it starts failing on /sysroot. Can't retrieve the log to show here (yet) cause I left my rescue USB at work (not my brightest moment). I will have the logs to post tomorrow evening

I am going to work on it more in depth tomorrow, however if you guys could give me some ideas of what could be going on and where to look I would be thankful. I have never seen an error like that before.

Kustom42 10-03-2014 06:13 PM

This could potentially be a problem with the RAM.

This sounds like it is using intird to boot the system which attempts to mount /sysroot to your RAM and then look for the fstab as well as other steps. I am not an expert, nor do I want to be, on systemd but it sounds like that is where the hangup is happening.

If you cant get any other diagnostic info might be worth it to use just a single stick of known-good RAM and attempt to boot.



And my personal rant:

RHEL 6 FOR LIFE! BOYCOTT SYSTEMD! :D

thealmightyos 10-03-2014 06:34 PM

Quote:

Originally Posted by Kustom42 (Post 5248737)
This could potentially be a problem with the RAM.

This sounds like it is using intird to boot the system which attempts to mount /sysroot to your RAM and then look for the fstab as well as other steps. I am not an expert, nor do I want to be, on systemd but it sounds like that is where the hangup is happening.

If you cant get any other diagnostic info might be worth it to use just a single stick of known-good RAM and attempt to boot.



And my personal rant:

RHEL 6 FOR LIFE! BOYCOTT SYSTEMD! :D

thanks for the advice. I attempted to boot with a stick from another pc: same issue

I am getting the logs now. Might take me a bit. Haven't figured out how I am going to do this yet. Tempted to take the whole drive out and boot it in a VM to rule out any hardware issues. Anyway, rescue live being loaded onto a usb stick now...

EDIT: I have ruled Hardware Fault in RAM. SystemRescueCD (or USB in this case) is running completely from the RAM with no issues
EDIT: Having an issue mounting the BTRFS partition from inside SRCD. The exact error I am getting is: "couldn't mount because of unsupported optional features (40)"

Ztcoracat 10-03-2014 06:53 PM

If your suspicious that the RAM is bad run Memtest overnight and see if it passes.
http://www.memtest.org/

thealmightyos 10-03-2014 08:12 PM

1 Attachment(s)
Ram seems ok, but I do plan on running memtest tonight. Thank you for reminding me I don't have to download the whole UBCD for memtest lol.

I was finally able to grab that log file. I am posting it here as I am reading it. Have not gone over it all yet.

EDIT: It almost seems like it is having issues mounting that BTRFS partition. Any utilities that can check that type of file-system for errors? (asking this pre-google-foo)
EDIT: Post-google-foo: https://btrfs.wiki.kernel.org/index....ge/btrfs-check

EDIT: Excerpt from the log:
Code:

[    2.544512] Apollo kernel: ------------[ cut here ]------------
[    2.544553] Apollo kernel: WARNING: at fs/btrfs/inode.c:847 cow_file_range+0x438/0x450 [btrfs]()
[    2.544555] Apollo kernel: Modules linked in: btrfs xor zlib_deflate raid6_pq libcrc32c sd_mod sr_mod crc_t10dif cdrom crct10dif_common ata_generic pata_acpi radeon ata_piix i2c_algo_bit drm_kms_helper ttm libata drm r8169 mii i2c_core
[    2.544582] Apollo kernel: CPU: 1 PID: 271 Comm: mount Not tainted 3.10.0-123.8.1.el7.x86_64 #1
[    2.544583] Apollo kernel: Hardware name: MSI MS-7236/MS-7236, BIOS V12.3 07/17/2008
[    2.544585] Apollo kernel:  0000000000000000 00000000a01e708f ffff880036b71190 ffffffff815e237b
[    2.544588] Apollo kernel:  ffff880036b711c8 ffffffff8105dee1 ffff880036b71510 ffff880036b714fc
[    2.544590] Apollo kernel:  0000000000000000 000000000003ffff ffff88003cbc01e0 ffff880036b711d8
[    2.544593] Apollo kernel: Call Trace:
[    2.544600] Apollo kernel:  [<ffffffff815e237b>] dump_stack+0x19/0x1b
[    2.544604] Apollo kernel:  [<ffffffff8105dee1>] warn_slowpath_common+0x61/0x80
[    2.544607] Apollo kernel:  [<ffffffff8105e00a>] warn_slowpath_null+0x1a/0x20
[    2.544618] Apollo kernel:  [<ffffffffa02cf3d8>] cow_file_range+0x438/0x450 [btrfs]
[    2.544630] Apollo kernel:  [<ffffffffa02e2f59>] ? release_extent_buffer+0xa9/0xd0 [btrfs]
[    2.544643] Apollo kernel:  [<ffffffffa02e8e2f>] ? free_extent_buffer+0x4f/0xa0 [btrfs]
[    2.544654] Apollo kernel:  [<ffffffffa02cf74c>] run_delalloc_nocow+0x35c/0xa30 [btrfs]
[    2.544665] Apollo kernel:  [<ffffffffa02d0160>] run_delalloc_range+0x340/0x3a0 [btrfs]
[    2.544677] Apollo kernel:  [<ffffffffa02e5749>] ? find_lock_delalloc_range.constprop.43+0x1c9/0x1f0 [btrfs]
[    2.544688] Apollo kernel:  [<ffffffffa02defb8>] ? btrfs_get_token_64+0x68/0x100 [btrfs]
[    2.544700] Apollo kernel:  [<ffffffffa02e69b4>] __extent_writepage+0x324/0x780 [btrfs]
[    2.544704] Apollo kernel:  [<ffffffff81141e81>] ? find_get_pages_tag+0xe1/0x1a0
[    2.544716] Apollo kernel:  [<ffffffffa02e70a2>] extent_write_cache_pages.isra.30.constprop.48+0x292/0x410 [btrfs]
[    2.544728] Apollo kernel:  [<ffffffffa02e84fc>] extent_writepages+0x5c/0x90 [btrfs]
[    2.544739] Apollo kernel:  [<ffffffffa02ccff0>] ? btrfs_submit_direct+0x6c0/0x6c0 [btrfs]
[    2.544750] Apollo kernel:  [<ffffffffa02ca588>] btrfs_writepages+0x28/0x30 [btrfs]
[    2.544753] Apollo kernel:  [<ffffffff8114d65e>] do_writepages+0x1e/0x40
[    2.544756] Apollo kernel:  [<ffffffff81142ba5>] __filemap_fdatawrite_range+0x65/0x80
[    2.544758] Apollo kernel:  [<ffffffff81142c83>] filemap_fdatawrite_range+0x13/0x20
[    2.544769] Apollo kernel:  [<ffffffffa02e1cd9>] btrfs_wait_ordered_range+0x49/0x140 [btrfs]
[    2.544781] Apollo kernel:  [<ffffffffa0308b52>] __btrfs_write_out_cache+0x6a2/0x8c0 [btrfs]
[    2.544794] Apollo kernel:  [<ffffffffa03090bd>] btrfs_write_out_cache+0x8d/0xe0 [btrfs]
[    2.544803] Apollo kernel:  [<ffffffffa02b7eeb>] btrfs_write_dirty_block_groups+0x55b/0x650 [btrfs]
[    2.544814] Apollo kernel:  [<ffffffffa03351a7>] commit_cowonly_roots+0x15a/0x230 [btrfs]
[    2.544824] Apollo kernel:  [<ffffffffa02c824e>] btrfs_commit_transaction+0x44e/0x9d0 [btrfs]
[    2.544836] Apollo kernel:  [<ffffffffa02e8e2f>] ? free_extent_buffer+0x4f/0xa0 [btrfs]
[    2.544848] Apollo kernel:  [<ffffffffa030692d>] btrfs_recover_log_trees+0x3ed/0x4c0 [btrfs]
[    2.544860] Apollo kernel:  [<ffffffffa0304260>] ? replay_one_extent+0x6d0/0x6d0 [btrfs]
[    2.544870] Apollo kernel:  [<ffffffffa02c58dd>] open_ctree+0x17ed/0x1fb0 [btrfs]
[    2.544878] Apollo kernel:  [<ffffffffa029b9de>] btrfs_mount+0x63e/0x800 [btrfs]
[    2.544881] Apollo kernel:  [<ffffffff81149ec3>] ? free_pages+0x13/0x20
[    2.544885] Apollo kernel:  [<ffffffff8125860d>] ? selinux_sb_copy_data+0x14d/0x220
[    2.544888] Apollo kernel:  [<ffffffff811b3a59>] mount_fs+0x39/0x1b0
[    2.544891] Apollo kernel:  [<ffffffff811ce88f>] vfs_kern_mount+0x5f/0xf0
[    2.544899] Apollo kernel:  [<ffffffffa029b529>] btrfs_mount+0x189/0x800 [btrfs]
[    2.544901] Apollo kernel:  [<ffffffff81149ec3>] ? free_pages+0x13/0x20
[    2.544904] Apollo kernel:  [<ffffffff8125860d>] ? selinux_sb_copy_data+0x14d/0x220
[    2.544906] Apollo kernel:  [<ffffffff811b3a59>] mount_fs+0x39/0x1b0
[    2.544908] Apollo kernel:  [<ffffffff811ce88f>] vfs_kern_mount+0x5f/0xf0
[    2.544911] Apollo kernel:  [<ffffffff811d0c9e>] do_mount+0x24e/0xa30
[    2.544913] Apollo kernel:  [<ffffffff8114626e>] ? __get_free_pages+0xe/0x50
[    2.544915] Apollo kernel:  [<ffffffff811d1516>] SyS_mount+0x96/0xf0
[    2.544919] Apollo kernel:  [<ffffffff815f2a59>] system_call_fastpath+0x16/0x1b
[    2.544920] Apollo kernel: ---[ end trace d3cbc53aeb04b87c ]---


Ztcoracat 10-04-2014 12:29 AM

Your Welcome:-

This was the only link remotely close to the Warning you have in the output of that log.
http://www.spinics.net/lists/linux-btrfs/msg30844.html

I'm not familiar with what BTRFS is or what the fault tolerance means, sorry.

Maybe this WiKi will be of some use.
https://btrfs.wiki.kernel.org/index.php/Main_Page

Where you right in the middle of transferring a file from host to guest when you got that strange error about something tieing up the processor while in a ssh session?

Here's information of a BTRFS Repair tool-
http://www.phoronix.com/scan.php?pag...tem&px=MTA2MDI

I have some experience with Boot Information Script.
Its primary use is for troubleshooting booting problems.
This might help us to figure out what's going on.
When the script is done it will put a .txt file on your system.

http://sourceforge.net/projects/bootinfoscript/

thealmightyos 10-04-2014 01:42 PM

RAM Confirmed OK: 22 passes: 0 Errors

Working the BTRFS repair angle as I am unable to mount that partition by any means. Will also use the script. Will that script run fine from a rescue USB?

There was some activity between the database and a program my co-worker is currently writing when this all started. I am allowing him limited access to the database while he learns Java. He has been working on it for well over a month without incident so I did not think it was a possible cause. Could some communication between his program and MariaDB cause this?

Ztcoracat 10-04-2014 07:22 PM

Quote:

Originally Posted by thealmightyos (Post 5249027)
RAM Confirmed OK: 22 passes: 0 Errors

Working the BTRFS repair angle as I am unable to mount that partition by any means. Will also use the script. Will that script run fine from a rescue USB?

There was some activity between the database and a program my co-worker is currently writing when this all started. I am allowing him limited access to the database while he learns Java. He has been working on it for well over a month without incident so I did not think it was a possible cause. Could some communication between his program and MariaDB cause this?

I'm not sure if that script will run from a rescue USB.
When I ran that script it placed the .txt file in my Downloads directory.

I have never come across a program interfering with a data base but anything is possible. What program was your co worker using?

How old is the HDD?

I hope that btrfs check with the repair option works for you.

Was the partition changed or modified in any way? Resized?

Ztcoracat 10-04-2014 07:41 PM

I looked up the 2 warning in the output and I found this report.
Code:

2.544604] Apollo kernel:  [<ffffffff8105dee1>] warn_slowpath_common+0x61/0x80
[    2.544607] Apollo kernel:  [<ffffffff8105e00a>] warn_slowpath_null+0x1a/0x20

http://bugs.centos.org/view.php?id=5347

Sorry, I have never seen those warnings before.

thealmightyos 10-04-2014 10:16 PM

Quote:

Originally Posted by Ztcoracat (Post 5249122)
I'm not sure if that script will run from a rescue USB.
When I ran that script it placed the .txt file in my Downloads directory.

I have never come across a program interfering with a data base but anything is possible. What program was your co worker using?

How old is the HDD?

I hope that btrfs check with the repair option works for you.

Was the partition changed or modified in any way? Resized?

Hard drive is old, but did pass SMART tests. How old, I can not say without pulling it out and looking for a mfg date. The rig was not mine originally.

The program is of his own design. He is learning Java and wanted to learn how to access and write to a database. Very basic.

I really hope it works too. Have not had a chance to work on it today. I can only hope I get some time tomorrow.

Partition was not modified. I leave that sort of thing alone

Ztcoracat 10-05-2014 06:22 PM

Quote:

Originally Posted by thealmightyos (Post 5249156)
Hard drive is old, but did pass SMART tests. How old, I can not say without pulling it out and looking for a mfg date. The rig was not mine originally.

The program is of his own design. He is learning Java and wanted to learn how to access and write to a database. Very basic.

I really hope it works too. Have not had a chance to work on it today. I can only hope I get some time tomorrow.

Partition was not modified. I leave that sort of thing alone

Since the partition wasn't changed that rules out one of my suspicions of why your not able to mount the file system. But that's a good thing.

-:-When you have the time to go through the BTRFS repair let me know how it goes.-:-

thealmightyos 10-05-2014 08:26 PM

Quote:

Originally Posted by Ztcoracat (Post 5249401)
Since the partition wasn't changed that rules out one of my suspicions of why your not able to mount the file system. But that's a good thing.

-:-When you have the time to go through the BTRFS repair let me know how it goes.-:-

It is the filesystem. When I attempted "btrfs check --repair /dev/sda3" I got many "Checksum Verify Failed ######## wanted ####### got #######" and then "csum failed".
Leaving it alone to see if it does anything else.

I might just have to reinstall. And if that is the case I need to plan ahead better and figure out some redundancies. Never seen a FS just crap out like that and then can not be repaired. In fact, this is the first time I have heard of btrfs.

EDIT: yep, so it did eventually finish but it didn't "repair" anything. Just had a long list of blocks that did not match what it expected. Will try with other options like scrub from here: https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs

thealmightyos 10-05-2014 11:12 PM

Starting a new post because I am going in a new direction.

Data does not appear to be recoverable. All that was lost was some hard work I put into the configuration so it was not a major deal. In fact, this just might be a blessing disguise. I originally installed and configured this server just to see what the new CentOS7 could do kept using it after as I really did like the OS (sorry Kustom42, looks like that puts me in the systemd camp). What I should have done was install it with specific options and configure it with specific goals in mind. Specifically, I should have told my friend to use a developmental environment for his java experimenting. I am not pointing fingers but I am fairly certain he was messing with that tool of his when this whole thing started. Also, I wanted to tighten down security so I can only access the server via ssh and ftp with a cert and key.

What I am getting at is that this gives me the excuse to start over and plan my approach rather then going ad hoc and reacting to situations. This might be a single home server but I know I can do better.

If I can not figure out how to recover the filesystem in by this time tomorrow I will close this thread as "Solved: FS Corruption" and move on to reinstallation

Ztcoracat 10-05-2014 11:23 PM

Quote:

Originally Posted by thealmightyos (Post 5249450)
It is the filesystem. When I attempted "btrfs check --repair /dev/sda3" I got many "Checksum Verify Failed ######## wanted ####### got #######" and then "csum failed".
Leaving it alone to see if it does anything else.

I might just have to reinstall. And if that is the case I need to plan ahead better and figure out some redundancies. Never seen a FS just crap out like that and then can not be repaired. In fact, this is the first time I have heard of btrfs.

EDIT: yep, so it did eventually finish but it didn't "repair" anything. Just had a long list of blocks that did not match what it expected. Will try with other options like scrub from here: https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs

If I had to guess some files where open when you weren't able to do anything remotely when you had GF shut the system down.

I lost 30 days worth of files once because I had to partake in a not so graceful shutdown.
The system locked up on me and there was nothing I could do. The freeze damaged my files and I couldn't reboot-
Unfortunately, I had to reinstall the OS.

Any chance you have a CentOS Live CD?
With it you can Rescue Installed System-
http://isoredirect.centos.org/centos/6/isos/x86_64/
http://www.standalone-sysadmin.com/b...S-bootmenu.png

I asked because I wasn't sure if the Live USB has that option or not.

Sorry your going through this, I've been there and I know it's not fun at all-
I downloaded the CentOS Bible 'PDF' if I find anything worthy of repairing your fs I'll post what I find.

Hope the instructions on the Manpage of btrfs 'scrub' works for you.

Ztcoracat 10-05-2014 11:36 PM

Quote:

Originally Posted by thealmightyos (Post 5249480)
Starting a new post because I am going in a new direction.

Data does not appear to be recoverable. All that was lost was some hard work I put into the configuration so it was not a major deal. In fact, this just might be a blessing disguise. I originally installed and configured this server just to see what the new CentOS7 could do kept using it after as I really did like the OS (sorry Ztcoracat, looks like that puts me in the systemd camp). What I should have done was install it with specific options and configure it with specific goals in mind. Specifically, I should have told my friend to use a developmental environment for his java experimenting. I am not pointing fingers but I am fairly certain he was messing with that tool of his when this whole thing started. Also, I wanted to tighten down security so I can only access the server via ssh and ftp with a cert and key.

What I am getting at is that this gives me the excuse to start over and plan my approach rather then going ad hoc and reacting to situations. This might be a single home server but I know I can do better.

If I can not figure out how to recover the filesystem in by this time tomorrow I will close this thread as "Solved: FS Corruption" and move on to reinstallation

I totally get it-:)

-::-Blessings in disguise can be a good teacher and in some cases a lesson well learned.-::-

Running a program in the improper environment most likely creates undesired effects.
Tightening Security is always a plus when your running a enterprise.


All times are GMT -5. The time now is 03:53 PM.