LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software > Linux - Kernel
User Name
Password
Linux - Kernel This forum is for all discussion relating to the Linux kernel.

Notices


Reply
  Search this Thread
Old 04-12-2014, 12:58 AM   #1
BensonBear
LQ Newbie
 
Registered: Feb 2005
Posts: 25

Rep: Reputation: 1
3.13 kernels regularly crash in very early boot stage.


I posted a few weeks ago about this, but still have the same problem.

Any Fedora 19 or 20 3.13.x kernel crashes in early boot about 95% of the time. The rest of the time, the kernel boots and everything seems to work okay. All other kernels used since 2010 have worked fine.

Hardware: Intel i3 540 cpu, Asus P7H55-M PRO motherboard, 4x4G Corsair XMS3 Classic 1333 ram, NVidia 9800GT video card, Kingston SV300 ssd , Seagate Barracuda 7200.12 hd, Corsair CX430 power supply.

I compiled a 3.13.6 version from Fedora, enabled early printk messages (otherwise there are no messages), and by observing these printk's, tracing them up through the source, putting in more printk's etc., eventually generated my own backtrace of the crash, which seems to happen in essentially the same place each time. Here is the call sequence:

free_all_bootmem, called by
mem_init, called by
mm_init, called by
start_kernel.

As far as I can tell, the kernel at early boot sets up an ad hoc memeory management system, which at this point it is done with and tries to free, in preparation for the more general memory management system that will be used once the system is booted (linux seems to have an unnecessarily large amount of bootstrapping components, what also with the very large temporary init file system too, but I guess I just don't understand the need here).

I have posted about this at redhat bugzilla (https://bugzilla.redhat.com/show_bug.cgi?id=1082207)
but have not been getting much response (a suggestion that I run memcheck86+ which I have, at great length, to discover nothing).

It was also suggested there could be some race condition timing issue here, which seems a little strange to me since at this early stage, I don't know why there would even have to be more than one processor doing anything, or even more than one thread. However, there are spinlocks in the free_all_bootmem code where the memory is being freed, so perhaps that is possible. But really, why? I don't see what it would get you at this point.

Not that it matters. I am not all that interested in learning details of how the kernel operates, I am mostly just a user. But any ideas anyone can offer than increase my understanding would be good, even if they don't lead to a solution (and perhaps at some point they might) since then I will get *something* out of this affair! It seems that few people are experiencing this problem. But I am pretty sure my own specific hardware is not defective in any way that would cause this problem 95% of the time in the same place, yet once things are booted, they run okay. So I suspect it is some hardware idiosyncrasy that is not dealt with by the software that most people don't have (I recall that my last machine's (2003) motherboard implemented some DMA stuff incorrectly, and I had to talk to Alan Cox to find the software workaround. *He* called it defective hardware, but it was not defective in the sense of faulty memory, it was the failure to correctly implement some specification, something that could be worked around (and he did so) in software). I suspect something like that here).

This is pretty serious, since I won't be able to continue along with Linux if I cannot even regularly boot a kernel. Never seen this before in many years (the DMA stuff was serious, but not like this, and it was immediately understood and worked around).

Last edited by BensonBear; 04-12-2014 at 01:03 AM.
 
Old 04-12-2014, 01:07 AM   #2
Emerson
LQ Sage
 
Registered: Nov 2004
Location: Saint Amant, Acadiana
Distribution: Gentoo ~amd64
Posts: 7,661

Rep: Reputation: Disabled
Boot something else, like SytemRescueCd and see if the problem persists.
 
Old 04-12-2014, 01:14 AM   #3
BensonBear
LQ Newbie
 
Registered: Feb 2005
Posts: 25

Original Poster
Rep: Reputation: 1
Quote:
Originally Posted by Emerson View Post
Boot something else, like SytemRescueCd and see if the problem persists.
I am not sure what you mean. The problem is only with 3.13 kernels (and later). I have only one machine, I have been using Fedora kernels on it for many years, currently all 3.12.x ones boot fine all the time and I am using one right now. Fedora debug 3.13 kernels also boot all the time with no apparent problem (but are too slow to use). The problem does not exist with any of these. But of course on attempting reboot to 3.13's, it has not gone away, hence, it "persists".
 
Old 04-12-2014, 02:20 AM   #4
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,128

Rep: Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121
Quote:
Originally Posted by BensonBear View Post
I am not sure what you mean. The problem is only with 3.13 kernels (and later).
No, so far you've ony shown that Fedora 3.13 booting has a problem. Something different (and slower) may offer some evidence. Especially as the debug kernels work.
From the evidence you shown, I'd say a race is right up there as a possibility.
Remember too that Fedora use an initrd, so that may have to be considered. And of course a "Fedora kernel" has been patched, so the issue may not be evident in the upstream source tree. Have you tried compiling one of Linus' kernels ?.
 
Old 04-12-2014, 02:36 AM   #5
BensonBear
LQ Newbie
 
Registered: Feb 2005
Posts: 25

Original Poster
Rep: Reputation: 1
Quote:
Originally Posted by syg00 View Post
No, so far you've ony shown that Fedora 3.13 booting has a problem. Something different (and slower) may offer some evidence. Especially as the debug kernels work.
Right, my point in reply to the other poster was that since many many other kernels work just fine, including kernels closely related to the problematic ones, I don't see the point of "Boot[ing] [yet another, and less related] something else, like SytemRescueCd and see if the problem persists", because I know it does not "persist" in many cases already.

Quote:
From the evidence you shown, I'd say a race is right up there as a possibility.
Even if this is right at the beginning of the kernel boot? It is in the function "start_kernel", the first function coded in C that is ever called, I believe. Do you know where I can read something about early hardware booting on multi-core multi-threaded architectures that describe what different cores and threads are doing? Anything I have found so far does not seem to describe this (The books "Understanding the Linux Kernel" and "Understanding the Linux Virtual Memory Manager")

In particular, why does there have to be a spinlock at the free page code? What other code might be executing at that time that could be using the data structures being altered at that point?

Quote:
Remember too that Fedora use an initrd, so that may have to be considered
Do you mean an initramfs? Yes, don't most systems use this (although why it needs to contain so much is something I do not understand). In any case, I believe this crash happens before the initramfs is decompressed and mounted, doesn't it (in mm_init of start_kernel)?

Quote:
And of course a "Fedora kernel" has been patched, so the issue may not be evident in the upstream source tree. Have you tried compiling one of Linus' kernels ?.
Yes, but I have been focussed on the Fedora one, since it is the only distribution I have used for many years. I compiled a stock 3.13.7 from kernel.org as reported April 2 at the Redhat Bugzilla post mentioned above:

Quote:
Originally Posted by me at redhat bugzilla
I also compiled a stock 3.13.7 kernel from kernel.org, and put in it some printk's after each function call listed in my message above. It got through all of them okay, and then booted fine, ten times in a row. So it does not appear to be a problem with a combination of my hardware and the basic kernel. Something specific to the redhat kernel running on my hardware (whether defective or not).
However, I don't really remember the details of this, so didn't report it again this time. But I must have done it!

I think it still could be a problem with any kernel, that is not displayed because it relies on the specific positioning of the code when loaded, which could vary from compile to compile. In fact my insertion of printk's could change it as well.

Last edited by BensonBear; 04-12-2014 at 03:03 AM.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Crash Monitor/Display regularly? Xeratul Debian 1 07-30-2013 01:40 PM
LXer: We few, we happy few: Big boys dominate early stage OpenStack LXer Syndicated Linux News 0 07-22-2013 09:10 AM
LXer: AntiX spin on Mepis in 'pre-final' stage, should be 'final' in early July LXer Syndicated Linux News 0 06-28-2007 10:31 PM
Crash on early boot gamito Linux - Laptop and Netbook 0 04-20-2005 08:23 AM
Kernel Crash-Exploit affects 2.4.2x and 2.6.x kernels on x86 and x86_64 unSpawn Linux - Security 8 11-24-2004 01:29 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software > Linux - Kernel

All times are GMT -5. The time now is 07:02 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration