LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (http://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Troubleshooting grub2 boot problem (http://www.linuxquestions.org/questions/linux-newbie-8/troubleshooting-grub2-boot-problem-818682/)

dbuehler 07-08-2010 08:31 AM

Troubleshooting grub2 boot problem
 
I am trying to troubleshoot an intermittent boot problem. In order to see what was happening, I first set the defaults to allow me to see all boot messages. When it failed again I was able to determine that the problem occurred right after the grub menu selections went away but before the OS was loaded. It just stopped there with nothing but a flashing cursor.

Does anyone on the list have a suggestion how to troubleshoot this kind
of problem? Is there a way find out what problem grub encountered? I
am pretty much a newbie here and will appreciate any help.

I am using ubuntu 10.04 with grub2. No other OS are loaded at this time.

Dan

saikee 07-08-2010 08:57 AM

You can boot a Linux manually line by line in Grub2.

At the booting screen don't choose a system but just press the "c" key to get a Grub prompt

If your linux is in the 3rd partition (count from 1) of the first disk (count from 0) it will be known to Grub2 as (hd0,3) and Linux as device /dev/sda3. Therefore you can ask Grub to display the booting configuration file, always called grub.cfg in /boot/grub directory by coomand
Code:

cat (hd0,3)/boot/grub/grub.cfg
You can then type each line of the system you wish to boot and finish last with a "boot" command.

In any case one can boot a Ubuntu 10.04 manually (assumong it has been installed in (hd0,3) and known to Linux as sda3 by commands
Code:

set root=(hd0,3)
linux /vmlinuz ro root=/dev/sda3
initrd /initrd.img
boot

If Grub does like any the statements it will tell you immediately.

If Grub executes all the statement but your Linux doesn't boot then the problem lies with the kernel and not the boot loader.

dbuehler 07-08-2010 10:59 AM

Thanks Saikee, I will try that and see what happens.

The problem is intermittent so it may be a while before I can get it to repeat. Last time, it took about a week.

Dan

saikee 07-08-2010 11:41 AM

If the problem is intermittent then you can restore/renew Grub2 to see if it can cure it. Just in the boot-up Ubuntu issue this command at a terminal
Code:

grub-install /dev/sda

dbuehler 07-08-2010 01:23 PM

I will try that also.

Thanks again.

aus9 07-08-2010 08:56 PM

hi

Commands from a menuentry may not be the only cause of the OP issue.

Please post your grub.cfg in a code box.....I am thinking you have set things like wallpaper, locales or other thingees at the top of your grub.cfg

To find the culprit for these are much harder but IMHO can still be done.

Also does your menu need any devices to be present of a non-linux nature or USB nature?

Hopefully your grub.cfg will provide an answer.

Leaping ahead....try hashing out....most or all of the top entries with a # and see if that makes a difference?

(for me there is always a delay if I have wallpaper set in grub2)
(for me there is always a "detectable" delay if I have lots of scripts at the top of grub.cfg)

EDIT

and if the culprit is the scripts there may be ways of using "set" instead of IF - THEN scripts to reduce that delay.

dbuehler 07-09-2010 09:26 AM

Thanks aus9.

My grub.cfg should be the standard ubuntu 10.04 issue as I haven't changed anything I am aware of. But here it is anyway.

Code:

#
# DO NOT EDIT THIS FILE
#
# It is automatically generated by /usr/sbin/grub-mkconfig using templates
# from /etc/grub.d and settings from /etc/default/grub
#

### BEGIN /etc/grub.d/00_header ###
if [ -s $prefix/grubenv ]; then
  load_env
fi
set default="0"
if [ ${prev_saved_entry} ]; then
  set saved_entry=${prev_saved_entry}
  save_env saved_entry
  set prev_saved_entry=
  save_env prev_saved_entry
  set boot_once=true
fi

function savedefault {
  if [ -z ${boot_once} ]; then
    saved_entry=${chosen}
    save_env saved_entry
  fi
}

function recordfail {
  set recordfail=1
  if [ -n ${have_grubenv} ]; then if [ -z ${boot_once} ]; then save_env recordfail; fi; fi
}
insmod ext2
set root='(hd0,1)'
search --no-floppy --fs-uuid --set be8d5289-0cfd-41e1-9da1-90c1e83fd463
if loadfont /usr/share/grub/unicode.pf2 ; then
  set gfxmode=640x480
  insmod gfxterm
  insmod vbe
  if terminal_output gfxterm ; then true ; else
    # For backward compatibility with versions of terminal.mod that don't
    # understand terminal_output
    terminal gfxterm
  fi
fi
insmod ext2
set root='(hd0,1)'
search --no-floppy --fs-uuid --set be8d5289-0cfd-41e1-9da1-90c1e83fd463
set locale_dir=($root)/boot/grub/locale
set lang=en
insmod gettext
if [ ${recordfail} = 1 ]; then
  set timeout=-1
else
  set timeout=10
fi
### END /etc/grub.d/00_header ###

### BEGIN /etc/grub.d/05_debian_theme ###
set menu_color_normal=white/black
set menu_color_highlight=black/light-gray
### END /etc/grub.d/05_debian_theme ###

### BEGIN /etc/grub.d/10_linux ###
menuentry 'Ubuntu, with Linux 2.6.32-23-generic' --class ubuntu --class gnu-linux --class gnu --class os {
        recordfail
        insmod ext2
        set root='(hd0,1)'
        search --no-floppy --fs-uuid --set be8d5289-0cfd-41e1-9da1-90c1e83fd463
        linux        /boot/vmlinuz-2.6.32-23-generic root=UUID=be8d5289-0cfd-41e1-9da1-90c1e83fd463 ro  quiet splash
        initrd        /boot/initrd.img-2.6.32-23-generic
}
menuentry 'Ubuntu, with Linux 2.6.32-23-generic (recovery mode)' --class ubuntu --class gnu-linux --class gnu --class os {
        recordfail
        insmod ext2
        set root='(hd0,1)'
        search --no-floppy --fs-uuid --set be8d5289-0cfd-41e1-9da1-90c1e83fd463
        echo        'Loading Linux 2.6.32-23-generic ...'
        linux        /boot/vmlinuz-2.6.32-23-generic root=UUID=be8d5289-0cfd-41e1-9da1-90c1e83fd463 ro single
        echo        'Loading initial ramdisk ...'
        initrd        /boot/initrd.img-2.6.32-23-generic
}
menuentry 'Ubuntu, with Linux 2.6.32-22-generic' --class ubuntu --class gnu-linux --class gnu --class os {
        recordfail
        insmod ext2
        set root='(hd0,1)'
        search --no-floppy --fs-uuid --set be8d5289-0cfd-41e1-9da1-90c1e83fd463
        linux        /boot/vmlinuz-2.6.32-22-generic root=UUID=be8d5289-0cfd-41e1-9da1-90c1e83fd463 ro  quiet splash
        initrd        /boot/initrd.img-2.6.32-22-generic
}
menuentry 'Ubuntu, with Linux 2.6.32-22-generic (recovery mode)' --class ubuntu --class gnu-linux --class gnu --class os {
        recordfail
        insmod ext2
        set root='(hd0,1)'
        search --no-floppy --fs-uuid --set be8d5289-0cfd-41e1-9da1-90c1e83fd463
        echo        'Loading Linux 2.6.32-22-generic ...'
        linux        /boot/vmlinuz-2.6.32-22-generic root=UUID=be8d5289-0cfd-41e1-9da1-90c1e83fd463 ro single
        echo        'Loading initial ramdisk ...'
        initrd        /boot/initrd.img-2.6.32-22-generic
}
menuentry 'Ubuntu, with Linux 2.6.32-21-generic' --class ubuntu --class gnu-linux --class gnu --class os {
        recordfail
        insmod ext2
        set root='(hd0,1)'
        search --no-floppy --fs-uuid --set be8d5289-0cfd-41e1-9da1-90c1e83fd463
        linux        /boot/vmlinuz-2.6.32-21-generic root=UUID=be8d5289-0cfd-41e1-9da1-90c1e83fd463 ro  quiet splash
        initrd        /boot/initrd.img-2.6.32-21-generic
}
menuentry 'Ubuntu, with Linux 2.6.32-21-generic (recovery mode)' --class ubuntu --class gnu-linux --class gnu --class os {
        recordfail
        insmod ext2
        set root='(hd0,1)'
        search --no-floppy --fs-uuid --set be8d5289-0cfd-41e1-9da1-90c1e83fd463
        echo        'Loading Linux 2.6.32-21-generic ...'
        linux        /boot/vmlinuz-2.6.32-21-generic root=UUID=be8d5289-0cfd-41e1-9da1-90c1e83fd463 ro single
        echo        'Loading initial ramdisk ...'
        initrd        /boot/initrd.img-2.6.32-21-generic
}
### END /etc/grub.d/10_linux ###

### BEGIN /etc/grub.d/20_memtest86+ ###
menuentry "Memory test (memtest86+)" {
        insmod ext2
        set root='(hd0,1)'
        search --no-floppy --fs-uuid --set be8d5289-0cfd-41e1-9da1-90c1e83fd463
        linux16        /boot/memtest86+.bin
}
menuentry "Memory test (memtest86+, serial console 115200)" {
        insmod ext2
        set root='(hd0,1)'
        search --no-floppy --fs-uuid --set be8d5289-0cfd-41e1-9da1-90c1e83fd463
        linux16        /boot/memtest86+.bin console=ttyS0,115200n8
}
### END /etc/grub.d/20_memtest86+ ###

### BEGIN /etc/grub.d/30_os-prober ###
### END /etc/grub.d/30_os-prober ###

### BEGIN /etc/grub.d/40_custom ###
# This file provides an easy way to add custom menu entries.  Simply type the
# menu entries you want to add after this comment.  Be careful not to change
# the 'exec tail' line above.
### END /etc/grub.d/40_custom ###

I have been dropping to the grub command line and manually issuing the commands as saikee suggested but, so far, the problem has not recurred.

Thanks for all the help. I have been 'windows free" for about a year and a half now but still feel like a newbie. I really like linux but there is so much to learn.

Dan

dbuehler 07-10-2010 06:15 PM

I have been booting manually as suggested by saikee and just had a recurrence of the problem. As before, it had a flashing cursor but, this time, I was able to see the last page of messages before it crashed. The last 2 messages listed were:

[ 4.034270] 456 pages shared
[ 4.034333] 214868 pages non shared
_

What would be the proper way to find out what caused this problem. Would I need to download the source code to see what it was doing when it hung or; is there a better way?

thanks.
Dan

syg00 07-10-2010 06:54 PM

I don't have a Lucid system (here) - does it still offer a recovery mode option on the boot menu ?.
Use that - it'll get rid of the lame background, and let you see all the boot messages (those are kernel messages).

dbuehler 07-10-2010 09:50 PM

Yes, lucid does have a recovery mode. It took several tries but when it died I had a very similar message to before.

[ 3.834270] 444 pages shared
[ 3.839790] 214718 pages non shared
_

Dan

syg00 07-10-2010 10:21 PM

Yeah sorry, my bad - that is effectively the same as saikee's post. Don't know how I missed that.
Intermittent problems are always hardware - you know grub(2) and the kernel are (normally) o.k.

You'd probably need to get earlier messages - going to be difficult to diagnose.

dbuehler 07-11-2010 09:15 AM

Thanks again syg00.

Yes, I am aware that the problem is most likely hardware but I was just hoping to get an idea what is failing. Since the last message displayed appeared to be related to allocating memory, I let memtest run over night. No problems were reported. Guess I will keep plugging away and hope to eventually find it.

I appreciate all the help from everyone!

Dan

aus9 07-11-2010 07:57 PM

hi

just to state the obvious, those messages are kernel messages. Nothing to do with grub2. You may like to edit your subject line when you get a chance?

2) Do you get the same glitch booting a different kernel?

3) Can you hold control and alt and press F1 and then hold shift key and press page up to thru your messages?

4) I gather its not in you /var/log/dmesg?

EDIT try getting more info?
http://www.debian.org/releases/stabl...h05s02.html.en BOOT_DEBUG

signing off as not grub2

dbuehler 07-11-2010 11:02 PM

Troubleshooting boot problem
 
Right now, ubuntu is the only OS I have installed. I might try
installing another when I get a chance but I will be out of town so it may be a week or so.

I did look at several of the files in /var/log before I posted but wasn't really sure how to interpret them. In dmesg, I did a search on the last messages I saw before it hung and found this:
Code:

...
[    3.658279] 620 pages shared
[    3.658342] 212995 pages non-shared
[    3.658407] Out of memory: kill process 395 (plymouthd) score 38 or a child
[    3.658478] Killed process 395 (plymouthd)
[    3.658818] ureadahead invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=0
[    3.658820] ureadahead cpuset=/ mems_allowed=0
[    3.658822] Pid: 394, comm: ureadahead Not tainted 2.6.32-23-generic #37-Ubuntu
[    3.658823] Call Trace:
[    3.658827]  [<c01cd1f4>] oom_kill_process+0xa4/0x2b0
[    3.658829]  [<c01cd869>] ? select_bad_process+0xa9/0xe0
...

I think this may be the area where it hangs.

Dan

syg00 07-12-2010 12:07 AM

Yep, that looks like a pretty good candidate. Plymouth .... what a surprise.
I may even retract my hardware accusation - crappy software forced on the user community qualifies as well ... :(
From a good boot, have a look at /var/log - you'll see several dmesg.* files. Probably one of the .gz ones will have your failure messages. From a terminal, try this to save a copy.
Code:

for i in /var/log/dme*.gz ; do if (zgrep oom_kill $i 2>@1>/dev/null) ;then cp $i ~/$(basename $(i}).save ; fi ; done
That'll get you some saved files in your home - use zless to look through one (q to quit out of zless).


All times are GMT -5. The time now is 10:01 PM.