LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > Slackware
User Name
Password
Slackware This Forum is for the discussion of Slackware Linux.

Notices


Reply
  Search this Thread
Old 10-06-2017, 10:06 AM   #1
EdGr
Member
 
Registered: Dec 2010
Location: California, USA
Distribution: I run my own OS
Posts: 998

Rep: Reputation: 470Reputation: 470Reputation: 470Reputation: 470Reputation: 470
random boot failures with 32-bit custom kernel


I am seeing random boot failures on an old Dell Inspiron E1505 laptop with 32-bit Slackware-current. The laptop occasionally crashes after printing "Decompressing Linux" and before printing "Parsing ELF". The crash will repeat until I power-cycle the laptop. I am running a custom 4.9.52 kernel which is essentially the same as Slackware's generic kernel except that it doesn't require an initrd.

I tried adding the "nokaslr" kernel parameter. This didn't cure the crash.

This problem seems specific to 32-bit Linux and the E1505 laptop. 64-bit Slackware-current runs fine on all my 64-bit machines.
Ed
 
Old 10-06-2017, 10:25 AM   #2
business_kid
LQ Guru
 
Registered: Jan 2006
Location: Ireland
Distribution: Slackware, Slarm64 & Android
Posts: 16,297

Rep: Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322
Presuming you've no initrd, does it boot fine with the slackware-huge kernel?
If not, we're looking like a memory thing. Get memtest86 and run it.
If so, we're probably looking at a kernel thing. I'm running on kernel version 4.9.45-dec5, because 4.93.45-dec1-4 were bummers.
 
Old 10-06-2017, 11:05 AM   #3
EdGr
Member
 
Registered: Dec 2010
Location: California, USA
Distribution: I run my own OS
Posts: 998

Original Poster
Rep: Reputation: 470Reputation: 470Reputation: 470Reputation: 470Reputation: 470
Memtest86 reported no errors.

I didn't notice any problems with the Slackware huge kernel, but I booted it only during installation. The boot failure on the custom kernel occurs maybe once every five boots. It occurs so early that there aren't any clues left for debugging.
Ed
 
Old 10-06-2017, 01:09 PM   #4
jostber
Member
 
Registered: Jul 2001
Location: Skien, Norway
Distribution: Slackware Current 64-bit
Posts: 543

Rep: Reputation: 178Reputation: 178
Try to disable power management when booting?
 
Old 10-06-2017, 01:44 PM   #5
EdGr
Member
 
Registered: Dec 2010
Location: California, USA
Distribution: I run my own OS
Posts: 998

Original Poster
Rep: Reputation: 470Reputation: 470Reputation: 470Reputation: 470Reputation: 470
I reset all the BIOS settings to defaults. It still crashes.
Ed
 
Old 10-07-2017, 01:46 PM   #6
EdGr
Member
 
Registered: Dec 2010
Location: California, USA
Distribution: I run my own OS
Posts: 998

Original Poster
Rep: Reputation: 470Reputation: 470Reputation: 470Reputation: 470Reputation: 470
I note that success or failure of booting is determined at power-on. This is looking like an initialization problem. The x86 boot code in /usr/src/linux/arch/x86/boot/compressed has changed recently. I bet there hasn't been much testing on 32-bit x86 hardware, which by now is over a decade old.
Ed
 
Old 10-07-2017, 07:46 PM   #7
allend
LQ 5k Club
 
Registered: Oct 2003
Location: Melbourne
Distribution: Slackware64-15.0
Posts: 6,371

Rep: Reputation: 2750Reputation: 2750Reputation: 2750Reputation: 2750Reputation: 2750Reputation: 2750Reputation: 2750Reputation: 2750Reputation: 2750Reputation: 2750Reputation: 2750
What is the GPU in your E1505? I am asking because apparently that model may use an nVidia GeForce Go 7300. I have had some random boot failures after switching server layouts in X with an nVidia GeForce 7300 LE using the nouveau driver. I did not see this when I was using the proprietary nVidia driver. A second reboot seems to be the fix for me.
 
Old 10-07-2017, 08:00 PM   #8
EdGr
Member
 
Registered: Dec 2010
Location: California, USA
Distribution: I run my own OS
Posts: 998

Original Poster
Rep: Reputation: 470Reputation: 470Reputation: 470Reputation: 470Reputation: 470
The GPU is an ATI Radeon X1300. I use the open-source radeon driver. Coincidentally, the Radeon X1300 also has an initialization problem - every few boots, the display shows pixel artifacts after I start X Windows. This GPU has always had that problem. I don't believe it is related to the boot failure, which occurs in the kernel's bootloader.
Ed
 
Old 10-09-2017, 01:11 PM   #9
EdGr
Member
 
Registered: Dec 2010
Location: California, USA
Distribution: I run my own OS
Posts: 998

Original Poster
Rep: Reputation: 470Reputation: 470Reputation: 470Reputation: 470Reputation: 470
I tried two more custom kernels, neither of which eliminated the crash.

* The first changed CONFIG_KERNEL_LZMA to CONFIG_KERNEL_GZIP.
* The second changed CONFIG_CC_STACKPROTECTOR_REGULAR to CONFIG_CC_STACKPROTECTOR_NONE.

The 32-bit kernel being built without CONFIG_RELOCATABLE avoids a lot of potential problems.

Sometimes the failure will occur slightly after "Decompressing Linux". The last message printed is "Booting the kernel". After a failure, powering the laptop off and then on (within two seconds) will result in a successful boot.
Ed

Last edited by EdGr; 10-09-2017 at 01:12 PM.
 
Old 10-09-2017, 02:07 PM   #10
business_kid
LQ Guru
 
Registered: Jan 2006
Location: Ireland
Distribution: Slackware, Slarm64 & Android
Posts: 16,297

Rep: Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322
I asked about the slackware-huge kernel back in post #2 and I'd still like to know the results from that.

If I were you, I'd open 2 consoles and run 'make menuconfig' on your own config, while viewing the slackware-huge config in a pager. In the section between and including "General Setup" and "Processor Type and Features" I would make them as alike as possible. Obviously, you don't need support for things you don't have, but make the rest alike and try it. Your problem is most likely in there. If it boots, try your 'improvements' a few at a time until it pukes on you. The real weight in the huge kernel is in compiled-in drivers for everything conceivable, not in the section I outlined.
 
Old 10-09-2017, 03:18 PM   #11
EdGr
Member
 
Registered: Dec 2010
Location: California, USA
Distribution: I run my own OS
Posts: 998

Original Poster
Rep: Reputation: 470Reputation: 470Reputation: 470Reputation: 470Reputation: 470
Okay, I tried booting the huge-smp kernel. It booted successfully a dozen times.

My custom kernel's configuration is generated from the generic kernel's configuration by a script. The only changes are:

Code:
5649,5650c5649,5650
< CONFIG_USB_XHCI_HCD=m
< CONFIG_USB_XHCI_PCI=m
---
> CONFIG_USB_XHCI_HCD=y
> CONFIG_USB_XHCI_PCI=y
5652c5652
< CONFIG_USB_EHCI_HCD=m
---
> CONFIG_USB_EHCI_HCD=y
5655c5655
< CONFIG_USB_EHCI_PCI=m
---
> CONFIG_USB_EHCI_PCI=y
5661c5661
< CONFIG_USB_OHCI_HCD=m
---
> CONFIG_USB_OHCI_HCD=y
5663d5662
< CONFIG_USB_OHCI_HCD_SSB=y
6839c6838
< CONFIG_EXT4_FS=m
---
> CONFIG_EXT4_FS=y
6845c6844
< CONFIG_JBD2=m
---
> CONFIG_JBD2=y
6847c6846
< CONFIG_FS_MBCACHE=m
---
> CONFIG_FS_MBCACHE=y
6929c6928
< CONFIG_ISO9660_FS=m
---
> CONFIG_ISO9660_FS=y
7099c7098
< CONFIG_NLS_CODEPAGE_437=m
---
> CONFIG_NLS_CODEPAGE_437=y
7123c7122
< CONFIG_NLS_ISO8859_1=m
---
> CONFIG_NLS_ISO8859_1=y
7147c7146
< CONFIG_NLS_UTF8=m
---
> CONFIG_NLS_UTF8=y
The custom kernels have worked on all versions of 64-bit Slackware, and on all but the most recent versions of 32-bit Slackware. I believe I am encountering a bug, possibly specific to the Dell Inspiron E1505.
Ed
 
Old 10-09-2017, 09:35 PM   #12
EdGr
Member
 
Registered: Dec 2010
Location: California, USA
Distribution: I run my own OS
Posts: 998

Original Poster
Rep: Reputation: 470Reputation: 470Reputation: 470Reputation: 470Reputation: 470
A different memory test is triggering a kernel panic. The hardware may be dying.

Code:
[   87.763326] memory4cl: Corrupted page table at address 9d9ab020
[   87.764013] *pdpt = 0000000033047001 *pde = 0000000033385067 
[   87.764013] *pte = 00208a8000000000 

[   87.764013] Bad pagetable: 000e [#1] SMP
[   87.764013] Modules linked in: xt_tcpudp iptable_filter ip_tables x_tables ipv6 fuse joydev i2c_dev gpio_ich dell_smm_hwmon dell_laptop dell_smbios dcdbas snd_hda_codec_hdmi radeon b44 psmouse iwl3945 snd_hda_codec_idt snd_hda_codec_generic snd_hda_intel iwlegacy coretemp evdev snd_hda_codec hwmon mac80211 ttm drm_kms_helper sdhci_pci ssb drm sdhci kvm i2c_algo_bit fb_sys_fops syscopyarea sysfillrect irqbypass sysimgblt serio_raw mmc_core snd_hda_core i2c_i801 i2c_smbus r852 i2c_core sm_common nand nand_ecc nand_bch nand_ids bch cfg80211 lpc_ich snd_hwdep mtd snd_pcm rfkill libphy mii pcmcia pcmcia_core firewire_ohci firewire_core r592 memstick video snd_timer shpchp thermal ac button snd soundcore intel_agp intel_gtt uhci_hcd agpgart acpi_cpufreq tpm_tis tpm_tis_core tpm loop
[   87.764013] CPU: 0 PID: 961 Comm: memory4cl Not tainted 4.9.52-smp #1
[   87.764013] Hardware name: Dell Inc. MM061                           /0XD720, BIOS A03 03/09/2006
[   87.764013] task: f5422940 task.stack: f4e66000
[   87.764013] EIP: 0073:[<0804b1bd>] EFLAGS: 00210212 CPU: 0
[   87.764013] EIP is at 0x804b1bd
[   87.764013] EAX: bf8721fb EBX: 0156b004 ECX: 0156b01a EDX: a10ccdf5
[   87.764013] ESI: 1e7a5406 EDI: 983ff010 EBP: 0000000b ESP: b57ff1b0
[   87.764013]  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b
[   87.764013] EIP: [<0804b1bd>] 
[   87.764013] 0x804b1bd
[   87.764013]  SS:ESP 007b:b57ff1b0
[   87.764013] ---[ end trace d1457291eb57aa43 ]---
Ed
 
Old 10-10-2017, 03:36 AM   #13
business_kid
LQ Guru
 
Registered: Jan 2006
Location: Ireland
Distribution: Slackware, Slarm64 & Android
Posts: 16,297

Rep: Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322
The kernel does a lot of memory testing at the stage you're dying at, so you are probably right.

Your diff is confusing. I'm guessing you're compiling in drivers that the generic kernel has as modules. The best way to show a diff is to use diff -u and show the command line. That way, you get file1 marked with '-' and file2 with '+'. You will need EXT4_FS compiled in to get going if you have ext4 disks. But those changes shouldn't otherwise affect matters.

The modules you need are the modules required to boot; filesystem, motherboard chipset, video, other disk drivers (e.g. sata, raid if applicable). It can then load the usb modules itself.
 
Old 10-10-2017, 12:31 PM   #14
EdGr
Member
 
Registered: Dec 2010
Location: California, USA
Distribution: I run my own OS
Posts: 998

Original Poster
Rep: Reputation: 470Reputation: 470Reputation: 470Reputation: 470Reputation: 470
Yes, the important configuration change is to build-in the ext4 filesystem.

Depopulating either of the two DIMMs did not eliminate the boot crash. However, I didn't see the kernel panic with only one DIMM installed.

At this point, I no longer trust the E1505 hardware. This machine is at the last stage of my PC waterfall, being used only as a music and video player. The next stage would be the recycler.
Ed
 
Old 10-11-2017, 03:32 AM   #15
business_kid
LQ Guru
 
Registered: Jan 2006
Location: Ireland
Distribution: Slackware, Slarm64 & Android
Posts: 16,297

Rep: Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322
That points firmly at hardware.

1. The extra dimm is 'The straw that broke the camel's back' in your box that trips something from "Just working" to "Just not working" or "Borderline."
2. That something could conceivably be the power supply (dying or overloaded).

If 1 is the case, you might well get out of it by slowing the bus speed down, and try this if you want to hang onto this piece of kit until the bitter and inevitable end. I gather you don't, but you can clear
a. The dimms - probably.
b. The kernel.
and we can stop posting here, and you can mark this solved.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
adding 32 bit compatibility layer in custom kernel module ajay_sharma Linux - Kernel 2 01-05-2017 07:46 AM
[SOLVED] Random name resolution failures when using root servers directly in bind Red Squirrel Linux - Server 5 11-09-2016 09:43 PM
ati ixp random sound failures glorsplitz Linux - Hardware 4 11-11-2010 09:01 PM
Acer Aspire One - Restart Wrong Kernel and Boot failures aspire1 Ubuntu 7 05-02-2009 01:25 AM
Random numbers during boot with custom kernel Daedra Slackware 3 12-14-2008 02:44 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > Slackware

All times are GMT -5. The time now is 04:01 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration