LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Embedded & Single-board computer (https://www.linuxquestions.org/questions/linux-embedded-and-single-board-computer-78/)
-   -   2.6.39.3 occasional boot hang (https://www.linuxquestions.org/questions/linux-embedded-and-single-board-computer-78/2-6-39-3-occasional-boot-hang-894897/)

sduquette 08-01-2011 08:48 AM

2.6.39.3 occasional boot hang
 
Hi,
I am trying to get kernel 2.6.39.3 to run on a PC104 board. It's an Emcore i613, with a Celeron processor. Roughly 1 out of every 7 boots will hang when it tries to initialize the serial port. I've found that it hangs when it does the 20 millisecond msleep() call in the probe_irq_on() function in autoprobe.c.
Just not sure where to go from here. I've never been down into the guts of the kernel before.

I have tried this, and get the same results, on 3 different boards of the same model, so I do not believe it to be an intermittent board.

It does seem like some timing is just on the hairy edge however, as the hang is more likely to occur if the board has been powered down for a while. (cold). If I keep on trying I may get 25 -30 good boots and see no failure.

F.Y.I. This kernel runs fine on a newer, Geode based PC104 board that we currently use, but we must be able to support our existing products already out in the field.

Thanks for any hints or advice.

Steve

onebuck 08-01-2011 09:11 AM

Hi,

Welcome to LQ!

Moved: This thread is more suitable in <Linux - Embedded> and has been moved accordingly to help your thread/question get the exposure it deserves.

Since you can repeat on other boards then the problem is inherent. When you used the earlier kernel on the PC104, any problems? Repeatable? Reasoning behind moving to a new kernel?

sduquette 08-01-2011 09:41 AM

Thanks for your quick response and also for moving my post to the more appropriate forum onebuck!

We have been running with a 2.6.11 kernel for about 5 years now, and (to my shame),only just found out that the kernel was the source of a long standing 'intermittent checksum error' that I had always blamed on the hardware we communicate with via RS232. The problem turned out to be that 'occasionally' it would take the kernel 10's of milliseconds to respond to an interrupt from the uarts and of course this would cause the uart to overrun. The uncaught loss of data propagated up to a 'checksum error' further up the chain.

When I tried a newer kernel, the random slow response to interrupts problem was gone. But as I tried to get the new kernel ready for deployment to our systems in the field I noticed this random boot problem on the older (i613) boards.

The 2.6.11 kernel never failed to boot.

Thanks Again,
Steve


All times are GMT -5. The time now is 07:12 AM.