Embedded kernel+app running on 256MB but random crashes on 128MB
Linux - Embedded & Single-board computerThis forum is for the discussion of Linux on both embedded devices and single-board computers (such as the Raspberry Pi, BeagleBoard and PandaBoard). Discussions involving Arduino, plug computers and other micro-controller like devices are also welcome.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Distribution: PCLinuxOS; CentOS; Distributions developed with Wind River Linux 4.1
Posts: 19
Rep:
Embedded kernel+app running on 256MB but random crashes on 128MB
Hi,
I am not sure if anybody can help but here is my problem:
I developed a Linux platform using Wind River Linux for an embedded device with a VIA Nehemiah processor. I have only one application running but it needs to run 24/7.
My first system has 256 MB of RAM and the application runs with not problem but on my second system, which has only 128MB, I am getting totally random crashes: oops or application crashes. These crashes happen after anything between 2 minutes and 2 days.
Would somebody have any idea where to start looking to solve this issue? Any advise would be great has I have been trying to solve this problem for weeks now.
More Info:
Kernel 2.6.34.8
The whole OS is loaded into RAM and everything is running from RAM. There is not swap partition.
Board: HS2604
If there is no swap and the kernel needs memory for some critical operation, it may kill a process without asking. You should be able to see that in the kernel logs (dmesg). This may explain the application crashes.
Distribution: PCLinuxOS; CentOS; Distributions developed with Wind River Linux 4.1
Posts: 19
Original Poster
Rep:
Hi Thank you for your replies,
I have a screen connected to the device.
Top shows that with the 128MB system we are using almost 100% of the memory, however this memory is not really used but in fact allocated (number of threads * 8MB). I reduced the size of the thread stacks to 2MB but the behaviour is same even if the app now only uses 30% of the memory.
Below are some example of oops messages that I am getting but never pointing to the same part of the code. I have many of them in stock
I don't think that it is a problem with memory allocation as I am not getting the correct dmesg if the app crashes. I created an app with a memory leak to check what message I would get if the system was running out of memory.
I would ask Wind River but they are very protective over the help they giving if you don't pay for dedicated support, but yes I could ask if they are aware of any issue with a system running on 128MB.
Do you always get it there? If so, does it help if you remove that kernel module? I know it may be needed for your purpose but this is just to identify the problem.
Another hypothesis: Hardware problem? May your RAM be corrupted somehow? If yes, it would cause any kind of random crashes, both app and kernel. VIA Nehemiah is x86 compatible, right? www.memtest86.com - it's worth a try.
Distribution: PCLinuxOS; CentOS; Distributions developed with Wind River Linux 4.1
Posts: 19
Original Poster
Rep:
Hi irey,
Thanks for your reply again.
I think that module linked are all the modules which are loaded not particularly modules that are causing the ooops. I unloaded the module comsync as it is not part of the original kernel but the crash occurs still (Ooops message at the end of this post).
I tried the same kernel and same app on two (or three?) different machines and the crash is still happening.
I ran the app into Valgrind which, among others errors, it gave me the following error:
Conditional jump or move depends on uninitialised value(s)
at ....: pthread_mutex_init
The mutex code is the following:
Code:
void CriticalSection::initialise()
{
pthread_mutexattr_t attr;
if (pthread_mutexattr_settype(&attr, PTHREAD_MUTEX_RECURSIVE ) != 0)
{
perror("CriticalSection::initialise: pthread_mutexattr_settype: ");
exit(1);
}
if (pthread_mutex_init(&m_mutex, &attr) != 0)
{
perror("CriticalSection::initialise: pthread_mutex_init: ");
exit(1);
}
// destroy the mutex attribute after use (not the mutex itself)
pthread_mutexattr_destroy(&attr);
}
And I changed it to:
Code:
void CriticalSection::initialise()
{
pthread_mutexattr_t attr;
if (pthread_mutexattr_init(&attr) != 0)
{
perror("CriticalSection::initialise: pthread_mutexattr_init: ");
exit(1);
}
if (pthread_mutexattr_settype(&attr, PTHREAD_MUTEX_RECURSIVE ) != 0)
{
perror("CriticalSection::initialise: pthread_mutexattr_settype: ");
exit(1);
}
if (pthread_mutex_init(&m_mutex, &attr) != 0)
{
perror("CriticalSection::initialise: pthread_mutex_init: ");
exit(1);
}
// destroy the mutex attribute after use (not the mutex itself)
pthread_mutexattr_destroy(&attr);
}
I don't know if this could be causing so much issues but it stops Valgrind complaining?
I think Valgrind was right. Even if the manpage for pthread_mutexattr_init() doesn't say it's mandatory, I assume it is since I would never use an uninitialized object.
Maybe you're right, "modules linked in" doesn't necessarily mean those modules were responsible for the problem and your stack traces show completely different system calls.
Have you tried running that OS image in a VM (such as VirtualBox) to experiment with different RAM sizes? Maybe that can confirm it's just the ammount of memory...
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.