LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 09-10-2014, 07:24 AM   #1
DevStf
LQ Newbie
 
Registered: Sep 2014
Posts: 3

Rep: Reputation: Disabled
Linux Embedded SD-Card memory corruption


Hi guys,

I have a nasty problem with my embedded linux kernel: "Linux 2.6.26.2-at91-tm #1 Wed Aug 4 11:33:17 MSD 2010 armv4tl unknown" Debian based embedded distro.

We have developed a std. C program which uses heavy read/write operations on a micro SD-Card.

Gnu/Linux is on a internal flash memory not the sd-card, unfortunately not enough space for storing lots of files!

After some time we got memory corruption in the filesystem (fat32) of the sd-cards. First synoptic is the sd-card is read only accessible and processes become into defuct state. Even if I drop a shell command over ssh the process hungs up and becomes defuct. Only solution is to hard reset the device.

After bootup fstab can't mount the sd-card. dmesg reveals "mmc0 -110 error can't initialize sd-card".
  1. Can I fire up the device /dev/mmcblk0p1 somehow to get formating done with dd?
  1. Any idea why the kernel can't handel processes any more (defunct processes)?
  1. Any ideas what can cause these memory corruption/prevention?

Unfortuantely I can't travel on site too the devices and replace sd-cards easily, so need to be done remotely.

Wearing of sd-cards ofc is an issue but after 2-3 month?

thx for ideas

lg stf

Last edited by DevStf; 09-10-2014 at 07:26 AM.
 
Old 09-10-2014, 08:31 AM   #2
rokytnji
LQ Veteran
 
Registered: Mar 2008
Location: Waaaaay out West Texas
Distribution: antiX 23, MX 23
Posts: 7,112
Blog Entries: 21

Rep: Reputation: 3474Reputation: 3474Reputation: 3474Reputation: 3474Reputation: 3474Reputation: 3474Reputation: 3474Reputation: 3474Reputation: 3474Reputation: 3474Reputation: 3474
Back when I ran Linux on a ext2 file file system on external SD cards on my eeepc.
I fixed it using check in gparted to do a file system check/repair to repair overwriting on the drive.

I don't know how you will do that via ssh and the drive being repaired must be umounted first before doing a check/repair.

Edit: on mine. Corruption came from improper power shutdown while sd card was powered on and mounted.

Last edited by rokytnji; 09-10-2014 at 08:32 AM.
 
Old 09-10-2014, 08:48 AM   #3
business_kid
LQ Guru
 
Registered: Jan 2006
Location: Ireland
Distribution: Slackware, Slarm64 & Android
Posts: 16,292

Rep: Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322
Just a few reactions, as one can't really do much more.

I hope you have some fat repair program loaded. If not download one. Through ssh you can do
Code:
mount -o remount,ro  /dev/mmcblk0p1
fsck.vfat (options)  /dev/mmcblk0p1
<and later>
mount -o remount,rw  /dev/mmcblk0p1
As for why, possibilities are
1. overloads in the linux system (too many files open, stack, too many processes for vfat, something crazy)
2. Poor write procedures in your firmware - not enough waits or something.
3. Parallel write processes - crazy but theoretically possible.

vfat automatically defaults to vfat32 but sd cards are rarely formatted that way. specifying msdos instead of vfat may help (or not). It _used_ to. Why not use yaffs or some other more suitable filesystem?

I pass you on Flon's Law:
Quote:
There is not now, and never will be, a language in which it is
the least bit difficult to write bad programs.
 
2 members found this post helpful.
Old 09-10-2014, 10:29 AM   #4
DevStf
LQ Newbie
 
Registered: Sep 2014
Posts: 3

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by business_kid View Post
Just a few reactions, as one can't really do much more.

I hope you have some fat repair program loaded. If not download one. Through ssh you can do
Code:
mount -o remount,ro  /dev/mmcblk0p1
fsck.vfat (options)  /dev/mmcblk0p1
<and later>
mount -o remount,rw  /dev/mmcblk0p1
As for why, possibilities are
1. overloads in the linux system (too many files open, stack, too many processes for vfat, something crazy)
2. Poor write procedures in your firmware - not enough waits or something.
3. Parallel write processes - crazy but theoretically possible.

vfat automatically defaults to vfat32 but sd cards are rarely formatted that way. specifying msdos instead of vfat may help (or not). It _used_ to. Why not use yaffs or some other more suitable filesystem?

I pass you on Flon's Law:
Unfortunately it is a min. system fsck, geparted is missing, actually fsck was the first thing which came to my mind couldn't find a binary for the architecture. Guess I have to search a little bit deeper, or someone has a link ready

I'll check the system and my code maybe I find something odd.

I had no control over the SD-Cards since the warehouse installed them and the devices lacked capabilities like gparted. Did not thought I would run in such severe problems with the sd-cards anyway. Thank you for the tip with the filesystems I'll give it a shot.

@Flon's Law: true so true!

much appreciative
 
Old 09-10-2014, 01:37 PM   #5
business_kid
LQ Guru
 
Registered: Jan 2006
Location: Ireland
Distribution: Slackware, Slarm64 & Android
Posts: 16,292

Rep: Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322
What is the architecture? There is linux for nearly every architecture, and although it hasn't been said clearly, I am presuming linux/unix

Ok, what to do there is get access to a development system for the system, and try and compile dosfsck on it using the installed libs as far as possible. The dependencies are not many

Code:
bash-4.2$ ldd /sbin/dosfsck
	linux-vdso.so.1 (0x00007ffff9a85000)
	libc.so.6 => /lib64/libc.so.6 (0x00007fe49145c000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fe491853000)
bash-4.2$
As for checking the code if you spot stuff that way, you are very good. Go back to the places you sweated blood, there's where the mistakes are likely to be. Can you do anything to decrease open files? Combine data files and hold back on writing so often, perhaps?
 
Old 09-10-2014, 05:00 PM   #6
DevStf
LQ Newbie
 
Registered: Sep 2014
Posts: 3

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by business_kid View Post
What is the architecture? There is linux for nearly every architecture, and although it hasn't been said clearly, I am presuming linux/unix

Ok, what to do there is get access to a development system for the system, and try and compile dosfsck on it using the installed libs as far as possible. The dependencies are not many

Code:
bash-4.2$ ldd /sbin/dosfsck
	linux-vdso.so.1 (0x00007ffff9a85000)
	libc.so.6 => /lib64/libc.so.6 (0x00007fe49145c000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fe491853000)
bash-4.2$
As for checking the code if you spot stuff that way, you are very good. Go back to the places you sweated blood, there's where the mistakes are likely to be. Can you do anything to decrease open files? Combine data files and hold back on writing so often, perhaps?
Yes Gnu/Linux a Debian based distro. for embedded devices. Architecture armv4tl I believe ARMv4. I still have the setup: compiler etc. and will compile it that way. Just had some problems before with dependencies on libraries and was not so keen on compiling it myself. Guess I have to get the work done and stop being lazy

As for code checking I thought of the usual stuff: code review, looking for memory leaks, checking for unclosed socket/file desc., and ofc reducing read/write opertions

thx
 
Old 09-16-2014, 07:48 AM   #7
rtmistler
Moderator
 
Registered: Mar 2011
Location: USA
Distribution: MINT Debian, Angstrom, SUSE, Ubuntu, Debian
Posts: 9,882
Blog Entries: 13

Rep: Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930
Quote:
Originally Posted by DevStf View Post
Hi guys,

I have a nasty problem with my embedded linux kernel: "Linux 2.6.26.2-at91-tm #1 Wed Aug 4 11:33:17 MSD 2010 armv4tl unknown" Debian based embedded distro.

We have developed a std. C program which uses heavy read/write operations on a micro SD-Card.

Gnu/Linux is on a internal flash memory not the sd-card, unfortunately not enough space for storing lots of files!

After some time we got memory corruption in the filesystem (fat32) of the sd-cards. First synoptic is the sd-card is read only accessible and processes become into defuct state. Even if I drop a shell command over ssh the process hungs up and becomes defuct. Only solution is to hard reset the device.

After bootup fstab can't mount the sd-card. dmesg reveals "mmc0 -110 error can't initialize sd-card".
  1. Can I fire up the device /dev/mmcblk0p1 somehow to get formating done with dd?
  1. Any idea why the kernel can't handel processes any more (defunct processes)?
  1. Any ideas what can cause these memory corruption/prevention?

Unfortuantely I can't travel on site too the devices and replace sd-cards easily, so need to be done remotely.

Wearing of sd-cards ofc is an issue but after 2-3 month?

thx for ideas

lg stf
Minor editorial comment first is that when you do a list you do the following to accomplish the auto-numbering or auto-lettering:

[list=1][*]First list entry[*]Second list entry[*]Third list entry[/list] and that will appear like this:
  1. First list entry
  2. Second list entry
  3. Third list entry
If you specify "list=a" or "list=A" then the numbering will be lettering. So just a minor note on what's incorrect above.

As far as corruption on flash media, what you're doing is very bad. Especially if you have a 2-3 month life cycle for these cards. There are plenty of file systems which will "effectively" sprinkle the write locations around so that you don't over-exercise one particular portion of the flash storage too much in excess in comparison to the rest of it, so that the degradation of the flash as a whole will be more uniform. Problems there are if you say have very large files which take up some large percentage of the disk, then one result is that the relocating of files to different segments of the drive cannot be easily performed by the file system.

All fixed media has a lifetime, and this especially includes the MMC which probably has a lesser lifetime than the SD, microSD, or compact flash.

Granted that over the years, these devices have greatly increased the write cycle limitations for their technologies; however they are not limitless. And with computers, you can race through their write cycle limits pretty fast if you allow, or design that to happen.

Sounds like you have an embedded device which is doing "something." If that something is recording data events of a large amount, then the questions become whether or not every literal event is noteworthy or if you can process and summarize or only gather detailed information for time frames which constitute device use or engagement by someone. If it has to gather detailed data logs 100% of the time, then it does.

Consider that it may not need to do this, or if it is convenient to do this then maybe perpetual logs can be sent to a RAM drive and rotated so that when the system is actively working, you will be able to view those details. Then when a particular use case occurs you can then choose to write data to your disk. If you're constantly writing data and then offloading it one or more times per day or week, then consider instead a strategy where you mount an external drive and send your data off system to a network mounted drive. 100% repeated writing of a fixed drive such as a compact flash, SD, microSD, or MMC will eventually fail. Therefore find ways to minimize that and avoid doing this. Add a real disk drive if you can, or a much, much larger drive where you use a very low percentage of it so that the file system can therefore spread around the write locations on the disk. But if you multiply your storage space, don't then multiply your recording capacity, that just brings you right back to where you are.

Defunct processes means? Are these zombies? Are they hung processes? How are they administered, started, or monitored?

Here are some blogs on daemons and how they can be used to control processes, pipes and how they can be used to communicate large amounts of data, how to clean up zombie processes, and how to use logrotate. All C examples, I'm due to look at them and see what updates they might need, but they have some suggestions how to set up the management of your architecture to be able to monitor processes and restart them.

Creating a daemon to launch and monitor your processes

Using PIPES for Interprocess Communications

How to kill those Zombies

Logrotate - Not Just for the System
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
LXer: Memory Corruption – Debugging Tools in Linux LXer Syndicated Linux News 0 07-08-2013 07:50 PM
LXer: New To Linux Programming? Say Hello To Memory Corruption LXer Syndicated Linux News 0 07-01-2013 06:21 PM
Problems w/ Embedded Linux - SDCard corruption... (power cycling?) st2000 Linux - Embedded & Single-board computer 11 11-02-2011 08:05 PM
Memory Corruption on AGP Video Card?! beutin Linux - Hardware 0 07-10-2003 05:37 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 01:52 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration