LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Embedded & Single-board computer (https://www.linuxquestions.org/questions/linux-embedded-and-single-board-computer-78/)
-   -   Problems w/ Embedded Linux - SDCard corruption... (power cycling?) (https://www.linuxquestions.org/questions/linux-embedded-and-single-board-computer-78/problems-w-embedded-linux-sdcard-corruption-power-cycling-909560/)

st2000 10-22-2011 12:31 PM

Problems w/ Embedded Linux - SDCard corruption... (power cycling?)
 
Hi...

This thread may morph once or twice before conclusions can be made. I'll try to edit the title and / or start new threads to keep things on track.

What I would like to know is what are people doing to avoid corrupting SDCards in an embedded Linux box?

I know, don't power down the box. Well, I've no control over that. So far, I've had it suggested to me to mount the SDCard using the "sync" switch (I think) to inhibit disk caching. Hum, I'll have to try that. Do any of you have other ideas I might try?

But what of the root directory? That is, on most embedded Linux boxes I have seen, the root directory is installed in a flash chip on the PCB. What experiences have others had with that? It occurs to me that if the SDCard can be corrupted, that the root directory can be just as easily corrupted.

You know, I had thought the embedded Linux root directory was a tar ball extracted from FLASH and written to RAM upon every boot up. That would almost eliminate all possible permanent corruption problems. If the RAM image were corrupted, just reboot and all would be fine. But on the platform we are OEM'ing, I have see files stick around in the root directory tree. To me that means the root file system is just sitting in FLASH and if corrupted it will always be corrupted.

Just what is the Standard Operating Procedure when it comes to booting an embedded Linux box??

-thanks

zootboy 10-22-2011 06:42 PM

I'm curious, what are your concerns here? Power failure? The card being pulled out? Poorly written software? I've booted computers off thumb drives many, many times and I've never had one corrupt on me. Is this a problem you're currently experiencing?

But for some general info, a lot of thumbdrive linux distros use squashfs to store a filesystem in a single file, then unpack it into RAM. This can help with your "restore to working state on reboot" issue. But please, tell us more about what you're trying to do. The more details we know, the better we can help.

st2000 10-23-2011 02:40 AM

Quote:

Originally Posted by zootboy (Post 4505499)
I'm curious, what are your concerns here? Power failure? The card being pulled out? Poorly written software? I've booted computers off thumb drives many, many times and I've never had one corrupt on me. Is this a problem you're currently experiencing?

Power cord being pulled out, SDCard being pulled out and/or the power being cut off at the end of the day. Mostly the last one. On top of that Qt is being used and it writes to the SDCard. FYI, you can probably think of Qt as a set of C library calls that abstract the hardware from GUI oriented C programs. The abstraction includes writing to the SDCard.

Quote:

Originally Posted by zootboy (Post 4505499)
But for some general info, a lot of thumbdrive linux distros use squashfs to store a filesystem in a single file, then unpack it into RAM. This can help with your "restore to working state on reboot" issue. But please, tell us more about what you're trying to do. The more details we know, the better we can help.

I suspect the OEM of the Linux board is not using something like squashfs. I will ask specifically about this. But I believe I have seen changes to the root file system persist between power ups. Indicating we are running out of FLASH, not RAM.

In a perfect system we would unmount the SDCard before powering down the Linux box. However this is probably not going to happen. Even if we make provisions for controlling mounting in the Qt program, it is more likely the power will simply be removed.

To prevent corruption I am considering mounting the SDCard with out disk caching enabled. I believe the mount command switch is simply "sync". But I don't have any embedded Linux experience with this approach. It may slow down the Linux box drastically or worse cause a premature failure of the SDCard by increasing write events.

I am also considering issuing the sync command several times in the script that kicks off the Qt application. But doubt that will help much.

So I am here looking for other ways to mitigate SDCard corruption due to powering off the device with out unmounting the mass storage device.

I thought of this forum because I thought you all might have had to wrestle with these problems already. I would think embedded Linux boxes, not being regarded as computers but rather appliances, would likely be power cycled all the time.

-thanks

zootboy 10-23-2011 11:03 AM

OK, you seem to have left out some pertinent details in your first post.

1. You are using some specific, pre-made device that you appear to have little or no control over.
2. This system is running some sort of purpose-built software (using Qt) that you have little or no control over.
3. The system already runs on an SD card in some manner, and you're trying to augment/modify the system to prevent corruption.

Am I right in these assumption? And I'll ask again, are you actually experiencing corruption? This is important, because the type of corruption can be very telling.

st2000 10-24-2011 08:48 AM

Quote:

Originally Posted by zootboy (Post 4505867)
OK, you seem to have left out some pertinent details in your first post.

Apologies, I don't mean to drag this out. It's rather humorous as I usually have to temper my self from writing posts that are too long.
Quote:

Originally Posted by zootboy (Post 4505867)
1. You are using some specific, pre-made device that you appear to have little or no control over.

That's just about right. I can ask for features. But I have to convince two groups (ours and theirs (remember this is OEM'ed)) it's the right thing to do. I would much rather solve this locally.
Quote:

Originally Posted by zootboy (Post 4505867)
2. This system is running some sort of purpose-built software (using Qt) that you have little or no control over.

Actually, we wrote the Qt application. What we do not have control over is the Qt environment we are running in. Much like the embedded Linux, the Qt environment is "pre-packaged" on the device.
Quote:

Originally Posted by zootboy (Post 4505867)
3. The system already runs on an SD card in some manner, and you're trying to augment/modify the system to prevent corruption.

Our application, data base and Qt *.ini files are stored on the SDCard. The Linux root file system is stored in the PCB's flash. I have questions out to the OEM asking if it is being run out of RAM or FLASH.
Quote:

Originally Posted by zootboy (Post 4505867)
Am I right in these assumption? And I'll ask again, are you actually experiencing corruption? This is important, because the type of corruption can be very telling.

To be honest, it is debatable where our SDCards are being corrupted. Did we pull them from the PC w/o ejecting them (we mostly use Win7 machines) or was it during a power cycle of the embedded Linux box. The situation is very difficult to re-create. Right now I can say SDCards out of a card duplicator using an SDCard image (not file) copying process has resulted in SDCards where, after a power cycle, the embedded Linux box did not boot up again. So it does sound like cards can be corrupted in the embedded Linux box.

Going on, it gets a bit more difficult to explain. I have seen SDCards that the embedded Linux box will not mount. But the same cards can mount w/o issue on Win7, and WinXP boxes. One of them even mounted on my Ubuntu laptop. To make matters worse, I reformatted the SDCard I could not mount on the embedded Linux system. As expected it mounted (as before) on the Win7 box. But it STILL did not mount on the embedded Linux box. Not until after an image copy from a good SDCard in the SDCard duplicator did the bad SDCard mount on the embedded Linux box.

I suspect the MBR because, I believe, reformatting only effects the partition and not the MBR. But why such varied behavior? Why does the embedded Linux box have such a hard time mounting the SDCard. Why are Win7, WinXP and Linux so tolerant, if the MBR is causing the problem?

-thanks

zootboy 10-24-2011 09:33 AM

Jeez. You have the worst kind of problem to troubleshoot; intermittent and hard to reproduce. I can see how frustrating your issue is.

Unfortunately, it seems that your embedded system may not have full filesystem/mounting software installed. When you use a full OS (Win/Lin), they have more tools at their disposal to deal with potential issues. Your embedded box seems to be lacking, so if it encounters an error, it has no choice but to die (all speculation on my part, seeing as I don't really know your system).

Is there any way for you to extract log files from your embedded device? Any sort of error message would be very helpful.

Also, have you tried doing a bit comparison of a "corrupt" card with a "good" card? That might be able to tell you what's getting corrupted, though you may have to climb down into the sub-filesystem muck to get any usable info.

st2000 10-25-2011 09:24 AM

Quote:

Originally Posted by zootboy (Post 4506574)
Is there any way for you to extract log files from your embedded device? Any sort of error message would be very helpful.

I do have a console connection. I can log it and find the bits that pertain to the SDCard. Perhaps this is a good time to ask: What is the process called which connects the SDCard to the device file? I always assumed the "activation" of a device file was the fall out of a happy driver. That is, a driver that found everything in order with the device (the SDCard in this case) it is trying to support. It seams to me this is not occurring. Because, when I try to mount the device, the mount command returns that there is no device. I have never tried to manually activate the SDCard device driver. I suspect it autonomously runs based on the small SDCard socket "card present" switch. I suppose I could attempt to mechanically intervene to see if I can try several times to activate the driver.

Quote:

Originally Posted by zootboy (Post 4506574)
Also, have you tried doing a bit comparison of a "corrupt" card with a "good" card? That might be able to tell you what's getting corrupted, though you may have to climb down into the sub-filesystem muck to get any usable info.

No, this will probably be the next step. Unfortunately it requires a byte by byte copy of multiple SDCards before one of them get corrupted. That is, the cards had to originally be exactly the same. During testing, re-programming and more testing of the SDCard, there is no doubt that they are different. However, the MBR (the first 512 bytes?) should be the same no matter what. Well - then again - that's not really true - is it. The MBR does contain (I think - obsolete) x86 code to instruct an old IBM-PC how to boot up the OS stored somewhere else on that mass storage device. From what I have read - there's no telling what that part of the MBR looks like these days.

zootboy 10-26-2011 08:49 PM

That's interesting. What you're saying is that the device file (e.g. /dev/sdb1) is not being created when a "corrupt" SD card is being put in. Is that right?

If that's the case, I would almost wonder if it's the embedded device that's the problem. Have you tried several different units with the same "corrupt" card? I know that wouldn't explain why Ubuntu won't mount it, but it would be nice to try and rule that possibility out.

And yes, AFAIK, when a mass storage driver loads properly, the kernel will generate the proper device file. A good way to check what's happening is to do a "dmesg | tail -f", then insert the card. This will display the kernel logs, which should indicate a new storage device has been added. You may want to try that and see what you get.

You are correct on how the MBR works. There is some good info in the Wikipedia article on FAT. Basically, the boot section is just a jump vector to the first OS boot instruction sector. There is a bunch of other stuff there too, but none of it should be modified in normal read/write ops (AFAIK). It's certainly not obsolete, though. That's how bootable thumbdrives work.

st2000 11-01-2011 09:24 AM

Thanks for all the help so far. I still have not tried the dmesg trick. It is an embedded system though and I don't know if that command is available.

The Wikipedia page is great. I'll have to take the time to read it thoroughly.

Don't know if I mentioned these other problem constraints that we have found so far: That the dd image from a bad SDcard can be dd'ed to a good SDCard and the good card acts like the bad one. That a good dd SDCard image put on a bad SDcard will make the bad SDCard good again. And, of course, formatting the bad SDCard on a Win7 box makes not difference at all.

Now, after looking briefly at the Wikipedia page, I am wondering what part the x86 code found in the MBR plays? I understand that this code is used to tell the computer how to boot up the OS on the mass storage device the MBR is on. We are not booting from the SDCard. But, we are also not running on an x86 target. This is an ARM based target. I wonder. All the large computers that can work w/the bad SDCard are x86 boxes. Sure there is Win & Linux this and that. But they are all running on x86 boxes. Could that be the reason the bad SDCards are not working on the target?

zootboy 11-01-2011 02:49 PM

It's entirely possible that something either within ARM or the ARM drivers is causing your issue, but unfortunately this is where my experience and knowledge ends. I've never done work this low-level before, and have only ever casually used ARM systems.

Your DD trick, along with the info that formatting (at least on windows) seems to have no effect, seems to reinforce the idea that it's something in the file table / MBR that's causing your issue. I'm sorry to say that that's all the help I can offer for now.

The dmesg command should be in most linux systems. I would be surprised if it wasn't, but then again, this whole issue is a bit surprising.

One suggestion I would make is to try to find some other ARM device (phone, PDA, etc.) and stick a "corrupted" SD card in it and see if it recognizes.

st2000 11-02-2011 02:36 AM

Quote:

One suggestion I would make is to try to find some other ARM device (phone, PDA, etc.) and stick a "corrupted" SD card in it and see if it recognizes.
That's a great idea! I've an old PalmOS product. Not sure if its a Motorola Dragon Ball or something else. But it is sure not to be x89 based. Unfortunately it is also not a Linux based product. I'll try that tomorrow.

-thanks

zootboy 11-02-2011 08:05 PM

If it's a relatively newer device, it's likely to be running on ARM. But either way, it would be good to test the corrupt cards in a range of embedded/"limited" devices (i.e. not PCs).


All times are GMT -5. The time now is 03:29 PM.