Saving file data using Python in an embedded system in an safe and fast way
Hi, I am developing a program in a system where the Linux does not take care of the sync command automatically. So I have to run it from my application always I save some data in the disk, which in my case is a 2GB sdcard.
It is true that I can make the operation system takes care of the syncronization, using a proper mount option, but in this case the programm's performance drops drastically. In particular I use the shelve module from Python to save data that comes from a socket/TCP connection and I have to deal with the potencial risk of the system being turned off suddenly Initially I wrote something like that to save data using shelve: Code:
def saveData(vo) Note that I use the sync from the OS every time I close a file to prevent data corruption in the case of the "computer" being turned off with data even in the buffer. To improve the performance I made something like that: Code:
def saveListData( list ) However I would like to know if adding a lot of objects before closing the file would increase the risk of data corruption. I known that turning off the system after fd.close() and before os.sync may cause problems. But what about turning off the system after Code:
fd = shelve.open('file_name', 'c') Thanks for any sugestion. |
Just a few thoughts - do you really need the overhead of shelve? Why not just use cPickle? Also keeping the files open for as short a time as possible will cut down on the risks of data loss. I've made these and a few other changes to try and address some of your concerns. Not knowing much about your system it's difficult to offer any more suggestions. Anyway it may be useful in some ways so here it is:
Code:
import cPickle |
Thanks bgeddy for your reply.
Quote:
My problem is that some times I have to save thousands of objects and this process turns to be very slow, mainly because of the time spent during the steps of opening and closing the file followed by the syncronization. About pickle. As I must save many objects and its keys in a file for later use, I am afraid that pickle is not well suited, because I have understood that in Pickle it is saved just one object in each file, so I will not be able to search objects using keys. But, of course, if it is possible to use keys with pickle and it is faster, I will prefer that. |
You might also investigate the filesystem being used. Some filesystems may be faster for what you're trying to do, if changing is an option.
If not, you might see if you can open using the O_DIRECT option, which guarantees a sync() when you write. |
Quote:
I have noticed that having to apply the sync command constantly is not the main problem. I have measured the time taken by the process of opening and closing the file and also the time taken by the sync. When the the size of the shelve file is around 5MB then it takes 30 seconds to add 50 objects at once, then more 4 seconds to apply the sync command. Thus the main difficult is that the performance goes down as the file size increases. |
Quote:
Quote:
Quote:
Quote:
Apparently there is a way to have O_DIRECT file accesses in python by memory aligning the buffers used by using mmap and using buffers which must be a multiple of the logical block size of the file system. This adds a fair bit of complication and only works for raw system file descriptors - not python file objects. There is a interesting post here on the subject. Changing the filesystem, as suggested by orgcandman, could help, a good suggestion. Hopefully someone with directly relevant experience of systems such as yours, which obviously I don't have, will chip in. |
Quote:
When the file is small, like 500KB then the time to save data is small, like 1 second to save 50 objects. About a better filesystem to use, if someone could give me a sugestion ... Here is my processor: Code:
cat /proc/cpuinfo Thanks again. |
Quote:
|
All times are GMT -5. The time now is 10:15 PM. |