ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I am starting to work on serializing structures in C. Does anybody know any related resources on net?.
I know for a matter of fact that a C structure can be written on a file and then
re-read successfully at some later stage (excluding the pointers issue for now).
However i am thinking about developing a generic framework for this purpose. Which requires consideration of a number things for example the varied behaviour of writing structures on disk on different platforms etc.
Who knows this thing may go on to become a small open source project
Anybody interested or capable of helping please comment/participate.
When you write a structure in binary to disk, you are 'serializing' the whole structure to disk. You can anytime read the structure fromthe disk. This is fairly basic.. Or am I missing something in what you are saying?
When you write a structure in binary to disk, you are 'serializing' the whole structure to disk. You can anytime read the structure fromthe disk. This is fairly basic.. Or am I missing something in what you are saying?
A generic approach is slightly more involved than just writing the structure to disk. Problems that will need to be overcome include:
The Endianness of the machine
Expanding pointers
Avoiding cyclic loops whilst following pointers
This means that knowledge of the structure is important, which is where xml could step in to provide that in a portable format.
I would suggest you read Chapter 5 of Eric S Raymond's "The Art of Unix Programming".
If possible, the output should be texual. Remember that different processors will save and read binary data differently. Some use big endian, others use little endian. Some use 32 bit integers and other use 64 bit. The same program compiled on different machines might produce non-portable files.
The format that works best depends on the type of information being saved. If you can use a simple Windows 3.1 INI format that may be fine for your application. For structured records consider how the /etc/password file is structured. XML may be the choice for complicated structures, but it is hard to work with using the standard textual tools such as grep, sed and awk. Also, using it could make a program unneccesarily bulky. Open Office uses it, but it is already bulky.
I like Hko's idea of making the serialization artchitecture independent. This can be a very cool medium/long term target. However initially we have to
deal with problems highlighted by graemef.
Personally i would like to deal with the problem of "Expanding Pointers". This also include multiple sub-problems like handling indirections (for e.g what to do with an int********* ?).
Serialization of PODs is not an issue at all. As we all know a structure containing POD data types can simply be written to the file and then read at some later
stage. But when a structure contains a pointer data type, the data written to the file would be the "address" which that pointer contains, which of no use to us.
I am working on this issue besides my busy schedule at work. Hence the progress would be a little slow. However i'll post on this forum as soon as i find something new and request the same from you guys.
If someone can find relevant reading material on the net, plz post the link here.
Serialization of PODs is not an issue at all. As we all know a structure containing POD data types can simply be written to the file and then read at some laterstage.
I'm not sure we all know this.
endianness
length
newlines
signedness
alignment
padding
char is the only type I can think of that doesn't have cross-platform problems when writing a structure to disk, and actually it kind of has problems too
Or are ints and chars not what you meant by "POD"? I don't see the acronym a lot but guess it means "plain old datatype"
I think that the idea is that a int would be saved in an xml style format (hence a character string) as follows:
<int>46</int>
whilst a double might be saved as:
<double>3.1415</double>
This addresses many of the internal problems of how data is stored. But it does mean that it is important to be able to find out what the datatype is and that is not trivial just consider the difference between char *, char [] and char[23] How do you identify the three apart?
char is the only type I can think of that doesn't have cross-platform problems when writing a structure to disk, and actually it kind of has problems too
Or are ints and chars not what you meant by "POD"? I don't see the acronym a lot but guess it means "plain old datatype"
Rephrasing by earlier statement:
Serialization of PODs is not an issue at all. Limiting the implementation to a single platform for now, we all know that a structure containing POD data types can simply be written to the file and then read at some laterstage.
Serialization of PODs is not an issue at all. Limiting the implementation to a single platform for now, we all know that a structure containing POD data types can simply be written to the file and then read at some laterstage.
Of course it's an issue, even on a single system. Programs compiled with different compiler options/optimizations might use different alignment etc.
I suggest re-reading jschiwal's post several times. All output should be textual without a very good reason to do otherwise.
A generic serialization library sounds pretty inefficient to me. An alternative would be to have some sort of program/script which would create the i/o functions for each structure at compile time. Have a look at the src for Freeciv, they have a python script which does exactly this.
by single platform i meant a single compiler, os, hardware etc etc.
Textual output is one good alternative but it may prove expensive as compared to binary output.
Data written in the form of text *has* to be converted back to its binary form during reading. Whereas if data is written as binary you dont have to face this overhead.
Quote:
A generic serialization library sounds pretty inefficient to me
Again with text vs binary .... the text version would be inefficient.
Quote:
An alternative would be to have some sort of program/script which would create the i/o functions for each structure at compile time. Have a look at the src for Freeciv, they have a python script which does exactly this.
Thanks, i'll check it out.
Currently i am not for/against any approach. Just considering the merits/demerits of all possible approaches.
Textual output is one good alternative but it may prove expensive as compared to binary output.
Data written in the form of text *has* to be converted back to its binary form during reading. Whereas if data is written as binary you dont have to face this overhead.
Efficiency isn't the point with text files. Data files should preferably be human readable whenever possible (so that a person doesn't need special tools to read them), though of course, text files are not appropriate for everything.
What I meant was, it would be inefficient compared to a custom coded function that writes binary output (I should have been more specific).
Quote:
Currently i am not for/against any approach. Just considering the merits/demerits of all possible approaches.
Indeed. My objections were just off the top of my head, I haven't investigated the idea in any great depth.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.