Serializing C structures

sibtay · 05-03-2006, 07:59 AM

Hi

I am starting to work on serializing structures in C. Does anybody know any related resources on net?.

I know for a matter of fact that a C structure can be written on a file and then
re-read successfully at some later stage (excluding the pointers issue for now).

However i am thinking about developing a generic framework for this purpose. Which requires consideration of a number things for example the varied behaviour of writing structures on disk on different platforms etc.

Who knows this thing may go on to become a small open source project

Anybody interested or capable of helping please comment/participate.

Thanks

Hko · 05-04-2006, 03:59 AM

I would not be surprised if such a library already exists somewhere. But I do like like the idea.

How about storing it architecture-independant in XML?

ppanyam · 05-04-2006, 06:38 AM

When you write a structure in binary to disk, you are 'serializing' the whole structure to disk. You can anytime read the structure fromthe disk. This is fairly basic.. Or am I missing something in what you are saying?

graemef · 05-04-2006, 07:23 AM

Quote:

Originally Posted by ppanyam

When you write a structure in binary to disk, you are 'serializing' the whole structure to disk. You can anytime read the structure fromthe disk. This is fairly basic.. Or am I missing something in what you are saying?

A generic approach is slightly more involved than just writing the structure to disk. Problems that will need to be overcome include:

The Endianness of the machine
Expanding pointers
Avoiding cyclic loops whilst following pointers

This means that knowledge of the structure is important, which is where xml could step in to provide that in a portable format.

jschiwal · 05-04-2006, 07:24 AM

I would suggest you read Chapter 5 of Eric S Raymond's "The Art of Unix Programming".

If possible, the output should be texual. Remember that different processors will save and read binary data differently. Some use big endian, others use little endian. Some use 32 bit integers and other use 64 bit. The same program compiled on different machines might produce non-portable files.

The format that works best depends on the type of information being saved. If you can use a simple Windows 3.1 INI format that may be fine for your application. For structured records consider how the /etc/password file is structured. XML may be the choice for complicated structures, but it is hard to work with using the standard textual tools such as grep, sed and awk. Also, using it could make a program unneccesarily bulky. Open Office uses it, but it is already bulky.

sibtay · 05-04-2006, 08:55 AM

great input guys, thanks

I like Hko's idea of making the serialization artchitecture independent. This can be a very cool medium/long term target. However initially we have to
deal with problems highlighted by graemef.

Personally i would like to deal with the problem of "Expanding Pointers". This also include multiple sub-problems like handling indirections (for e.g what to do with an int********* ?).

Serialization of PODs is not an issue at all. As we all know a structure containing POD data types can simply be written to the file and then read at some later
stage. But when a structure contains a pointer data type, the data written to the file would be the "address" which that pointer contains, which of no use to us.

I am working on this issue besides my busy schedule at work. Hence the progress would be a little slow. However i'll post on this forum as soon as i find something new and request the same from you guys.

If someone can find relevant reading material on the net, plz post the link here.

Thanks,
Sibtay

aluser · 05-04-2006, 03:25 PM

Quote:

Originally Posted by sibtay

Serialization of PODs is not an issue at all. As we all know a structure containing POD data types can simply be written to the file and then read at some laterstage.

I'm not sure we all know this.

endianness
length
newlines
signedness
alignment
padding

char is the only type I can think of that doesn't have cross-platform problems when writing a structure to disk, and actually it kind of has problems too

Or are ints and chars not what you meant by "POD"? I don't see the acronym a lot but guess it means "plain old datatype"

graemef · 05-04-2006, 04:31 PM

I think that the idea is that a int would be saved in an xml style format (hence a character string) as follows:
<int>46</int>
whilst a double might be saved as:
<double>3.1415</double>

This addresses many of the internal problems of how data is stored. But it does mean that it is important to be able to find out what the datatype is and that is not trivial just consider the difference between char *, char [] and char[23] How do you identify the three apart?

sibtay · 05-05-2006, 03:40 AM

Quote:

Originally Posted by aluser

I'm not sure we all know this.

endianness
length
newlines
signedness
alignment
padding

char is the only type I can think of that doesn't have cross-platform problems when writing a structure to disk, and actually it kind of has problems too

Or are ints and chars not what you meant by "POD"? I don't see the acronym a lot but guess it means "plain old datatype"

Rephrasing by earlier statement:

Serialization of PODs is not an issue at all. Limiting the implementation to a single platform for now, we all know that a structure containing POD data types can simply be written to the file and then read at some laterstage.

ioerror · 05-05-2006, 07:32 AM

Quote:

Serialization of PODs is not an issue at all. Limiting the implementation to a single platform for now, we all know that a structure containing POD data types can simply be written to the file and then read at some laterstage.

Of course it's an issue, even on a single system. Programs compiled with different compiler options/optimizations might use different alignment etc.

I suggest re-reading jschiwal's post several times. All output should be textual without a very good reason to do otherwise.

A generic serialization library sounds pretty inefficient to me. An alternative would be to have some sort of program/script which would create the i/o functions for each structure at compile time. Have a look at the src for Freeciv, they have a python script which does exactly this.

sibtay · 05-05-2006, 07:51 AM

by single platform i meant a single compiler, os, hardware etc etc.

Textual output is one good alternative but it may prove expensive as compared to binary output.

Data written in the form of text *has* to be converted back to its binary form during reading. Whereas if data is written as binary you dont have to face this overhead.

Quote:

A generic serialization library sounds pretty inefficient to me

Again with text vs binary .... the text version would be inefficient.

Quote:

An alternative would be to have some sort of program/script which would create the i/o functions for each structure at compile time. Have a look at the src for Freeciv, they have a python script which does exactly this.

Thanks, i'll check it out.

Currently i am not for/against any approach. Just considering the merits/demerits of all possible approaches.

regards,
Sibtay

ioerror · 05-05-2006, 08:46 AM

Quote:

Textual output is one good alternative but it may prove expensive as compared to binary output.

Data written in the form of text *has* to be converted back to its binary form during reading. Whereas if data is written as binary you dont have to face this overhead.

Efficiency isn't the point with text files. Data files should preferably be human readable whenever possible (so that a person doesn't need special tools to read them), though of course, text files are not appropriate for everything.

What I meant was, it would be inefficient compared to a custom coded function that writes binary output (I should have been more specific).

Quote:

Currently i am not for/against any approach. Just considering the merits/demerits of all possible approaches.

Indeed. My objections were just off the top of my head, I haven't investigated the idea in any great depth.