ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Maybe you have answered a thousand times, still please help.
I am developing an application involving several programs on Linux which share-write to a local file, i.e. Multi-writers, perhaps some readers. I have two very basic questions about the file read-write operations:
1) when the writers and readers are working on high speed operations, how to make sure the reader either read a full record, or read nothing? Ok, each record has a header that tells the number of bytes in the record. Is the following ok?
typedef struct rec_tag {
int iLen; // entire rec length including this 'int'
char buffer[65536]; //max 64KB, arbitrary
} rec_t;
rec_t rec;
/*writer.cc------------------------------------------*/
int fd = open( "mylist", O_RDWR|O_CREAT|O_APPEND|O_SYNC, 00664 );
...
while (1) {
...
write( fd, &rec, sizeof(rec.iLen)+rec.iLen ); // write it in one-go.
/*reader.cc------------------------------------------*/
int fd = open( "mylist", O_RDONLY );
...
while (1) {
int xx = read( fd, &rec.iLen, sizeof(rec.iLen) );
if (xx == sizeof(rec.iLen)) {
int yy = read( fd, rec.buffer, rec.iLen );
...
Let's just assume sizeof(rec.iLen) is 4 for simplicity.
Would xx either be 0 (nothing read), 4 (good,okay), or -1 (error), and never be something between 0 and 4?
Would also yy even be 0?
/////////////////////////
2) If the above question is clarified, then to guarantee only 1 writer can write to the file at a time, is the following the right thing to do?
/*multiwriter.cc---------------------*/
int fd = open( "alerts", O_RDWR|O_CREAT|O_APPEND|O_SYNC, 00664 );
...
if (flock( fd, LOCK_EX ) == 0) {
// do the write() thing as in the first question.
flock( fd, LOCK_UN );
}
else {
// look for another chance to write
}
/*multireader.cc---------------------*/
int fd = open( "alerts", O_RDONLY );
...
// just read() as in the first question.
Some error checking steps are skipped for illustration purpose. I need to know if I'm doing it right. Please tell. Thanks very much.
Last edited by Jerry Mcguire; 11-04-2009 at 02:58 AM.
1) when the writers and readers are working on high speed operations, how to make sure the reader either read a full record, or read nothing? Ok, each record has a header that tells the number of bytes in the record. Is the following ok?
typedef struct rec_tag {
int iLen; // entire rec length including this 'int'
char buffer[65536]; //max 64KB, arbitrary
} rec_t;
rec_t rec;
That is easier than you seem to think. Reading and writing an arbitrary block using the read() and write() syscalls already is an atomical operation.
So, no locks needed, not even for writing a block, as long as you make sure:
Read() an entire struct at once.
Use O_APPEND on the open() call for adding new structs at the end of the file.
Don't use O_APPEND on the open() call if you want to overwrite struct that already exist in the file.
Then reading or writing the block (struct) either succeds entirely, or fails entirely returning an error (-1).
Note that it is possible that read() "half-succeeds", return less that the block size requested. But in that case there was not an complete block at the end of the file. So in that case the file was corrupt, thus in your case en error too.
I am not sure where you want to use iLen for... You said it is the struct length. But the struct length is always the same: char buffer[65536] is always 65536 bytes, and int iLen is always 4 bytes.
So I wonder why store iLen? If it is to indicate up to where the bytes in buffer[] contain valid data, then it makes sense to me, but in that case why have it include the 4 bytes of the int iLen itself?
If it is meant to first read iLen to know how much bytes to read nexr, it is wrong. Then read()-ing a struct is not atomical anymore (two read operations on one struct is by definition not atomical). Also it is not needed, since the structs are always the same size.
So I wonder why store iLen? If it is to indicate up to where the bytes in buffer[] contain valid data, then it makes sense to me, but in that case why have it include the 4 bytes of the int iLen itself?
If it is meant to first read iLen to know how much bytes to read nexr, it is wrong. Then read()-ing a struct is not atomical anymore (two read operations on one struct is by definition not atomical). Also it is not needed, since the structs are always the same size.
Thanks Hko. Indeed iLen stores the length of valid content following. So having a variable length record it is not possible to pass in a fixed length to read(). What should be done to overcome?
Thanks Hko. Indeed iLen stores the length of valid content following. So having a variable length record it is not possible to pass in a fixed length to read(). What should be done to overcome?
I suggest (and assumed when I wrote my previous post) to read and write entire records/structs. Use iLen to to indicate how many bytes of buffer are meaningful.
So even if you have, say, two meaningful bytes in buffer, read and write all 65535, but set iLen = 2.
This may cause some wasted space on the filesystem, but it is the easiest approach. Otherwise you should use locking, or it will become a complex issue to make sure reads/writes are atomical.
The amount of wasted diskspace can be mitigated, by lseek()-ing forward over the unused bytes in de buffers so the filesystem may be able to make the file(s) sparse (i.e. not realy wasting all the space).
Also you should try to make buffer as small as possible. e.g. if you never going to write more than 2000 bytes in buffer, just don't make it 65536 bytes big.
mmm... Please comment if my logic is making sense:
If the file layout contains fixed length records, simple read() and write() operations with the record should suffice, because write() is atomic.
If the file layout contains variable length content, notated as
{ length of content following, actual content }
in my case, then the reader is left with only 2 choices to process the file:
A) as mentioned earlier, do 2 read()'s: one for the length, one for the actual content.
B) do read() with a maximum possible length and process whatever is read until an incomplete content is hanging and repeat this process.
I think A) is better in many ways because it is by all means simpler, and because it can pick up where it left off by saving the lseek(fd,0,SEEK_CUR) offset after each processing of the record. (can't afford to repeat or skip data if anything dies except me).
[...]
in my case, then the reader is left with only 2 choices to process the file:
A) as mentioned earlier, do 2 read()'s: one for the length, one for the actual content.
...and hold a lock while doing two read()s...
Quote:
Originally Posted by Jerry Mcguire
B) do read() with a maximum possible length and process whatever is read until an incomplete content is hanging and repeat this process.
Depending on what your data in buffer is actually representing (i.e. is the meaningful chunk of data never more than the buffer size?), IMHO the best option is the first I mentioned: forget about iLen, and write/read blocks of 65536 at once (or what ever buffers size you choose, as long as it is the same all the time).
sorry by keep bothering you, please don't get mad with my nagging question marks.
I don't get it, if the writers can only append and never modify any written record, isn't it quite safe for the readers to traverse the file without locks?
If write() is atomic, then a record is always complete in the file as long as write() writes the length and content in one call.
So, if the reader is able to read the length portion, it is guaranteed to read the content portion next, would it not?
Just a side note: you should malloc this because it's quite a bit to put on the stack. Also, if you use stat/fstat, it will tell you the best size increment to read the file in, which is based on the block size of the underlying file system, if you're concerned with efficiency.
Kevin Barry
I don't get it, if the writers can only append and never modify any written record, isn't it quite safe for the readers to traverse the file without locks?
Yes. It is even safe if the writers do change a record that already exists, if writing happens atomically.
Quote:
Originally Posted by Jerry Mcguire
If write() is atomic, then a record is always complete in the file as long as write() writes the length and content in one call.
Yes.
Quote:
Originally Posted by Jerry Mcguire
So, if the reader is able to read the length portion, it is guaranteed to read the content portion next, would it not?
Yes, provided that (again) writing occurs atomically..
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.