LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 11-04-2009, 01:54 AM   #1
Jerry Mcguire
Member
 
Registered: Jul 2009
Location: Hong Kong SAR
Distribution: RedHat, Fedora
Posts: 201

Rep: Reputation: 31
shared write on local file


Hi all,

Maybe you have answered a thousand times, still please help.

I am developing an application involving several programs on Linux which share-write to a local file, i.e. Multi-writers, perhaps some readers. I have two very basic questions about the file read-write operations:

1) when the writers and readers are working on high speed operations, how to make sure the reader either read a full record, or read nothing? Ok, each record has a header that tells the number of bytes in the record. Is the following ok?

typedef struct rec_tag {
int iLen; // entire rec length including this 'int'
char buffer[65536]; //max 64KB, arbitrary
} rec_t;

rec_t rec;

/*writer.cc------------------------------------------*/
int fd = open( "mylist", O_RDWR|O_CREAT|O_APPEND|O_SYNC, 00664 );
...
while (1) {
...
write( fd, &rec, sizeof(rec.iLen)+rec.iLen ); // write it in one-go.


/*reader.cc------------------------------------------*/
int fd = open( "mylist", O_RDONLY );
...
while (1) {
int xx = read( fd, &rec.iLen, sizeof(rec.iLen) );
if (xx == sizeof(rec.iLen)) {
int yy = read( fd, rec.buffer, rec.iLen );
...

Let's just assume sizeof(rec.iLen) is 4 for simplicity.
Would xx either be 0 (nothing read), 4 (good,okay), or -1 (error), and never be something between 0 and 4?
Would also yy even be 0?

/////////////////////////
2) If the above question is clarified, then to guarantee only 1 writer can write to the file at a time, is the following the right thing to do?

/*multiwriter.cc---------------------*/
int fd = open( "alerts", O_RDWR|O_CREAT|O_APPEND|O_SYNC, 00664 );
...
if (flock( fd, LOCK_EX ) == 0) {
// do the write() thing as in the first question.
flock( fd, LOCK_UN );
}
else {
// look for another chance to write
}

/*multireader.cc---------------------*/
int fd = open( "alerts", O_RDONLY );
...
// just read() as in the first question.


Some error checking steps are skipped for illustration purpose. I need to know if I'm doing it right. Please tell. Thanks very much.

Last edited by Jerry Mcguire; 11-04-2009 at 02:58 AM.
 
Old 11-04-2009, 02:58 AM   #2
Hko
Senior Member
 
Registered: Aug 2002
Location: Groningen, The Netherlands
Distribution: Debian
Posts: 2,536

Rep: Reputation: 111Reputation: 111
Quote:
Originally Posted by Jerry Mcguire View Post
1) when the writers and readers are working on high speed operations, how to make sure the reader either read a full record, or read nothing? Ok, each record has a header that tells the number of bytes in the record. Is the following ok?

typedef struct rec_tag {
int iLen; // entire rec length including this 'int'
char buffer[65536]; //max 64KB, arbitrary
} rec_t;

rec_t rec;
That is easier than you seem to think. Reading and writing an arbitrary block using the read() and write() syscalls already is an atomical operation.

So, no locks needed, not even for writing a block, as long as you make sure:
  • Read() an entire struct at once.
  • Use O_APPEND on the open() call for adding new structs at the end of the file.
  • Don't use O_APPEND on the open() call if you want to overwrite struct that already exist in the file.

Then reading or writing the block (struct) either succeds entirely, or fails entirely returning an error (-1).

Note that it is possible that read() "half-succeeds", return less that the block size requested. But in that case there was not an complete block at the end of the file. So in that case the file was corrupt, thus in your case en error too.

I am not sure where you want to use iLen for... You said it is the struct length. But the struct length is always the same: char buffer[65536] is always 65536 bytes, and int iLen is always 4 bytes.

So I wonder why store iLen? If it is to indicate up to where the bytes in buffer[] contain valid data, then it makes sense to me, but in that case why have it include the 4 bytes of the int iLen itself?

If it is meant to first read iLen to know how much bytes to read nexr, it is wrong. Then read()-ing a struct is not atomical anymore (two read operations on one struct is by definition not atomical). Also it is not needed, since the structs are always the same size.
 
Old 11-04-2009, 03:05 AM   #3
Jerry Mcguire
Member
 
Registered: Jul 2009
Location: Hong Kong SAR
Distribution: RedHat, Fedora
Posts: 201

Original Poster
Rep: Reputation: 31
Quote:
Originally Posted by Hko View Post
So I wonder why store iLen? If it is to indicate up to where the bytes in buffer[] contain valid data, then it makes sense to me, but in that case why have it include the 4 bytes of the int iLen itself?

If it is meant to first read iLen to know how much bytes to read nexr, it is wrong. Then read()-ing a struct is not atomical anymore (two read operations on one struct is by definition not atomical). Also it is not needed, since the structs are always the same size.
Thanks Hko. Indeed iLen stores the length of valid content following. So having a variable length record it is not possible to pass in a fixed length to read(). What should be done to overcome?
 
Old 11-04-2009, 03:44 AM   #4
Hko
Senior Member
 
Registered: Aug 2002
Location: Groningen, The Netherlands
Distribution: Debian
Posts: 2,536

Rep: Reputation: 111Reputation: 111
Quote:
Originally Posted by Jerry Mcguire View Post
Thanks Hko. Indeed iLen stores the length of valid content following. So having a variable length record it is not possible to pass in a fixed length to read(). What should be done to overcome?
I suggest (and assumed when I wrote my previous post) to read and write entire records/structs. Use iLen to to indicate how many bytes of buffer are meaningful.

So even if you have, say, two meaningful bytes in buffer, read and write all 65535, but set iLen = 2.

This may cause some wasted space on the filesystem, but it is the easiest approach. Otherwise you should use locking, or it will become a complex issue to make sure reads/writes are atomical.

The amount of wasted diskspace can be mitigated, by lseek()-ing forward over the unused bytes in de buffers so the filesystem may be able to make the file(s) sparse (i.e. not realy wasting all the space).

Also you should try to make buffer as small as possible. e.g. if you never going to write more than 2000 bytes in buffer, just don't make it 65536 bytes big.
 
Old 11-04-2009, 04:28 AM   #5
Jerry Mcguire
Member
 
Registered: Jul 2009
Location: Hong Kong SAR
Distribution: RedHat, Fedora
Posts: 201

Original Poster
Rep: Reputation: 31
mmm... Please comment if my logic is making sense:

If the file layout contains fixed length records, simple read() and write() operations with the record should suffice, because write() is atomic.

If the file layout contains variable length content, notated as
{ length of content following, actual content }
in my case, then the reader is left with only 2 choices to process the file:

A) as mentioned earlier, do 2 read()'s: one for the length, one for the actual content.

B) do read() with a maximum possible length and process whatever is read until an incomplete content is hanging and repeat this process.


I think A) is better in many ways because it is by all means simpler, and because it can pick up where it left off by saving the lseek(fd,0,SEEK_CUR) offset after each processing of the record. (can't afford to repeat or skip data if anything dies except me).

??
 
Old 11-04-2009, 06:02 AM   #6
Hko
Senior Member
 
Registered: Aug 2002
Location: Groningen, The Netherlands
Distribution: Debian
Posts: 2,536

Rep: Reputation: 111Reputation: 111
Quote:
Originally Posted by Jerry Mcguire View Post
[...]
in my case, then the reader is left with only 2 choices to process the file:

A) as mentioned earlier, do 2 read()'s: one for the length, one for the actual content.
...and hold a lock while doing two read()s...

Quote:
Originally Posted by Jerry Mcguire View Post
B) do read() with a maximum possible length and process whatever is read until an incomplete content is hanging and repeat this process.
Depending on what your data in buffer is actually representing (i.e. is the meaningful chunk of data never more than the buffer size?), IMHO the best option is the first I mentioned: forget about iLen, and write/read blocks of 65536 at once (or what ever buffers size you choose, as long as it is the same all the time).
 
Old 11-04-2009, 07:41 PM   #7
Jerry Mcguire
Member
 
Registered: Jul 2009
Location: Hong Kong SAR
Distribution: RedHat, Fedora
Posts: 201

Original Poster
Rep: Reputation: 31
sorry by keep bothering you, please don't get mad with my nagging question marks.
I don't get it, if the writers can only append and never modify any written record, isn't it quite safe for the readers to traverse the file without locks?

If write() is atomic, then a record is always complete in the file as long as write() writes the length and content in one call.

So, if the reader is able to read the length portion, it is guaranteed to read the content portion next, would it not?
 
Old 11-04-2009, 08:15 PM   #8
ta0kira
Senior Member
 
Registered: Sep 2004
Distribution: FreeBSD 9.1, Kubuntu 12.10
Posts: 3,078

Rep: Reputation: Disabled
Quote:
Originally Posted by Jerry Mcguire View Post
char buffer[65536]; //max 64KB, arbitrary
Just a side note: you should malloc this because it's quite a bit to put on the stack. Also, if you use stat/fstat, it will tell you the best size increment to read the file in, which is based on the block size of the underlying file system, if you're concerned with efficiency.
Kevin Barry
 
Old 11-14-2009, 06:59 AM   #9
Hko
Senior Member
 
Registered: Aug 2002
Location: Groningen, The Netherlands
Distribution: Debian
Posts: 2,536

Rep: Reputation: 111Reputation: 111
Quote:
Originally Posted by Jerry Mcguire View Post
I don't get it, if the writers can only append and never modify any written record, isn't it quite safe for the readers to traverse the file without locks?
Yes. It is even safe if the writers do change a record that already exists, if writing happens atomically.

Quote:
Originally Posted by Jerry Mcguire View Post
If write() is atomic, then a record is always complete in the file as long as write() writes the length and content in one call.
Yes.

Quote:
Originally Posted by Jerry Mcguire View Post
So, if the reader is able to read the length portion, it is guaranteed to read the content portion next, would it not?
Yes, provided that (again) writing occurs atomically..
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Write Apache Error Log to Local File and Syslog Server ddenton Linux - Server 2 10-16-2009 08:32 AM
error while loading shared libraries: libhid.so.0: cannot open shared object file: No misungs Linux - Software 4 06-10-2009 12:05 PM
error while loading shared libraries: libstdc++.so.5: cannot open shared object file PaulyWally Debian 2 10-18-2008 05:59 PM
error while loading shared libraries: libgvc.so.3: cannot open shared object file coolrock Slackware 6 01-17-2007 05:10 PM
HTML form to read/write from local file mustangfanatic01 Linux - Software 2 03-11-2006 03:29 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 12:52 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration