Synchronization - How to synchronize reading of segmented file by multiple children processes
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Synchronization - How to synchronize reading of segmented file by multiple children processes
Cheers. I am given a text file e.g. of 10 lines, which is segmented. For example, we have 5 segments each containing 2 lines inside them (e.g. segment 1 loads lines 1 and 2, segment 2 loads lines 3 and 4, and so on...) .
I want to be able to spawn children processes which randomly select a segment and retrieve it SYNCHRONIZED. To be able to make that connection between the parent and child processes I must use shared memory. So I fork() for the children, which select a segment which is then requested to the parent processes which acts like a server, getting requests and placing the requested segment into shared memory to be then read by the children. My goal is to synchronize the following problem.
Quote:
However, when a segment has been loaded into the shared memory then it can serve any incoming requests asking for that segment without mutual exclusion. When there is no longer need for that segment and a request has been made for another, then we switch the segment in the shared memory. Requests that cannot be executed since the requested segment is NOT currently in the shared memory, are blocked until the segment desired is placed into memory. They are then recalled using FIFO strategy.
For the purposes of the exercise I am asked to use semaphores to achieve the FIFO line of recall. So when a process is blocked, it is saved in a QUEUE inside the semaphore, and when the semaphore is then "signaled", it unblocks the first blocked process.
I am aware that I am looking at a variation of the one writer / multiple readers problem. However, I find it very difficult to continue and can't find how to place semaphores in a way such as to achieve the desired result. I also know that I must use a array of semaphores, one for each segment. I have wrote some C code, but I am missing the main synchronization scheme implementation to achieve the above.
Here is some C-style pseudocode of what I have so far.
// Somewhere here I THINK // a sem_wait() must be placed for all requests that // want a segment that is not in memory // ???? // if (currentSegment != segment) // sem_wait(shared->array_of_semaphores[segment]) // ????, and this semaphore must be "posted" when counter = 0
// Place request requestBuffer->segment = segment; requestBuffer->childID = childId;
sem_post(place_request); sem_wait(wait_response);
// If reached here, the requested segment is in memory // We can READ this now...
}
}
int main() { FILE *fp = fopen("derised_text_file.txt", "r"); int segments = 5; int lines_per_segment = total_number_of_lines(fp) / segments;
// Assume we have attached and gotten the shared memory shm_ptr = get_shared_memory( sizeof(shared)); request_buffer = get_shared_memory(sizeof(childRequest));
for (int i = 0; i < segments; i++) sem_init(shm_ptr->array_of_semaphores[i], ??);
int childrenProcesses = 15; int requestsPerChild = 5;
for (int i = 0; i < childrenProcesses; i++ ) { if (fork() == 0) childProcess(0, requestsPerChild, shm_ptr, request_buffer); }
// Serve the requested segment into memory for (i = 0; i < childrenProcesses * requestsPerChild; i++ ) { // Wait for a request to be placed wait(place_request);
// TO PLACE LINES INTO char** data;
signal(wait_response);
} }
I don't know I am going to be able to keep the information as of which segment is the next one to be unblocked. E.g. if I am serving segment 1 and a request for 3,4,4 come in, how when I am done with 1, I will switch to 3 (as it came first), and when I am done with it, I am going to unblock both requests needing 4. Also since we know which segment is first and last, I think these should be responsible for the communication with the parent/server.
Any help/hint would be greatly appreciated, as I feel like I have stuck a wall.
And I feel you have forgotten your previous question.
Hi! Well no, because now I am informed I have to do the FIFO management of the processes via semaphores, so I will not be explicitly using a FIFO queue created by me, but a one created indirectly via semaphores.
You could quote the actual text of your homework instead of your interpretation.
Hi again. This is what I have done. I have translated and quoted the actual text as I was given it, in the 3rd paragraph. I have just not placed in quotes.
And I feel you have forgotten your previous question.
@gzorp: I think the point being made is that you never responded to those who provided helpful replies in your other thread but simply abandoned it, and them. Please be respectful of the time that others put into reading your questions and sharing their knowledge - sharing is an interactive process!
And, per the LQ Rules, please do not post homework assignments verbatim. We're happy to assist if you have specific questions or have hit a stumbling point, however. Let us know what you've already tried and what references you have used (including class notes, books, and Google searches) and we'll do our best to help. Also, keep in mind that your instructor might also be an LQ member.
Hi! Well no, because now I am informed I have to do the FIFO management of the processes via semaphores, so I will not be explicitly using a FIFO queue created by me, but a one created indirectly via semaphores.
Why do you think this? Is this from part of the assignment that you haven't posted here?
@gzorp: I think the point being made is that you never responded to those who provided helpful replies in your other thread but simply abandoned it, and them. Please be respectful of the time that others put into reading your questions and sharing their knowledge - sharing is an interactive process!
And, per the LQ Rules, please do not post homework assignments verbatim. We're happy to assist if you have specific questions or have hit a stumbling point, however. Let us know what you've already tried and what references you have used (including class notes, books, and Google searches) and we'll do our best to help. Also, keep in mind that your instructor might also be an LQ member.
Thanks for your answer. I did not abandon the thread. I had a question for which some useful ideas were provided, but it does not apply to what I now want to achieve now, that's why it is a separate question. Since my approach changed, I cannot report anything new to the old answer since I will no longer work on it. Also, what constitutes "homework"? I had a question about a given task, which I did my due diligence for, I know my theory, but I have hit a stumbling point as you say. One thing I have learned to do as a junior computer scientist is ask questions about things when I can't overcome a problem. I never asked for an answer given on a silver platter. I just wanted a hint or an idea/some pseudocode on how to approach it, and get unstuck. So I think even if my professor is also a LQ member, I haven't made any mistake I should be condemned for. I will keep your suggestions in mind.
It's been quoted on the 4th paragraph of my question. It's an instruction given by my professor.
Okay, the reason I asked about it is if you read e.g., https://manned.org/sem_post.3p you'll see that semaphores don't always guarantee FIFO ordering. So I wonder if you could have misunderstood your professor's instructions.
Are the fields in struct shared also given in the instructions? It seems like you could just use the counter of the semaphore itself (sem_getvalue()), so most of the fields except for array_of_semaphores and data wouldn't be needed.
I would suggest adding an exit() at the end of childProcess(), otherwise you will get a lot of unexpected grandchild processes. Also, I don't think it makes any sense to have requestBuffer be in shared memory (because that would prevent you from handling more than one request at a time).
Okay, the reason I asked about it is if you read e.g., https://manned.org/sem_post.3p you'll see that semaphores don't always guarantee FIFO ordering. So I wonder if you could have misunderstood your professor's instructions.
Yeah I think this is a key point. You need to actually implement a FIFO queue. And then, and only then, you use the semaphores to prevent race conditions.
... and if you can use C++, these things have already been built and perfected for(!) you. You can grab a "FIFO queue" right off the shelf. Most other things, too. The implementations are just as efficient but the human-labor is considerably less. "Do not do a thing already done."
Also always bear in mind the underlying hardware, which is getting and servicing all of the requests coming from everywhere. If the device is "SSD" then this is much less of a problem because "it's really memory," but if the device is rotating there are now mechanical considerations. (Also, [virtual] memory considerations.) If you exceed the physical ability of the hardware, you will probably create a "sophisticated" process that is actually much slower(!) than a simpler one: "dozens of threads, all stuck in traffic on a two-lane road behind a drawbridge."
"Asynchronous message-passing" often turns out to be more efficient (and, more scalable) than "shared memory." Only one process receives requests for information, and sends the information back in a reply. It uses a (non-shared) memory segment as a buffer, or simply relies upon the filesystem to do so. It issues asynchronous requests for data to be retrieved from backing storage, and is notified when each of these requests complete. Only this one process knows – or, needs to know – the full picture.
And, guess what: there are already libraries out there which do all of this, in various languages. Check places like "github" and "sourceforge" and search them very carefully before you set out to write anything "new." (Because, it undoubtedly isn't.)
Last edited by sundialsvcs; 12-05-2022 at 12:51 PM.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.