LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 02-17-2013, 08:57 AM   #1
johnsfine
Guru
 
Registered: Dec 2007
Distribution: Centos
Posts: 5,107

Rep: Reputation: 1114Reputation: 1114Reputation: 1114Reputation: 1114Reputation: 1114Reputation: 1114Reputation: 1114Reputation: 1114Reputation: 1114
Perl process synchronization question


I want to modify a large Perl program someone else wrote. I don't know any Perl.

The program reads and processes a bunch of one line commands from a single input file.

The aspect of the program I want to modify is:

It currently reads all the lines and makes an inaccurate guess at the difficulty of each line and distributes the lines into N pools of roughly equal total difficulty. Then it forks N child processes, each of which handles one pool. The parent waits for all the child processes to finish.

I would prefer for each child to read a line, process that line, then read the next available line, not getting any lines other child processes have already taken (and finish when no lines are left). So no estimates of difficulty are needed. When a child process hits a very hard command, it just keeps working on that command while child processes that got easy commands repeatedly complete those and take the next line.

But I have no clue how to coordinate reading the commands. If there a way to keep the file position in a locked shared variable? So a child could lock the position variable, read a line, update and unlock the shared variable and then process the line it read?

Do I need to create a pipe for each child and have the parent somehow detect when each child wants the next line and write to that child's pipe?

Or how expensive is fork and wait? Should the parent fork a child for each line? It would need to read a line, fork a child, giving that child the line and the child's index from 1 to N. It would do that N times without waiting, but after the first N, before each fork, it would need to wait for some child to end, then figure out what index (1 to N) that child had been, then refork that index with the next line.

The work per line varies from a tenth of a second up to a few minutes, with an average of about 2 seconds.

Edit: I should have mentioned this runs on Windows, not on Linux. I just read the Perl fork() documentation including how different that might be in Windows.

Edit2: I coded the following stealing scattered lines from elsewhere in the program and from online examples. I tested with a do-nothing version of single_line() and the result is slower than I expected. It is fast enough to be better than making no change at all to the original program. But I'd appreciate advice on a better way.
Code:
sub multi_line {
  local(*IN);
  open (IN, "< $testfile") or 
      error("unable to open $testfile for reading");
  my $line;
  my %pidhash;
  while ($line = <IN>) {
    next if ($line =~ /^\s*\#/);
    next if ($line =~ /^\s*$/);
    my $proc = 1 + scalar keys %pidhash;
    my $pid;
    if ( $proc > $parallel_factor ) {
      $pid = wait;
      $proc = $pidhash{ $pid };
      delete $pidhash{ $pid };
      }
    if ( $pid = fork ) {
      $pidhash{ $pid } = $proc;
      }
    else {
      single_line( $proc, $line );
      exit(0);
      }
    }
  while ( 0 < scalar keys %pidhash ) {
     my $pid = wait;
     delete $pidhash{ $pid };
  }
}

Last edited by johnsfine; 02-17-2013 at 10:30 AM.
 
Old 02-17-2013, 10:13 AM   #2
millgates
Member
 
Registered: Feb 2009
Location: 192.168.x.x
Distribution: Slackware
Posts: 651

Rep: Reputation: 269Reputation: 269Reputation: 269
I have to admit I have never done anything like this, so this may be a stupid idea, but how about feeding the lines to a named pipe and have each child flock() the pipe when reading?
 
Old 02-17-2013, 12:43 PM   #3
theNbomr
LQ 5k Club
 
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,395
Blog Entries: 2

Rep: Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903
Quote:
Originally Posted by millgates View Post
I have to admit I have never done anything like this, so this may be a stupid idea, but how about feeding the lines to a named pipe and have each child flock() the pipe when reading?
Yes, that was my initial thought. The parent process dispenses new lines to each child on a FIFO basis. It could be done by the parent simply filling a message queue with all of the lines as individual messages, and when the queue is exhausted, the job is done.

--- rod.
 
Old 02-17-2013, 01:43 PM   #5
johnsfine
Guru
 
Registered: Dec 2007
Distribution: Centos
Posts: 5,107

Original Poster
Rep: Reputation: 1114Reputation: 1114Reputation: 1114Reputation: 1114Reputation: 1114Reputation: 1114Reputation: 1114Reputation: 1114Reputation: 1114
Quote:
Originally Posted by theNbomr View Post
It could be done by the parent simply filling a message queue with all of the lines as individual messages, and when the queue is exhausted, the job is done.
That sounds like the best idea. I didn't fully understand the message queue documentation (assuming I even looked at the right documentation):
http://search.cpan.org/~ironcamel/PO...essageQueue.pm
So it will be a while before I find the time/courage to recode using that. The version I showed in the first post seems to work, and it takes less total time than the original (which distributed work too unevenly) even though the constant re exec'ing (in Windows) of perl.exe takes a large fraction of the total time.

Quote:
Originally Posted by Sergei Steshenko View Post
For better or for worse, Perl supports threads:
The documentation I found say it doesn't use the native threads to implement threads. It uses native processes to pretend to be threads. For my purposes, the syntax and semantics of fork/wait are very simple (see code in post #1). Only the performance of fork/wait is poor.

If I understood the Perl threads documentation, that would be a lot of work to switch to a less convenient (for my needs) syntax/semantics, and would then have even worse performance. Am I missing something?

Last edited by johnsfine; 02-17-2013 at 01:47 PM.
 
Old 02-21-2013, 12:30 AM   #6
chrism01
Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.5, Centos 5.10
Posts: 16,287

Rep: Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034
I've wrote some threaded Perl a while ago and I think you may(?) be wrong. What I discovered was that under Linux kernel 2.4.x, threads were implemented as processes (possibly any lang?), but as of kernel 2.6 if you ran a Perl multi-thread and looked at eg top or ps, you'd only see one process for the whole thing.
It certainly ran pretty quick, which was what I needed.
If you want a definitive answer, ask over at perlmonks.org; its where the Perl gurus hang out.
some body may even have already asked/answered the qn.

I would add that the general rule for choosing between thrs and procs is how much communication you need between the parent and child 'processes'.
If its fire and forget, fork() is fine, but if you need bi-directional comms and/or for the parent to keep an eye on the children, then thrs are indicated, although you can get the same effect with fork() & IPC using shared mem blocks.

In your case, if you want to maximise efficient use of a small num of workers thrs, using threads would enable a simple way of allowing the workers to set a value the parent can read to see when the worker has completed the current task and is ready for another. The worker thr does not need to be killed and replaced by another.

Last edited by chrism01; 02-21-2013 at 12:55 AM.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
LXer: Using Perl On Linux To Do Mass Synchronization Of File Time Stamps LXer Syndicated Linux News 0 05-27-2008 07:41 AM
wait free synchronization question shogun1234 Programming 3 08-20-2007 09:37 AM
Perl to get the complete process info for all process alix123 Programming 4 08-16-2007 01:36 AM
A little question about SMP synchronization queuebil Programming 2 10-01-2006 03:59 AM
email synchronization (not file synchronization) Moebius Linux - Software 6 10-05-2004 05:31 AM


All times are GMT -5. The time now is 02:44 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration