ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I have a perl script which takes a file as input, read it and process it.
Can I give 2 files as input and process them at same time rather than process them one after another means when the perl script starts it forks itself and process the two files with different child at same time.
[schneidz@hyper ~]$ touch hello world
[schneidz@hyper ~]$ ll hello & ll world
[1] 22604
-rw-rw-r--. 1 schneidz schneidz 0 Mar 18 12:09 world
-rw-rw-r--. 1 schneidz schneidz 0 Mar 18 12:09 hello
you can put one instance of your perl program in the background and run against one input file and at the same time execute another instance with another input file.
What do you really want to do? Merge two or more files?
Hi Nevemteve, thanks for the reply. I dont want to merge. When I said two files means they are two different category of files.
e.g. one file is storage related, one file is server related and each file has thousands of entry in it.
The perl script read each file line by line and process them.
What I want to do is, if I am merging the content, it will take huge time to process the other category of file as the processing is serial.
So I categorized two different files and want to process them in parallel manner. As I said earlier, in someway to fork the script for two child script and each child will process separate files paralelly.
And can this be done from inside the script or outside the script ??
[schneidz@hyper ~]$ touch hello world
[schneidz@hyper ~]$ ll hello & ll world
[1] 22604
-rw-rw-r--. 1 schneidz schneidz 0 Mar 18 12:09 world
-rw-rw-r--. 1 schneidz schneidz 0 Mar 18 12:09 hello
you can put one instance of your perl program in the background and run against one input file and at the same time execute another instance with another input file.
HI schneidz. Thanks for the reply. The problem is I am not allowed to run two instance of the script for two different file. I have to handle both files with a single instance of the script.
^ my previous suggestion was to just run it twice at the same time (calling different input files).
edit: ^ thats a weird requirement... that limitation mite be in place so that 1 person doesnt spam the cpu. not knowing the source of your script it would be hard to edit it but maybe you can have a function that is called twice with 2 different inputs.
^ my previous suggestion was to just run it twice at the same time (calling different input files).
edit: ^ thats a weird requirement... that limitation mite be in place so that 1 person doesnt spam the cpu. not knowing the source of your script it would be hard to edit it but maybe you can have a function that is called twice with 2 different inputs.
Dedicated cpu is there to process the perl script, as the processing is massive so spamming is not the concern for now, may be in future as the file size is growing and I am considering this threading approach for that.
So as per your idea the whole file processing programme in the script will be in one function and it will be called for each file ..right ???
@OP: So your question is: How to fork in perl-script? Tried perldoc -f fork yet?
Edit: I have just found and old example-code of mine. (Be vareful, it's quite dusty.)
NevemTeve, I have one simple query, what is the role of sleep and wait here with fork.
Because the script is behaving differently with presence and absence of both.
Sometimes both childs started at same time and ended at same time, sometimes one child start & exit and then other child start & exit.
On which basis this behaviour is changing ?
I modified the childs to read files from argument as below:
Code:
#!/usr/bin/perl
use warnings;
use strict;
my $REQPARAM = 4;
$#ARGV += 1;
unless ($#ARGV == 4) {
printf "$0 requires minimum 4 arguments \n";
printf "Usage: $0 -F1 <File 1> -F2 <File 2>\n";
exit 100;
}
else {
main();
}
#sub child_process {
# printf STDERR "child %s started param=%s\n", $$, $_[0];
# sleep (1);
# printf STDERR "child %s exiting\n", $$;
#}
sub main {
my $pid;
$pid = fork;
if (!$pid) {
readfile ("$ARGV[1]");
exit (0);
}
$pid = fork;
if (!$pid) {
readfile ("$ARGV[3]");
exit (0);
}
wait;
}
sub readfile {
printf STDERR "child %s started param=%s\n", $$, $_[0];
foreach my $arg (@_){
if (-e $_[0]){
open FILE , '<'.$_[0] or die $!;
while (<FILE>){
print "$_";
}
}
}
sleep(2);
printf STDERR "child %s exiting\n", $$;
}
But here what is happening is, when readfile is called, if wrong arguments are provided its not throwing error of file not found and childs are simple starting and exiting.
The shell-prompt returns immediately as the two programs are launched, as what are called "jobs," in the background of your terminal session. (So, they are not true "batch jobs.") Then use the jobs command to watch the parallel completion of the two commands that you launched as independent children of the shell by means of the "&" suffix. Use fg and bg to reconnect to either one. Also see nohup.
If you have many files to process, check out the -p numprocs argument of the xargs command.
A general notion of Unix-ish systems is that commands ought to be simple, and fairly self-centered. Then, you get extra mileage out of them by simple shell features like these, and by "piping" multiple commands together so that the output of one becomes the input to another.
It's a "disarmingly simple" idea, but it greatly reduces the complexity. Yes, "Perl can do anything you want." But maybe you can remove the complexity of "parallelism" from the program (regardless of language used), and move it up to the shell.
Last edited by sundialsvcs; 03-20-2014 at 07:36 AM.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.