LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   Reading 2 files and process both at same time (https://www.linuxquestions.org/questions/programming-9/reading-2-files-and-process-both-at-same-time-4175498620/)

divyashree 03-18-2014 11:00 AM

Reading 2 files and process both at same time
 
I have a perl script which takes a file as input, read it and process it.

Can I give 2 files as input and process them at same time rather than process them one after another means when the perl script starts it forks itself and process the two files with different child at same time.

schneidz 03-18-2014 11:11 AM

does this demostration help you:
Code:

[schneidz@hyper ~]$ touch hello world
[schneidz@hyper ~]$ ll hello & ll world
[1] 22604
-rw-rw-r--. 1 schneidz schneidz 0 Mar 18 12:09 world
-rw-rw-r--. 1 schneidz schneidz 0 Mar 18 12:09 hello

you can put one instance of your perl program in the background and run against one input file and at the same time execute another instance with another input file.

NevemTeve 03-18-2014 11:44 AM

What do you really want to do? Merge two or more files?

divyashree 03-18-2014 01:27 PM

Quote:

Originally Posted by NevemTeve (Post 5136838)
What do you really want to do? Merge two or more files?

Hi Nevemteve, thanks for the reply. I dont want to merge. When I said two files means they are two different category of files.
e.g. one file is storage related, one file is server related and each file has thousands of entry in it.

The perl script read each file line by line and process them.
What I want to do is, if I am merging the content, it will take huge time to process the other category of file as the processing is serial.
So I categorized two different files and want to process them in parallel manner. As I said earlier, in someway to fork the script for two child script and each child will process separate files paralelly.

And can this be done from inside the script or outside the script ??

divyashree 03-18-2014 01:29 PM

Quote:

Originally Posted by schneidz (Post 5136826)
does this demostration help you:
Code:

[schneidz@hyper ~]$ touch hello world
[schneidz@hyper ~]$ ll hello & ll world
[1] 22604
-rw-rw-r--. 1 schneidz schneidz 0 Mar 18 12:09 world
-rw-rw-r--. 1 schneidz schneidz 0 Mar 18 12:09 hello

you can put one instance of your perl program in the background and run against one input file and at the same time execute another instance with another input file.

HI schneidz. Thanks for the reply. The problem is I am not allowed to run two instance of the script for two different file. I have to handle both files with a single instance of the script.

schneidz 03-18-2014 01:29 PM

^ my previous suggestion was to just run it twice at the same time (calling different input files).


edit: ^ thats a weird requirement... that limitation mite be in place so that 1 person doesnt spam the cpu. not knowing the source of your script it would be hard to edit it but maybe you can have a function that is called twice with 2 different inputs.

divyashree 03-18-2014 01:43 PM

Quote:

Originally Posted by schneidz (Post 5136899)
^ my previous suggestion was to just run it twice at the same time (calling different input files).


edit: ^ thats a weird requirement... that limitation mite be in place so that 1 person doesnt spam the cpu. not knowing the source of your script it would be hard to edit it but maybe you can have a function that is called twice with 2 different inputs.

Dedicated cpu is there to process the perl script, as the processing is massive so spamming is not the concern for now, may be in future as the file size is growing and I am considering this threading approach for that.

So as per your idea the whole file processing programme in the script will be in one function and it will be called for each file ..right ???

schneidz 03-18-2014 01:45 PM

i got another idea... if your script can take input from a pipe, maybe you can feed your script like so:
Code:

(cat file.1 & cat file.2) | divya.pl

NevemTeve 03-19-2014 12:38 AM

@OP: So your question is: How to fork in perl-script? Tried perldoc -f fork yet?

Edit: I have just found and old example-code of mine. (Be vareful, it's quite dusty.)
Code:

#!/usr/local/bin/perl -w

use strict;

sub child_process {
    printf STDERR "child %s started param=%s\n", $$, $_[0];
    sleep (2);
    printf STDERR "child %s exiting\n", $$;
}

sub main {
    my $pid;

    $pid = fork;
    if (!$pid) {
        child_process ("ChildParam #1");
        exit (0);
    }

    $pid = fork;
    if (!$pid) {
        child_process ("ChildParam #2");
        exit (0);
    }
    wait;
}

main;


divyashree 03-19-2014 12:08 PM

NevemTeve, thanks for the nice example. I will try that in my perl script and post the response.

divyashree 03-19-2014 01:02 PM

Quote:

Originally Posted by NevemTeve (Post 5137158)
@OP: So your question is: How to fork in perl-script? Tried perldoc -f fork yet?

Edit: I have just found and old example-code of mine. (Be vareful, it's quite dusty.)

NevemTeve, I have one simple query, what is the role of sleep and wait here with fork.

Because the script is behaving differently with presence and absence of both.

Sometimes both childs started at same time and ended at same time, sometimes one child start & exit and then other child start & exit.
On which basis this behaviour is changing ?

NevemTeve 03-19-2014 01:25 PM

The children run as they want (can), the only thing you can do for them is waiting for them to terminate before exiting the parent.

Let's not forget that the child-processes (or the child and the parent) are supposed to do unrelated jobs, otherwise there is no point in using them.

PS: I should have used two wait's for the two children.
Code:

    $pid= wait;
    printf STDERR "wait#1 returned %d\n", $pid;

    $pid= wait;
    printf STDERR "wait#2 returned %d\n", $pid;


divyashree 03-19-2014 11:53 PM

I modified the childs to read files from argument as below:

Code:

#!/usr/bin/perl

use warnings;
use strict;

my $REQPARAM = 4;
$#ARGV += 1;
unless ($#ARGV == 4) {
        printf "$0 requires minimum 4  arguments \n";
        printf "Usage: $0 -F1 <File 1> -F2 <File 2>\n";
        exit 100;
        }
else {
        main();

        }

#sub child_process {
#    printf STDERR "child %s started param=%s\n", $$, $_[0];
#    sleep (1);
#    printf STDERR "child %s exiting\n", $$;
#}

sub main {
    my $pid;

    $pid = fork;
    if (!$pid) {
        readfile ("$ARGV[1]");
        exit (0);
    }

    $pid = fork;
    if (!$pid) {
        readfile ("$ARGV[3]");
        exit (0);
    }
    wait;
}



sub readfile {

printf STDERR "child %s started param=%s\n", $$, $_[0];
foreach my $arg (@_){
        if (-e $_[0]){
        open FILE , '<'.$_[0]  or die $!;
        while (<FILE>){
                print "$_";
                }
            }
        }
sleep(2);
printf STDERR "child %s exiting\n", $$;
}

But here what is happening is, when readfile is called, if wrong arguments are provided its not throwing error of file not found and childs are simple starting and exiting.

NevemTeve 03-20-2014 03:49 AM

That's what debugging is good for...
Code:

#!/usr/bin/perl

use warnings;
use strict;

my $REQPARAM = 4;
$#ARGV += 1;
unless ($#ARGV == 4) {
        printf "$0 requires minimum 4  arguments \n";
        printf "Usage: $0 -F1 <File 1> -F2 <File 2>\n";
        exit 100;
        }
else {
        main();

        }

sub main {
    my ($pid, $childret);

    $pid = fork;
    if (!$pid) {
        readfile ("$ARGV[1]");
        exit (0);
    }

    $pid = fork;
    if (!$pid) {
        readfile ("$ARGV[3]");
        exit (0);
    }
    $pid= wait; $childret= $?;
    printf STDERR "wait returned %d (exit-status=%d)\n", $pid, $childret;

    $pid= wait; $childret= $?;
    printf STDERR "wait returned %d (exit-status=%d)\n", $pid, $childret;
}

sub readfile {
    printf STDERR "child %s started param=%s\n", $$, $_[0];
    foreach my $arg (@_) {
        open FILE , '<'.$_[0]  or die "*** Open error in $_[0]";
        while (<FILE>){
            print "$_";
        }
    }
    printf STDERR "child %s exiting\n", $$;
    exit (0);
}


sundialsvcs 03-20-2014 07:32 AM

In any shell:

Code:

perl script-one.pl &
perl script-two.pl &

The shell-prompt returns immediately as the two programs are launched, as what are called "jobs," in the background of your terminal session. (So, they are not true "batch jobs.") Then use the jobs command to watch the parallel completion of the two commands that you launched as independent children of the shell by means of the "&" suffix. Use fg and bg to reconnect to either one. Also see nohup.

If you have many files to process, check out the -p numprocs argument of the xargs command.

A general notion of Unix-ish systems is that commands ought to be simple, and fairly self-centered. Then, you get extra mileage out of them by simple shell features like these, and by "piping" multiple commands together so that the output of one becomes the input to another.

It's a "disarmingly simple" idea, but it greatly reduces the complexity. Yes, "Perl can do anything you want." But maybe you can remove the complexity of "parallelism" from the program (regardless of language used), and move it up to the shell.


All times are GMT -5. The time now is 12:33 PM.