[SOLVED] replicate a block of data throughout a file

tabbygirl1990 · 05-08-2014, 11:22 AM

hi guys,

i have a very large text file (largefile.txt) that has data organized by blocks (in the file they’re called Tables), and i need to replace one of those blocks (the one called Table 10) with a different Table 10 block of data that I have in another file called table10.dat i need to do the Table 10 switch-out throughout the whole file. my table10.dat file has the same format as Table 10 in largefile.txt so it's really a "cut and paste" just a gazillion times.

all of the blocks of data have 12 lines (including a blank/return line at the end of each block)and Table 10 first shows up on line 2271, and then 2684, and then 3097 repeating itself every 413 lines til eof

so i think i could use a counter to paste in my table10.dat or i could string replace every time "Table 10," shows up.

is one way better than the other?

i started working on this by moding a command I already know (colucix helped me with)

Code:

awk 'NR<=2270{print $0}{for (i = 2; i <= NF; i = i+413) print $i}' largefile.txt table10.dat > out.txt

this of course doesn't work cause nowhere in the command does it call for table10.dat I don’t know how to "feed" in the table10.dat

thanks so much for whatever help you can provide!!!

tabby

jpollard · 05-08-2014, 12:21 PM

Looks more like a job for perl... Something like:

Code:

#!/usr/bin/perl

file1=shift(@ARGV); # file to be mangled
file2=shift(@ARGV); # file to be included
$size=shift(@ARGV); # line to start on
$count=shift(@ARGV); # number of lines to remove from file1

open(INP,"<".file1) or die "can't open source file";
open(INP2,"<".file2) or die "can't open file to include";

$not_done = 1;
$i = 0;
while (<INP>))
   $i++;
   if ($i < $size && $not_done) {
       print;
   } elsif ($not_done) {
       for ($j =0; $j < count; $j++) { # discard the count number of records
           $discard = <INP>;
       }
       while(<INP2>) { # copy the new input
          print;
       }
       $not_done = 0;  # finished copying the replacement
    } else {
       print;
    }
}

The output is sent to stdout. So usage would be "name input1 replacement startingline skip >newinput1"

Note, this has not been debugged. But the perl code will run faster than awk or any combination of awk and shell... I think only python would go faster.

tabbygirl1990 · 05-08-2014, 01:01 PM

oooops guys, i probably should have showed what a block or table looks like

Code:

         Table 10,   offset = 56491
         1 1.1111e-01
         6          6          1          5          3          
1.5000e+01 5.0000e+00 2.6410e+00 0.0000e+00 0.0000e+00 0.0000e+00
         6          6          3          1          
1.5032e+01 1.4620e+00 3.7710e+00 0.0000e+00 0.0000e+00 0.0000e+00
3.9200e-01 5.0000e+02 9.6901e-02 0.0000e+00 0.0000e+00 0.0000e+00
7.7030e+01 5.1111e-01 1.1470e-01 0.0000e+00 0.0000e+00 0.0000e+00
0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00
0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00
0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 
1.0050e+02

tabbygirl1990 · 05-08-2014, 01:21 PM

Quote:

Originally Posted by jpollard

Looks more like a job for perl... Something like:

Code:

#!/usr/bin/perl

file1=shift(@ARGV); # file to be mangled
file2=shift(@ARGV); # file to be included
$size=shift(@ARGV); # line to start on
$count=shift(@ARGV); # number of lines to remove from file1

open(INP,"<".file1) or die "can't open source file";
open(INP2,"<".file2) or die "can't open file to include";

$not_done = 1;
$i = 0;
while (<INP>))
   $i++;
   if ($i < $size && $not_done) {
       print;
   } elsif ($not_done) {
       for ($j =0; $j < count; $j++) { # discard the count number of records
           $discard = <INP>;
       }
       while(<INP2>) { # copy the new input
          print;
       }
       $not_done = 0;  # finished copying the replacement
    } else {
       print;
    }
}

The output is sent to stdout. So usage would be "name input1 replacement startingline skip >newinput1"

Note, this has not been debugged. But the perl code will run faster than awk or any combination of awk and shell... I think only python would go faster.

hi j,

i've almost never used perl before so i know even less of it than awk, and my awk is not good

i don't think i understand the command line elements, are they

Code:

 "name" js_perl_script.pl

Code:

 "input1" my largefile.txt

Code:

 "input2" my table10.dat

Code:

 "replacement startingline" my 2271

Code:

 "skip" my 413

so putting it all together it would look like this?

Code:

perl home/path/js_perl_script.pl largefile.txt table10.dat 2271 413 > new_largefile.txt

grail · 05-08-2014, 01:53 PM

I believe the skip, ie last argument, should be 12. This is the number of lines to not include from the original file, but to then be replaced by the new data.

I would agree that Perl may well be faster, but thought I would put up an awk for you to see how it may be done:

Code:

awk 'FNR==NR{new=$0;next}/Table 10/{$0 = new}1' RS="\n\n" table10.dat largefile.txt > new_largefile.txt

Of course I cannot test, but this is based on the information you have provided.

tabbygirl1990 · 05-08-2014, 02:14 PM

yea grail, that looks more like what i know

i don't care how long it take, i'll run down to *$

so grail, is yours looking for the string "Table 10"

is the RS="\n\n" what tells awk to look for the table10.dat file (in my first code post i did know that with awk to put table10.dat before largefile.txt)

jay, i won't give up on yours either, but perl is a STEEP learning curve and i'm only now after a loooong time catching on to some of the stuff in awk

tabbygirl1990 · 05-08-2014, 02:28 PM

double yeas for grail

it ran perfectly! and very fast through a 64M file, i didn't even have time put on my heels

thanks soooo much!

grail, if you can can you still explain to me how that awk command runs, so i know for the future...

tabby

grail · 05-08-2014, 02:36 PM

Looking again at the perl you may have been right with 413 ... just noticed how your original idea was setup

And yes, happy to explain:

Code:

RS="\n\n" - I put this first as it is actually interpreted prior to the script running and basically says that each record is delimited by 2 new lines, ie one after the record and the one on the empty line

FNR==NR{new=$0;next} - Expression will only be true for the first file and since we are using the same record separator it will effectively read the entire file into the variable new

/Table 10/{$0 = new} - Once the string has been found within a record, substitute the current value for our saved one

1 - print all records when found

tabbygirl1990 · 05-09-2014, 09:25 AM

good morning Grail,

i'm embarassed, i knew what the RS="\n\n" string does.

the part i didn't know was the

Code:

 /Table 10/{$0 = new}

part, but now that makes sense with

[code]FNR==NR{new=$0;next}

thanks so much!!!

tabby