LinuxQuestions.org
Register a domain and help support LQ
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 05-08-2014, 12:22 PM   #1
tabbygirl1990
Member
 
Registered: Jul 2013
Location: a warm beach, cool ocean breeze, nice waves, and a Margaritta
Distribution: RHEL 5.5 Tikanga
Posts: 63

Rep: Reputation: 1
replicate a block of data throughout a file


hi guys,

i have a very large text file (largefile.txt) that has data organized by blocks (in the file they’re called Tables), and i need to replace one of those blocks (the one called Table 10) with a different Table 10 block of data that I have in another file called table10.dat i need to do the Table 10 switch-out throughout the whole file. my table10.dat file has the same format as Table 10 in largefile.txt so it's really a "cut and paste" just a gazillion times.

all of the blocks of data have 12 lines (including a blank/return line at the end of each block)and Table 10 first shows up on line 2271, and then 2684, and then 3097 repeating itself every 413 lines til eof

so i think i could use a counter to paste in my table10.dat or i could string replace every time "Table 10," shows up.

is one way better than the other?

i started working on this by moding a command I already know (colucix helped me with)

Code:
awk 'NR<=2270{print $0}{for (i = 2; i <= NF; i = i+413) print $i}' largefile.txt table10.dat > out.txt
this of course doesn't work cause nowhere in the command does it call for table10.dat I don’t know how to "feed" in the table10.dat

thanks so much for whatever help you can provide!!!

tabby
 
Old 05-08-2014, 01:21 PM   #2
jpollard
Senior Member
 
Registered: Dec 2012
Location: Washington DC area
Distribution: Fedora, CentOS, Slackware
Posts: 4,604

Rep: Reputation: 1241Reputation: 1241Reputation: 1241Reputation: 1241Reputation: 1241Reputation: 1241Reputation: 1241Reputation: 1241Reputation: 1241
Looks more like a job for perl... Something like:

Code:
#!/usr/bin/perl

file1=shift(@ARGV); # file to be mangled
file2=shift(@ARGV); # file to be included
$size=shift(@ARGV); # line to start on
$count=shift(@ARGV); # number of lines to remove from file1

open(INP,"<".file1) or die "can't open source file";
open(INP2,"<".file2) or die "can't open file to include";

$not_done = 1;
$i = 0;
while (<INP>))
   $i++;
   if ($i < $size && $not_done) {
       print;
   } elsif ($not_done) {
       for ($j =0; $j < count; $j++) { # discard the count number of records
           $discard = <INP>;
       }
       while(<INP2>) { # copy the new input
          print;
       }
       $not_done = 0;  # finished copying the replacement
    } else {
       print;
    }
}
The output is sent to stdout. So usage would be "name input1 replacement startingline skip >newinput1"

Note, this has not been debugged. But the perl code will run faster than awk or any combination of awk and shell... I think only python would go faster.
 
Old 05-08-2014, 02:01 PM   #3
tabbygirl1990
Member
 
Registered: Jul 2013
Location: a warm beach, cool ocean breeze, nice waves, and a Margaritta
Distribution: RHEL 5.5 Tikanga
Posts: 63

Original Poster
Rep: Reputation: 1
oooops guys, i probably should have showed what a block or table looks like

Code:
         Table 10,   offset = 56491
         1 1.1111e-01
         6          6          1          5          3          
1.5000e+01 5.0000e+00 2.6410e+00 0.0000e+00 0.0000e+00 0.0000e+00
         6          6          3          1          
1.5032e+01 1.4620e+00 3.7710e+00 0.0000e+00 0.0000e+00 0.0000e+00
3.9200e-01 5.0000e+02 9.6901e-02 0.0000e+00 0.0000e+00 0.0000e+00
7.7030e+01 5.1111e-01 1.1470e-01 0.0000e+00 0.0000e+00 0.0000e+00
0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00
0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00
0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 
1.0050e+02
 
Old 05-08-2014, 02:21 PM   #4
tabbygirl1990
Member
 
Registered: Jul 2013
Location: a warm beach, cool ocean breeze, nice waves, and a Margaritta
Distribution: RHEL 5.5 Tikanga
Posts: 63

Original Poster
Rep: Reputation: 1
Quote:
Originally Posted by jpollard View Post
Looks more like a job for perl... Something like:

Code:
#!/usr/bin/perl

file1=shift(@ARGV); # file to be mangled
file2=shift(@ARGV); # file to be included
$size=shift(@ARGV); # line to start on
$count=shift(@ARGV); # number of lines to remove from file1

open(INP,"<".file1) or die "can't open source file";
open(INP2,"<".file2) or die "can't open file to include";

$not_done = 1;
$i = 0;
while (<INP>))
   $i++;
   if ($i < $size && $not_done) {
       print;
   } elsif ($not_done) {
       for ($j =0; $j < count; $j++) { # discard the count number of records
           $discard = <INP>;
       }
       while(<INP2>) { # copy the new input
          print;
       }
       $not_done = 0;  # finished copying the replacement
    } else {
       print;
    }
}
The output is sent to stdout. So usage would be "name input1 replacement startingline skip >newinput1"

Note, this has not been debugged. But the perl code will run faster than awk or any combination of awk and shell... I think only python would go faster.
hi j,

i've almost never used perl before so i know even less of it than awk, and my awk is not good

i don't think i understand the command line elements, are they
Code:
 "name" js_perl_script.pl
Code:
 "input1" my largefile.txt 
Code:
 "input2" my table10.dat
Code:
 "replacement startingline" my 2271
Code:
 "skip" my 413
so putting it all together it would look like this?
Code:
perl home/path/js_perl_script.pl largefile.txt table10.dat 2271 413 > new_largefile.txt

Last edited by tabbygirl1990; 05-09-2014 at 10:25 AM. Reason: corrected the number of lines in the skip command
 
Old 05-08-2014, 02:53 PM   #5
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,252

Rep: Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685
I believe the skip, ie last argument, should be 12. This is the number of lines to not include from the original file, but to then be replaced by the new data.

I would agree that Perl may well be faster, but thought I would put up an awk for you to see how it may be done:
Code:
awk 'FNR==NR{new=$0;next}/Table 10/{$0 = new}1' RS="\n\n" table10.dat largefile.txt > new_largefile.txt
Of course I cannot test, but this is based on the information you have provided.
 
1 members found this post helpful.
Old 05-08-2014, 03:14 PM   #6
tabbygirl1990
Member
 
Registered: Jul 2013
Location: a warm beach, cool ocean breeze, nice waves, and a Margaritta
Distribution: RHEL 5.5 Tikanga
Posts: 63

Original Poster
Rep: Reputation: 1
yea grail, that looks more like what i know i don't care how long it take, i'll run down to *$

so grail, is yours looking for the string "Table 10"

is the RS="\n\n" what tells awk to look for the table10.dat file (in my first code post i did know that with awk to put table10.dat before largefile.txt)

jay, i won't give up on yours either, but perl is a STEEP learning curve and i'm only now after a loooong time catching on to some of the stuff in awk
 
Old 05-08-2014, 03:28 PM   #7
tabbygirl1990
Member
 
Registered: Jul 2013
Location: a warm beach, cool ocean breeze, nice waves, and a Margaritta
Distribution: RHEL 5.5 Tikanga
Posts: 63

Original Poster
Rep: Reputation: 1
double yeas for grail it ran perfectly! and very fast through a 64M file, i didn't even have time put on my heels

thanks soooo much!

grail, if you can can you still explain to me how that awk command runs, so i know for the future...

tabby
 
Old 05-08-2014, 03:36 PM   #8
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,252

Rep: Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685Reputation: 2685
Looking again at the perl you may have been right with 413 ... just noticed how your original idea was setup

And yes, happy to explain:
Code:
RS="\n\n" - I put this first as it is actually interpreted prior to the script running and basically says that each record is delimited by 2 new lines, ie one after the record and the one on the empty line

FNR==NR{new=$0;next} - Expression will only be true for the first file and since we are using the same record separator it will effectively read the entire file into the variable new

/Table 10/{$0 = new} - Once the string has been found within a record, substitute the current value for our saved one

1 - print all records when found
 
1 members found this post helpful.
Old 05-09-2014, 10:25 AM   #9
tabbygirl1990
Member
 
Registered: Jul 2013
Location: a warm beach, cool ocean breeze, nice waves, and a Margaritta
Distribution: RHEL 5.5 Tikanga
Posts: 63

Original Poster
Rep: Reputation: 1
good morning Grail,

i'm embarassed, i knew what the RS="\n\n" string does.

the part i didn't know was the

Code:
 /Table 10/{$0 = new}
part, but now that makes sense with

[code]FNR==NR{new=$0;next}

thanks so much!!!

tabby
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
How to find a specific data block in a huge file and then do algebra on them? mcgao07 Linux - Newbie 6 03-16-2009 08:21 PM
Replicate file shares for redundant servers? toben Linux - Software 1 12-06-2007 08:48 AM
Reiserfs - how to find which file contains data in a block Vrajgh Linux - Software 8 09-14-2007 04:04 AM
Writeing block data to a file. exvor Programming 1 05-12-2006 10:34 PM
Can I instantly replicate data over 2 drives on different servers with a codafs? abefroman Linux - Hardware 1 09-14-2005 01:03 PM


All times are GMT -5. The time now is 08:58 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration