resume search in file with grep

t2000 · 11-20-2006, 12:58 PM

Hi
Suppose I have a text file containing some lines like the following:

Code:

text
123 456 789
234 345 456
text
234 456 567
234 456 567

Now I want to find all lines containing "text" and copy the following lines containing numbers into a new file, with "text" as header until the next "text" and so on until the end of the file.

I tried grep but could not manage to extract the interesting lines in to different files.

What would be the right tool to use?

thanks

tom

matthewg42 · 11-20-2006, 01:50 PM

With that input data you can actually do it all with [GNU] grep:

Code:

grep -A 1 text yourfile |grep -v -e text -e --

But this probably won't work with more general cases.

t2000 · 11-21-2006, 01:38 AM

Thanks, but how can I write all the lines containing the numbers between consequent "text" lines into separate files?

The end product should look like:

Code:

fileA:
text
123 456 789
234 345 456

fileB:
text
234 456 567
234 456 567

matthewg42 · 11-21-2006, 04:09 AM

oh. umm, not with just grep.

Since I spend a lot of time farting about with perl, I'd use a small perl program. Maybe there's a better way to do it, but here would be my approach:

Code:

#!/usr/bin/perl -w

my $fno = 65;
my $filename = undef;
my $fd;
open(OUT, ">/dev/null");

while (<>) {
    print OUT;
    if ( /text/ ) {
        $filename = "file" . chr($fno++);
        open(OUT, ">$filename") || die "cannot open $filename for writing: $!\n";
        print "at line $. of input : writing to new file, $filename\n";
        next;
    }
}
close(OUT);

Of course, this isn't a very robust implementation - if you have more than 26 files, you'll start to get strange filenames. I'm sure you can write a nice little function to produce better filenames.

t2000 · 11-21-2006, 07:01 AM

Hey, this works, thanks a lot. I'm not very familiar with perl, could you just explain, what the "open(OUT, ">/dev/null");" statement does?

tom

matthewg42 · 11-21-2006, 07:36 AM

It opens the file descriptor, OUT, to the special device node /dev/null, which , on unix-like OSes, is a file which you can write to, and the output disappears into the void.

The purpose is to not output anything until a match for the pattern /text/ has been found. It's just one approach, and not a very portable one. you could also have a flag to prevent the calling of the print line until a match has been found.

ghostdog74 · 11-21-2006, 08:03 AM

How about a Python one

Code:

data = open("file").read().split("text") #read in the whole file, split on "text"
count = 1 #for incrementing file number
for i in filter(None,data):
  outfile = "file-" + str(count) 
  f = open(outfile, "a")
  print >> f , "text"
  print >> f, i.strip()
  f.close()
  count = count + 1

output:

Code:

sun:/home/ # cat file-1
text
123 456 789
234 345 456

sun:/home/ # cat file-2
text
234 456 567
234 456 567

t2000 · 11-21-2006, 08:21 AM

cool! Just a small thing missing yet:
suppose "text" is longer than one line. Now I want to extract the text lines, write them as header into one file, preceded by an "%" and than write the following numbers below this header like:
input file:

Code:

text
bla
bla
123 456 789
234 345 456
text
234 456 567
234 456 567

output file1:

Code:

%text
%bla
%bla
123 456 789
234 345 456

output file2:

Code:

%text
%bla
%bla
34 456 567
234 456 567

matthewg42 · 11-21-2006, 09:15 AM

Code:

#!/usr/bin/perl -w

my $fno = 65;
my $filename = undef;
my $fd;
open(OUT, ">/dev/null");

while (<>) {
    if ( /text/ ) {
        $filename = "file" . chr($fno++);
        open(OUT, ">$filename") || die "cannot open $filename for writing: $!\n";
        print "at line $. of input : writing to new file, $filename\n";
        print OUT "%" . $_;
    }
    elsif ( /^[\d\s]+$/ ) {
        print OUT;
    }
    else {
        print OUT "%" . $_;
    }
}
close(OUT);

t2000 · 11-21-2006, 09:20 AM

That's great!!

matthewg42 · 11-21-2006, 09:55 AM

I'd like to thank my agent, my mother and J. R. Bob Dobbs. No, please - no more autographs.

ghostdog74 · 11-21-2006, 05:58 PM

Code:

data = open("file").read().split("text") #read in the whole file, split on "text"
count = 1 #for incrementing file number
for i in filter(None,data):
  i = i.strip()
  outfile = "file-" + str(count) 
  f = open(outfile, "a")  
  print >> f , "%text"
  if not i.strip().isdigit():
    print >> f, "%" + i
  else:
    print >> f, i
  f.close()
  count = count + 1

t2000 · 11-24-2006, 07:09 AM

The python solution also produces a '%' in front of the first line after 'text'. Why's that?

ghostdog74 · 11-25-2006, 09:51 PM

Quote:

Originally Posted by t2000

The python solution also produces a '%' in front of the first line after 'text'. Why's that?

Code:

data = open("file").read().split("text") #read in the whole file, split on "text"
count = 1 #for incrementing file number
for i in filter(None,data):
  i = i.strip()
  outfile = "file-" + str(count) 
  f = open(outfile, "a")  
  print >> f , "%text"
  if not i.split()[0].isdigit(): #assume only check the first column for text or number....
    print >> f, "%" + i
  else:
    print >> f, i
  f.close()
  count = count + 1

makyo · 11-26-2006, 11:01 AM

Hi.

In post #8, perhaps there is a typo. I don't see how the second part:

Code:

%text
%bla
%bla
34 456 567
234 456 567

is supposed to be obtained from the input file:

Code:

text
bla
bla
123 456 789
234 345 456
text
234 456 567
234 456 567

Assuming that it is a mistake, I would use utility csplit, adding a sed to insert the "%" on non-numeric lines for each of the resulting files:

Code:

#!/bin/sh

# @(#) s2       Demonstrate csplit.

F=${1-data2}

csplit -k -s -z $F /text/ {\*}

echo
for file in xx*
do
        echo
        echo "File: $file"
        sed 's/^\([^0-9]\)/%\1/' $file |
        cat -n
done

which, when run, produces:

Code:

% ./s2


File: xx00
     1  %text
     2  %bla
     3  %bla
     4  123 456 789
     5  234 345 456

File: xx01
     1  %text
     2  234 456 567
     3  234 456 567

from the second data file. See man csplit for details ( and a pox on inscrutable calling sequences and man pages

) ... cheers, makyo

( edit 1: clarify )