LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 11-20-2006, 12:58 PM   #1
t2000
LQ Newbie
 
Registered: Jan 2005
Distribution: Mandrake 10.1 community, Gentoo, Fedora
Posts: 17

Rep: Reputation: 0
resume search in file with grep


Hi
Suppose I have a text file containing some lines like the following:
Code:
text
123 456 789
234 345 456
text
234 456 567
234 456 567
Now I want to find all lines containing "text" and copy the following lines containing numbers into a new file, with "text" as header until the next "text" and so on until the end of the file.

I tried grep but could not manage to extract the interesting lines in to different files.

What would be the right tool to use?

thanks

tom
 
Old 11-20-2006, 01:50 PM   #2
matthewg42
Senior Member
 
Registered: Oct 2003
Location: UK
Distribution: Kubuntu 12.10 (using awesome wm though)
Posts: 3,530

Rep: Reputation: 65
With that input data you can actually do it all with [GNU] grep:
Code:
grep -A 1 text yourfile |grep -v -e text -e --
But this probably won't work with more general cases.
 
Old 11-21-2006, 01:38 AM   #3
t2000
LQ Newbie
 
Registered: Jan 2005
Distribution: Mandrake 10.1 community, Gentoo, Fedora
Posts: 17

Original Poster
Rep: Reputation: 0
Thanks, but how can I write all the lines containing the numbers between consequent "text" lines into separate files?

The end product should look like:
Code:
fileA:
text
123 456 789
234 345 456

fileB:
text
234 456 567
234 456 567
 
Old 11-21-2006, 04:09 AM   #4
matthewg42
Senior Member
 
Registered: Oct 2003
Location: UK
Distribution: Kubuntu 12.10 (using awesome wm though)
Posts: 3,530

Rep: Reputation: 65
oh. umm, not with just grep.

Since I spend a lot of time farting about with perl, I'd use a small perl program. Maybe there's a better way to do it, but here would be my approach:
Code:
#!/usr/bin/perl -w

my $fno = 65;
my $filename = undef;
my $fd;
open(OUT, ">/dev/null");

while (<>) {
    print OUT;
    if ( /text/ ) {
        $filename = "file" . chr($fno++);
        open(OUT, ">$filename") || die "cannot open $filename for writing: $!\n";
        print "at line $. of input : writing to new file, $filename\n";
        next;
    }
}
close(OUT);
Of course, this isn't a very robust implementation - if you have more than 26 files, you'll start to get strange filenames. I'm sure you can write a nice little function to produce better filenames.
 
Old 11-21-2006, 07:01 AM   #5
t2000
LQ Newbie
 
Registered: Jan 2005
Distribution: Mandrake 10.1 community, Gentoo, Fedora
Posts: 17

Original Poster
Rep: Reputation: 0
Hey, this works, thanks a lot. I'm not very familiar with perl, could you just explain, what the "open(OUT, ">/dev/null");" statement does?

tom
 
Old 11-21-2006, 07:36 AM   #6
matthewg42
Senior Member
 
Registered: Oct 2003
Location: UK
Distribution: Kubuntu 12.10 (using awesome wm though)
Posts: 3,530

Rep: Reputation: 65
It opens the file descriptor, OUT, to the special device node /dev/null, which , on unix-like OSes, is a file which you can write to, and the output disappears into the void.

The purpose is to not output anything until a match for the pattern /text/ has been found. It's just one approach, and not a very portable one. you could also have a flag to prevent the calling of the print line until a match has been found.
 
Old 11-21-2006, 08:03 AM   #7
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 244Reputation: 244Reputation: 244
How about a Python one

Code:
data = open("file").read().split("text") #read in the whole file, split on "text"
count = 1 #for incrementing file number
for i in filter(None,data):
  outfile = "file-" + str(count) 
  f = open(outfile, "a")
  print >> f , "text"
  print >> f, i.strip()
  f.close()
  count = count + 1
output:
Code:
sun:/home/ # cat file-1
text
123 456 789
234 345 456

sun:/home/ # cat file-2
text
234 456 567
234 456 567
 
Old 11-21-2006, 08:21 AM   #8
t2000
LQ Newbie
 
Registered: Jan 2005
Distribution: Mandrake 10.1 community, Gentoo, Fedora
Posts: 17

Original Poster
Rep: Reputation: 0
cool! Just a small thing missing yet:
suppose "text" is longer than one line. Now I want to extract the text lines, write them as header into one file, preceded by an "%" and than write the following numbers below this header like:
input file:
Code:
text
bla
bla
123 456 789
234 345 456
text
234 456 567
234 456 567
output file1:
Code:
%text
%bla
%bla
123 456 789
234 345 456
output file2:
Code:
%text
%bla
%bla
34 456 567
234 456 567
 
Old 11-21-2006, 09:15 AM   #9
matthewg42
Senior Member
 
Registered: Oct 2003
Location: UK
Distribution: Kubuntu 12.10 (using awesome wm though)
Posts: 3,530

Rep: Reputation: 65
Code:
#!/usr/bin/perl -w

my $fno = 65;
my $filename = undef;
my $fd;
open(OUT, ">/dev/null");

while (<>) {
    if ( /text/ ) {
        $filename = "file" . chr($fno++);
        open(OUT, ">$filename") || die "cannot open $filename for writing: $!\n";
        print "at line $. of input : writing to new file, $filename\n";
        print OUT "%" . $_;
    }
    elsif ( /^[\d\s]+$/ ) {
        print OUT;
    }
    else {
        print OUT "%" . $_;
    }
}
close(OUT);
 
Old 11-21-2006, 09:20 AM   #10
t2000
LQ Newbie
 
Registered: Jan 2005
Distribution: Mandrake 10.1 community, Gentoo, Fedora
Posts: 17

Original Poster
Rep: Reputation: 0
That's great!!
 
Old 11-21-2006, 09:55 AM   #11
matthewg42
Senior Member
 
Registered: Oct 2003
Location: UK
Distribution: Kubuntu 12.10 (using awesome wm though)
Posts: 3,530

Rep: Reputation: 65
I'd like to thank my agent, my mother and J. R. Bob Dobbs. No, please - no more autographs.
 
Old 11-21-2006, 05:58 PM   #12
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 244Reputation: 244Reputation: 244
Code:
data = open("file").read().split("text") #read in the whole file, split on "text"
count = 1 #for incrementing file number
for i in filter(None,data):
  i = i.strip()
  outfile = "file-" + str(count) 
  f = open(outfile, "a")  
  print >> f , "%text"
  if not i.strip().isdigit():
    print >> f, "%" + i
  else:
    print >> f, i
  f.close()
  count = count + 1
 
Old 11-24-2006, 07:09 AM   #13
t2000
LQ Newbie
 
Registered: Jan 2005
Distribution: Mandrake 10.1 community, Gentoo, Fedora
Posts: 17

Original Poster
Rep: Reputation: 0
The python solution also produces a '%' in front of the first line after 'text'. Why's that?
 
Old 11-25-2006, 09:51 PM   #14
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 244Reputation: 244Reputation: 244
Quote:
Originally Posted by t2000
The python solution also produces a '%' in front of the first line after 'text'. Why's that?
Code:
data = open("file").read().split("text") #read in the whole file, split on "text"
count = 1 #for incrementing file number
for i in filter(None,data):
  i = i.strip()
  outfile = "file-" + str(count) 
  f = open(outfile, "a")  
  print >> f , "%text"
  if not i.split()[0].isdigit(): #assume only check the first column for text or number....
    print >> f, "%" + i
  else:
    print >> f, i
  f.close()
  count = count + 1
 
Old 11-26-2006, 11:01 AM   #15
makyo
Member
 
Registered: Aug 2006
Location: Saint Paul, MN, USA
Distribution: {Free,Open}BSD, CentOS, Debian, Fedora, Solaris, SuSE
Posts: 735

Rep: Reputation: 76
Hi.

In post #8, perhaps there is a typo. I don't see how the second part:
Code:
%text
%bla
%bla
34 456 567
234 456 567
is supposed to be obtained from the input file:
Code:
text
bla
bla
123 456 789
234 345 456
text
234 456 567
234 456 567
Assuming that it is a mistake, I would use utility csplit, adding a sed to insert the "%" on non-numeric lines for each of the resulting files:
Code:
#!/bin/sh

# @(#) s2       Demonstrate csplit.

F=${1-data2}

csplit -k -s -z $F /text/ {\*}

echo
for file in xx*
do
        echo
        echo "File: $file"
        sed 's/^\([^0-9]\)/%\1/' $file |
        cat -n
done
which, when run, produces:
Code:
% ./s2


File: xx00
     1  %text
     2  %bla
     3  %bla
     4  123 456 789
     5  234 345 456

File: xx01
     1  %text
     2  234 456 567
     3  234 456 567
from the second data file. See man csplit for details ( and a pox on inscrutable calling sequences and man pages ) ... cheers, makyo

( edit 1: clarify )

Last edited by makyo; 11-26-2006 at 11:10 AM.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
grep search... tobiasw Linux - General 4 10-28-2006 01:36 PM
Grep String Search, and identify source file. carl0ski Linux - General 4 01-21-2006 08:15 AM
To recursively search with grep grautu Slackware 5 11-21-2005 02:53 AM
can you specify which files to grep search? sneakyimp Linux - Software 4 10-12-2005 08:28 PM
Grep for search, but what for replace? TheSpecial Linux - Software 18 04-28-2003 09:01 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 04:22 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration