grepping lines...

visitnag · 06-20-2008, 10:52 AM

I have a print file of 1.5gb. In that each page will have a header and a line starts like "Servicing Branch : XXX1" and it follows 37 lines. we have 23 different branch names like XXX1,XXX2... so on. The problem is the data of everybranch is not placed in a sequence. Suppose one page contains XXX1 branch data and next page contains XXX2... so everything is jumbled.

Now i want to get the data of a particular branch in continuous pages...then another branch data will come. I tried with grep command like grep -A37 "Servicing Branch : XXX1" > xy.

is there any method to sort this data branch order?

crazedsanity · 06-20-2008, 12:54 PM

Just run "sort" on the file, or pipe the data through sort, like this:

Code:

sort filename.txt | grep "whatever"

Or, piping to sort:

Code:

<command_to_get_data> | sort | grep "whatever

Tinkster · 06-20-2008, 02:57 PM

Quote:

Originally Posted by visitnag

I have a print file of 1.5gb. In that each page will have a header and a line starts like "Servicing Branch : XXX1" and it follows 37 lines. we have 23 different branch names like XXX1,XXX2... so on. The problem is the data of everybranch is not placed in a sequence. Suppose one page contains XXX1 branch data and next page contains XXX2... so everything is jumbled.

Now i want to get the data of a particular branch in continuous pages...then another branch data will come. I tried with grep command like grep -A37 "Servicing Branch : XXX1" > xy.

is there any method to sort this data branch order?

Depends on what constitutes a page. If it's a form-feed
character "014 12 0C FF '\f' (form feed)" you may
be able to use that as a separator in awk.

Something like the following would work (I've created a little sample
and ran this against it, the ^L is the visual representation of the
form-feed), the script rips the big file into named chunks, one per
branch:

Code:

awk 'BEGIN{RS="\f";FS="\n"} {file=gensub(/Servicing Branch *: +([^ \t]+)/, "\\1", 1, $1); print $0 >> file}' large_file

Code:

$ ls -l
total 4
-rw-r--r-- 1 tink   tink   395 2008-06-21 07:48 large_file
$ view largefile
 Servicing Branch     :   XXX1

Yadda

Yadda

More yadda ...


^L Servicing Branch     :   XXX2

Yadda

Yadda

More yadda ...


^L Servicing Branch     :   XXX1

Yadda

Yadda

More yadda ...


^L Servicing Branch     :   XXX3

Yadda

Yadda

More yadda ...


^L Servicing Branch     :   XXX7

Yadda

Yadda

More yadda ...


^L Servicing Branch     :   XXX1

Yadda

Yadda

More yadda ...

$ awk 'BEGIN{RS="\f";FS="\n"} {file=gensub(/Servicing Branch *: +([^ \t]+)/, "\\1", 1, $1); print $0 >> file}' large_file
$ ls -l
-rw-r--r-- 1 tink   tink   198 2008-06-21 07:53 XXX1
-rw-r--r-- 1 tink   tink    66 2008-06-21 07:53 XXX2
-rw-r--r-- 1 tink   tink    66 2008-06-21 07:53 XXX3
-rw-r--r-- 1 tink   tink    66 2008-06-21 07:53 XXX7
-rw-r--r-- 1 tink   tink   395 2008-06-21 07:48 large_file

You can then print those individually.

Cheers,
Tink

visitnag · 06-26-2008, 11:05 AM

Thank you Tinkster! Exactly what i wanted you gave me the solution.
I wanted to grep the lines between "servicing Branch XXX" to ^L charcter line. But when i run the your code it is giving the error like "redirected file has null string" and giving no result. I am using RHEL 4.0ES. can you correct the error. and one more thing the line which is "Servicing branch..." has 4 line header above it. Can I grep that too... for example.

header 1
header 2
========
location
========
Servicing Branch : XXX1
line
line
line
.
.
.
.
.
line ^M^L

Tinkster · 06-26-2008, 01:22 PM

You may have to play with the regex in the gensub statement,
I have no idea whether there are any other special characters
embedded in your file, or whether XXX1 actually IS the kind of
string we expect.

As for the "grepping more" lines: that's not necessary, the awk
script doesn't operate on the basis of lines but on records,
which are delimited by form-feeds. So unless there's a form-
feed between the header and the line with "Servicing branch"
there's no need for a special treatment.

Cheers,
Tink