Quote:
Originally Posted by visitnag
I have a print file of 1.5gb. In that each page will have a header and a line starts like "Servicing Branch : XXX1" and it follows 37 lines. we have 23 different branch names like XXX1,XXX2... so on. The problem is the data of everybranch is not placed in a sequence. Suppose one page contains XXX1 branch data and next page contains XXX2... so everything is jumbled.
Now i want to get the data of a particular branch in continuous pages...then another branch data will come. I tried with grep command like grep -A37 "Servicing Branch : XXX1" > xy.
is there any method to sort this data branch order?
|
Depends on what constitutes a page. If it's a form-feed
character "014 12 0C FF '\f' (form feed)" you may
be able to use that as a separator in awk.
Something like the following would work (I've created a little sample
and ran this against it, the ^L is the visual representation of the
form-feed), the script rips the big file into named chunks, one per
branch:
Code:
awk 'BEGIN{RS="\f";FS="\n"} {file=gensub(/Servicing Branch *: +([^ \t]+)/, "\\1", 1, $1); print $0 >> file}' large_file
Code:
$ ls -l
total 4
-rw-r--r-- 1 tink tink 395 2008-06-21 07:48 large_file
$ view largefile
Servicing Branch : XXX1
Yadda
Yadda
More yadda ...
^L Servicing Branch : XXX2
Yadda
Yadda
More yadda ...
^L Servicing Branch : XXX1
Yadda
Yadda
More yadda ...
^L Servicing Branch : XXX3
Yadda
Yadda
More yadda ...
^L Servicing Branch : XXX7
Yadda
Yadda
More yadda ...
^L Servicing Branch : XXX1
Yadda
Yadda
More yadda ...
$ awk 'BEGIN{RS="\f";FS="\n"} {file=gensub(/Servicing Branch *: +([^ \t]+)/, "\\1", 1, $1); print $0 >> file}' large_file
$ ls -l
-rw-r--r-- 1 tink tink 198 2008-06-21 07:53 XXX1
-rw-r--r-- 1 tink tink 66 2008-06-21 07:53 XXX2
-rw-r--r-- 1 tink tink 66 2008-06-21 07:53 XXX3
-rw-r--r-- 1 tink tink 66 2008-06-21 07:53 XXX7
-rw-r--r-- 1 tink tink 395 2008-06-21 07:48 large_file
You can then print those individually.
Cheers,
Tink