LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   grepping lines... (https://www.linuxquestions.org/questions/linux-newbie-8/grepping-lines-650550/)

visitnag 06-20-2008 10:52 AM

grepping lines...
 
I have a print file of 1.5gb. In that each page will have a header and a line starts like "Servicing Branch : XXX1" and it follows 37 lines. we have 23 different branch names like XXX1,XXX2... so on. The problem is the data of everybranch is not placed in a sequence. Suppose one page contains XXX1 branch data and next page contains XXX2... so everything is jumbled.

Now i want to get the data of a particular branch in continuous pages...then another branch data will come. I tried with grep command like grep -A37 "Servicing Branch : XXX1" > xy.

is there any method to sort this data branch order?

crazedsanity 06-20-2008 12:54 PM

Use sort
 
Just run "sort" on the file, or pipe the data through sort, like this:

Code:

sort filename.txt | grep "whatever"
Or, piping to sort:

Code:

<command_to_get_data> | sort | grep "whatever

Tinkster 06-20-2008 02:57 PM

Quote:

Originally Posted by visitnag (Post 3190349)
I have a print file of 1.5gb. In that each page will have a header and a line starts like "Servicing Branch : XXX1" and it follows 37 lines. we have 23 different branch names like XXX1,XXX2... so on. The problem is the data of everybranch is not placed in a sequence. Suppose one page contains XXX1 branch data and next page contains XXX2... so everything is jumbled.

Now i want to get the data of a particular branch in continuous pages...then another branch data will come. I tried with grep command like grep -A37 "Servicing Branch : XXX1" > xy.

is there any method to sort this data branch order?

Depends on what constitutes a page. If it's a form-feed
character "014 12 0C FF '\f' (form feed)" you may
be able to use that as a separator in awk.

Something like the following would work (I've created a little sample
and ran this against it, the ^L is the visual representation of the
form-feed), the script rips the big file into named chunks, one per
branch:
Code:

awk 'BEGIN{RS="\f";FS="\n"} {file=gensub(/Servicing Branch *: +([^ \t]+)/, "\\1", 1, $1); print $0 >> file}' large_file

Code:

$ ls -l
total 4
-rw-r--r-- 1 tink  tink  395 2008-06-21 07:48 large_file
$ view largefile
 Servicing Branch    :  XXX1

Yadda

Yadda

More yadda ...


^L Servicing Branch    :  XXX2

Yadda

Yadda

More yadda ...


^L Servicing Branch    :  XXX1

Yadda

Yadda

More yadda ...


^L Servicing Branch    :  XXX3

Yadda

Yadda

More yadda ...


^L Servicing Branch    :  XXX7

Yadda

Yadda

More yadda ...


^L Servicing Branch    :  XXX1

Yadda

Yadda

More yadda ...

$ awk 'BEGIN{RS="\f";FS="\n"} {file=gensub(/Servicing Branch *: +([^ \t]+)/, "\\1", 1, $1); print $0 >> file}' large_file
$ ls -l
-rw-r--r-- 1 tink  tink  198 2008-06-21 07:53 XXX1
-rw-r--r-- 1 tink  tink    66 2008-06-21 07:53 XXX2
-rw-r--r-- 1 tink  tink    66 2008-06-21 07:53 XXX3
-rw-r--r-- 1 tink  tink    66 2008-06-21 07:53 XXX7
-rw-r--r-- 1 tink  tink  395 2008-06-21 07:48 large_file

You can then print those individually.


Cheers,
Tink

visitnag 06-26-2008 11:05 AM

Thank you Tinkster! Exactly what i wanted you gave me the solution.
I wanted to grep the lines between "servicing Branch XXX" to ^L charcter line. But when i run the your code it is giving the error like "redirected file has null string" and giving no result. I am using RHEL 4.0ES. can you correct the error. and one more thing the line which is "Servicing branch..." has 4 line header above it. Can I grep that too... for example.

header 1
header 2
========
location
========
Servicing Branch : XXX1
line
line
line
.
.
.
.
.
line ^M^L

Tinkster 06-26-2008 01:22 PM

You may have to play with the regex in the gensub statement,
I have no idea whether there are any other special characters
embedded in your file, or whether XXX1 actually IS the kind of
string we expect.

As for the "grepping more" lines: that's not necessary, the awk
script doesn't operate on the basis of lines but on records,
which are delimited by form-feeds. So unless there's a form-
feed between the header and the line with "Servicing branch"
there's no need for a special treatment.


Cheers,
Tink


All times are GMT -5. The time now is 06:56 AM.