LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 06-20-2008, 10:52 AM   #1
visitnag
Member
 
Registered: Mar 2008
Posts: 147

Rep: Reputation: 15
grepping lines...


I have a print file of 1.5gb. In that each page will have a header and a line starts like "Servicing Branch : XXX1" and it follows 37 lines. we have 23 different branch names like XXX1,XXX2... so on. The problem is the data of everybranch is not placed in a sequence. Suppose one page contains XXX1 branch data and next page contains XXX2... so everything is jumbled.

Now i want to get the data of a particular branch in continuous pages...then another branch data will come. I tried with grep command like grep -A37 "Servicing Branch : XXX1" > xy.

is there any method to sort this data branch order?
 
Old 06-20-2008, 12:54 PM   #2
crazedsanity
CS-Project Lead Developer
 
Registered: May 2008
Location: Bismarck, ND
Distribution: OpenSuSE 10.3
Posts: 14

Rep: Reputation: 1
Use sort

Just run "sort" on the file, or pipe the data through sort, like this:

Code:
sort filename.txt | grep "whatever"
Or, piping to sort:

Code:
<command_to_get_data> | sort | grep "whatever
 
Old 06-20-2008, 02:57 PM   #3
Tinkster
Moderator
 
Registered: Apr 2002
Location: earth
Distribution: slackware by choice, others too :} ... android.
Posts: 23,067
Blog Entries: 11

Rep: Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928
Quote:
Originally Posted by visitnag View Post
I have a print file of 1.5gb. In that each page will have a header and a line starts like "Servicing Branch : XXX1" and it follows 37 lines. we have 23 different branch names like XXX1,XXX2... so on. The problem is the data of everybranch is not placed in a sequence. Suppose one page contains XXX1 branch data and next page contains XXX2... so everything is jumbled.

Now i want to get the data of a particular branch in continuous pages...then another branch data will come. I tried with grep command like grep -A37 "Servicing Branch : XXX1" > xy.

is there any method to sort this data branch order?
Depends on what constitutes a page. If it's a form-feed
character "014 12 0C FF '\f' (form feed)" you may
be able to use that as a separator in awk.

Something like the following would work (I've created a little sample
and ran this against it, the ^L is the visual representation of the
form-feed), the script rips the big file into named chunks, one per
branch:
Code:
awk 'BEGIN{RS="\f";FS="\n"} {file=gensub(/Servicing Branch *: +([^ \t]+)/, "\\1", 1, $1); print $0 >> file}' large_file

Code:
$ ls -l
total 4
-rw-r--r-- 1 tink   tink   395 2008-06-21 07:48 large_file
$ view largefile
 Servicing Branch     :   XXX1

Yadda

Yadda

More yadda ...


^L Servicing Branch     :   XXX2

Yadda

Yadda

More yadda ...


^L Servicing Branch     :   XXX1

Yadda

Yadda

More yadda ...


^L Servicing Branch     :   XXX3

Yadda

Yadda

More yadda ...


^L Servicing Branch     :   XXX7

Yadda

Yadda

More yadda ...


^L Servicing Branch     :   XXX1

Yadda

Yadda

More yadda ...

$ awk 'BEGIN{RS="\f";FS="\n"} {file=gensub(/Servicing Branch *: +([^ \t]+)/, "\\1", 1, $1); print $0 >> file}' large_file
$ ls -l
-rw-r--r-- 1 tink   tink   198 2008-06-21 07:53 XXX1
-rw-r--r-- 1 tink   tink    66 2008-06-21 07:53 XXX2
-rw-r--r-- 1 tink   tink    66 2008-06-21 07:53 XXX3
-rw-r--r-- 1 tink   tink    66 2008-06-21 07:53 XXX7
-rw-r--r-- 1 tink   tink   395 2008-06-21 07:48 large_file
You can then print those individually.


Cheers,
Tink
 
Old 06-26-2008, 11:05 AM   #4
visitnag
Member
 
Registered: Mar 2008
Posts: 147

Original Poster
Rep: Reputation: 15
Thank you Tinkster! Exactly what i wanted you gave me the solution.
I wanted to grep the lines between "servicing Branch XXX" to ^L charcter line. But when i run the your code it is giving the error like "redirected file has null string" and giving no result. I am using RHEL 4.0ES. can you correct the error. and one more thing the line which is "Servicing branch..." has 4 line header above it. Can I grep that too... for example.

header 1
header 2
========
location
========
Servicing Branch : XXX1
line
line
line
.
.
.
.
.
line ^M^L
 
Old 06-26-2008, 01:22 PM   #5
Tinkster
Moderator
 
Registered: Apr 2002
Location: earth
Distribution: slackware by choice, others too :} ... android.
Posts: 23,067
Blog Entries: 11

Rep: Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928
You may have to play with the regex in the gensub statement,
I have no idea whether there are any other special characters
embedded in your file, or whether XXX1 actually IS the kind of
string we expect.

As for the "grepping more" lines: that's not necessary, the awk
script doesn't operate on the basis of lines but on records,
which are delimited by form-feeds. So unless there's a form-
feed between the header and the line with "Servicing branch"
there's no need for a special treatment.


Cheers,
Tink
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
grepping different lines..... visitnag Linux - Newbie 5 04-12-2008 09:35 AM
grepping "'s and .'s and the such secretlydead Linux - Software 6 11-30-2007 03:49 AM
Advanced Grepping keysorsoze Linux - General 10 04-20-2007 06:00 AM
grepping all manfiles pgrodt Linux - Software 8 12-01-2006 10:40 AM
Not grepping emails right mikeyt_333 Linux - General 11 04-13-2003 02:48 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 04:20 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration