grepping lines...
I have a print file of 1.5gb. In that each page will have a header and a line starts like "Servicing Branch : XXX1" and it follows 37 lines. we have 23 different branch names like XXX1,XXX2... so on. The problem is the data of everybranch is not placed in a sequence. Suppose one page contains XXX1 branch data and next page contains XXX2... so everything is jumbled.
Now i want to get the data of a particular branch in continuous pages...then another branch data will come. I tried with grep command like grep -A37 "Servicing Branch : XXX1" > xy. is there any method to sort this data branch order? |
Use sort
Just run "sort" on the file, or pipe the data through sort, like this:
Code:
sort filename.txt | grep "whatever" Code:
<command_to_get_data> | sort | grep "whatever |
Quote:
character "014 12 0C FF '\f' (form feed)" you may be able to use that as a separator in awk. Something like the following would work (I've created a little sample and ran this against it, the ^L is the visual representation of the form-feed), the script rips the big file into named chunks, one per branch: Code:
awk 'BEGIN{RS="\f";FS="\n"} {file=gensub(/Servicing Branch *: +([^ \t]+)/, "\\1", 1, $1); print $0 >> file}' large_file Code:
$ ls -l Cheers, Tink |
Thank you Tinkster! Exactly what i wanted you gave me the solution.
I wanted to grep the lines between "servicing Branch XXX" to ^L charcter line. But when i run the your code it is giving the error like "redirected file has null string" and giving no result. I am using RHEL 4.0ES. can you correct the error. and one more thing the line which is "Servicing branch..." has 4 line header above it. Can I grep that too... for example. header 1 header 2 ======== location ======== Servicing Branch : XXX1 line line line . . . . . line ^M^L |
You may have to play with the regex in the gensub statement,
I have no idea whether there are any other special characters embedded in your file, or whether XXX1 actually IS the kind of string we expect. As for the "grepping more" lines: that's not necessary, the awk script doesn't operate on the basis of lines but on records, which are delimited by form-feeds. So unless there's a form- feed between the header and the line with "Servicing branch" there's no need for a special treatment. Cheers, Tink |
All times are GMT -5. The time now is 06:56 AM. |