LinuxQuestions.org - need your help with GREP!

- Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)

- - need your help with GREP! (https://www.linuxquestions.org/questions/linux-newbie-8/need-your-help-with-grep-310918/)

need your help with GREP!

hello!

please help me on this. trying to use grep to find tags in a file called mike.txt.

Within the file I need the information between these two tags

<mike>
</mike>

AND

the data between <mike2> </mike2>

I also need to output the results to a file.

in other words, my file named mike.txt has a bunch of tags and there ARE tags in between the </mike> and the <mike2> tags. I dont want any data in between these , just the data i note above.

Help!!

While grep can look into files and find a regular expression (read search term), it can't do the kind of selection you want. You would probably have to use sed or awk to do the selection. So, use grep to find the beginning tag, awk/sed to select what's between the beginning and ending tags, and either print to screen (stdout) or pipe the output though tee to print to a file.

grepping like grampa

if you have a file that has only the tags <mike> and </mike>
you could get away with 'grep -v mike'.
The -v switch tells it to search for lines that don't contain mike.
Of course if your file has other tags too you may want to aproach
it differently.

For example, if you know that <mike> is always going to appear
on line 25 and the data will only ever be ten lines in length then
you can specify explicitly the lines you want to extract with a tool
like sed;

sed -n '20,30p' rawtaggedfile > mikes.file

mikes.file being the alternative location to stdout.

Statically located data has it's uses and this would be a fine example.

If the tags are at the beginning and the end of a single line then you can
use sed again to delete the first 6 characters and the last 7 characters
like this;

sed 's/......//' rawtaggedfile > tmp1
sed 's/.......$//' tmp1 > mikes.file

or something similar. Passing the output of the first sed to a pipe may not
work because sed can be a little flaky with the memory in my experience.
So I use a tmp file then delete it.

G

actually this is quite easy but you will want to use cgrep and not grep.
cgrep has this very ability built in via delimiters.

In you case I believe you will use:

Code:

cgrep -w"<mike>" +w"<\/mike>" mike.txt > filename

The -w is the top delimiter and the +w is the bottom delimiter. The output will be everything in between (incl. the delimiters themselve). On note: you may need to use a \ in front of each < and > to indicate that these are just regular characters.
cheers:)

looks like I just found myself a new tool to learn all about - cheers nilesso

G

I looked up cgrep.
---quote---
cgrep is a context-grep Perl script for showing the given string with several lines of surrounding text.
---end---
There is also a cgrep.sed.
---quote---
cgrep.sed is a context-grep sed script for showing the given string with several lines of surrounding text. It can also match a pattern that's spread across several lines.
---end---
Anyone have any experience with
the latter?

G

cgrep isn't a perl script. It's a C-binary developed by Bell Labs (Lucent)... I think what you have come across are Perl scripts that use cgrep internally.
cgrep's basic funtionality already allows matching " a pattern that's spread across several lines" It is excellent for parsing log files in which error msg's are multiple lines.

check out this link for cgrep info
and the following to get the source

Hope you find it useful :D

Just goes to show... don't believe everything you google!! lol

Thanks Nilleso

Ginetta

Here is the current man page (Description Section) for more info:

DESCRIPTION
cgrep provides all the features of grep, egrep, and fgrep, with greatly
enhanced performance (see the section on PERFORMANCE) along with many
additional features, one of which is the ability to output the context
(surrounding lines) of the matching lines. The use of cgrep is upward-
compatible with that of grep, egrep (using the -E option), or fgrep
(using the -F option). cgrep searches files for lines matching pat-
terns and normally sends to standard output matching lines, possibly
with a user-specified context window. The window may be specified as a
constant number of lines before and after the matching lines (the num-
ber of context lines before the matching lines and the number after the
matching lines may differ); or by specifying beginning and ending
delimiters (these may differ); or as any combination thereof. The win-
dows need not be delimited at the nearest occurrence of delimiters, as
any number of matches to the beginning and ending delimiters may be
independently specified. The lines delimiting the beginning or end of
the window may independently be either included in or excluded from the
window.

By default, the patterns and delimiters are taken to be limited regular
expressions as in grep; however, full regular expressions as in egrep,
or fixed strings as in fgrep, may also be used.

More than one pattern or delimiter can be specified by enclosing the
entire set of patterns or delimiters within quotes and separating indi-
vidual patterns or delimiters with newlines. More than one pattern or
delimiter can also be specified by using egrep mode (the -E option) and
separating individual patterns or delimiters with `|' . More than one
pattern can also be specified by using the -f option and listing the
patterns, one per line in a file. patterns can also be specified
dynamically from the input itself by use of the -t or +t option. cgrep
also provides two special-purpose options, -R and -T, for scanning
5ESS(R) ROP output.

If no files are specified, standard input is assumed.

cgrep allows restricting matches to whole words or phrases. The sec-
tion, WORD MATCHING, explains this in more detail.

cgrep supports approximate matching (matching with mismatches allowed).
The description of the -A option explains approximate matching in more
detail.

Unlike grep, egrep, or fgrep, cgrep allows the matching of patterns,
delimiters, or trail_patterns that may span multiple lines of text
through the use of literal newline characters. The section, MULTI-LINE
MATCHING, explains this in more detail.

cgrep also supports viewpathing. The section, VIEWPATHING, explains
this in more detail.

Yeah, I re-researched, just a little more thoroughly and have a good grasp of it now.

Thanks for sharing -- nice to see people still like to follow through with as much as possible when they take on the role of tutor :-)

G is for Grasshopper