Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
please help me on this. trying to use grep to find tags in a file called mike.txt.
Within the file I need the information between these two tags
<mike>
</mike>
AND
the data between <mike2> </mike2>
I also need to output the results to a file.
in other words, my file named mike.txt has a bunch of tags and there ARE tags in between the </mike> and the <mike2> tags. I dont want any data in between these , just the data i note above.
While grep can look into files and find a regular expression (read search term), it can't do the kind of selection you want. You would probably have to use sed or awk to do the selection. So, use grep to find the beginning tag, awk/sed to select what's between the beginning and ending tags, and either print to screen (stdout) or pipe the output though tee to print to a file.
if you have a file that has only the tags <mike> and </mike>
you could get away with 'grep -v mike'.
The -v switch tells it to search for lines that don't contain mike.
Of course if your file has other tags too you may want to aproach
it differently.
For example, if you know that <mike> is always going to appear
on line 25 and the data will only ever be ten lines in length then
you can specify explicitly the lines you want to extract with a tool
like sed;
sed -n '20,30p' rawtaggedfile > mikes.file
mikes.file being the alternative location to stdout.
Statically located data has it's uses and this would be a fine example.
If the tags are at the beginning and the end of a single line then you can
use sed again to delete the first 6 characters and the last 7 characters
like this;
sed 's/......//' rawtaggedfile > tmp1
sed 's/.......$//' tmp1 > mikes.file
or something similar. Passing the output of the first sed to a pipe may not
work because sed can be a little flaky with the memory in my experience.
So I use a tmp file then delete it.
Distribution: ubuntu, RHAS, and other unmentionables
Posts: 372
Rep:
actually this is quite easy but you will want to use cgrep and not grep.
cgrep has this very ability built in via delimiters.
In you case I believe you will use:
Code:
cgrep -w"<mike>" +w"<\/mike>" mike.txt > filename
The -w is the top delimiter and the +w is the bottom delimiter. The output will be everything in between (incl. the delimiters themselve). On note: you may need to use a \ in front of each < and > to indicate that these are just regular characters.
cheers
I looked up cgrep.
---quote---
cgrep is a context-grep Perl script for showing the given string with several lines of surrounding text.
---end---
There is also a cgrep.sed.
---quote---
cgrep.sed is a context-grep sed script for showing the given string with several lines of surrounding text. It can also match a pattern that's spread across several lines.
---end---
Anyone have any experience with
the latter?
Distribution: ubuntu, RHAS, and other unmentionables
Posts: 372
Rep:
cgrep isn't a perl script. It's a C-binary developed by Bell Labs (Lucent)... I think what you have come across are Perl scripts that use cgrep internally.
cgrep's basic funtionality already allows matching " a pattern that's spread across several lines" It is excellent for parsing log files in which error msg's are multiple lines.
Distribution: ubuntu, RHAS, and other unmentionables
Posts: 372
Rep:
Here is the current man page (Description Section) for more info:
DESCRIPTION
cgrep provides all the features of grep, egrep, and fgrep, with greatly
enhanced performance (see the section on PERFORMANCE) along with many
additional features, one of which is the ability to output the context
(surrounding lines) of the matching lines. The use of cgrep is upward-
compatible with that of grep, egrep (using the -E option), or fgrep
(using the -F option). cgrep searches files for lines matching pat-
terns and normally sends to standard output matching lines, possibly
with a user-specified context window. The window may be specified as a
constant number of lines before and after the matching lines (the num-
ber of context lines before the matching lines and the number after the
matching lines may differ); or by specifying beginning and ending
delimiters (these may differ); or as any combination thereof. The win-
dows need not be delimited at the nearest occurrence of delimiters, as
any number of matches to the beginning and ending delimiters may be
independently specified. The lines delimiting the beginning or end of
the window may independently be either included in or excluded from the
window.
By default, the patterns and delimiters are taken to be limited regular
expressions as in grep; however, full regular expressions as in egrep,
or fixed strings as in fgrep, may also be used.
More than one pattern or delimiter can be specified by enclosing the
entire set of patterns or delimiters within quotes and separating indi-
vidual patterns or delimiters with newlines. More than one pattern or
delimiter can also be specified by using egrep mode (the -E option) and
separating individual patterns or delimiters with `|' . More than one
pattern can also be specified by using the -f option and listing the
patterns, one per line in a file. patterns can also be specified
dynamically from the input itself by use of the -t or +t option. cgrep
also provides two special-purpose options, -R and -T, for scanning
5ESS(R) ROP output.
If no files are specified, standard input is assumed.
cgrep allows restricting matches to whole words or phrases. The sec-
tion, WORD MATCHING, explains this in more detail.
cgrep supports approximate matching (matching with mismatches allowed).
The description of the -A option explains approximate matching in more
detail.
Unlike grep, egrep, or fgrep, cgrep allows the matching of patterns,
delimiters, or trail_patterns that may span multiple lines of text
through the use of literal newline characters. The section, MULTI-LINE
MATCHING, explains this in more detail.
cgrep also supports viewpathing. The section, VIEWPATHING, explains
this in more detail.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.