need your help with GREP!
hello!
please help me on this. trying to use grep to find tags in a file called mike.txt. Within the file I need the information between these two tags <mike> </mike> AND the data between <mike2> </mike2> I also need to output the results to a file. in other words, my file named mike.txt has a bunch of tags and there ARE tags in between the </mike> and the <mike2> tags. I dont want any data in between these , just the data i note above. Help!! |
While grep can look into files and find a regular expression (read search term), it can't do the kind of selection you want. You would probably have to use sed or awk to do the selection. So, use grep to find the beginning tag, awk/sed to select what's between the beginning and ending tags, and either print to screen (stdout) or pipe the output though tee to print to a file.
|
grepping like grampa
if you have a file that has only the tags <mike> and </mike>
you could get away with 'grep -v mike'. The -v switch tells it to search for lines that don't contain mike. Of course if your file has other tags too you may want to aproach it differently. For example, if you know that <mike> is always going to appear on line 25 and the data will only ever be ten lines in length then you can specify explicitly the lines you want to extract with a tool like sed; sed -n '20,30p' rawtaggedfile > mikes.file mikes.file being the alternative location to stdout. Statically located data has it's uses and this would be a fine example. If the tags are at the beginning and the end of a single line then you can use sed again to delete the first 6 characters and the last 7 characters like this; sed 's/......//' rawtaggedfile > tmp1 sed 's/.......$//' tmp1 > mikes.file or something similar. Passing the output of the first sed to a pipe may not work because sed can be a little flaky with the memory in my experience. So I use a tmp file then delete it. G |
actually this is quite easy but you will want to use cgrep and not grep.
cgrep has this very ability built in via delimiters. In you case I believe you will use: Code:
cgrep -w"<mike>" +w"<\/mike>" mike.txt > filename cheers:) |
cgrep
looks like I just found myself a new tool to learn all about - cheers nilesso
G |
I looked up cgrep.
---quote--- cgrep is a context-grep Perl script for showing the given string with several lines of surrounding text. ---end--- There is also a cgrep.sed. ---quote--- cgrep.sed is a context-grep sed script for showing the given string with several lines of surrounding text. It can also match a pattern that's spread across several lines. ---end--- Anyone have any experience with the latter? G |
cgrep isn't a perl script. It's a C-binary developed by Bell Labs (Lucent)... I think what you have come across are Perl scripts that use cgrep internally.
cgrep's basic funtionality already allows matching " a pattern that's spread across several lines" It is excellent for parsing log files in which error msg's are multiple lines. check out this link for cgrep info and the following to get the source Hope you find it useful :D |
Just goes to show... don't believe everything you google!! lol
Thanks Nilleso Ginetta |
Here is the current man page (Description Section) for more info:
DESCRIPTION cgrep provides all the features of grep, egrep, and fgrep, with greatly enhanced performance (see the section on PERFORMANCE) along with many additional features, one of which is the ability to output the context (surrounding lines) of the matching lines. The use of cgrep is upward- compatible with that of grep, egrep (using the -E option), or fgrep (using the -F option). cgrep searches files for lines matching pat- terns and normally sends to standard output matching lines, possibly with a user-specified context window. The window may be specified as a constant number of lines before and after the matching lines (the num- ber of context lines before the matching lines and the number after the matching lines may differ); or by specifying beginning and ending delimiters (these may differ); or as any combination thereof. The win- dows need not be delimited at the nearest occurrence of delimiters, as any number of matches to the beginning and ending delimiters may be independently specified. The lines delimiting the beginning or end of the window may independently be either included in or excluded from the window. By default, the patterns and delimiters are taken to be limited regular expressions as in grep; however, full regular expressions as in egrep, or fixed strings as in fgrep, may also be used. More than one pattern or delimiter can be specified by enclosing the entire set of patterns or delimiters within quotes and separating indi- vidual patterns or delimiters with newlines. More than one pattern or delimiter can also be specified by using egrep mode (the -E option) and separating individual patterns or delimiters with `|' . More than one pattern can also be specified by using the -f option and listing the patterns, one per line in a file. patterns can also be specified dynamically from the input itself by use of the -t or +t option. cgrep also provides two special-purpose options, -R and -T, for scanning 5ESS(R) ROP output. If no files are specified, standard input is assumed. cgrep allows restricting matches to whole words or phrases. The sec- tion, WORD MATCHING, explains this in more detail. cgrep supports approximate matching (matching with mismatches allowed). The description of the -A option explains approximate matching in more detail. Unlike grep, egrep, or fgrep, cgrep allows the matching of patterns, delimiters, or trail_patterns that may span multiple lines of text through the use of literal newline characters. The section, MULTI-LINE MATCHING, explains this in more detail. cgrep also supports viewpathing. The section, VIEWPATHING, explains this in more detail. |
Yeah, I re-researched, just a little more thoroughly and have a good grasp of it now.
Thanks for sharing -- nice to see people still like to follow through with as much as possible when they take on the role of tutor :-) G is for Grasshopper |
All times are GMT -5. The time now is 04:30 AM. |