LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (http://www.linuxquestions.org/questions/programming-9/)
-   -   Extracting and printing only part of the line with awk or sed (http://www.linuxquestions.org/questions/programming-9/extracting-and-printing-only-part-of-the-line-with-awk-or-sed-4175454059/)

Alvin88 03-14-2013 09:24 AM

Extracting and printing only part of the line with awk or sed
 
Hi all,

I got an issue with regular expression. Here is the deal:

Log file on my server looks like this - let's call it LOGFILE.log:
2013-03-14 02:14:18 GMT comment server INFO 200 - ModuleMediaCasterStreamMonitorAdvanced.onValidateMediaCaster[rtpinput1/_definst_] Stream not healthy [stream startup timeout]: flv:STREAM--NAME.stream - - - 55485.773 - - - - - - - -- - - - - - - - - - - - - - - - -
2013-03-14 02:14:18 GMT comment server INFO 200 - ModuleMediaCasterStreamMonitorAdvanced.onValidateMediaCaster[rtpinput1/_definst_] Stream not healthy [stream startup timeout]: flv:STREAM--NAME.stream - - - 55485.773 - - - - - - -- - - - - - - - - - - - - - - - - -
2013-03-14 02:14:18 GMT comment server INFO 200 - ModuleMediaCasterStreamMonitorAdvanced.onValidateMediaCaster[rtpinput1/_definst_] Stream not healthy [stream startup timeout]: flv:STREAM--NAME.stream - - - 55485.774 - - - - - - - -- - - - - - - - - - - - - - - - -

I would like to extract the part STREAM--NAME, how can I do this?

I try to write something like: flv:.*\.stream, but it doesn't help much.

My solution so far:
egrep "Stream not healthy " LOGFILE.log | cut -d ':' -f5 | awk -F"." '{print $1}'
as I need to do this only for the lines which contains string "Stream not healthy ", hence the egrep as the first filter.

The issue is that this pattern --> STREAM--NAME is not always in the same place in the log file - 99% the command above will work fine, but I would like to match this pattern, wherever it will be in the line in this LOGFILE.log.

Question no 1: How can I write a regular expression to print only the part STREAM--NAME from any place in the log line? :scratch:

Question no 2: How can also extract, in the same one-liner - date and time? So, the output will look like:
2013-03-14 02:14:18 STREAM--NAME
For this one I have completly no idea how to shorten the line for LOGFILE.log. Date and time will always be the first characters in the line - like above, but the location of the pattern STREAM--NAME might be in a diffrent place - (but after the date and time for sure) hence the need for the command which will extract it and print it. :scratch:

Question no 3: Could you recommend any good pages, books or magazines where I can read about it? I do not have much experience with regular expresion, especailly connected with awk or sed. So i can learn and help others form time to time? :cool:

Can I ask you chaps for help on this one, or some hints how to achieve my goal?

danielbmartin 03-14-2013 11:58 AM

This code ...
Code:

sed 's/GMT/~/;s/flv:/~/' $InFile  \
|awk -F~ '{print $1 $3}'          \
> $OutFile

... produced this output ...
Code:

2013-03-14 02:14:18 STREAM--NAME.stream - - - 55485.773 - - - - - - - -- - - - - - - - - - - - - - - - -
2013-03-14 02:14:18 STREAM--NAME.stream - - - 55485.773 - - - - - - -- - - - - - - - - - - - - - - - - -
2013-03-14 02:14:18 STREAM--NAME.stream - - - 55485.774 - - - - - - - -- - - - - - - - - - - - - - - - -

Daniel B. Martin

Alvin88 03-14-2013 02:45 PM

Many thanks, works like a charm.

I found something like:
grep -E -o 'v:.*\.s' | awk -F"[:.]" '{print $2}'

but i really didn't know how to do the trick with date, time, and STREAMNAME.

Your solution is much better.

After a moment, and a cup of tea I understood the trick here:
sed 's/GMT/~/;s/flv:/~/;s/\.stream/~/' $InFile | awk -F~ '{print $1 $3}'
gave me:
2013-03-14 18:23:06 STREAMNAME
and this one can be marked as solved.

Mr Daniel B. Martin, can I buy you a beer, if you live in London or somewhere near by?

Once again - many thank.

danielbmartin 03-14-2013 03:35 PM

Quote:

Originally Posted by Alvin88 (Post 4911677)
After a moment, and a cup of tea I understood the trick ...

Solutions are usually offered with the assumption the Original Poster will understand. However, there is no shame in making a follow-up post to ask for an explanation.
Quote:

Mr Daniel B. Martin, can I buy you a beer, if you live in London or somewhere near by?
Apex NC USA is some distance from London, so I thank you for the offer but respectfully decline.

Daniel B. Martin

grail 03-15-2013 01:25 AM

Another alternative:
Code:

grep -Po '^.*(?= GMT)|(?<=flv:)[^.]*' file | awk 'ORS=NR%2?" ":"\n"'

chrism01 03-15-2013 02:38 AM

Re qn 3: http://www.grymoire.com/Unix/
Enjoy :)

David the H. 03-15-2013 10:39 AM

Please use ***[code][/code]*** tags around your code and data, to preserve the original formatting and to improve readability. Do not use quote tags, bolding, colors, "start/end" lines, or other creative techniques.

Here's my addition to the list of options:
Code:

sed -rn '/not healthy/ s/^([0-9-]+[ ][0-9:]+).*flv:(.+)[.]stream.*/\1 \2/p' logfile
This first matches lines containing "not healthy". The substitution then captures the date string to the first backreference, and the part between "flv:" and ".stream" to the second backreference. The rest of the line is discarded.

Naturally it assumes there will be only one instance of each pattern on the line.


All times are GMT -5. The time now is 10:03 PM.