LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - General (http://www.linuxquestions.org/questions/linux-general-1/)
-   -   Awk - How to print match instead of whole line (http://www.linuxquestions.org/questions/linux-general-1/awk-how-to-print-match-instead-of-whole-line-909845/)

protocol 10-24-2011 11:40 AM

Awk - How to print match instead of whole line
 
I wish to use awk in order to extract email addresses from a log file. I wrote the proper regular expression, however it prints the whole line, while i want it to only print the match (in my case, the email address). Can it be done?

Here is my code:

tail -f /var/log/mail.log | awk '/\ to=([^@]+@[^,:]+)/ { print $7 }'

... where $7 is useless for my goal, it only serves as something to print during testing.

Please assist.

Panos

crts 10-24-2011 11:58 AM

Hi,

please provide some sample data. It is hard to guess what is wrong without seeing the input and output.

protocol 10-24-2011 12:25 PM

Of course.

A standard line from my mail server's log looks like this:

Code:

Oct 24 19:20:27 server postfix/pipe[31099]: 236041EA4AC0: to=<g.pavlakis@domain.com>, relay=dovecot, delay=0.81, delays=0.8/0/0/0.01, dsn=2.0.0, status=sent (delivered via dovecot service)
I wish to extract the :

Code:

g.pavlakis@domain.com
however, the awk line i posted will print the seventh segment of the line (which sometimes indeed contains the mail address, but sometimes it does not - for example, postfix might add a couple more segments to the line, making the seventh segment irrelevant). So, what i really need, is for postfix to print what it actually matches, not the nth segment or the complete line that contains the match.

crts 10-24-2011 01:03 PM

Well, you can surely do this with awk. However, how about a sed solution? It is a bit easier in this case since it is RegEx oriented while awk is field oriented. This worked with your sample:
Code:

sed -r 's/.*to=<([^>]*)>.*/\1/'
This assumes that the address will be inside to=<user@domain>. Also, I notice that you are posting from a Mac in your profile column on the left. If you want to do this on a Mac then we will have to change the sed a bit since the above solution requires GNU sed and as far as I know it is not available by default on a Mac.

protocol 10-24-2011 01:07 PM

Quote:

Originally Posted by crts (Post 4506775)
Well, you can surely do this with awk. However, how about a sed solution? It is a bit easier in Also, I notice that you are posting from a Mac in your profile column on the left. If you want to do this on a Mac then we will have to change the sed a bit since the above solution requires GNU sed and as far as I know it is not available by default on a Mac.

Thanks for noticing: it is something i work on a Linux machine i have at the office (i am connected to it via SSH). So it is not for the Mac.

Will your sed solution allow me to assign the found/matched email address to some variable and call an external script for further processing? I mean, after i find the address, i was thinking that perhaps i could "post" it to some other script for e.g. inserting it in a database or something. Is that possible?

crts 10-24-2011 01:14 PM

Quote:

Originally Posted by protocol (Post 4506779)
Will your sed solution allow me to assign the found/matched email address to some variable and call an external script for further processing? I mean, after i find the address, i was thinking that perhaps i could "post" it to some other script for e.g. inserting it in a database or something. Is that possible?

Sure. Just call it like
Code:

variable=$(sed -r 's/.*to=<([^>]*)>.*/\1/' filename)
Notice, that there mustn't be any space between variable and the command substitution $().

protocol 10-24-2011 01:20 PM

Lovely.

One thing though.. i run your see code, along with my tail -f (contained in the original one-liner i posted) and it behaves as follows:

a) The first time it matches.. yes it prints the email address alone.
b) All subsequent times it matches, it prints the complete log line that contains the match, exactly as my faulty awk was doing.

Did i miss something? The whole point is that it keeps extracting email addresses as they appear in the log files, in real time.

Panos

crts 10-24-2011 01:26 PM

Quote:

Originally Posted by protocol (Post 4506794)
Lovely.

One thing though.. i run your see code, along with my tail -f (contained in the original one-liner i posted) and it behaves as follows:

a) The first time it matches.. yes it prints the email address alone.
b) All subsequent times it matches, it prints the complete log line that contains the match, exactly as my faulty awk was doing.

Did i miss something? The whole point is that it keeps extracting email addresses as they appear in the log files, in real time.

Panos

Nope, you did not miss anything. By default, sed prints the entire line. A small modification will correct this:
Code:

variable=$(sed -nr 's/.*to=<([^>]*)>.*/\1/p' filename)
The -n option tells sed to not print anything unless specifically instructed to. The 'p' flag at the end of 's///' does exactly that. It only prints when the 's///' command makes a substitution.

protocol 10-24-2011 01:30 PM

Sincere thanks.
Problem SOLVED.


All times are GMT -5. The time now is 02:51 AM.