LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   text manipulating ... the more efficient way to grab a string (https://www.linuxquestions.org/questions/programming-9/text-manipulating-the-more-efficient-way-to-grab-a-string-4175482950/)

socalheel 10-31-2013 02:11 PM

text manipulating ... the more efficient way to grab a string
 
what's the most efficient way to grab B39391FF67D from this line when i grep a file?

Code:

Oct 31 14:06:04 mailserver02 postfix/smtp[6737]: B39391FF67D: to=<spamyamy@yahoo.com>, relay=hostname.filter[123.123.123.123]:25, delay=1.9, delays=0.06/0/0.45/1.4, dsn=2.0.0, status=sent (250 Thanks)
how i normally do it is to one line awk, cut, and sed commands but i know there is a more efficient way to do this.

my method is
Code:

grep -i spam /var/log/maillog | grep "Oct 31 14" | awk '{print $6}' | sed -e 's/://'

szboardstretcher 10-31-2013 02:43 PM

I dont see anything wrong with what you are doing,.. but if you want a smaller command:

Code:

grep -i spam /var/log/maillog | grep "Oct 31 14" | cut -d ":" -f4

socalheel 10-31-2013 02:45 PM

ah maybe that was a bad example, but there are instances where i have all three awk/cut/sed in the same line and i'm not sure if there's a better way to extract what i need.

let me gather up a better example.

socalheel 10-31-2013 02:49 PM

oh i just noticed with your cut -f 4 -d ":" command, that gives me a space in front of my number and i still have to use sed to remove it ...

is that correct?

szboardstretcher 10-31-2013 02:59 PM

I would use your original command. Its nice.

Do you have any reason for looking to tune this? Usually we only do that if we have to search billions of records and such.

druuna 10-31-2013 03:02 PM

It seems you want to have 2 search criteria: spam and Oct 31 14. If both are found you want the B39391FF67D string.

Assuming that the layout of such a line is always the same (i.e. $6 is always the wanted field), have a look at this:
Code:

awk '/Oct 31 14/ && /spam/ { gsub(/:/,"") ; print $6 }' /var/log/maillog
B39391FF67D

If you need a case insensitive search (GNU Awk only...):
Code:

awk 'BEGIN{IGNORECASE=1}/oCt 31 14/ && /SpAm/ { gsub(/:/,"") ; print $6 }' /var/log/maillog
B39391FF67D


socalheel 10-31-2013 03:03 PM

well i have a script that run every hour to grep our maillog for a certain entry, and if that entry is present, do a few other things then email out an alert.

i know it's not too resource intense, but i like to minimize every little thing i can so all these little "resource grabbers" don't grow into something that would cause a headache later.

danielbmartin 10-31-2013 03:04 PM

Consider:
Code:

awk -F":" '{print $4}'
Daniel B. Martin

socalheel 10-31-2013 03:06 PM

for example, i want this line

Oct 31 14:34:17 mailserver02 postfix/smtp[7009]: 3C9341FF9D8: to=<spamyamy@yahoo.com>, relay=outbounds8.obsmtp.com[64.18.7.12]:25, delay=4.5, delays=0.08/0/0.46/3.9, dsn=2.0.0, status=sent (250 Thanks)

to only come back with
3C9341FF9D8 to=spamyamy@yahoo.com

and how i get that stripped down is rather ugly, and i'm not sure it's necessary. here is how i get it:

Code:

grep $MAILID /var/log/maillog | egrep "from=|to=" | egrep -v "osj" | awk '{print $6,$7}' | sed -e 's/,//g' | sed -e 's/://g' | sed -e 's/>//g' | sed -e 's/<//g';done

Firerat 10-31-2013 03:11 PM

Quote:

Originally Posted by szboardstretcher (Post 5056059)
I dont see anything wrong with what you are doing,.. but if you want a smaller command:

Code:

grep -i spam /var/log/maillog | grep "Oct 31 14" | cut -d ":" -f4

you end up with a 'leading' space

Code:

awk -F\: '/^Oct 31 14.*spam.*/{gsub(/ /,"",$4);print $4}' /var/log/maillog
or
Code:

grep "Oct 31 14.*spam.*" /var/log/maillog | cut -d\: -f4 | sed 's/ //'

to 'feed' awk,
Code:

Date="Oct 31"
Hour="14"
String="spam"
awk -F\: '/'"${Date} ${Hour}.*${String}.*"'/{gsub(/ /,"",$4);print $4}' /var/log/maillog

alternate , as a function
Code:

#!/bin/bash
GetSpamID () {
Date="$1 $2"
Hour="$3"
String="$4"
awk -F\: '/'"${Date} ${Hour}.*${String}.*"'/{gsub(/ /,"",$4);print $4}' /var/log/maillog
}

#
GetSpamID Oct 31 14 spam

probably makes sense to further break it down to month day, hour

or, this form

Code:

GetSpamID () {
Prefix="$1"
String="$2"
awk -F\: '/'"${Prefix}.*${String}.*"'/{gsub(/ /,"",$4);print $4}' /var/log/maillog
}

#
GetSpamID "Oct 31 14" spam


druuna 10-31-2013 03:13 PM

Quote:

Originally Posted by socalheel (Post 5056082)
for example, i want this line

Oct 31 14:34:17 mailserver02 postfix/smtp[7009]: 3C9341FF9D8: to=<spamyamy@yahoo.com>, relay=outbounds8.obsmtp.com[64.18.7.12]:25, delay=4.5, delays=0.08/0/0.46/3.9, dsn=2.0.0, status=sent (250 Thanks)

to only come back with
3C9341FF9D8 to=spamyamy@yahoo.com

and how i get that stripped down is rather ugly, and i'm not sure it's necessary. here is how i get it:

Code:

grep $MAILID /var/log/maillog | egrep "from=|to=" | egrep -v "osj" | awk '{print $6,$7}' | sed -e 's/,//g' | sed -e 's/://g' | sed -e 's/>//g' | sed -e 's/<//g';done

Using a modified version of my previously posted command:
Code:

awk '/Oct 31 14/ && /spam/ { gsub(/[:,<>]/,"") ; print $6, $7 }' /var/log/maillog
B39391FF67D to=spamyamy@yahoo.com


Habitual 10-31-2013 03:24 PM

What a really Great Question!

my insanity is apparent when I grep this | grep -v that | cut -d | sed
what a mess.

szboardstretcher 10-31-2013 03:40 PM

Alrighty,.. well, no one has mentioned python yet,..

Code:

import re

f = open('maillog', 'r')
for line in f:
    if not re.search('osj', line):
        if re.search('from=|to=', line):
            clean = re.sub('[:<>,]', '', line)
            split = clean.split()
            print split[5], split[6]

File open uses lazy line reading, so it should work fine on big files.

danielbmartin 10-31-2013 04:44 PM

With this InFile ...
Code:

Oct 30 14:34:17 mailserver02 postfix/smtp[7009]: 3C9341FF9D8: to=<bogus@yahoo.com>, relay=outbounds8.obsmtp.com[64.18.7.12]:25, delay=4.5, delays=0.08/0/0.46/3.9, dsn=2.0.0, status=sent (250 Thanks)
Oct 31 14:34:17 mailserver02 postfix/smtp[7009]: 3C9341FF9D8: to=<spamyamy@yahoo.com>, relay=outbounds8.obsmtp.com[64.18.7.12]:25, delay=4.5, delays=0.08/0/0.46/3.9, dsn=2.0.0, status=sent (250 Thanks)
Nov 01 14:34:17 mailserver02 postfix/smtp[7009]: 3C9341FF9D8: to=<dontwant@yahoo.com>, relay=outbounds8.obsmtp.com[64.18.7.12]:25, delay=4.5, delays=0.08/0/0.46/3.9, dsn=2.0.0, status=sent (250 Thanks)

... this awk ...
Code:

awk 'BEGIN{FS=":|,"} /^Oct 31 14/ {print $4$5}' $InFile >$OutFile
... produced this OutFile ...
Code:

3C9341FF9D8 to=<spamyamy@yahoo.com>
Daniel B. Martin

socalheel 10-31-2013 07:04 PM

man you guys are absolutely amazing ... all these different ways to get the same result and teaches me something as well.

you rock ... thank you.


All times are GMT -5. The time now is 08:36 AM.