LinuxQuestions.org
Go Job Hunting at the LQ Job Marketplace
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 04-20-2011, 03:50 PM   #1
Jekikii
LQ Newbie
 
Registered: Apr 2011
Posts: 3

Rep: Reputation: 0
using grep, awk or sed to find and cut data


Hi,
I am trying to analyze problems associated with my "maillog" file
and so am using BASH to try and get details out of this 320Mb file !

This line - and many similar, seems to be a problem with an incoming mail.

Quote:
Apr 19 23:15:08 heavyhoster postfix/smtpd[1895]: NOQUEUE: reject: RCPT from unknown[194.169.226.178]: 550 5.1.1 <harry@intaccs.com>: Recipient address rejected: User unknown in virtual alias table; from=<aflcmol@yahoo.co.uk> to=<harry@intaccs.com> proto=SMTP helo=<ip-194.169.226.178.um.krosno.pl>

The address harry@intaccs.com is not known by postfix
( the domain "intaccs.com" is mine - but not sure where that username comes from !! )

I need to extract that sender email address.

How would I search for "alias table; from=<" and cut from there until the ">"
so that I end up with the email address ?


This must the sort of thing thats quite common for BASH .

But after spending hours looking at grep, awk and sed videos and
tute I still can not find one that show how to find and cut data out of the stream.

Of course, I am still searching and reading.

However, ANY help or pointers would be appreciated

Thanks.
 
Old 04-20-2011, 04:28 PM   #2
konstan
LQ Newbie
 
Registered: Feb 2011
Posts: 4

Rep: Reputation: 0
I'm not an expert in the tools but here is my first solution. I just dumped your example line into file.log a number of times interleaved with some other garbage. Here is the result. Not sure though how performant it will be on 32M file

Code:
$ awk '/alias table; from=</' file.log | sed -e 's/^.*alias table; from=<//' -e 's/>.*$//' | sort | uniq -c
  11 aflcmol@yahoo.co.uk
 
Old 04-21-2011, 01:39 AM   #3
Jekikii
LQ Newbie
 
Registered: Apr 2011
Posts: 3

Original Poster
Rep: Reputation: 0
What do you mean "I'm not an expert" !!

Looks pretty cool to me

I must admit I only found out about BASH and sed and awk yesterday but
after hours of reading, I didn't see anything about alias table;

Anyway I just Google'd it and it gave me "Advanced Bash-Scripting Guide"
by Mendel Cooper.

Looks like a great online tutorial.

I guess you are using the alias table; because you want to perform the sort
on the data ?

But what if I needed to take out two other elements from the lines - the date and the to: address

So I end up with a table :

Date/time,to address,from address
Apr 19 23:15:08,harry@intaccs.com,aflcmol@yahoo.co.uk
Apr 19 23:15:08,harry@intaccs.com,someone@example.com
Apr 19 23:15:08,jack@intaccs.com,someone@example.com

Would your solution with the alias table; still work or would everything get
jumbled up ?
 
Old 04-21-2011, 02:21 AM   #4
bash-o-logist
LQ Newbie
 
Registered: Apr 2011
Posts: 2

Rep: Reputation: 0
Code:
#!/bin/bash
declare -a array
while read -r a b c d
do
    array=("${d##*alias table; from=<}")
    for i in ${array[@]}
    do
        case "$i" in
            *"@"* )
                addr="${i//[><]/}"
                addr="${addr/to=/}"
                echo "addr: $addr"
                ;;
        esac
    done
done < file
 
Old 04-21-2011, 02:44 AM   #5
Jekikii
LQ Newbie
 
Registered: Apr 2011
Posts: 3

Original Poster
Rep: Reputation: 0
Hello,
I have been playing with awk and sed


Quote:
awk '/Apr 19/ && /550 5.1.1/' maillog |sed 's/^.\+ from=//g' |sed 's/proto.*$//g' >listmail.txt
My listmail.txt file looks like:

Quote:
<suppo88131@yahoo.com> to=<greg@intaccs.com>
<vylvsuk@yahoo.co.uk> to=<hes@intaccs.com>
<fdqkirq@yahoo.co.uk> to=<pete@intaccs.com>
<aflcmol@yahoo.co.uk> to=<harry@intaccs.com>
This happens naturally because the two addresses are next to each in the original maillog.

This isn't too bad because I can take off the "<" and ">" and the "to="
once I get the file into am in php.

Ideally it would look like:

Quote:
suppo88131@yahoo.com,greg@intaccs.com
vylvsuk@yahoo.co.uk,hes@intaccs.com
fdqkirq@yahoo.co.uk,pete@intaccs.com
aflcmol@yahoo.co.uk,harry@intaccs.com
i.e. a comma separated file.


Next, I just want to grab the first 6 characters, as this
will give me the date e.g. Feb 28, Mar 1, Apr 19 etc.

So from lines that start with :
Apr 19 23:15:08 heavyhoster

I can get the date: "Apr 19"

When I put this all together I can get a file like this:

Quote:
Apr 19,suppo88131@yahoo.com,greg@intaccs.com
Apr 19,vylvsuk@yahoo.co.uk,hes@intaccs.com
Apr 19,fdqkirq@yahoo.co.uk,pete@intaccs.com
Apr 19,aflcmol@yahoo.co.uk,harry@intaccs.com
Then when I take out the Apr 19/ && from my awk,
I should get a list of senders of undelivered emails to me
with the dates.

Any more hints for getting a bit closer to this ?

Thanks again.

Last edited by Jekikii; 04-21-2011 at 02:45 AM.
 
Old 04-21-2011, 03:16 AM   #6
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,629

Rep: Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947
How about something like:
Code:
awk 'BEGIN{RS="[<> ]"}/^from=$/{getline; print}' file
I haven't had a chance to read your new requirements ... will look a little later
 
Old 04-21-2011, 04:21 AM   #7
dugan
Senior Member
 
Registered: Nov 2003
Location: Canada
Distribution: distro hopper
Posts: 4,880

Rep: Reputation: 1524Reputation: 1524Reputation: 1524Reputation: 1524Reputation: 1524Reputation: 1524Reputation: 1524Reputation: 1524Reputation: 1524Reputation: 1524Reputation: 1524
Assuming the line is in an environment variable called LINE:

Code:
echo $LINE | cut -d "<" -f 3 | cut -d ">" -f 1
will give you what you asked for in the first post.

Last edited by dugan; 04-21-2011 at 11:19 AM.
 
Old 04-21-2011, 09:35 AM   #8
MTK358
LQ 5k Club
 
Registered: Sep 2009
Posts: 6,443
Blog Entries: 3

Rep: Reputation: 714Reputation: 714Reputation: 714Reputation: 714Reputation: 714Reputation: 714Reputation: 714
Code:
sed 's/alias table; from=<\([-%_.A-Za-z0-9]\+@[-a-zA-Z0-9.]\+\.[A-Za-z]\{2,4\}\)>/\1/'
 
Old 04-21-2011, 10:48 AM   #9
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,629

Rep: Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947
Would it be possible to see a few more lines of the file in post #1? I may be able to alter mine further to get the details but I need to see more of the pattern.
Currently I can give you both to and from easily.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
sed/awk/cut/grep.. looking for the best solution voda87 Programming 4 03-03-2011 08:59 AM
using SED or AWK to cut data from a file, between certain characters hugh86 Linux - Newbie 18 10-28-2010 05:17 AM
Get data from multi lined text file using awk, sed or perl - grep & cut not upto par cam34 Programming 4 07-02-2010 04:10 AM
How to use command grep,cut,awk to cut a data from a file? hocheetiong Linux - Newbie 7 09-11-2008 08:16 PM
sed/awk/grep for multiple line data hotrodmacman Programming 8 10-18-2007 12:06 PM


All times are GMT -5. The time now is 12:31 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration