LinuxQuestions.org
LinuxAnswers - the LQ Linux tutorial section.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices

Reply
 
Search this Thread
Old 12-31-2009, 12:41 PM   #1
bridrod
Member
 
Registered: Aug 2009
Distribution: SLES, openSUSE
Posts: 39

Rep: Reputation: 15
Clean Up Log - Search for Pattern in Log file and Output result


Wow, I got a good challenge at work and not sure how to semi-automate the process.

We were attacked by a new trojan. By the time the fix was out hundreds of machines were infected. We resolved the bulk of the issue but we still have a handful of machines popping up "trying" to use our SMTP servers to spread the bad.

Well, in any given day I have more than a million lines of logs to go thru and help security locate the bad apples out there. Right now the trojan is not spreading because I locked down connection to our SMTP servers.

So, here is a sample from the LOG:

09:37:31 058 DMN: MSG 14421742 Accepted connection: [192.268.1.100] (dns_for_the_machine)
09:37:31 058 DMN: MSG 14421742 Refused sender: email@domain.com (4)
09:37:31 058 DMN: MSG 14421742 SMTP session ended: [192.268.1.100] (dns_for_the_machine)

Among other good connections in the log, the infected machines will show up as the sample above. It shows the bad apple being rejected.

The variants are: timestamp, IP address, DNS for machine [this one might show up empty as "()"] and email address
What's static: "Accepted connection", "Refused sender" and "SMTP session ended"

So the idea is to "filter" and output to a file all the bad apples. I thought about maybe a way I could search for the phrase "Refused sender". If found, read the "MSG xxxxxxxx" line. Now use that line to search within the log for matching lines, since they are unique per connection. Now grab all those lines and output to a file. Keep doing that appending to file.

Well, I somewhat have the concept but no clue on how to make it happen.

You can figure out I am doing a LOT of manual cleanup trying to identify the bad machines...

Thanks for even reading this!

-Rod

Last edited by bridrod; 12-31-2009 at 12:44 PM.
 
Old 12-31-2009, 01:12 PM   #2
Dave_Devnull
Member
 
Registered: May 2009
Posts: 142

Rep: Reputation: 24
Perhaps I'm missing something but what about something along these lines:

zgrep -e 'Refused sender' /var/log/mail.info
zgrep -e 'Refused sender' /var/log/mail.info*
zgrep -e 'Refused sender' /var/log/mail.info > your.output.file
zgrep -e 'Refused sender' /var/log/mail.info* > your.output.file
 
Old 12-31-2009, 01:32 PM   #3
bridrod
Member
 
Registered: Aug 2009
Distribution: SLES, openSUSE
Posts: 39

Original Poster
Rep: Reputation: 15
Quote:
Originally Posted by Dave_Devnull View Post
Perhaps I'm missing something but what about something along these lines:

zgrep -e 'Refused sender' /var/log/mail.info
zgrep -e 'Refused sender' /var/log/mail.info*
zgrep -e 'Refused sender' /var/log/mail.info > your.output.file
zgrep -e 'Refused sender' /var/log/mail.info* > your.output.file
Cool. Thx. That's a start. In your example it's only going to grab "Refused sender". Definitely missing the other lines.

I really think "conditional pattern search" is very hard to accomplish.

-Rod
 
Old 12-31-2009, 02:38 PM   #4
bridrod
Member
 
Registered: Aug 2009
Distribution: SLES, openSUSE
Posts: 39

Original Poster
Rep: Reputation: 15
Well, this is how far I was able to get...

Step#1 - Searching for pattern "Refused sender":

zgrep -e 'Refused sender' smtptest.txt
11:52:28 688 DMN: MSG 110724 Refused sender: email1@domain.com (4)
11:52:28 688 DMN: MSG 110725 Refused sender: email2@domain.com (4)

OR

awk '/Refused/ {print}' smtptest.txt
11:52:28 688 DMN: MSG 110724 Refused sender: email1@domain.com (4)
11:52:28 688 DMN: MSG 110725 Refused sender: email2@domain.com (4)

Step#2 - Extracting forth word ($4) and fifth word ($5) from the lines that match "Refused" pattern:

awk '/Refused/ {print $4,$5}' smtptest.txt
MSG 110724
MSG 110725
.
.
.

Next step is to figure out how to use the found strings i.e.: "MSG 110724" as my new pattern for AWK looking in smtptest.txt and output to another file (i.e.: smtptest2.txt). I suppose I need a loop script to use so that each string found in Step#2 will be used for searching, appending the result to smtptest2.txt.

-Rod
 
Old 12-31-2009, 06:47 PM   #5
bridrod
Member
 
Registered: Aug 2009
Distribution: SLES, openSUSE
Posts: 39

Original Poster
Rep: Reputation: 15
Wow, I might be very close. So I need to read each line ($0) from file1 and use as search string in file2 and output to file3.

Anyone?
 
Old 01-01-2010, 02:48 AM   #6
Dave_Devnull
Member
 
Registered: May 2009
Posts: 142

Rep: Reputation: 24
For a match of one of those three phases simple regex with one of the grep family will do the job. Here zgrep with a wildcard will go through all the mail.info logs, including the .x.gz history/rotated.

zgrep -E "(Accepted connection|Refused sender|SMTP session ended)" /var/log/mail.info*
zgrep -E "(Accepted connection|Refused sender|SMTP session ended)" /var/log/mail.info* > your.output.file

I'm not entirely clear on what you are trying to do, so I'm limited in the advice I can offer. To do anything useful I'd need to be clear on the exact patterns you are seeking, any variables/circumstances in which to find them, the files you are seeking them in and what you want to do with the output.

Personally I prefer to use Perl for log churning, but you may be able to knock up what you want with (family)GREP, AWK or SED or a shell script.

Familiarise yourself with Regular Expressions - and you'll find your holy grail.
http://www.gsp.com/cgi-bin/man.cgi?s...=1&topic=zgrep

Good luck.
 
Old 01-01-2010, 12:13 PM   #7
bridrod
Member
 
Registered: Aug 2009
Distribution: SLES, openSUSE
Posts: 39

Original Poster
Rep: Reputation: 15
Quote:
Originally Posted by Dave_Devnull View Post
For a match of one of those three phases simple regex with one of the grep family will do the job. Here zgrep with a wildcard will go through all the mail.info logs, including the .x.gz history/rotated.

zgrep -E "(Accepted connection|Refused sender|SMTP session ended)" /var/log/mail.info*
zgrep -E "(Accepted connection|Refused sender|SMTP session ended)" /var/log/mail.info* > your.output.file

I'm not entirely clear on what you are trying to do, so I'm limited in the advice I can offer. To do anything useful I'd need to be clear on the exact patterns you are seeking, any variables/circumstances in which to find them, the files you are seeking them in and what you want to do with the output.

Personally I prefer to use Perl for log churning, but you may be able to knock up what you want with (family)GREP, AWK or SED or a shell script.

Familiarise yourself with Regular Expressions - and you'll find your holy grail.
http://www.gsp.com/cgi-bin/man.cgi?s...=1&topic=zgrep

Good luck.
Believe me. I've tried. For some people things digest so easily. Definitely not my case.

The reason I am trying a two step approach (extract the lines that have "refused sender" on them and then extracting the MSG ID of those lines so I can use the MSG ID to then search for any lines that contain that MSG ID) is because the original LOG (i merged the logs so I only have to look in one single file) contains good and bad connections (trojan) and I want to extract all the SMTP traffic related to the BAD connections. The only common thing identifying the connection is the MSG ID found in the "Refused sender" line. So my previous example:

09:37:31 058 DMN: MSG 14421742 Accepted connection: [192.268.1.100] (dns_for_the_machine)
09:37:31 058 DMN: MSG 14421742 Refused sender: email@domain.com (4)
09:37:31 058 DMN: MSG 14421742 SMTP session ended: [192.268.1.100] (dns_for_the_machine)

These are the lines identifying the BAD connections. First the connection is accepted, then it's refused and then SMTP is dropped.

So, for each new connection there is a new MSG ID (i.e.: MSG 14421742).

I am able to extract the line "Refused sender" only found on BAD connections. Now to extract the rest of the information for that connection I need to search for the MSG ID related to the "Refused server" line. So I would somehow now search for "MSG 14421742" and three lines would show up as shown above. Output that to a another file.

Loop it so it will digest the whole merged LOG and extract all the info for BAD connections.

Hope I made it clearer now and thanks for your suggestion.

Happy new Year!

-Rod
 
Old 01-04-2010, 09:55 PM   #8
bridrod
Member
 
Registered: Aug 2009
Distribution: SLES, openSUSE
Posts: 39

Original Poster
Rep: Reputation: 15
I am still lost. I have been reading a lot and still haven't figured it out. "It seems" the solution is to use array (that alone is a can of worms!!!). I went thru many different samples and tutorials and I just can't make it work. Here is how my files are...

file1.txt:

110724
110725

file2.txt:

11:52:10 688 DMN: MSG 110722 Accepted connection: [172.31.35.203] (dc-1044a.central.nychhc.org)
11:52:10 688 DMN: MSG 110722 SMTP session ended: [172.31.35.203] (dc-1044a.central.domain.com)
11:52:13 688 DMN: MSG 110723 Accepted connection: [172.31.35.203] (dc-1044a.central.domain.com)
11:52:13 688 DMN: MSG 110723 SMTP session ended: [172.31.35.203] (dc-1044a.central.domain.com)
11:52:21 488 IMAP4 session ended: 172.23.75.28
11:52:28 688 DMN: MSG 110724 Accepted connection: [172.21.109.179] (160-9fl-fl.central.domain.com)
11:52:28 688 DMN: MSG 110724 Refused sender: email1@domain1.com (4)
11:52:28 688 DMN: MSG 110724 SMTP session ended: [172.21.109.179] (160-9fl-fl.central.domain.com)
11:52:28 688 DMN: MSG 110725 Accepted connection: [172.17.228.250] (kchc-sob265-pc6.kchdomain.local)
11:52:28 688 DMN: MSG 110725 Refused sender: email2@domain2.com (4)
11:52:28 688 DMN: MSG 110725 SMTP session ended: [172.17.228.250] (kchc-sob265-pc6.kchdomain.local)

Now, how I want my output file (file3.txt):

11:52:28 688 DMN: MSG 110724 Accepted connection: [172.21.109.179] (160-9fl-fl.central.domain.com)
11:52:28 688 DMN: MSG 110724 Refused sender: email1@domain1.com (4)
11:52:28 688 DMN: MSG 110724 SMTP session ended: [172.21.109.179] (160-9fl-fl.central.domain.com)
11:52:28 688 DMN: MSG 110725 Accepted connection: [172.17.228.250] (kchc-sob265-pc6.kchdomain.local)
11:52:28 688 DMN: MSG 110725 Refused sender: email2@domain2.com (4)
11:52:28 688 DMN: MSG 110725 SMTP session ended: [172.17.228.250] (kchc-sob265-pc6.kchdomain.local)

Now, I tried awk with while getline, if i++, and variants and can't get it to work. I even tried perl.

Anyone to shed a light, please?

Last edited by bridrod; 01-04-2010 at 09:56 PM. Reason: spellchecking
 
Old 01-04-2010, 10:09 PM   #9
chrism01
Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.5, Centos 5.10
Posts: 16,269

Rep: Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028
Code:
for id in $(cat file1.txt)
do
    grep $id file2.txt >file3.txt
done
assuming file1 = unique list of bad ids, file2 = orig big_bad_log and file3 = reqd output, IIUC.
Personally I'd have done it in one pass in Perl, much quicker processing, but for the moment, this sounds like a one-off request.

HTH
 
Old 01-05-2010, 07:55 AM   #10
bridrod
Member
 
Registered: Aug 2009
Distribution: SLES, openSUSE
Posts: 39

Original Poster
Rep: Reputation: 15
Quote:
Originally Posted by chrism01 View Post
Code:
for id in $(cat file1.txt)
do
    grep $id file2.txt >file3.txt
done
assuming file1 = unique list of bad ids, file2 = orig big_bad_log and file3 = reqd output, IIUC.
Personally I'd have done it in one pass in Perl, much quicker processing, but for the moment, this sounds like a one-off request.

HTH
Almost there! Thanks! But it seems its overwriting the results. Your assumption above is all correct, but I end up with only the last part of the file3.txt:

11:52:28 688 DMN: MSG 110725 Accepted connection: [172.17.228.250] (kchc-sob265-pc6.kchdomain.local)
11:52:28 688 DMN: MSG 110725 Refused sender: email2@domain2.com (4)
11:52:28 688 DMN: MSG 110725 SMTP session ended: [172.17.228.250] (kchc-sob265-pc6.kchdomain.local)
 
Old 01-05-2010, 09:49 AM   #11
bridrod
Member
 
Registered: Aug 2009
Distribution: SLES, openSUSE
Posts: 39

Original Poster
Rep: Reputation: 15
Thumbs up

Quote:
Originally Posted by bridrod View Post
Almost there! Thanks! But it seems its overwriting the results. Your assumption above is all correct, but I end up with only the last part of the file3.txt:

11:52:28 688 DMN: MSG 110725 Accepted connection: [172.17.228.250] (kchc-sob265-pc6.kchdomain.local)
11:52:28 688 DMN: MSG 110725 Refused sender: email2@domain2.com (4)
11:52:28 688 DMN: MSG 110725 SMTP session ended: [172.17.228.250] (kchc-sob265-pc6.kchdomain.local)
Ok. Got it working. Slightly changing your code:

Code:
for id in $(cat file1.txt)
do
    grep $id file2.txt >>file3.txt
done
So an extra ">" made it append instead of overwrite! Thanks a bunch!

I am still curious to make it work with AWK. Seems like a wonderful tool!

Last edited by bridrod; 01-05-2010 at 09:50 AM. Reason: Additional question
 
  


Reply

Tags
conditional, file, output, pattern, result, search


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
How to search logs between two timestamps in a log file. ram.venkat84 Linux - Newbie 2 11-19-2009 01:04 AM
In Apache server, How to change log file location and log format for access log fil? since1993 Linux - Server 1 08-19-2009 04:14 PM
How do i log GNU screen output to a file ? RadishTP Linux - Newbie 3 06-12-2009 02:35 PM
How i can Clean up the log file of proxy? AZIMBD03 Red Hat 4 10-10-2003 08:27 AM
Log Screen Output to a file sdandeker Linux - Newbie 3 09-17-2003 02:57 AM


All times are GMT -5. The time now is 07:28 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration