LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (http://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Clean Up Log - Search for Pattern in Log file and Output result (http://www.linuxquestions.org/questions/linux-newbie-8/clean-up-log-search-for-pattern-in-log-file-and-output-result-779109/)

bridrod 12-31-2009 12:41 PM

Clean Up Log - Search for Pattern in Log file and Output result
 
Wow, I got a good challenge at work and not sure how to semi-automate the process.

We were attacked by a new trojan. By the time the fix was out hundreds of machines were infected. We resolved the bulk of the issue but we still have a handful of machines popping up "trying" to use our SMTP servers to spread the bad.

Well, in any given day I have more than a million lines of logs to go thru and help security locate the bad apples out there. Right now the trojan is not spreading because I locked down connection to our SMTP servers.

So, here is a sample from the LOG:

09:37:31 058 DMN: MSG 14421742 Accepted connection: [192.268.1.100] (dns_for_the_machine)
09:37:31 058 DMN: MSG 14421742 Refused sender: email@domain.com (4)
09:37:31 058 DMN: MSG 14421742 SMTP session ended: [192.268.1.100] (dns_for_the_machine)

Among other good connections in the log, the infected machines will show up as the sample above. It shows the bad apple being rejected.

The variants are: timestamp, IP address, DNS for machine [this one might show up empty as "()"] and email address
What's static: "Accepted connection", "Refused sender" and "SMTP session ended"

So the idea is to "filter" and output to a file all the bad apples. I thought about maybe a way I could search for the phrase "Refused sender". If found, read the "MSG xxxxxxxx" line. Now use that line to search within the log for matching lines, since they are unique per connection. Now grab all those lines and output to a file. Keep doing that appending to file.

Well, I somewhat have the concept but no clue on how to make it happen.

You can figure out I am doing a LOT of manual cleanup trying to identify the bad machines... :cry:

Thanks for even reading this!

-Rod

Dave_Devnull 12-31-2009 01:12 PM

Perhaps I'm missing something but what about something along these lines:

zgrep -e 'Refused sender' /var/log/mail.info
zgrep -e 'Refused sender' /var/log/mail.info*
zgrep -e 'Refused sender' /var/log/mail.info > your.output.file
zgrep -e 'Refused sender' /var/log/mail.info* > your.output.file

bridrod 12-31-2009 01:32 PM

Quote:

Originally Posted by Dave_Devnull (Post 3810201)
Perhaps I'm missing something but what about something along these lines:

zgrep -e 'Refused sender' /var/log/mail.info
zgrep -e 'Refused sender' /var/log/mail.info*
zgrep -e 'Refused sender' /var/log/mail.info > your.output.file
zgrep -e 'Refused sender' /var/log/mail.info* > your.output.file

Cool. Thx. That's a start. In your example it's only going to grab "Refused sender". Definitely missing the other lines.

I really think "conditional pattern search" is very hard to accomplish.

-Rod

bridrod 12-31-2009 02:38 PM

Well, this is how far I was able to get...

Step#1 - Searching for pattern "Refused sender":

zgrep -e 'Refused sender' smtptest.txt
11:52:28 688 DMN: MSG 110724 Refused sender: email1@domain.com (4)
11:52:28 688 DMN: MSG 110725 Refused sender: email2@domain.com (4)

OR

awk '/Refused/ {print}' smtptest.txt
11:52:28 688 DMN: MSG 110724 Refused sender: email1@domain.com (4)
11:52:28 688 DMN: MSG 110725 Refused sender: email2@domain.com (4)

Step#2 - Extracting forth word ($4) and fifth word ($5) from the lines that match "Refused" pattern:

awk '/Refused/ {print $4,$5}' smtptest.txt
MSG 110724
MSG 110725
.
.
.

Next step is to figure out how to use the found strings i.e.: "MSG 110724" as my new pattern for AWK looking in smtptest.txt and output to another file (i.e.: smtptest2.txt). I suppose I need a loop script to use so that each string found in Step#2 will be used for searching, appending the result to smtptest2.txt.

-Rod

bridrod 12-31-2009 06:47 PM

Wow, I might be very close. So I need to read each line ($0) from file1 and use as search string in file2 and output to file3.

Anyone? :)

Dave_Devnull 01-01-2010 02:48 AM

For a match of one of those three phases simple regex with one of the grep family will do the job. Here zgrep with a wildcard will go through all the mail.info logs, including the .x.gz history/rotated.

zgrep -E "(Accepted connection|Refused sender|SMTP session ended)" /var/log/mail.info*
zgrep -E "(Accepted connection|Refused sender|SMTP session ended)" /var/log/mail.info* > your.output.file

I'm not entirely clear on what you are trying to do, so I'm limited in the advice I can offer. To do anything useful I'd need to be clear on the exact patterns you are seeking, any variables/circumstances in which to find them, the files you are seeking them in and what you want to do with the output.

Personally I prefer to use Perl for log churning, but you may be able to knock up what you want with (family)GREP, AWK or SED or a shell script.

Familiarise yourself with Regular Expressions - and you'll find your holy grail.
http://www.gsp.com/cgi-bin/man.cgi?s...=1&topic=zgrep

Good luck.

bridrod 01-01-2010 12:13 PM

Quote:

Originally Posted by Dave_Devnull (Post 3810597)
For a match of one of those three phases simple regex with one of the grep family will do the job. Here zgrep with a wildcard will go through all the mail.info logs, including the .x.gz history/rotated.

zgrep -E "(Accepted connection|Refused sender|SMTP session ended)" /var/log/mail.info*
zgrep -E "(Accepted connection|Refused sender|SMTP session ended)" /var/log/mail.info* > your.output.file

I'm not entirely clear on what you are trying to do, so I'm limited in the advice I can offer. To do anything useful I'd need to be clear on the exact patterns you are seeking, any variables/circumstances in which to find them, the files you are seeking them in and what you want to do with the output.

Personally I prefer to use Perl for log churning, but you may be able to knock up what you want with (family)GREP, AWK or SED or a shell script.

Familiarise yourself with Regular Expressions - and you'll find your holy grail.
http://www.gsp.com/cgi-bin/man.cgi?s...=1&topic=zgrep

Good luck.

Believe me. I've tried. For some people things digest so easily. Definitely not my case.

The reason I am trying a two step approach (extract the lines that have "refused sender" on them and then extracting the MSG ID of those lines so I can use the MSG ID to then search for any lines that contain that MSG ID) is because the original LOG (i merged the logs so I only have to look in one single file) contains good and bad connections (trojan) and I want to extract all the SMTP traffic related to the BAD connections. The only common thing identifying the connection is the MSG ID found in the "Refused sender" line. So my previous example:

09:37:31 058 DMN: MSG 14421742 Accepted connection: [192.268.1.100] (dns_for_the_machine)
09:37:31 058 DMN: MSG 14421742 Refused sender: email@domain.com (4)
09:37:31 058 DMN: MSG 14421742 SMTP session ended: [192.268.1.100] (dns_for_the_machine)

These are the lines identifying the BAD connections. First the connection is accepted, then it's refused and then SMTP is dropped.

So, for each new connection there is a new MSG ID (i.e.: MSG 14421742).

I am able to extract the line "Refused sender" only found on BAD connections. Now to extract the rest of the information for that connection I need to search for the MSG ID related to the "Refused server" line. So I would somehow now search for "MSG 14421742" and three lines would show up as shown above. Output that to a another file.

Loop it so it will digest the whole merged LOG and extract all the info for BAD connections.

Hope I made it clearer now and thanks for your suggestion.

Happy new Year!

-Rod

bridrod 01-04-2010 09:55 PM

I am still lost. I have been reading a lot and still haven't figured it out. "It seems" the solution is to use array (that alone is a can of worms!!!). I went thru many different samples and tutorials and I just can't make it work. Here is how my files are...

file1.txt:

110724
110725

file2.txt:

11:52:10 688 DMN: MSG 110722 Accepted connection: [172.31.35.203] (dc-1044a.central.nychhc.org)
11:52:10 688 DMN: MSG 110722 SMTP session ended: [172.31.35.203] (dc-1044a.central.domain.com)
11:52:13 688 DMN: MSG 110723 Accepted connection: [172.31.35.203] (dc-1044a.central.domain.com)
11:52:13 688 DMN: MSG 110723 SMTP session ended: [172.31.35.203] (dc-1044a.central.domain.com)
11:52:21 488 IMAP4 session ended: 172.23.75.28
11:52:28 688 DMN: MSG 110724 Accepted connection: [172.21.109.179] (160-9fl-fl.central.domain.com)
11:52:28 688 DMN: MSG 110724 Refused sender: email1@domain1.com (4)
11:52:28 688 DMN: MSG 110724 SMTP session ended: [172.21.109.179] (160-9fl-fl.central.domain.com)
11:52:28 688 DMN: MSG 110725 Accepted connection: [172.17.228.250] (kchc-sob265-pc6.kchdomain.local)
11:52:28 688 DMN: MSG 110725 Refused sender: email2@domain2.com (4)
11:52:28 688 DMN: MSG 110725 SMTP session ended: [172.17.228.250] (kchc-sob265-pc6.kchdomain.local)

Now, how I want my output file (file3.txt):

11:52:28 688 DMN: MSG 110724 Accepted connection: [172.21.109.179] (160-9fl-fl.central.domain.com)
11:52:28 688 DMN: MSG 110724 Refused sender: email1@domain1.com (4)
11:52:28 688 DMN: MSG 110724 SMTP session ended: [172.21.109.179] (160-9fl-fl.central.domain.com)
11:52:28 688 DMN: MSG 110725 Accepted connection: [172.17.228.250] (kchc-sob265-pc6.kchdomain.local)
11:52:28 688 DMN: MSG 110725 Refused sender: email2@domain2.com (4)
11:52:28 688 DMN: MSG 110725 SMTP session ended: [172.17.228.250] (kchc-sob265-pc6.kchdomain.local)

Now, I tried awk with while getline, if i++, and variants and can't get it to work. I even tried perl.

Anyone to shed a light, please?

chrism01 01-04-2010 10:09 PM

Code:

for id in $(cat file1.txt)
do
    grep $id file2.txt >file3.txt
done

assuming file1 = unique list of bad ids, file2 = orig big_bad_log and file3 = reqd output, IIUC.
Personally I'd have done it in one pass in Perl, much quicker processing, but for the moment, this sounds like a one-off request.

HTH

bridrod 01-05-2010 07:55 AM

Quote:

Originally Posted by chrism01 (Post 3814289)
Code:

for id in $(cat file1.txt)
do
    grep $id file2.txt >file3.txt
done

assuming file1 = unique list of bad ids, file2 = orig big_bad_log and file3 = reqd output, IIUC.
Personally I'd have done it in one pass in Perl, much quicker processing, but for the moment, this sounds like a one-off request.

HTH

Almost there! Thanks! But it seems its overwriting the results. Your assumption above is all correct, but I end up with only the last part of the file3.txt:

11:52:28 688 DMN: MSG 110725 Accepted connection: [172.17.228.250] (kchc-sob265-pc6.kchdomain.local)
11:52:28 688 DMN: MSG 110725 Refused sender: email2@domain2.com (4)
11:52:28 688 DMN: MSG 110725 SMTP session ended: [172.17.228.250] (kchc-sob265-pc6.kchdomain.local)

bridrod 01-05-2010 09:49 AM

Quote:

Originally Posted by bridrod (Post 3814847)
Almost there! Thanks! But it seems its overwriting the results. Your assumption above is all correct, but I end up with only the last part of the file3.txt:

11:52:28 688 DMN: MSG 110725 Accepted connection: [172.17.228.250] (kchc-sob265-pc6.kchdomain.local)
11:52:28 688 DMN: MSG 110725 Refused sender: email2@domain2.com (4)
11:52:28 688 DMN: MSG 110725 SMTP session ended: [172.17.228.250] (kchc-sob265-pc6.kchdomain.local)

Ok. Got it working. Slightly changing your code:

Code:

for id in $(cat file1.txt)
do
    grep $id file2.txt >>file3.txt
done

So an extra ">" made it append instead of overwrite! Thanks a bunch!

I am still curious to make it work with AWK. Seems like a wonderful tool!


All times are GMT -5. The time now is 05:44 AM.