Filtering a CSV file from web log with shell script?
I have been working on this project on and off for work for a few months and it's driving me nuts!
I basically have a semi-consistent CSV file from a Microsoft IIS web log. Here is a small snippet of what the log file looks like: Code:
2007-01-01,23:30:00,/main/page-1.html,192.168.1.10,, Code:
2007-01-01,23:30:00,/main/page-1.html,192.168.1.10,john@doe.com Is this possible to fully automate in a shell script? |
Quote:
http://www.gnu.org/manual/gawk/html_...ield-Splitting |
I have actually been using sed and awk, but my understanding of awk is very slim. I can post my shell script later when I am work so you guys can see what I have been able to do and not do.
|
To recap, here's what my CSV files looks like
Code:
2007-01-01,23:30:00,/main/page-1.html,192.168.1.10,, My approach: step 1) extract the IP address of the CSV file which has an email address AND "page-" in the URL since those are the two main things I am looking for step 2) run this iplist against the CSV to further filter the list and somehow stick the email address at the end step 3) somehow take the first and last line per IP address and VOILA - all done. Code:
#!/bin/sh Code:
2007-01-01,23:30:00,/main/page-1.html,192.168.1.10,john@doe.com |
not sure if this will work universally for your needs:
Code:
awk -F , '{print $1","$2","$3","$NF}' ip.lst | grep @ |
Unfortunately that won't work because the email field isn't always in the last line in the original CSV file. :(
I somehow need a way to hold the email in a temporary variable/file, and then stick it in the end to the corresponding IP address. Any ideas? |
this is quick-and-dirty but i think it'll work:
Code:
grep @ ip.lst | awk -F , '{print $1","$2","$3","}' > grep-awk.out |
I give up with this project. :( I posted an ad on Craigslist hoping that someone can finish this project for me since my company will pay for it. I probably spent over 20-30 hours on this and I haven't gotten anywhere! So frustrating.
Not sure if this is okay to ask, but if anyone is interested PM me. |
So, re-reading your OP, you want to add an email to a line if you've previously seen a line with the same IP and an email.
You also change the associated email for all subsequent lines if a new email appears (see john & jack both using ...10). If you never see an email for a given IP, you don't want to see that line at all in the output. IOW, only lines that (after checking) have an email are reported. Is this correct? Personally I'd use Perl. Does it have to be shell? |
All times are GMT -5. The time now is 04:18 AM. |