parsing out squid access log with awk and grep
I'm trying to recreate a simple script I wrote to parse out the access.log to get a rough idea of websites that users are going to on our corp network. The issue I'm having is I want to pull out any line from access.log that ends in .com/ .org/ .net/ or whatever to only see what the user entered into the address bar and drop pictures, js's and everything else and log only this.
so what I do is : awk '{print $8} | grep -e '[cong]|[ore]|[mgtv][/]'$ and nothing happens. I know there is an easier way to do this with awk alone, . . . anyone? Thx |
Code:
ruby -ne 'print if /\.(com|net|org)$/' access.log |
Quote:
Here's the line that is being spit out from the access.log after I awk out the 7th field: Quote:
|
Quote:
Code:
ruby -ne 'print if /\.(com|net|org)\/$/' access.log |
The problem in the grep command
Code:
grep -e '[cong]|[ore]|[mgtv][/]' Code:
awk '$8 ~ /\.com|\.net|\.org/{print $8}' |
So I am confused on 2 fronts here (easily done some times):
1. You start by referring to field 8 but then in post #3 you talk about the 7th field? 2. You state the following: Quote:
Maybe you could show some of the log so we can ascertain exactly which field you are referring to and then which part of that field are you interested in? |
Guys, I appreciate all of the help! I'm sorry this has been a bit of a flustercluck from the beginning. I have solved it, and with your help! The access.log from squid looks like this:
Code:
1303632736.387 121 192.168.4.12 TCP_MISS/200 537 GET http://packages.linuxmint.com/dists/julia/Release.gpg - DIRECT/80.86. I appreciate the help guys! I've been avoiding regex, awk, and sed for a while now, only using it minimally, and unfortunatly, I get confused. Thanks again! |
Quote:
|
Well there is no need to use grep if you are also using awk as it has most of the functionality built in as well.
colucix's example is the one I would go for, if using awk, although it will give you the entire field and your explanation still has me asking if you want it all or just up until .com, .net, etc, ie. up until the first slash after http://. If the above is desired, you could easily use split: Code:
awk '$7 ~ /\.(com|net|org)/{split($7,f,"/");print f[3]}' file |
All times are GMT -5. The time now is 01:52 AM. |