LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - General (https://www.linuxquestions.org/questions/linux-general-1/)
-   -   Retrieving FQDN from squid access.log (https://www.linuxquestions.org/questions/linux-general-1/retrieving-fqdn-from-squid-access-log-768241/)

LinuxGold 11-10-2009 02:23 PM

Retrieving FQDN from squid access.log
 
I am trying to figure out a way to retrieve FQDN from access.log i.e. for http(s)://www.cnn.com/whever/complex/line/this/might/be I would like output to be:

www.cnn.com

Here is what my access.log contain:

Code:

root@cachepilot:/var/log/squid# tail /var/log/squid/access.log
1257884141.586    119 10.182.16.205 TCP_MISS/200 2423 GET http://t0.gstatic.com/images?q=tbn:B3386mi19hBkMM:http://www.artistryofiron.com/Images/Cutouts/lizzard.jpg - DEFAULT_PARENT/wwwproxy.k12.de.us image/jpeg
1257884141.618    176 10.182.16.205 TCP_MISS/200 5745 GET http://t1.gstatic.com/images?q=tbn:DLRVN_v-bM3PsM:http://lounginlizzard.com/store/images/llhammockchair.jpg - DEFAULT_PARENT/wwwproxy.k12.de.us image/jpeg
1257884141.736    117 10.182.16.205 TCP_MISS/200 4045 GET http://t2.gstatic.com/images?q=tbn:UHSmX_XQMcMzOM:http://1.bp.blogspot.com/_ks1ZRojRD8k/SJJTVhimFVI/AAAAAAAAAiE/bhpsac_DHpY/s400/alien_lizzard.jpg - DEFAULT_PARENT/wwwproxy.k12.de.us image/jpeg
1257884141.736      0 10.182.16.205 TCP_NEGATIVE_HIT/204 370 GET http://clients1.google.com/generate_204 - NONE/- text/html
1257884141.756    169 10.182.16.205 TCP_MISS/200 3743 GET http://t1.gstatic.com/images?q=tbn:pMfVy6iLu7pjPM:http://www.buchinger.or.at/pic/animals/20050101-Lizzard-Nakkuru.jpg - DEFAULT_PARENT/wwwproxy.k12.de.us image/jpeg
1257884141.765    192 10.182.16.205 TCP_MISS/200 4775 GET http://t1.gstatic.com/images?q=tbn:zXJFTlypl8JCVM:http://bp1.blogger.com/_RloTrSWsm7A/RipCKwIS3_I/AAAAAAAAAGc/pXo0bLfKV-Q/s400/caiman_lizzard_pantanal-FBAL.jpg - DEFAULT_PARENT/wwwproxy.k12.de.us image/jpeg
1257884141.826    269 10.182.16.205 TCP_MISS/200 6589 GET http://t3.gstatic.com/images?q=tbn:87-PS67EHxfMJM:http://nmsouthernskies.com/PhotoHiRes/Lizzard.JPG - DEFAULT_PARENT/wwwproxy.k12.de.us image/jpeg
1257884141.978    152 10.182.16.205 TCP_MISS/204 452 GET http://images.google.com/csi?v=3&s=images&action=&e=17259,21329,21517,21766,22107,22712&ei=Acr5St7BNIKV8Abl9pTSDA&rt=prt.422,xjs.516,ol.1703 - DEFAULT_PARENT/wwwproxy.k12.de.us text/html
1257884144.617    64 10.182.16.156 TCP_MISS/304 452 GET http://www.etymonline.com/style.css - DEFAULT_PARENT/wwwproxy.k12.de.us text/css
1257884144.618    59 10.182.16.156 TCP_MISS/304 457 GET http://www.etymonline.com/graphics/header.jpg - DEFAULT_PARENT/wwwproxy.k12.de.us image/jpeg

Here is my current script that I work on so far -- it is hair-pulling experience:
Code:

tail /var/log/squid/access.log |
awk '{print $1,$3,$7}' |
while read line; do
echo $line
time=$(echo "$line" | cut -d ' ' -f 1)
ip=$(echo "$line" | cut -d ' ' -f 2)
url=$(echo "$line" | cut -d ' ' -f 3)
fqdn=$(echo "$url" | cut -d '/' -f 2)
echo "Time: $time $ip => $fqdn"
done

The output is as follows:

Code:

root@cachepilot:/var/log/squid# tail /var/log/squid/access.log |
> awk '{print $1,$3,$7}' |
> while read line; do
> echo $line
> time=$(echo "$line" | cut -d ' ' -f 1)
> ip=$(echo "$line" | cut -d ' ' -f 2)
> url=$(echo "$line" | cut -d ' ' -f 3)
> fqdn=$(echo "$url" | cut -d '/' -f 2)
> echo "Time: $time $ip => $fqdn"
> done
1257884402.710 10.182.16.156 http://www.etymonline.com/style.css
Time: 1257884402.710 10.182.16.156 =>
1257884402.750 10.182.16.153 http://www.schoolnotes.com/cgi-bin/notesupdate-new.pl
Time: 1257884402.750 10.182.16.153 =>
1257884403.135 10.182.16.153 http://clk.atdmt.com/go/175199563/direct;vt.1;wi.160;hi.600;ai.131775080;ct.d;ea.364/01/
Time: 1257884403.135 10.182.16.153 =>
1257884404.009 10.182.16.97 http://64.12.161.103/aim/fetchEvents?aimsid=088.1218851095.1557230312:cchowe&seqNum=10&rnd=1257884404.436330&timeout=20000&r=84&k=ke1KS-K6BdPDaRnC&f=json&a=%252FwQAAAAAAABBSgVh6p%252BakWe8%252FHCb%252F3YAGJT5VgU26pjC3sle9NGn0XL11zN9gHdCK32tHIf1OMCIJL%252B7B6cabYQnU0BHtC2HFXsNP5h2%252BsbgdAcT0lc2q5o%252FoVloSsXKXsJD1S5YUOHl5KamubP1a6QUEdbGRfrxr6nrkw%253D%253D&dojo.preventCache=1257884379785&c=dojo.io.script.jsonp_dojoIoScript85._jsonpCallback
Time: 1257884404.009 10.182.16.97 =>

I am trying to output the domain name after "=>" i.e. on the last line it should be :
Code:

Time: 1257884404.009 10.182.16.97 => 64.12.161.103
Any suggestions?

kbp 11-10-2009 03:32 PM

Hey,

This is a bit hacky, it relies on the greed of the first '.*' .. but it may help

Code:

sed 's/.*http[s]*:\/\/\([^/]*\).*/\1/' /path/to/file
cheers


All times are GMT -5. The time now is 08:56 PM.