LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 07-31-2013, 10:01 AM   #1
szboardstretcher
Senior Member
 
Registered: Aug 2006
Location: Detroit, MI
Distribution: GNU/Linux systemd
Posts: 4,278

Rep: Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694
Parse through apache log and output lines based on desination domain


We are using the combined log format, and each line has a destination and a referrer URL as per usual.

I need to go through the file and put all the requests for each specific domain, into specific files named after the domain.

So, if there are 2,000 requests to www.mysite.com, I want to pull all of those out and put them into www.mysite.com.log. If there are 4,000 requests to widgets.other.site.com, I want to pull all of those out and put them into widgets.other.site.com.log.

Ive currently got this psuedo-plan, to apply to each line of the MASSIVE log:

* grep for the first domain listed in each line
* write domain into a file (if not already exist in file)
* massage file for any weirdness

when thats all done parsing I should have a file with a list of domains, that I can put into a grep "for each" loop against the original MASSIVE file and output to a file named after the domain being searched for.

Sort that by time stamp and done.

Any suggestions on this? Pointers?
 
Old 07-31-2013, 10:42 AM   #2
joe_2000
Senior Member
 
Registered: Jul 2012
Location: Aachen, Germany
Distribution: Void, Debian
Posts: 1,016

Rep: Reputation: 308Reputation: 308Reputation: 308Reputation: 308
Is that something you need to do only once, or do you foresee a recurring need for this? Because in the latter case I would refrain from this dual-stage process that involves manual editing of the intermediate file.

What is the background of your need? On my sites I am using self-written logging scripts. (Basically a php script that I include on every page). It gives me a much more useful output because I can write out what I need, in the format that I need it to be, and cut all the stuff I don't need, which reduces log file sizes significantly
 
Old 07-31-2013, 10:53 AM   #3
szboardstretcher
Senior Member
 
Registered: Aug 2006
Location: Detroit, MI
Distribution: GNU/Linux systemd
Posts: 4,278

Original Poster
Rep: Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694
Herr Joe_2000!

Its a one time thing. 20G of data. I've already got it half-way there.

I'll post my history in case anyone ever has to do something like this again. I'd certainly love to see it done in php/python if anyone has the time!
 
Old 07-31-2013, 11:01 AM   #4
Firerat
Senior Member
 
Registered: Oct 2008
Distribution: Debian sid
Posts: 2,683

Rep: Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783
use awk,

assuming field 4 to be www_somesite_com

Code:
awk '{print $0 >> $4".log"}' infile

you can get awk to do all the "massage file for any weirdness" before the print statement

We would need sample input and desired output to be of any more help to you.

Last edited by Firerat; 07-31-2013 at 11:02 AM. Reason: typo, ".log" not ."log"
 
Old 07-31-2013, 11:27 AM   #5
joe_2000
Senior Member
 
Registered: Jul 2012
Location: Aachen, Germany
Distribution: Void, Debian
Posts: 1,016

Rep: Reputation: 308Reputation: 308Reputation: 308Reputation: 308
Quote:
Originally Posted by Firerat View Post
use awk,

assuming field 4 to be www_somesite_com

Code:
awk '{print $0 >> $4".log"}' infile
Yes, agree, that's exactly what you should do. Although depending on how you want to sort the data (one file per domain vs. one file per page) it might become a bit more complex. But basically just recognize the domain from the current line and write to the corresponding log file in append mode, is what it comes down to...
 
Old 07-31-2013, 11:54 AM   #6
Firerat
Senior Member
 
Registered: Oct 2008
Distribution: Debian sid
Posts: 2,683

Rep: Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783
yeah, well lets say $4 == "www.foo.bar/foo/page/files/file.html"

Code:
awk '{LogFile = $4;sub(/\/.+/,".log",LogFile);print $0 >> LogFile}' infile
Edit probably don't need $0, just a habit I have

Last edited by Firerat; 07-31-2013 at 11:56 AM.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Re-configuring from a domain name to IP based reference in a Apache server j0sh-linux Linux - Server 1 11-08-2012 06:03 AM
Parse log file for errors and fetch variable number of lines before and after error jgm27 Linux - Newbie 4 07-18-2012 05:26 PM
BASH - parse apache log for requests per CIDR /24 Scruff Programming 3 02-06-2011 01:38 PM
Parse error: parse error, unexpected $ in /home/content/d/o/m/domain/html/addpuppy2.p Scooby-Doo Programming 3 10-25-2007 09:41 AM
Parse lines need from /var/log/message but excluding... grant-skywalker Linux - General 8 03-20-2007 02:30 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 11:42 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration