We are using the combined log format, and each line has a destination and a referrer URL as per usual.
I need to go through the file and put all the requests for each specific domain, into specific files named after the domain.
So, if there are 2,000 requests to www.mysite.com
, I want to pull all of those out and put them into www.mysite.com.log
. If there are 4,000 requests to widgets.other.site.com, I want to pull all of those out and put them into widgets.other.site.com.log.
Ive currently got this psuedo-plan, to apply to each line of the MASSIVE log:
* grep for the first domain listed in each line
* write domain into a file (if not already exist in file)
* massage file for any weirdness
when thats all done parsing I should have a file with a list of domains, that I can put into a grep "for each" loop against the original MASSIVE file and output to a file named after the domain being searched for.
Sort that by time stamp and done.
Any suggestions on this? Pointers?