SQL - Get the correct data
I admit it; my SQL is way too rusty for these queries...
The situation is simple, and actually something I recently thought up and started to implement.
We have a bunch of (apache) proxy servers, and from those I load the logs into my database with a simple perl script that's gonna replace me :-)
I took the configuration from the conf.d/*.conf files and extracted the <Location /bla/bla/bla> and stored those in my table 'configured_locations' (configured_locations.location is varchar(255) )
I also have a table logs that's actually a bit more complex in setup it references a bunch of other tables and to make that "human readable" I designed this view:
Now I want to query how much each location as defined in the configured_locations.location appears "like" in the logs in a specified period (using showlog.date)
I thought this query would tell me that, but the outcome must be wrong as I have about 8M records in my table and the sum of the results was 420M :-)
Probably not really necessary to tell, but we're talking a PostgreSQL 8.4 database.
I would put the showlog matching in a sub-select thus:
Thanks for the quick reply; Had to alter it a bit, as the query wanted a group-by... But this query does not give what I expected; Maybe I ought to clarify a bit better:
Given these configured_locations.location:
And these showlog.locations (where envir is not that relevant)
I'd like to do the query that will give these results:
So how often (within a given period) each configured_location "appears" as a substring of the location string in showlog
The query you gave does some sort of counting of how often a location appears in configured_locations where they appear in the log; but the count is on the amount of hits in the configured_locations table.
I think I may have the correct query; not sure if this is "it" though:
Got close; have yet to determine a way to fix a few things
The query I came up with is now as follows:
would both match the like 'site1/service1%', besides: having N proxies in the given environment, multiplies the outcome with factor N; at least: that's what the data looks like right now.
Gonna think about how to overcome these things ;-)
|All times are GMT -5. The time now is 10:21 AM.|