I admit it; my SQL is way too rusty for these queries...
The situation is simple, and actually something I recently thought up and started to implement.
We have a bunch of (apache) proxy servers, and from those I load the logs into my database with a simple perl script that's gonna replace me :-)
I took the configuration from the conf.d/*.conf files and extracted the <Location /bla/bla/bla> and stored those in my table 'configured_locations' (configured_locations.location is varchar(255) )
I also have a table logs that's actually a bit more complex in setup it references a bunch of other tables and to make that "human readable" I designed this view:
Code:
View "public.showlog"
Column | Type | Modifiers
-----------+------------------------+-----------
date | date |
hour | smallint |
minute | smallint |
second | smallint |
address | inet |
username | character varying(128) |
location | character varying(512) |
response | integer |
duration | integer |
method | character varying(20) |
proxyname | character varying(20) |
envir | character varying(5) |
View definition:
SELECT logs.date, logs.hour, logs.minute, logs.second, source.address, users.username, locations.location, logs.response, logs.duration, methods.method, proxies.pr
oxyname, environments.environment_short_name AS envir
FROM logs
JOIN source ON logs.source = source.id
JOIN users ON logs.userid = users.id
JOIN locations ON logs.location = locations.id
JOIN methods ON logs.method = methods.id
JOIN proxies ON logs.proxy = proxies.id
JOIN environmental_proxies ON proxies.id = environmental_proxies.proxyid
JOIN environments ON environmental_proxies.environment_id = environments.id
ORDER BY logs.date, logs.hour, logs.minute, logs.second;
locations in the view is a reference to a different "location" table, and ought to be more aptly named 'uri' as they appear in the logfile.
Now I want to query how much each location as defined in the configured_locations.location appears "like" in the logs in a specified period (using showlog.date)
I thought this query would tell me that, but the outcome must be wrong as I have about 8M records in my table and the sum of the results was 420M :-)
Code:
select configured_locations.location,count(0)
from configured_locations,showlog
where configured_locations.location like showlog.location || '%'
and showlog.date between '2013-02-01' and '2013-02-28'
and showlog.envir = 'prd'
group by configured_locations.location;
I was hoping this query would help me give the usage of each "location" for the environment "production" over the month februari... It doesn't... it does something else, but I'm not exactly sure what :-)
Probably not really necessary to tell, but we're talking a PostgreSQL 8.4 database.