LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   SQL - Get the correct data (https://www.linuxquestions.org/questions/programming-9/sql-get-the-correct-data-4175452185/)

Ramurd 03-01-2013 12:54 AM

SQL - Get the correct data
 
I admit it; my SQL is way too rusty for these queries...

The situation is simple, and actually something I recently thought up and started to implement.
We have a bunch of (apache) proxy servers, and from those I load the logs into my database with a simple perl script that's gonna replace me :-)

I took the configuration from the conf.d/*.conf files and extracted the <Location /bla/bla/bla> and stored those in my table 'configured_locations' (configured_locations.location is varchar(255) )

I also have a table logs that's actually a bit more complex in setup it references a bunch of other tables and to make that "human readable" I designed this view:
Code:

            View "public.showlog"
  Column  |          Type          | Modifiers
-----------+------------------------+-----------
 date      | date                  |
 hour      | smallint              |
 minute    | smallint              |
 second    | smallint              |
 address  | inet                  |
 username  | character varying(128) |
 location  | character varying(512) |
 response  | integer                |
 duration  | integer                |
 method    | character varying(20)  |
 proxyname | character varying(20)  |
 envir    | character varying(5)  |
View definition:
 SELECT logs.date, logs.hour, logs.minute, logs.second, source.address, users.username, locations.location, logs.response, logs.duration, methods.method, proxies.pr
oxyname, environments.environment_short_name AS envir
  FROM logs
  JOIN source ON logs.source = source.id
  JOIN users ON logs.userid = users.id
  JOIN locations ON logs.location = locations.id
  JOIN methods ON logs.method = methods.id
  JOIN proxies ON logs.proxy = proxies.id
  JOIN environmental_proxies ON proxies.id = environmental_proxies.proxyid
  JOIN environments ON environmental_proxies.environment_id = environments.id
  ORDER BY logs.date, logs.hour, logs.minute, logs.second;

locations in the view is a reference to a different "location" table, and ought to be more aptly named 'uri' as they appear in the logfile.

Now I want to query how much each location as defined in the configured_locations.location appears "like" in the logs in a specified period (using showlog.date)

I thought this query would tell me that, but the outcome must be wrong as I have about 8M records in my table and the sum of the results was 420M :-)
Code:

select configured_locations.location,count(0)
from configured_locations,showlog
where configured_locations.location like showlog.location || '%'
and showlog.date between '2013-02-01' and '2013-02-28'
and showlog.envir = 'prd'
group by configured_locations.location;

I was hoping this query would help me give the usage of each "location" for the environment "production" over the month februari... It doesn't... it does something else, but I'm not exactly sure what :-)

Probably not really necessary to tell, but we're talking a PostgreSQL 8.4 database.

AnanthaP 03-01-2013 02:13 AM

I would put the showlog matching in a sub-select thus:
Quote:

Select location,count(0)
from configured_locations
where location in
(select location from showlog where date between '2013-02-01' and '2013-02-28'
and envir = 'prd'
)
order by location;
OK

Ramurd 03-01-2013 02:43 AM

Thanks for the quick reply; Had to alter it a bit, as the query wanted a group-by... But this query does not give what I expected; Maybe I ought to clarify a bit better:

Given these configured_locations.location:
/site1/service1
/site2/service1
/site3/service2

And these showlog.locations (where envir is not that relevant)
/site1/service1?status
/site1/service1/page1.html
/site2/service1/service
/site3/service2/service4
/site3/service2/service
/site3/service2?status

I'd like to do the query that will give these results:

/site1/service1 2
/site2/service1 1
/site3/service2 3


So how often (within a given period) each configured_location "appears" as a substring of the location string in showlog

The query you gave does some sort of counting of how often a location appears in configured_locations where they appear in the log; but the count is on the amount of hits in the configured_locations table.

Ramurd 03-01-2013 03:49 AM

I think I may have the correct query; not sure if this is "it" though:

Code:

Select configured_locations.location,count(0)
from showlog, configured_locations
where showlog.location like configured_locations.location || '%'
and showlog.date between '2013-02-01' and '2013-02-28'
and showlog.envir = 'prd'
group by configured_locations.location
order by location;


Ramurd 03-01-2013 04:13 AM

Got close; have yet to determine a way to fix a few things

The query I came up with is now as follows:
Code:

Select configured_locations.location,count(0)
from showlog, configured_locations
where showlog.location like configured_locations.location || '%'
and showlog.date between '2013-02-01' and '2013-02-28'
and showlog.envir = 'prd'
and configured_locations.proxyid in
(
        select proxyid
        from environmental_proxies
        where environment_id in
        (
                select id from environments where environment_short_name='prd'
        )
)
group by configured_locations.location
order by location;

However, it logically is not correct (yet) as
/site1/service1
/site1/service12

would both match the like 'site1/service1%', besides: having N proxies in the given environment, multiplies the outcome with factor N; at least: that's what the data looks like right now.
Gonna think about how to overcome these things ;-)


All times are GMT -5. The time now is 01:52 PM.