LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices



Reply
 
Search this Thread
Old 03-01-2013, 01:54 AM   #1
Ramurd
Member
 
Registered: Mar 2009
Location: Rotterdam, the Netherlands
Distribution: Slackwarelinux
Posts: 555

Rep: Reputation: 75
SQL - Get the correct data


I admit it; my SQL is way too rusty for these queries...

The situation is simple, and actually something I recently thought up and started to implement.
We have a bunch of (apache) proxy servers, and from those I load the logs into my database with a simple perl script that's gonna replace me :-)

I took the configuration from the conf.d/*.conf files and extracted the <Location /bla/bla/bla> and stored those in my table 'configured_locations' (configured_locations.location is varchar(255) )

I also have a table logs that's actually a bit more complex in setup it references a bunch of other tables and to make that "human readable" I designed this view:
Code:
             View "public.showlog"
  Column   |          Type          | Modifiers 
-----------+------------------------+-----------
 date      | date                   | 
 hour      | smallint               | 
 minute    | smallint               | 
 second    | smallint               | 
 address   | inet                   | 
 username  | character varying(128) | 
 location  | character varying(512) | 
 response  | integer                | 
 duration  | integer                | 
 method    | character varying(20)  | 
 proxyname | character varying(20)  | 
 envir     | character varying(5)   | 
View definition:
 SELECT logs.date, logs.hour, logs.minute, logs.second, source.address, users.username, locations.location, logs.response, logs.duration, methods.method, proxies.pr
oxyname, environments.environment_short_name AS envir
   FROM logs
   JOIN source ON logs.source = source.id
   JOIN users ON logs.userid = users.id
   JOIN locations ON logs.location = locations.id
   JOIN methods ON logs.method = methods.id
   JOIN proxies ON logs.proxy = proxies.id
   JOIN environmental_proxies ON proxies.id = environmental_proxies.proxyid
   JOIN environments ON environmental_proxies.environment_id = environments.id
  ORDER BY logs.date, logs.hour, logs.minute, logs.second;
locations in the view is a reference to a different "location" table, and ought to be more aptly named 'uri' as they appear in the logfile.

Now I want to query how much each location as defined in the configured_locations.location appears "like" in the logs in a specified period (using showlog.date)

I thought this query would tell me that, but the outcome must be wrong as I have about 8M records in my table and the sum of the results was 420M :-)
Code:
select configured_locations.location,count(0)
from configured_locations,showlog
where configured_locations.location like showlog.location || '%'
and showlog.date between '2013-02-01' and '2013-02-28'
and showlog.envir = 'prd'
group by configured_locations.location;
I was hoping this query would help me give the usage of each "location" for the environment "production" over the month februari... It doesn't... it does something else, but I'm not exactly sure what :-)

Probably not really necessary to tell, but we're talking a PostgreSQL 8.4 database.
 
Old 03-01-2013, 03:13 AM   #2
AnanthaP
Member
 
Registered: Jul 2004
Location: Chennai, India
Distribution: UBUNTU 5.10 since Jul-18,2006 on Intel 820 DC
Posts: 659

Rep: Reputation: 160Reputation: 160
I would put the showlog matching in a sub-select thus:
Quote:
Select location,count(0)
from configured_locations
where location in
(select location from showlog where date between '2013-02-01' and '2013-02-28'
and envir = 'prd'
)
order by location;
OK
 
Old 03-01-2013, 03:43 AM   #3
Ramurd
Member
 
Registered: Mar 2009
Location: Rotterdam, the Netherlands
Distribution: Slackwarelinux
Posts: 555

Original Poster
Rep: Reputation: 75
Thanks for the quick reply; Had to alter it a bit, as the query wanted a group-by... But this query does not give what I expected; Maybe I ought to clarify a bit better:

Given these configured_locations.location:
/site1/service1
/site2/service1
/site3/service2

And these showlog.locations (where envir is not that relevant)
/site1/service1?status
/site1/service1/page1.html
/site2/service1/service
/site3/service2/service4
/site3/service2/service
/site3/service2?status

I'd like to do the query that will give these results:

/site1/service1 2
/site2/service1 1
/site3/service2 3


So how often (within a given period) each configured_location "appears" as a substring of the location string in showlog

The query you gave does some sort of counting of how often a location appears in configured_locations where they appear in the log; but the count is on the amount of hits in the configured_locations table.

Last edited by Ramurd; 03-01-2013 at 03:47 AM.
 
Old 03-01-2013, 04:49 AM   #4
Ramurd
Member
 
Registered: Mar 2009
Location: Rotterdam, the Netherlands
Distribution: Slackwarelinux
Posts: 555

Original Poster
Rep: Reputation: 75
I think I may have the correct query; not sure if this is "it" though:

Code:
Select configured_locations.location,count(0)
from showlog, configured_locations
where showlog.location like configured_locations.location || '%'
and showlog.date between '2013-02-01' and '2013-02-28'
and showlog.envir = 'prd'
group by configured_locations.location
order by location;
 
Old 03-01-2013, 05:13 AM   #5
Ramurd
Member
 
Registered: Mar 2009
Location: Rotterdam, the Netherlands
Distribution: Slackwarelinux
Posts: 555

Original Poster
Rep: Reputation: 75
Got close; have yet to determine a way to fix a few things

The query I came up with is now as follows:
Code:
Select configured_locations.location,count(0)
from showlog, configured_locations
where showlog.location like configured_locations.location || '%'
and showlog.date between '2013-02-01' and '2013-02-28'
and showlog.envir = 'prd'
and configured_locations.proxyid in 
(
	select proxyid 
	from environmental_proxies 
	where environment_id in 
	(
		select id from environments where environment_short_name='prd'
	)
)
group by configured_locations.location
order by location;
However, it logically is not correct (yet) as
/site1/service1
/site1/service12

would both match the like 'site1/service1%', besides: having N proxies in the given environment, multiplies the outcome with factor N; at least: that's what the data looks like right now.
Gonna think about how to overcome these things ;-)
 
  


Reply

Tags
sql


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] SQL - get data filtered by conditions in that same table Ramurd Programming 4 01-08-2013 06:20 AM
LXer: ITTIA DB SQL Data Distribution Bridges QNX(R) with SQL Server LXer Syndicated Linux News 0 03-14-2012 01:21 AM
Recovering SQL data clamtwa Linux - Newbie 1 01-18-2010 02:19 AM
how to dump an sql table only data nephish Linux - Server 2 12-27-2007 02:55 PM
put_user doesn't pass the correct data rojak Linux - General 0 08-15-2006 05:03 AM


All times are GMT -5. The time now is 09:07 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration