LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 05-29-2007, 11:14 AM   #1
hedpe
Member
 
Registered: Jan 2005
Location: Boston, MA
Distribution: Debian
Posts: 380

Rep: Reputation: 30
help optimizing this SQL statement


Hi all,

I am looking for suggestions on optimizing this SQL statement:
Code:
    SELECT ip,sum(packets)
    FROM(
      (SELECT dst_ip AS ip,sum(dst_packets) AS packets 
              FROM flows
              WHERE interval='2005-02-01 00:00:00'
              GROUP BY dst_ip) 
      UNION ALL
      (SELECT src_ip AS ip,sum(src_packets) AS packets 
              FROM flows
              WHERE interval='2005-02-01 00:00:00'
              GROUP BY src_ip) )
    AS stats
    GROUP BY stats.ip
    HAVING sum(packets)>0
    ORDER BY sum(packets) DESC
There must be some way to combine the two subqueries in to a single query.

Here is the basic gist of what is happening...

I have flow level records, where I am interested in finding the number of packets sent by IP addresses, and the format is:
Code:
src_ip   dst_ip   src_packets   dst_packets
500        600        1                      2
700        500        1                      2
Such that src_ip sent src_packets and dst_ip sent dst_packets. Therefore, the result of my current query is:
Code:
500 3
600 2
700 1
The first subquery adds adds up the packet counts for the destination IPs, and the second subquery adds up the packet counts for the source IPs. Then the top level query sums the union.

Theres got to be a way to optimize this to perform it in one pass through the data, since it is hundreds of thousands of rows it does make a difference.

I'd greatly appreciate any suggestions

Thanks!
George

Last edited by hedpe; 05-29-2007 at 11:15 AM.
 
Old 05-30-2007, 07:06 AM   #2
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Rocky 9.2
Posts: 18,359

Rep: Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751
I'm not entirely clear what you want (esp as you need a table name after 'FROM'),
but it sound's like you want sums of pkts grouped & ordered by IP addresses (assuming those are supposed to be ip addresses...)

SELECT src_ip, dst_ip, sum(src_kts), sum(dst_pkts)
FROM flows
GROUP BY src_ip, dst_ip
ORDER BY src_ip, dst_ip

IME, by the time the SQL engine has done GROUP BY, it ends up ordered the same way unless you specify otherwise, so ORDER BY is redundant here...

HTH
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
SQL statement to get the last row in a table oulevon Programming 11 04-03-2009 08:30 PM
SQL - getting the primary key autogenerated of an INSERT statement vharishankar Programming 3 12-05-2005 02:47 AM
help with simple sql statement sekelsenmat Programming 6 08-06-2005 12:01 PM
mysql sql statement help mrtwice Programming 4 12-02-2003 10:04 AM
SQL statement glj Programming 1 10-12-2001 09:29 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 11:50 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration