LinuxQuestions.org
Visit the LQ Articles and Editorials section
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 12-31-2012, 10:20 AM   #1
Ramurd
Member
 
Registered: Mar 2009
Location: Rotterdam, the Netherlands
Distribution: Slackwarelinux
Posts: 555

Rep: Reputation: 75
Question SQL - get data filtered by conditions in that same table


Hi folks,

I hope you'll be able to help me solve this issue, that I'm trying to solve:
We have a MySQL (don't ask) database with a bunch of tables. One of these tables is growing a bit large, and I wish to put all records older than 3 months into another table (I'm going to use select into backuptable once I'm sure I have the correct data here)

The table that is growing too large is permtable, which holds quite some fields for each record; Among which are start_time (which holds the date/time when the record was inserted), event (which is 'C' for each record in a "good" situation, and another value in any other situation),group_id which combines the records that belong together and should be seen as a whole and repair_id which is set when an attempt was made to repair the non-good situation.

I wish to extract from this table the dataset that covers these constraints:
- all records older than 3 months
- where all records with the same group_id are C, or in the case they're not C have repair_id set.

The query I thought of was this one:
Code:
select * from permtable p1 -- get all
where p1.start_time < DATE(NOW() - INTERVAL 3 MONTH) -- start time older than 3 months
and not exists (select * from permtable p2 where p2.event <> 'C' and p2.repair_id is null and p2.group_id = p1.group_id) -- and are not C AND have no repair_id
but it does not work nice with that group_id; I do keep getting records where there is one record with a non-C event and a null-value on repair_id. Where am I going wrong?

I also had another query, but that one fails due to deadlocking:
Code:
select * from permtable
where start_time < DATE(NOW() - INTERVAL 3 MONTH)
and group_id not in (select group_id from permtable where event <> 'C' and repair_id is null)
Also, even if it would not dead-lock, the full tablescans make this query far from optimal, but it does paint the logic I was trying to implement.

Anyone who can help out pointing out what I'm doing wrong?
 
Old 12-31-2012, 11:33 AM   #2
Pearlseattle
Member
 
Registered: Aug 2007
Location: Switzerland
Distribution: Gentoo
Posts: 716

Rep: Reputation: 81
Hi
Sure that #2...
Code:
select * from permtable
where start_time < DATE(NOW() - INTERVAL 3 MONTH)
and group_id not in (select group_id from permtable where event <> 'C' and repair_id is null)
... is really deadlocking? Don't have much experience with MySQL (usually use Oracle) but it looks perfectly normal to me.

As an alternative, did you try to use an inline view (hopefully supported in MySQL) for #2 like...
Code:
select * from
permtable p1,
(select group_id from permtable where event <> 'C' and repair_id is null) p2
where p1.start_time < DATE(NOW() - INTERVAL 3 MONTH)
and p1.group_id <> p2.group_id
Btw., should the date range be valid as well for the subquery/inline view?
 
Old 01-03-2013, 04:33 AM   #3
Ramurd
Member
 
Registered: Mar 2009
Location: Rotterdam, the Netherlands
Distribution: Slackwarelinux
Posts: 555

Original Poster
Rep: Reputation: 75
Hi, thanks for the quick response! (And a happy 2013! :-))

The query might not be deadlocking, after all; it's just too slow and doing too many full table scans to be manageable. Haven't tried inline views (yet), it is actually one of the options I am looking into (not sure whether MySQL supports them or not, and what the implications might be... who knows)

I'm currently trying out your second query. For figures: the table holds about 10M records, the subset < DATE(NOW() - INTERVAL 3 MONTH) yields 800K records; Just an indication about how tablescans will affect performance. I think the first query would cause 10M * 10M = lots of tablescans... it might be a tad too much, even 800K * 10M = still too many tablescans (sorry, I'm losing my head on the numbers, I don't even know how they'd be called... :-o

Anyway, I realize I forgot to mention this, while it is a relevant part of the question.

Also I was thinking if this would help me solve the riddle:
Code:
create temporary table badgroup
select group_id from permtable where event <> 'C' and repair_id is null;

select * from permtable
where start_time < DATE(NOW() - INTERVAL 3 MONTH)
and group_id not in (select group_id from badgroup);

drop temporary table badgroup; -- should not be necessary, but oh well.
badgroup table holds, with this query 180k records, but should reduce the weight and amount of full tablescans...
 
Old 01-04-2013, 05:55 AM   #4
Ramurd
Member
 
Registered: Mar 2009
Location: Rotterdam, the Netherlands
Distribution: Slackwarelinux
Posts: 555

Original Poster
Rep: Reputation: 75
I ran this query:
Code:
select * from
permtable p1,
(select group_id from permtable where event <> 'C' and repair_id is null) p2
where p1.start_time < DATE(NOW() - INTERVAL 3 MONTH)
and p1.group_id <> p2.group_id
Unfortunately it does not do what I expected and hoped for; It returned 178 billion records (when doing a select count(*)) and ran for 22 hours.
Since the table holds plus or minus 10M records the answer cannot be correct. This seems to do a full outer join, rather than an inline view.

Thinking whether or not a function would solve the problem as well (rather than the temp table):
something along these lines:

Code:
create function is_completed(f_group_id varchar(36))
returns boolean deterministic
begin
  declare retval boolean false;
  select case when count(0) = 0 then true else false end into retval
  from permtable
  where group_id = f_group_id
  and event <> 'C'
  and repair_id is null;

  return(retval);
end
and then
Code:
select * from permtable
where is_completed(group_id) = true
having min(start_time) < date(now() - interval 3 month);

Last edited by Ramurd; 01-04-2013 at 05:56 AM.
 
Old 01-08-2013, 06:20 AM   #5
Ramurd
Member
 
Registered: Mar 2009
Location: Rotterdam, the Netherlands
Distribution: Slackwarelinux
Posts: 555

Original Poster
Rep: Reputation: 75
The function, along with adding an index made the query run fast enough and yield the correct results.
So marking this thread as solved.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
How to make an SQL table reference another table? Dornith Programming 4 10-29-2012 02:37 PM
LXer: ITTIA DB SQL Data Distribution Bridges QNX(R) with SQL Server LXer Syndicated Linux News 0 03-14-2012 01:21 AM
how to dump an sql table only data nephish Linux - Server 2 12-27-2007 02:55 PM
mySQL selecting from one table with conditions from another Tischbein Programming 5 10-23-2007 08:19 PM
SQL SELECT Group by 2 conditions MicahCarrick Programming 1 01-30-2007 11:02 PM


All times are GMT -5. The time now is 12:41 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration