LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   SQL - get data filtered by conditions in that same table (https://www.linuxquestions.org/questions/programming-9/sql-get-data-filtered-by-conditions-in-that-same-table-4175443582/)

Ramurd 12-31-2012 09:20 AM

SQL - get data filtered by conditions in that same table
 
Hi folks,

I hope you'll be able to help me solve this issue, that I'm trying to solve:
We have a MySQL (don't ask) database with a bunch of tables. One of these tables is growing a bit large, and I wish to put all records older than 3 months into another table (I'm going to use select into backuptable once I'm sure I have the correct data here)

The table that is growing too large is permtable, which holds quite some fields for each record; Among which are start_time (which holds the date/time when the record was inserted), event (which is 'C' for each record in a "good" situation, and another value in any other situation),group_id which combines the records that belong together and should be seen as a whole and repair_id which is set when an attempt was made to repair the non-good situation.

I wish to extract from this table the dataset that covers these constraints:
- all records older than 3 months
- where all records with the same group_id are C, or in the case they're not C have repair_id set.

The query I thought of was this one:
Code:

select * from permtable p1 -- get all
where p1.start_time < DATE(NOW() - INTERVAL 3 MONTH) -- start time older than 3 months
and not exists (select * from permtable p2 where p2.event <> 'C' and p2.repair_id is null and p2.group_id = p1.group_id) -- and are not C AND have no repair_id

but it does not work nice with that group_id; I do keep getting records where there is one record with a non-C event and a null-value on repair_id. Where am I going wrong?

I also had another query, but that one fails due to deadlocking:
Code:

select * from permtable
where start_time < DATE(NOW() - INTERVAL 3 MONTH)
and group_id not in (select group_id from permtable where event <> 'C' and repair_id is null)

Also, even if it would not dead-lock, the full tablescans make this query far from optimal, but it does paint the logic I was trying to implement.

Anyone who can help out pointing out what I'm doing wrong?

Pearlseattle 12-31-2012 10:33 AM

Hi
Sure that #2...
Code:

select * from permtable
where start_time < DATE(NOW() - INTERVAL 3 MONTH)
and group_id not in (select group_id from permtable where event <> 'C' and repair_id is null)

... is really deadlocking? Don't have much experience with MySQL (usually use Oracle) but it looks perfectly normal to me.

As an alternative, did you try to use an inline view (hopefully supported in MySQL) for #2 like...
Code:

select * from
permtable p1,
(select group_id from permtable where event <> 'C' and repair_id is null) p2
where p1.start_time < DATE(NOW() - INTERVAL 3 MONTH)
and p1.group_id <> p2.group_id

Btw., should the date range be valid as well for the subquery/inline view?

Ramurd 01-03-2013 03:33 AM

Hi, thanks for the quick response! (And a happy 2013! :-))

The query might not be deadlocking, after all; it's just too slow and doing too many full table scans to be manageable. Haven't tried inline views (yet), it is actually one of the options I am looking into (not sure whether MySQL supports them or not, and what the implications might be... who knows)

I'm currently trying out your second query. For figures: the table holds about 10M records, the subset < DATE(NOW() - INTERVAL 3 MONTH) yields 800K records; Just an indication about how tablescans will affect performance. I think the first query would cause 10M * 10M = lots of tablescans... it might be a tad too much, even 800K * 10M = still too many tablescans (sorry, I'm losing my head on the numbers, I don't even know how they'd be called... :-o

Anyway, I realize I forgot to mention this, while it is a relevant part of the question.

Also I was thinking if this would help me solve the riddle:
Code:

create temporary table badgroup
select group_id from permtable where event <> 'C' and repair_id is null;

select * from permtable
where start_time < DATE(NOW() - INTERVAL 3 MONTH)
and group_id not in (select group_id from badgroup);

drop temporary table badgroup; -- should not be necessary, but oh well.

badgroup table holds, with this query 180k records, but should reduce the weight and amount of full tablescans...

Ramurd 01-04-2013 04:55 AM

I ran this query:
Code:

select * from
permtable p1,
(select group_id from permtable where event <> 'C' and repair_id is null) p2
where p1.start_time < DATE(NOW() - INTERVAL 3 MONTH)
and p1.group_id <> p2.group_id

Unfortunately it does not do what I expected and hoped for; It returned 178 billion records (when doing a select count(*)) and ran for 22 hours.
Since the table holds plus or minus 10M records the answer cannot be correct. This seems to do a full outer join, rather than an inline view.

Thinking whether or not a function would solve the problem as well (rather than the temp table):
something along these lines:

Code:

create function is_completed(f_group_id varchar(36))
returns boolean deterministic
begin
  declare retval boolean false;
  select case when count(0) = 0 then true else false end into retval
  from permtable
  where group_id = f_group_id
  and event <> 'C'
  and repair_id is null;

  return(retval);
end

and then
Code:

select * from permtable
where is_completed(group_id) = true
having min(start_time) < date(now() - interval 3 month);


Ramurd 01-08-2013 05:20 AM

The function, along with adding an index made the query run fast enough and yield the correct results.
So marking this thread as solved.


All times are GMT -5. The time now is 01:06 PM.