MYSQL: SQL query takes 10 minutes to execute. Need help optimizing.
Using MySQL version 5.0.67 running on a 2.40GHz dual core Intel CPU.
I have the following table: Code:
CREATE TABLE `files` ( md5 is the md5sum of the files. Now, some of the files are identical. Code:
SELECT count(*), md5 Say I want to retrieve the name of all files which has more than 3 duplicate md5 Code:
SELECT name, date, md5 This is obviously a bad query. Reading the entire table takes 0.10 sec. In Sybase this is rather easy because I can avoid the subquery, but as far as I know MySQL would not work this way. The same applies to MSSQL. Any suggestion on how to improve this query? |
Quote:
as far as my knowlegde is concerned, at teh lower level the query gets optimized and what i can say in your case is that this optimization is the default optimization available, so it might help a bit if you can provide your own procedures thereby asking the underlying "query manager" to do a customised optimization. do think in this direction and let us know your views/issues etc. thank you. lets see what other ppl/experts in this field have to say. |
Hello, this is odd but it would make sense that it is executing that same sub-query 28,573 times. Here is my reasoning:
0.02s * 28,573 = 571.46s 570.46s * (1m/60s) = 9.52m Note: 's' is seconds and 'm' is minutes This is odd because normally databases will see that a sub-query is not dependent on any value in the outer query and will internally optimize the query so that it only runs the sub-query once. The only thing I can think of is that MySQL is seeing the same table and column names in the outer and inner queries and thinks it must run the inner query ever time because it may depend on the outer one. To test this, give specific names to the inner and outer query tables like so: Code:
SELECT outer.name, outer.date, outer.md5 Otherwise, we will have to play around with indexes and/or possibly find a better way to write the query. What version of MySQL are you using? |
Quote:
Code:
mysql> SELECT outer.name, outer.date, outer.md5 Guess I have to check the manual. :) Quote:
|
Quote:
Code:
SELECT t1.name, t1.date, t1.md5 |
Quote:
That is, the syntax is ok, but still it takes long time to execute. I have waited a few minutes now. Still no results. Oh, and the mysqld process is in 100% CPU utilization. It was a nice try and a plausible explanation. I still think you're explanation is close to true, but insinuating a MySQL bug is probably not wise... :scratch: Is there an alternative query to get the same results? I was pondering about a JOIN but couldn't figure out how to do it. Edit: Here is the result: 334 rows in set (10 min 4.24 sec) |
Quote:
|
Problem solved!
Got it! Thanks to all for guiding me to the answer. I've never come up with this alone.
Code:
SELECT files.name, files.date, files.md5 FROM files It's really cooking with gas! (It's all electrical of course, but then it's just a saying right? http://en.wiktionary.org/wiki/cooking_with_gas) :D The clue was using a JOIN in the query. |
All times are GMT -5. The time now is 12:00 AM. |