LinuxQuestions.org
Go Job Hunting at the LQ Job Marketplace
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 10-10-2006, 10:35 PM   #1
vargadanis
Member
 
Registered: Sep 2006
Posts: 248

Rep: Reputation: 30
Search in vast MySQL database - how?


Hi,

This is my 3rd thread today. You gays are very helpful. Better than a purchased support. I am editing a huge website now. A civilian portal but it is gonna have over 5 million new entries in a year. And I have to create a search engine for that. I was going to create a full-indexing service (I mean the keywords are recorded to a table and the searching looks for data in this table). I tought it is better to ask you what technique shall I use to optimize the performace of the search engine, what techniques would you use. I think this sollution can work but it requires quite high system resources. (They gave me a server with two Opteron 244 processors and 8GB of RAM so I cannot use anything else but MySQL cause it is performance optimized. I hope it is going to be enough because my emplyers expect millions of requests a day).
And one more thing. I plan to use ADODB as it support hashing. Do you think it can improve the performance too?
 
Old 10-11-2006, 10:07 PM   #2
graemef
Senior Member
 
Registered: Nov 2005
Location: Hanoi
Distribution: Fedora 13, Ubuntu 10.04
Posts: 2,379

Rep: Reputation: 148Reputation: 148
Well my initial thought is that since ADODB is an abstraction layer it will have an overhead. If pure speed is important to you and you will not be changing your database then there is probably no need for it.
 
Old 10-11-2006, 10:12 PM   #3
graemef
Senior Member
 
Registered: Nov 2005
Location: Hanoi
Distribution: Fedora 13, Ubuntu 10.04
Posts: 2,379

Rep: Reputation: 148Reputation: 148
How are you doing your indexing? If you have a couple of tables one of keywords, a second acting as a resolver between the keywords and the documents then I'd say that is the right way to do it. If on the other hand you are indexing the documents and using the SQL %LIKE% feature then no that will very quickly cause the computer to grind to a halt.
 
Old 10-12-2006, 05:06 AM   #4
Guttorm
Senior Member
 
Registered: Dec 2003
Location: Trondheim, Norway
Distribution: Debian and Ubuntu
Posts: 1,158

Rep: Reputation: 247Reputation: 247Reputation: 247
Hi

Since you said it's a website, another option would be to index the website and not the MySQL data. I've tried using htdig, and found out its fast - even on quite a lot of data. It needs some configuration, but it's a lot easier than writing your own search engine. Indexing with htdig requires quite a lot of disk space for the indexes, but disks are cheap.

If that is not an option, the basic approach would be to make a table of all words - then a join table with ID's of the words and ID's of where they occure. Remember there aren't that many words in a vocabulary - and also you can "blacklist" a lot of words like "a", "it", "are" and so on.
 
Old 10-12-2006, 10:48 AM   #5
95se
Member
 
Registered: Apr 2002
Location: Windsor, ON, CA
Distribution: Ubuntu
Posts: 740

Rep: Reputation: 32
No point in reinventing the wheel, why not just use MySQL's built-in full text capabilities? Fulltext searches in MySQL are VERY fast. Unless I am missing something, I see this as a good solution.

Graemef, often DB abstraction layers have support for caching searches.
 
Old 10-12-2006, 10:03 PM   #6
graemef
Senior Member
 
Registered: Nov 2005
Location: Hanoi
Distribution: Fedora 13, Ubuntu 10.04
Posts: 2,379

Rep: Reputation: 148Reputation: 148
Quote:
Originally Posted by 95se
No point in reinventing the wheel, why not just use MySQL's built-in full text capabilities? Fulltext searches in MySQL are VERY fast. Unless I am missing something, I see this as a good solution.

Graemef, often DB abstraction layers have support for caching searches.
Just some thoughts...

The fulltext search has some limitations (I forget exactly) but I believe that it will only work on one of the db engines.

A full text search can be efficient if you are searching one field, if you need to search multiple felids then a bespoke system (which I outlined) would be more efficient

I would imagine that a cache is only useful if the database is on a different server, hence saving network connection. If it is on the same server then both will require disk access.
 
Old 10-12-2006, 10:51 PM   #7
95se
Member
 
Registered: Apr 2002
Location: Windsor, ON, CA
Distribution: Ubuntu
Posts: 740

Rep: Reputation: 32
There are some limitations to using the full-text search, but it's not a show stopper. You can search multiple full-text indexed fields at once, but they must all be full text. If they are, then it's still fast.
 
Old 10-13-2006, 12:33 AM   #8
vargadanis
Member
 
Registered: Sep 2006
Posts: 248

Original Poster
Rep: Reputation: 30
Wov... Ok... I am not as good at MySQL so I need some further help. What are the normalization stuff or what? I gues I need to design the DB that I know what are those.
I used full-text search on smaller DB-s but I do not think that was quick enough. This DB is going to have GB-s of data after a while. Do you still think it can be fast enough?
Quote:
If you have a couple of tables one of keywords, a second acting as a resolver between the keywords and the documents then I'd say that is the right way to do it.
Graemef, can you give me some details or an example of your solution?
 
Old 10-13-2006, 01:30 PM   #9
95se
Member
 
Registered: Apr 2002
Location: Windsor, ON, CA
Distribution: Ubuntu
Posts: 740

Rep: Reputation: 32
OK, think of it like this. What you want to do is create separate table(s) for indexing. MySQL does the same thing w/ its fulltext indexing, only it will do it a lot faster than you can with some complicated set of SELECT statements. If MySQL's full text search isn't fast enough w/ the amount of data you have, your solution won't be any better. Check out the user comments on the MySQL 5.0 Fulltext Documentation, http://dev.mysql.com/doc/refman/5.0/...xt-search.html . Some people are happy w/ it's performance at sizes your talking about, some are not, and many have tips on how to get the most out of it. I would consider doing 2 things. 1) Test it out with a LOT of test data (real-world if you have it), and see if it's performance meets your needs. 2) Have a back up plan handy should MySQL suddenly not suit your needs anymore. This could mean many things, like culling older rows, throwing more hardware at it, or switching to a new DB. From what I see on the net, 2x Opteron 244 proc. & 8GB of ram should handle your data, but there is no way I could know that!

Last edited by 95se; 10-13-2006 at 01:33 PM.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Php search mysql database jamesjoy Programming 4 12-02-2005 08:28 AM
PHP launches the entire database in one search?! Alexander.s Programming 5 04-28-2005 08:32 PM
Writing an app that uses a mysql database without installing mysql server? QtCoder Programming 4 08-09-2004 03:43 PM
MYSQL help - mysql database missing eloviyandhi Linux - Software 1 03-20-2004 10:20 PM
what is the command to rebuild the search database? warkrime Linux - General 1 11-16-2003 05:39 PM


All times are GMT -5. The time now is 04:36 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration