LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Go Back   LinuxQuestions.org > Blogs > MOCKBA
User Name
Password

Notices


Rate this Entry

Why Is Anyone Still Using Google search?

Posted 04-01-2006 at 09:32 PM by MOCKBA

Opinion: How many times must Google be shown to be totally irrelevant, inaccurate, and ads driven when there is MSN search?

Nothing can be simpler than writing search engines. Do you think so? You can be right though. Indexing Internet content although requires time and CPU recourses is quiet straight forward task. So where a problem is?

Keeping index up-to-date
Keep index comprehensive
Build categorization upon content
Categorize a search query
Sort and filter result
Give an access to entire index
Incremental and activity indexing
What's behind every problem and how it reflects in search result? Ok, let's look:

Keeping index up-to-date
If index isn't updating periodically then a search result can contain broken links. It can be solved looking in search cache, however actual cause of broken link can be that previous version of link had incomplete or wrong information. Cache also doesn't allow to do drill down to get more information if it looks like relevant. Other case of obsolete link, that a link can be still active but content can not match search query.

Keep index comprehensive
Although web crawling is well known task, many search engines use very simplistic crawlers which can't handle well links calculated in JavaScript or links reachable from navigation trees. Since many links in navigation tree just change tree state without giving links to a new content, many crawler gave up after certain number of attempts. Needless to say about reaching protected content.

Build categorization upon content
Many search engine doesn't assign any categorization information to indexed web content. So all pages look equal. It makes search result mostly irrelevant to a search query.

Categorize a search query
Categorization a search query makes sense only if index has categorization too. matching query categories to categories in index can provide much more relevant results.

Sort and filter result
Search result should be sorted by certain criteria, like last time update, or most relevant upon used categorization. Filtering can be helpful to eliminate duplicated or similar result coming from the same source.

Give an access to entire index
Although actual number of entries matching a search query can be big, most of search engine will restrict a temporary built result to 1000 or less entries. It makes very likely that final result won't include search goal, especially if above problems exist in a search engine.

Incremental and activity indexing
Indexing can be incremental like providing changes for updated pages. Activity index is also helpful to find out recently changed information relevant to a search query.

How good Google or any other search engine are in resolving above problems?
To get an answer on this question I wrote a small tool which helps in deep analyzing results provided by a search engine. The following score system for search engines were introduced:

Relevant: One or more search targets are presented in result, 0 to 5. 0 no search target, 1 in last 20% entries, 2 in last 50% entries, 3 in first 20% entries, 4 in 2 first pages, 5 on first page more than one entry.
Relevant after tuning: One or more search targets are presented after one search query tuning. The same scores as for one, considering as 5 if tuning not required.
Actuality, 0 to 3: 0 more than 20% of broken links, 1 - 10% of broken links, 2 less than 5% of broken links, 3 - no broken links.
Relevant actuality, 0 to 2, 50% or more of search targets broken, 1 20% to 50% of search target broken, 2 less than 20% of of search target broken.
Actuality, 0 to 2, 0 more than 50% of result entries do not include all query words, 1 between 20%-50%, 2 less than 20%.
All above tests perform for 10 different queries in the following categories:

Single word product name
Single word product name and 3 words product description
3 words product description
5 words problem description
a person name, well known
a person name not well known
20 words on page accessible using tree navigation
20 words on page accessible JavaScript constructed link
10 words page updated 10 days ago
10 words page updated 30 days ago
Download finesearch and test yourself to see how bad your favorite search engine as Google. Compare it to MSN search and you will be amazed with result. Do not cry about wasted time with Google, switch to MSN now and be happy.

Posted in Uncategorized
Views 838 Comments 0
« Prev     Main     Next »

  



All times are GMT -5. The time now is 04:08 AM.

Main Menu
Advertisement
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration