LinuxQuestions.org
Visit the LQ Articles and Editorials section
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > General
User Name
Password
General This forum is for non-technical general discussion which can include both Linux and non-Linux topics. Have fun!

Notices

Reply
 
Search this Thread
Old 03-28-2009, 04:34 PM   #1
entz
Member
 
Registered: Mar 2007
Location: Milky Way , Planet Earth!
Distribution: Opensuse
Posts: 453
Blog Entries: 3

Rep: Reputation: 40
Thumbs up Community Driven Search Engine - NIteCo


Hello All,

here comes a grand surprise from NiteCo.

You've probably seen the chat facility or the Average Age of Internet Users's Survey.

But here comes something that is , on a totally new level of sophistication.

As the title may suggest , it's about a community driven search engine
that has been released just today on the web.

principle of operation:
you type your search keywords into the field , click search
then you get the most relevant results , now here comes the interesting part.
each single url can be picked which will increase it's searching ranking
so that it will get a higher probability to be shown on the top.
simple isn't it ?

however , if you don't find what you want or you have a url to share
then you can enlist the url.

now enlisting a url doesn't just mean that you type in a url and that's it.
but also you actually write a title , a short description and most importantly you set the keywords that you think are relevant.

So Enough talk for now...Go Check it , Submit url's or Pick those from others.

http://www.niteco.com


regards

Last edited by entz; 03-28-2009 at 04:35 PM.
 
Old 03-28-2009, 09:33 PM   #2
XavierP
Moderator
 
Registered: Nov 2002
Location: Kent, England
Distribution: Lubuntu
Posts: 19,176
Blog Entries: 4

Rep: Reputation: 430Reputation: 430Reputation: 430Reputation: 430Reputation: 430
So it's a backwards search engine? If the engine can't find what I want, I have to tell it where to look? Why wouldn't I, as a user and searcher, just hit an established engine which will give me results but won't ask me to provide them? I mean, it's a nice idea, but if I knew where to look, I wouldn't need a search engine. I suspect that you will get hit by spammers and the like who will just want to up their rankings for viagra and porn.
 
Old 03-28-2009, 09:35 PM   #3
MS3FGX
Guru
 
Registered: Jan 2004
Location: NJ, USA
Distribution: Slackware, Debian
Posts: 5,852

Rep: Reputation: 351Reputation: 351Reputation: 351Reputation: 351
Quote:
You've probably seen the chat facility or the Average Age of Internet Users's Survey.
No, can't say that I have...

I get the idea of what you are trying to do here, but 9 out of 10 terms I searched for had zero results. It seems that it would make this a little more usable if you seeded it with some initial results from Google or one of the other search engines. Still let people submit their own links, and vote up/down the results, but at least give a baseline to start with.

Even if you just had the results from the other engines on the bottom of the page with a "Add to NiteCo" link next to each one, that would at least give people a place to start.
 
Old 03-29-2009, 10:34 AM   #4
entz
Member
 
Registered: Mar 2007
Location: Milky Way , Planet Earth!
Distribution: Opensuse
Posts: 453
Blog Entries: 3

Original Poster
Rep: Reputation: 40
hello Xaviar , how are you? I'm fine btw

Quote:
So it's a backwards search engine? If the engine can't find what I want, I have to tell it where to look? Why wouldn't I, as a user and searcher, just hit an established engine which will give me results but won't ask me to provide them?
I'm not quite sure what you mean with a 'backwards search engine'
however , you made an interesting point and here is the answer to it:

Well the deal is that when you enlist a url you're not giving the engine
any answers as to where to look , but instead you're sharing a useful site
that is relevant to a certain topic or keyword/group of keywords with other users

moreover , the engine works primarily by balancing between the enlisters
and the pickers.
the first suggests urls , give them titles , description ..etc

and the pickers search for content then select what they find interesting
or relevant.

the thing is that anybody , has potentially at least one other url to enlist.
for example you know about LQ , so you could suggest it (btw I've enlisted LQ under linux already) me on the other hand can pick it and perhaps suggest something for example XKCD.com , that you might pick and so on...

Regarding the dread fear of spam , well many ppl have asked me similar
questions , like for example what spam filters are up ..etc

the antidote to the spam problem ,lies straight forward in the principle and philosophy behind the Niteco search engine.

first of all,

Each url has to go from the bottom up as it earns respect

clearly spam will not make it to the top under such conditions

second,

Concentrate on the Elite Choosen Few that matter to the subject 99% ,
not on 1 million url's with 50% relevance or lower.


consider this questions , What's the chance the googler might look beyond the first search page ?
quite low isn't it , and fact is that visiting the 3th page is even lower and so on.
so from a 'linux' search on google which gives you over 500 million results only a tiny fraction of less than 1% will actually get seen from the perspecitve of 'linux' searchers , the rest shall go down the toilet.

Quote:
I get the idea of what you are trying to do here, but 9 out of 10 terms I searched for had zero results. It seems that it would make this a little more usable if you seeded it with some initial results from Google or one of the other search engines
1 match in 10 attempts , isn't bad for a start , don't you think.
may i remind you that we just started yesterday so it may take some time but i'm sure the coverage will increase exponentially .

What i would think would work much better than seeding results from google or another classical engine is , if Jeremy would put a 'Enlist on NiteCo' button for thread here.

I would be more than happy to code a full blown API on my part.

Cheers
 
Old 03-30-2009, 09:30 PM   #5
MS3FGX
Guru
 
Registered: Jan 2004
Location: NJ, USA
Distribution: Slackware, Debian
Posts: 5,852

Rep: Reputation: 351Reputation: 351Reputation: 351Reputation: 351
Quote:
clearly spam will not make it to the top under such conditions
What protections do you have in place to stop links being added/voted from scripts? What would stop somebody from writing a script to add millions of spam links to your engine with popular keywords, and then systematically vote all of them up a few hundred times a minute?
 
Old 03-31-2009, 05:33 PM   #6
entz
Member
 
Registered: Mar 2007
Location: Milky Way , Planet Earth!
Distribution: Opensuse
Posts: 453
Blog Entries: 3

Original Poster
Rep: Reputation: 40
Quote:
Originally Posted by MS3FGX View Post
What protections do you have in place to stop links being added/voted from scripts?
Well that seems to be a pretty popular question these days ,
considering those so called votabot attacks on youtube..etc

anyways,

As for possible attacks of similar nature on NiteCo , i can assure you that I've prepared very effective measures against them.
unfortunately , i can't disclose any detailed informations for security reasons.
However, What is already known and pretty much obvious to the mundane observer , is among other things the fact that carrying out such an attack would require registering a substantially large number of accounts in the first place.

and in that lies the first line of defense , one word , captcha .

and even for the worst case scenario , I do have plans .

Unlike , many other online systems both big and small , the NiteCo Index
is completely revertable , that is if security layers where to be circumvented (which is unlikely btw) , all offensive and/or disruptive actions can be filtered and canceled out as if they have never taken place.

So there is a slight chance to bypass the measures but it's impossible
to go that far without being detected .

at that point they will get busted and penalized with an IP Ban.

Hope that clarifies something.


Regards
 
Old 03-31-2009, 06:06 PM   #7
easuter
Member
 
Registered: Dec 2005
Location: Portugal
Distribution: Slackware64 13.0, Slackware64 13.1
Posts: 538

Rep: Reputation: 62
Why have people inputting the information when spiders already collect information efficiently? And that's the whole point of having computers: they're supposed to do all the boring tedious work for us.

Quote:
Originally Posted by XavierP
I suspect that you will get hit by spammers and the like who will just want to up their rankings for viagra and porn.
Yeah, I'd imagine it would quickly degenerate into a spammer crapfest. Its bad enough that my email account is already plagued by a never ending flow of cheap Rolexes and Megadik pill emails.

Last edited by easuter; 03-31-2009 at 06:09 PM. Reason: typo
 
Old 04-01-2009, 12:35 PM   #8
entz
Member
 
Registered: Mar 2007
Location: Milky Way , Planet Earth!
Distribution: Opensuse
Posts: 453
Blog Entries: 3

Original Poster
Rep: Reputation: 40
Quote:
Originally Posted by easuter View Post
Why have people inputting the information when spiders already collect information efficiently? And that's the whole point of having computers: they're supposed to do all the boring tedious work for us.
Well I'm gonna break this down into parts,

First of all , Computers might do things efficiently but they don't do it
right , unless you tell them too.

Seriously, Anybody who believes that computers who can only understand 1 and 0 are capable of understanding the needs of humans on their own without guidance , is a person with extremely limited cognitive functions.

because computers are nowhere near grasping even the slightest clue
of what people want or even how to deal with them.

Computational Intelligence is inferior to Human Thinking , there is no need for 2 to argue about , and that's so far for the answer to why we can't rely solely on computers to gather data.

Quote:
Yeah, I'd imagine it would quickly degenerate into a spammer crapfest. Its bad enough that my email account is already plagued by a never ending flow of cheap Rolexes and Megadik pill emails.

Now that's utter Prejudice , because you probably haven't seen the site yet and in either case you're not in a position to throw such a generalized conclusion , as you have no idea about the project.

Regarding your email provider , Well my Email Inbox only sees a spam mail each couple of month and that's very good.
So if you're being literally buried with spam that's your issue because you chose the wrong service , go get a better one. My Advice.

Cheers
 
Old 04-01-2009, 02:19 PM   #9
easuter
Member
 
Registered: Dec 2005
Location: Portugal
Distribution: Slackware64 13.0, Slackware64 13.1
Posts: 538

Rep: Reputation: 62
Quote:
Originally Posted by entz View Post
Well I'm gonna break this down into parts,

First of all , Computers might do things efficiently but they don't do it
right , unless you tell them too.

Seriously, Anybody who believes that computers who can only understand 1 and 0 are capable of understanding the needs of humans on their own without guidance , is a person with extremely limited cognitive functions.

because computers are nowhere near grasping even the slightest clue
of what people want or even how to deal with them.

Computational Intelligence is inferior to Human Thinking , there is no need for 2 to argue about , and that's so far for the answer to why we can't rely solely on computers to gather data.
Jeez take it easy, no need to throw your toys out of the pram.
I was simply pointing out that your site, however well-intentioned, is just not an effective way to index the web.

Employing "clever" and efficient algorithms allows computers to process large amounts of information at a rate that our brains are simply incapable of.

And yes, the best way to index massive amounts of data spread over millions of interconnected nodes is to automate the process as much as possible, not to make it even more dependent on human intervention.


Quote:
Now that's utter Prejudice , because you probably haven't seen the site yet and in either case you're not in a position to throw such a generalized conclusion , as you have no idea about the project.

Regarding your email provider , Well my Email Inbox only sees a spam mail each couple of month and that's very good.
So if you're being literally buried with spam that's your issue because you chose the wrong service , go get a better one. My Advice.

Cheers
I followed the link you gave and tried it out; most of the searches I performed were misses.
And as for the abuse by spammers: it doesnt take a genius to figure out that if you create a tool that can be abused by spammers, then it will be abused for sure.
One of the reasons I won't contribute to that site is because my hard "work" and contributions can be very quickly undone by a spammer with a script.

The idea for this site is interesting, but its just not compelling.
A more interesting (and fun) project would be to create algorithms that would unseat Google as the search engine king. Though unlikely to happen that would still be something educational for you and maybe even end up really befitting the rest of the web.

Last edited by easuter; 04-01-2009 at 02:27 PM.
 
Old 04-02-2009, 06:49 AM   #10
entz
Member
 
Registered: Mar 2007
Location: Milky Way , Planet Earth!
Distribution: Opensuse
Posts: 453
Blog Entries: 3

Original Poster
Rep: Reputation: 40
Quote:
Originally Posted by easuter View Post
Employing "clever" and efficient algorithms allows computers to process large amounts of information at a rate that our brains are simply incapable of.

And yes, the best way to index massive amounts of data spread over millions of interconnected nodes is to automate the process as much as possible, not to make it even more dependent on human intervention.
Well , You probably presented a good point but you ruined it by
generalizing it for every situation.

Question: Have you heard of Organic Eggs for instance?
I assume you do and in the same token you should understand the principle of NiteCo Search .

or in other Words , it's Quality over Quantity.
On one hand you've the industrialized factory that produces massive yet mediocre results.
on the other , you've a community with significantly lower capacity but with superior handcrafted results.

However , the word handcrafted does not mean necessarily that everything
has to be manually done.
No , Bots can be utilized where it's beneficial without impacting the quality.
Quote:
Originally Posted by easuter View Post
And as for the abuse by spammers: it doesnt take a genius to figure out that if you create a tool that can be abused by spammers, then it will be abused for sure.
One of the reasons I won't contribute to that site is because my hard "work" and contributions can be very quickly undone by a spammer with a script.
OK , now let us analyze this statement logically , shall we?
first you say 'a tool that can be abused' .
I'm wondering how can you determine that without having any insights
into the system itself you're talking about?
second , anything can be abused , Google has been abused , so has been wikipedia , yahoo , hotmail and even the kitchen knife has been abused for murder.

So here we have a general assumption without any concrete evidence to back it up.

In order to do a Risk Assessment in general , you either need to acquire insider knowledge OR (more commonly) you have to refer to previous incidents where this has been the case.
and also as for the latter you've to show that those incidents can be
repeated because no actions have been taken on behalf of the operator.

Anyways, it's clear that none of the above qualifies as evidence for your assumption.

Now you happen to mention later on that you will not contribute any information to the project because you firmly believe that the system can be abused by spammers and that it's not worth it.

OK , now that compares to saying something like this:
'I don't want to go to school because i'm afraid that the kids will bully me.'
or ' I don't wanna work because my boss or colleagues might mob me.'
or ' I don't wanna plant flowers in my garden because i'm afraid that
birds might poo over them' .

Seriously , don't you think that sounds pretty cynical and ridiculous ?

I mean , In life you should be prepared to face problems at any time
and when they arise you've to fight them.

As for this Project , I've made a pledge that this is gonna be something that outputs quality and i'll not tolerate behavior that attempts to disrupt the service in any way .

YOu can Count on that !


Quote:
Originally Posted by easuter View Post
The idea for this site is interesting, but its just not compelling.
A more interesting (and fun) project would be to create algorithms that would unseat Google as the search engine king. Though unlikely to happen that would still be something educational for you and maybe even end up really befitting the rest of the web.
Oh thanks for considering that.

My Opinion is that no algorithm or automated process can accomplish
what it's programmer was not able to accomplish!

this is the like the paradox of the creation superseding the creator.

Maybe in the distant future such scenarios might show up although i doubt that this might be the case.

examining the field of AI shows absolutely nothing worth bothering with that can be practically employed for realistic and complex operations like this.

That said i do believe that bots are useful in performing specific operations but the primary role should be given to the User.
 
Old 04-02-2009, 01:11 PM   #11
easuter
Member
 
Registered: Dec 2005
Location: Portugal
Distribution: Slackware64 13.0, Slackware64 13.1
Posts: 538

Rep: Reputation: 62
Quote:
Well , You probably presented a good point but you ruined it by
generalizing it for every situation.

Question: Have you heard of Organic Eggs for instance?
I assume you do and in the same token you should understand the principle of NiteCo Search .

or in other Words , it's Quality over Quantity.
On one hand you've the industrialized factory that produces massive yet mediocre results.
on the other , you've a community with significantly lower capacity but with superior handcrafted results.

However , the word handcrafted does not mean necessarily that everything
has to be manually done.
No , Bots can be utilized where it's beneficial without impacting the quality.
Maybe you are not aware of the fact that Google does allow you to narrow down your searches using certain flags and keywords. So technically, you can "hand craft" your searches and get quality results.

Quote:
I'm wondering how can you determine that without having any insights
into the system itself you're talking about?
Perhaps if you weren't so vague about what security measures you have in place, I could judge the situation better.

IP banning offenders, as you suggested, is not a solution. There are many people who share the same IP address that would be unjustly affected by such a "security" measure, and probably wouldn't stop the abuse because spammers like using zombie machines.
So as soon as one zombie is no longer able to access the site, they can just switch to another.

And CAPTCHA can be infuriating to use on a regular basis.

Quote:
second , anything can be abused , Google has been abused , so has been wikipedia , yahoo , hotmail and even the kitchen knife has been abused for murder.

So here we have a general assumption without any concrete evidence to back it up.
Am I missing something? You yourself just pointed out how many other services get abused and then say there is no evidence for me to think that your site will also get abused?
Sorry, but THAT DOES NOT COMPUTE!

Quote:
Now you happen to mention later on that you will not contribute any information to the project because you firmly believe that the system can be abused by spammers and that it's not worth it.

OK , now that compares to saying something like this:
'I don't want to go to school because i'm afraid that the kids will bully me.'
or ' I don't wanna work because my boss or colleagues might mob me.'
or ' I don't wanna plant flowers in my garden because i'm afraid that
birds might poo over them' .
Now thats just a stupid comparison.

Even if I had a crappy time at school, I still benefited from it because I got a free education.
And even if I one day get a shitty job, at least I'm earning a wage, so I'm also getting something positive out of it.
Lastly, I don't do gardening because its boring.

If I contribute to your site and that contribution gets trampled on by spammers, I get nothing, and neither do any other people using the site. So in other words, all my contributions would have been a pointless exercise and a massive waste of time.

Quote:
My Opinion is that no algorithm or automated process can accomplish
what it's programmer was not able to accomplish!
Now that doesn't make any sense.

Programmers "move" algorithms from the theoretical realm into the real world so that they can run on computers, simply because they expect to harness a computer's power to perform a task that our brains can't do (efficiently).

How about comparing the time it would take my 400mhz laptop to perform a merge-sort on 1 million random integers to the time it would take you to perform the same task.
The time it would take the computer is negligible, but I bet you will be busy for a very long time.

In the same way, a spider can tirelessly index the web and store the information in a database. This information can later be retrieved by a user and the search can be fine-tuned to weed out information that may not be of any interest.

Quote:
examining the field of AI shows absolutely nothing worth bothering with that can be practically employed for realistic and complex operations like this.
We don't need an artificial intelligence to index data, current methods are pretty damn good, thank you.

Perhaps you felt the need to start this project because you weren't getting the desired results out of Google. But maybe all you need is to learn how to use the tools that you already have available. Maybe improve your knowledge of Google search syntax. There are books covering this topic.

Anyway, I don't have anything else to say and it doesn't seem like there is anything constructive to add to previous posts either, so this is my last post in this thread.

Good luck

Last edited by easuter; 04-02-2009 at 01:16 PM. Reason: typo
 
Old 04-02-2009, 04:19 PM   #12
entz
Member
 
Registered: Mar 2007
Location: Milky Way , Planet Earth!
Distribution: Opensuse
Posts: 453
Blog Entries: 3

Original Poster
Rep: Reputation: 40
Quote:
Maybe you are not aware of the fact that Google does allow you to narrow down your searches using certain flags and keywords. So technically, you can "hand craft" your searches and get quality results.
You guessed wrong , i'm aware of those advanced search modes namely:
search for any
search for all
search for exact phrase

...
those are standard techniques that are obvious to any SQL admin.
however what i'm not 'aware' of is how that is supposed to create handcrafted results.
well let me explain to you what handcrafted is and what is not:
when you do a text search regardless of the mode (any,all,exact,etc..)
you get automated output that matches the literal words for instance,
if you search 'car dealer' you get all the resources that contain 'car dealer' , but that doesn't mean that all returned results are relevant to car dealers just because it happened that the phrase 'car dealer' has appeared in a particular resource.
What is handcrafted on other hand, are resources indexed by people which are by far more likely to be relevant as humans are not as dump as computers , to assume that every page that contains 'car dealer' is related to that namesake.

Semantics are complex psychological constructs that can't be oversimplified to simple text searches.

Quote:
Perhaps if you weren't so vague about what security measures you have in place, I could judge the situation better.
GO consider this , If you were to ask google what measures they have to
prevent spamdexing , google bombing , XSS attacks or whatever...

Would you honestly expect them to tell you any details??
I don't think so , Fact is that I'd be a Buffon if i were to reveal
any sensitive security related information to the public.

Quote:
Am I missing something? You yourself just pointed out how many other services get abused and then say there is no evidence for me to think that your site will also get abused?
OK , I'll outline what you missed or failed to computer (in your own words)
It's the basic difference between probability and certainty.
Like for example you saying that NiteCo will get abused and thus making definitive pre-judgements to not use the service what so ever.
as compared to the likelyhood or chance of it getting abused.(which is a factor even present in the most secure system ever)

So basically , you're demonstrating an affirmative statement (which you seem to be so sure about) that abuse will occur no matter what.
and when you make such an argument you're supposed to present evidence
otherwise you're just being cynical and counterproductive.

Quote:
Now thats just a stupid comparison
If I contribute to your site and that contribution gets trampled on by spammers, I get nothing, and neither do any other people using the site. So in other words, all my contributions would have been a pointless exercise and a massive waste of time.
The Comparison very accurate as far as i can see.

what is funny however , is how you pretend to be afraid of wasting time
in case you enlist URL's to NiteCo, while simultaneously you're killing
insane amounts of time bickering around in vain that this project won't succeed.

now that's hypocrisy , isn't it?

Quote:
How about comparing the time it would take my 400mhz laptop to perform a merge-sort on 1 million random integers to the time it would take you to perform the same task.
The time it would take the computer is negligible, but I bet you will be busy for a very long time.
Now that's an example of a stupid comparison!

it's unrelated to the subject , because array sorting and retrieving
semantically relevant documents from a resource pool that match certain keywords are not on the same level of complexity.

Your example basically shows your complete ignorance regarding this subject.

An accurate Example would be to test how long a human would need to paint a portrait against how long a computer would need .

the winner is clearly identified from the start , a Human may take 1 year but the computer will consume an infinite amount of time (thus never presenting anything of merit).

Quote:
In the same way, a spider can tirelessly index the web and store the information in a database.
I have to interrupt you here ,

collecting data randomly and dumping it somewhere is not a big deal at all.
that's why the science that studies search engines is called information retrieval and not information storage.
The Art lies in how you extract things out as well as how the threads get linked to each other.

Quote:
This information can later be retrieved by a user and the search can be fine-tuned to weed out information that may not be of any interest.
You're entirely wrong , as it's not the user who retrieves anything
from the system nor does the user affect the search results or indexing in a classical search engine.(apart from giving the keywords of course)

What happens is that the user types the search query but it's the computer's object to interpret the results as well as more importantly sort the results according to an internal algorithm that the user understands nothing about..etc

On the Other hand , This Project gives the User full control(or at least a big slice of it) to determine what should be placed where and under which shape...etc
Moreover , The Community decides what is relevant and what is not through picking the 'good' documents.
and that's how true information retrieval and categorization is conducted.
Quote:
We don't need an artificial intelligence to index data, current methods are pretty damn good, thank you.
They are pretty damn good at presenting mediocre results,
and if that's what you want then i gonna leave you to it.

Quote:
Perhaps you felt the need to start this project because you weren't getting the desired results out of Google.
Wrong Again! I were getting the desired results.
but that is irrelevant to the subject , what matters is that there are better ways than those used in classical search engines.

Quote:
But maybe all you need is to learn how to use the tools that you already have available. Maybe improve your knowledge of Google search syntax. There are books covering this topic.
Nah , don't need it but thanks anyway

Quote:
so this is my last post in this thread
I Appreciate
 
  


Reply

Tags
chat, engine, networking, search, social


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
LXer: Excitement, Doubt and Ridicule Greet Linux Community-Driven Ad Campaign LXer Syndicated Linux News 0 01-26-2006 07:46 AM
Community driven advertisment? Caboose General 4 04-10-2005 06:24 PM
new search engine? markhod General 3 04-08-2005 04:44 PM
search engine jean-michel LQ Suggestions & Feedback 2 04-09-2004 12:40 PM
LQ search engine Tinkster LQ Suggestions & Feedback 6 03-16-2003 01:57 PM


All times are GMT -5. The time now is 06:51 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration