LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 10-10-2010, 03:07 AM   #16
unihiekka
Member
 
Registered: Aug 2005
Distribution: SuSE Linux / Scientific Linux / [K|X]ubuntu
Posts: 273

Original Poster
Rep: Reputation: 32

I can feed chunks of texts rather than the whole corpus all at once.

Quote:
Try CLucene, a C++ port of Apache's Lucene project.
Isn't CLucene a search engine, if I recall correctly?

I don't need to manipulate the texts themselves, they stay as they are. It's just that I need to take out of the texts words and characters with their abundances and then start crunching numbers. It's probably the words and characters with all their statistics that I probably would put in a database.

Last edited by unihiekka; 10-10-2010 at 03:09 AM.
 
Old 10-10-2010, 03:37 AM   #17
unihiekka
Member
 
Registered: Aug 2005
Distribution: SuSE Linux / Scientific Linux / [K|X]ubuntu
Posts: 273

Original Poster
Rep: Reputation: 32
Just something that I thought of a couple of minutes ago. I started off on the premise that I need a database, but do you guys actually think a database would be appropriate?

I mean, I can just create new objects with data members that contain all the statistics, which are instantiated every time I encounter a new one and save that in plain text files. That would still resemble a database, but would not require additional libraries.

It was just a thought. Any of yours are always much appreciated.

Last edited by unihiekka; 10-10-2010 at 03:39 AM.
 
Old 10-10-2010, 05:19 AM   #18
mericet
Member
 
Registered: Jul 2009
Posts: 50

Rep: Reputation: 8
Quote:
Originally Posted by unihiekka View Post
Isn't CLucene a search engine, if I recall correctly?

I don't need to manipulate the texts themselves, they stay as they are. It's just that I need to take out of the texts words and characters with their abundances and then start crunching numbers. It's probably the words and characters with all their statistics that I probably would put in a database.
Yes, it is a search engine at the highest level, but one of the first tasks to do before a search engine can work is to create a search index on the material to be searched. Some of this process sounds very similar to what you want to do so you could use CLucene's API to run any of the text processing you want. It's a big API, you can do lots of clever stuff with it. You don't have to use the whole thing end to end.

I don't think a database is really appropriate for what you want to do, which is essentially processing large amounts of text.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Newbie seeking advice on distro-choice ianjose Linux - Newbie 13 02-08-2010 08:31 PM
distro choice advice leech985 Linux - Newbie 3 11-10-2006 04:24 PM
Database/Photo management choice? rogere Linux - Software 3 05-03-2006 06:33 PM
Video card choice advice exit3219 Linux - Hardware 3 06-27-2005 08:19 AM
need some advice on language choice(Perl vs PHP) coolman0stress Programming 8 11-17-2003 05:41 AM


All times are GMT -5. The time now is 07:07 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration