LinuxQuestions.org
View the Most Wanted LQ Wiki articles.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices

Reply
 
Search this Thread
Old 10-29-2008, 09:49 PM   #1
pengyou
LQ Newbie
 
Registered: Jun 2004
Posts: 26

Rep: Reputation: 15
Database software for storing documents and "unstructured" data


I am a data guru and have been hording information since the ripe old age of 7 I would like to see if I can use computers to help me as much as possible, especially since I am now undertaking my masters degree, and soon a Ph. D.

I want to be able to scan articles, papers and even whole books into digital form, then process them with ocr software. I want the result to go into my database as an intact file. I am ok with having keywords with a document but would like a (pretty) fast search engine that would be able to go through all of the text in the stored documents and search for words that I am looking for as they appear in the actual document.

I also need to be able to store webpages, jpegs, sound and even video. I am ok with adding keywords to these ( not having the software scan them) but would still like the same search engine to include this when searching. I am anticipating that within five years this database will be over 20GB and will continue to grow. It will be a part of my brain

My questions:

First of all, can someone give me some more specific key words to help me describe what I am looking for? Most databases that I have run across are very structured, in that the user has to define fields and insert the data into the field before it can be looked up.

Second, can anyone suggest some linux software that will do this? Right now I only have a dual 650 mhz P3 to play with - am thinking of geting some 10K rpm scsi or SATA hard drives to put in it. I will have enough money to upgrade in about a year if I need to. BTW, it will have a single user.

Thanks in advance for any help you can provide.

Last edited by pengyou; 10-29-2008 at 09:52 PM.
 
Old 10-29-2008, 11:32 PM   #2
normscherer
Member
 
Registered: Sep 2005
Location: On the road
Distribution: Ubuntu 8.10
Posts: 40

Rep: Reputation: 15
The filesystem is a database that can store any kind of data and can have as much structure as you want. You can organize it as you wish. There are tools to search it etc. Think about how it could handle your needs.
 
Old 10-30-2008, 01:20 AM   #3
chrism01
Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.5, Centos 5.10
Posts: 16,283

Rep: Reputation: 2032Reputation: 2032Reputation: 2032Reputation: 2032Reputation: 2032Reputation: 2032Reputation: 2032Reputation: 2032Reputation: 2032Reputation: 2032Reputation: 2032
Writing a good search engine is an art. I believe you can (if you have the money) license a search engine from google to run on your local system only.
Databases can store binary objects, but whether thats a good idea is another qn. You can't search/match on them. You might be better off keeping large items like books or binary items as external files and just maintain a ptr in the db to them.
For large amts of data, you would be better off pre-creating the index files instead of trying to search raw data in real time. ie when you load a book, index the contents immediately.
Just store the 'content indexes' in the DB. This is not the same as the indexes used to organise the tables in the DB.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
system commands like "ls" says "cannot connect to database" Rams3377 Debian 8 11-20-2007 10:01 AM
how can I view and edit "Documents to Go" documents in Linux? izquierdista Linux - Software 7 08-30-2007 07:58 AM
can't open "add/remove software" or "software updater" windows 7trek Fedora 1 06-12-2007 09:06 PM
does failed using urpmi messed up my "Install Software" / "mandrake update" ??? sirpelidor Mandriva 1 11-02-2003 09:00 PM


All times are GMT -5. The time now is 11:58 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration