LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Software (https://www.linuxquestions.org/questions/linux-software-2/)
-   -   Database software for storing documents and "unstructured" data (https://www.linuxquestions.org/questions/linux-software-2/database-software-for-storing-documents-and-unstructured-data-679906/)

pengyou 10-29-2008 09:49 PM

Database software for storing documents and "unstructured" data
 
I am a data guru and have been hording information since the ripe old age of 7 :) I would like to see if I can use computers to help me as much as possible, especially since I am now undertaking my masters degree, and soon a Ph. D.

I want to be able to scan articles, papers and even whole books into digital form, then process them with ocr software. I want the result to go into my database as an intact file. I am ok with having keywords with a document but would like a (pretty) fast search engine that would be able to go through all of the text in the stored documents and search for words that I am looking for as they appear in the actual document.

I also need to be able to store webpages, jpegs, sound and even video. I am ok with adding keywords to these ( not having the software scan them) but would still like the same search engine to include this when searching. I am anticipating that within five years this database will be over 20GB and will continue to grow. It will be a part of my brain :)

My questions:

First of all, can someone give me some more specific key words to help me describe what I am looking for? Most databases that I have run across are very structured, in that the user has to define fields and insert the data into the field before it can be looked up.

Second, can anyone suggest some linux software that will do this? Right now I only have a dual 650 mhz P3 to play with - am thinking of geting some 10K rpm scsi or SATA hard drives to put in it. I will have enough money to upgrade in about a year if I need to. BTW, it will have a single user.

Thanks in advance for any help you can provide.

normscherer 10-29-2008 11:32 PM

The filesystem is a database that can store any kind of data and can have as much structure as you want. You can organize it as you wish. There are tools to search it etc. Think about how it could handle your needs.

chrism01 10-30-2008 01:20 AM

Writing a good search engine is an art. I believe you can (if you have the money) license a search engine from google to run on your local system only.
Databases can store binary objects, but whether thats a good idea is another qn. You can't search/match on them. You might be better off keeping large items like books or binary items as external files and just maintain a ptr in the db to them.
For large amts of data, you would be better off pre-creating the index files instead of trying to search raw data in real time. ie when you load a book, index the contents immediately.
Just store the 'content indexes' in the DB. This is not the same as the indexes used to organise the tables in the DB.


All times are GMT -5. The time now is 10:19 PM.