Originally Posted by unihiekka
Isn't CLucene a search engine, if I recall correctly?
I don't need to manipulate the texts themselves, they stay as they are. It's just that I need to take out of the texts words and characters with their abundances and then start crunching numbers. It's probably the words and characters with all their statistics that I probably would put in a database.
Yes, it is a search engine at the highest level, but one of the first tasks to do before a search engine can work is to create a search index on the material to be searched. Some of this process sounds very similar to what you want to do so you could use CLucene's API to run any of the text processing you want. It's a big API, you can do lots of clever stuff with it. You don't have to use the whole thing end to end.
I don't think a database is really appropriate for what you want to do, which is essentially processing large amounts of text.