Please suggest me a document management software.

vincent90152900 · 03-07-2010, 08:24 PM

Hi,

I am a IT volunteer works in the Solomon Islands.

Solomon Islands is a developing country and does not have much budget, so I think Linux is idea for Solomon Islands.

Solomon Islands Government has many thousands documents in hard copy for many years.

I want to find a solution for scan all the documents and stored it in the server.

Then, people can read the documents and update information about the documents via browser.

Also, I need membership function for manage users and permission to access documents.

Please give me some suggestions.

Many thanks.

mark_alfred · 03-07-2010, 08:34 PM

xsane is for scanners, and once you have created a pnm graphic file, you can convert it to either djvu or pdf via gscan2pdf. In fact, you may not even need xsane, and be able to use the scanner with just gscan2pdf (though I've always used xsane to first create the graphic of the hard-copy, and then used gscan2pdf to convert the graphic to a document.)

bret381 · 03-07-2010, 09:08 PM

Quote:

Originally Posted by vincent90152900

I want to find a solution for scan all the documents and stored it in the server.
Then, people can read the documents and update information about the documents via browser.

I don't think you will be able to do both.... You can scan the documents into the server, but then they will be image files and can not be edited via a browser.

However, you could use a wiki on a webserver to display the documents, after they were typed in by hand... ugh and then allow the files to be edited. So it's more of a question of which you want more. Easier availability to VIEW the documents, or to be able to edit the documents.

mark_alfred · 03-07-2010, 09:12 PM

xsane has a function for converting the scanned image into text, but, admittedly, this function is quite error prone.

bret381 · 03-07-2010, 09:13 PM

well I guess it's a start anyway. I didn't know that was possible

Smartpatrol · 03-07-2010, 09:18 PM

mark_alfred · 03-07-2010, 09:40 PM

Quote:

Originally Posted by bret381

well I guess it's a start anyway. I didn't know that was possible

Yes, it's called "optical character recognition". It'll only work if what is being scanned has been typewritten -- it won't work with handwritten documents.

It's not the best. Just now I tested it with a document that had the following:

"So, in future, if the News is publishing a piece that clearly is opinionated, then kindly label it as an editorial, rather than mislabelling it as balanced news coverage."

What I got was:

"So, in fure, if the News is publishing a piece that clearly is opinionated, then ndly label it as editorial, rather than mislabelling it as balanced news coverage."

So, several errors in one sentence, and the rest of the document likewise had errors. Still, as you say, it may be a start. It may, however, be faster to simply hire a bunch of typists to retype the hard copies into new documents on the computer, rather than scanning them into text and subsequently reviewing and correcting them.

mark_alfred · 03-07-2010, 10:03 PM

Quote:

Originally Posted by Smartpatrol

Microsoft Sharepoint Server will do exactly what you are looking for.

O3Spaces is a Linux equivalent to Sharepoint, and I'm guessing it would be less expensive.

catkin · 03-07-2010, 10:50 PM

Quote:

Originally Posted by vincent90152900

Solomon Islands Government has many thousands documents in hard copy for many years.

I want to find a solution for scan all the documents and stored it in the server.

Then, people can read the documents and update information about the documents via browser.

A common situation and wish

Before you do too much work on the technology, it might be worth a quick calculation of how many person-hours it will take. Try scanning in one document and manually creating the keywords that would be necessary to search for it.

Another aspect to consider is the data volume and implications for backup. Without OCR (and the best OCR is expensive, the free OCR unsatisfactory as posted above) each page will be a graphic. How many TB will the "many thousands documents" be?

vincent90152900 · 03-08-2010, 03:09 AM

Quote:

Originally Posted by mark_alfred

xsane is for scanners, and once you have created a pnm graphic file, you can convert it to either djvu or pdf via gscan2pdf. In fact, you may not even need xsane, and be able to use the scanner with just gscan2pdf (though I've always used xsane to first create the graphic of the hard-copy, and then used gscan2pdf to convert the graphic to a document.)

Hi Mark,

Many thanks for your replying.

I will take a tried.

vincent90152900 · 03-08-2010, 03:17 AM

Quote:

Originally Posted by bret381

I don't think you will be able to do both.... You can scan the documents into the server, but then they will be image files and can not be edited via a browser.

However, you could use a wiki on a webserver to display the documents, after they were typed in by hand... ugh and then allow the files to be edited. So it's more of a question of which you want more. Easier availability to VIEW the documents, or to be able to edit the documents.

Hi bret381,

Many thanks for your replying.

I do need to edit the image file.

I just want to provide some basic information for the documents for people to search it latter.

So I need a solution to find and view the documents fast.

Currently, it is impossible to find a document, because they only have hard copies and too many of them.

I need a solution for people to store image file of the document and provide basic information about the document.

Then people can search it and update the information about the document.

I need membership feature too, because some documents only for some members to view or update information about the documents.

Thank you very much.

vincent90152900 · 03-08-2010, 03:20 AM

Quote:

Originally Posted by mark_alfred

xsane has a function for converting the scanned image into text, but, admittedly, this function is quite error prone.

I think that I will use the OCR result as part of the basic information about the document.

This information is only for searching.

Many thanks for your replying.

vincent90152900 · 03-08-2010, 03:23 AM

Quote:

Originally Posted by bret381

well I guess it's a start anyway. I didn't know that was possible

Hi bret381,

Many thanks for your replying.

Thank you very much.

vincent90152900 · 03-08-2010, 03:24 AM

Quote:

Originally Posted by Smartpatrol

Microsoft Sharepoint Server will do exactly what you are looking for.

Hi Smartpatrol,

Many thanks for your suggestion.

However, I am looking for a open source solution.

vincent90152900 · 03-08-2010, 03:26 AM

Quote:

Originally Posted by mark_alfred

O3Spaces is a Linux equivalent to Sharepoint, and I'm guessing it would be less expensive.

Hi Mark,

Many thanks for your suggestion.

I will take it a look.

Thank you again.