Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Distribution: Ubuntu Linux 16.04, Debian 10, LineageOS 14.1
Posts: 1,572
Rep:
xsane is for scanners, and once you have created a pnm graphic file, you can convert it to either djvu or pdf via gscan2pdf. In fact, you may not even need xsane, and be able to use the scanner with just gscan2pdf (though I've always used xsane to first create the graphic of the hard-copy, and then used gscan2pdf to convert the graphic to a document.)
Last edited by mark_alfred; 03-07-2010 at 09:08 PM.
I want to find a solution for scan all the documents and stored it in the server.
Then, people can read the documents and update information about the documents via browser.
I don't think you will be able to do both.... You can scan the documents into the server, but then they will be image files and can not be edited via a browser.
However, you could use a wiki on a webserver to display the documents, after they were typed in by hand... ugh and then allow the files to be edited. So it's more of a question of which you want more. Easier availability to VIEW the documents, or to be able to edit the documents.
Distribution: Ubuntu Linux 16.04, Debian 10, LineageOS 14.1
Posts: 1,572
Rep:
Quote:
Originally Posted by bret381
well I guess it's a start anyway. I didn't know that was possible
Yes, it's called "optical character recognition". It'll only work if what is being scanned has been typewritten -- it won't work with handwritten documents.
It's not the best. Just now I tested it with a document that had the following:
"So, in future, if the News is publishing a piece that clearly is opinionated, then kindly label it as an editorial, rather than mislabelling it as balanced news coverage."
What I got was:
"So, in fure, if the News is publishing a piece that clearly is opinionated, then ndly label it as editorial, rather than mislabelling it as balanced news coverage."
So, several errors in one sentence, and the rest of the document likewise had errors. Still, as you say, it may be a start. It may, however, be faster to simply hire a bunch of typists to retype the hard copies into new documents on the computer, rather than scanning them into text and subsequently reviewing and correcting them.
Solomon Islands Government has many thousands documents in hard copy for many years.
I want to find a solution for scan all the documents and stored it in the server.
Then, people can read the documents and update information about the documents via browser.
A common situation and wish
Before you do too much work on the technology, it might be worth a quick calculation of how many person-hours it will take. Try scanning in one document and manually creating the keywords that would be necessary to search for it.
Another aspect to consider is the data volume and implications for backup. Without OCR (and the best OCR is expensive, the free OCR unsatisfactory as posted above) each page will be a graphic. How many TB will the "many thousands documents" be?
xsane is for scanners, and once you have created a pnm graphic file, you can convert it to either djvu or pdf via gscan2pdf. In fact, you may not even need xsane, and be able to use the scanner with just gscan2pdf (though I've always used xsane to first create the graphic of the hard-copy, and then used gscan2pdf to convert the graphic to a document.)
I don't think you will be able to do both.... You can scan the documents into the server, but then they will be image files and can not be edited via a browser.
However, you could use a wiki on a webserver to display the documents, after they were typed in by hand... ugh and then allow the files to be edited. So it's more of a question of which you want more. Easier availability to VIEW the documents, or to be able to edit the documents.
Hi bret381,
Many thanks for your replying.
I do need to edit the image file.
I just want to provide some basic information for the documents for people to search it latter.
So I need a solution to find and view the documents fast.
Currently, it is impossible to find a document, because they only have hard copies and too many of them.
I need a solution for people to store image file of the document and provide basic information about the document.
Then people can search it and update the information about the document.
I need membership feature too, because some documents only for some members to view or update information about the documents.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.