LinuxQuestions.org
Visit the LQ Articles and Editorials section
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices

Reply
 
Search this Thread
Old 11-11-2008, 03:33 AM   #1
Sundeniro
LQ Newbie
 
Registered: Nov 2008
Posts: 7

Rep: Reputation: 0
Best Distro for document management


Hi

I have a PC that I wish to retire as my windows machine (I have to have one for work) I thought it would be really useful to me to make this PC a dedicated document management PC i.e. for scanning, optically recognising the documents and then turning them int searchable PDFs.

This is something I do now on WinXP I have been messing with a few live-CDs (completely new to linux) to see whihc ones I like the feel of.

I have tried the following so far

OpenSUSE - runs very slow
Ubuntu - Seems quite nice
Kubuntu - Failed to load
PuppyLinux - Too Cluttered and confusing

I will give others a go but to reduce the time I spend trying to get a system working fully I thought it would be a good idea to post my Spec and ask for suggestions on both best distro for my hardware and software to accomplish my document management needs.

Pentium 4 2.8GHZ
Ram 1.5GP
Graphics AGPx4 512MB dedicated memory
Internet connection currently via LAN but may switch to WiFi but that shouldn't be an issue as I will do that post set up and specifically seek compatible hardware.

If you need any other info please let me know

And thanks very much for your help

Scanner is a Plustek PS256 Smartoffice ADF Scanner.

I also use a Network Printer.
 
Old 11-11-2008, 04:26 AM   #2
billymayday
Guru
 
Registered: Mar 2006
Location: Sydney, Australia
Distribution: Fedora, CentOS, OpenSuse, Slack, Gentoo, Debian, Arch, PCBSD
Posts: 6,678

Rep: Reputation: 122Reputation: 122
OpenSuse shouldn't be too slow, and I'd be wary about measuring speed based on a live CD.

I'd be looking at OpenSuse, CentOS, Debian and maybe gentoo and slackware
 
Old 11-11-2008, 07:53 AM   #3
Sundeniro
LQ Newbie
 
Registered: Nov 2008
Posts: 7

Original Poster
Rep: Reputation: 0
why those particular distributions?

I'd really appreciate it if you could expand on your suggestions?

I have been favouring UBUNTU up until now but am happy to admit that I am not knowledgeable on this subject hence my request for advice.
 
Old 11-11-2008, 08:03 AM   #4
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Arch/XFCE
Posts: 17,802

Rep: Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728
Anything in the top ten on the "hit list" at Distrowatch is going to be fine.

The long tent pole may be that scanner--I would do some searches

Try here: http://www.sane-project.org/sane-mfgs.html

Found this by accident:
http://www.gjaeger.de/scanner/plustek.html
 
Old 11-11-2008, 08:34 AM   #5
Sundeniro
LQ Newbie
 
Registered: Nov 2008
Posts: 7

Original Poster
Rep: Reputation: 0
Yeah the scanner does seem to be a sticking point, also finding there is little in the way of decent document management and OCR software available on LINUX which is another major issue for me.

I'm a one man business and was interested in setting up a "mirror" PC to do everything I do for my work on my windows box now to see if there would be a feasible point to switch over, but it is looking very unlikely to happen any time in the next few years I think.
 
Old 11-11-2008, 09:22 AM   #6
farslayer
Guru
 
Registered: Oct 2005
Location: Willoughby, Ohio
Distribution: linuxdebian
Posts: 7,231
Blog Entries: 5

Rep: Reputation: 189Reputation: 189
I would use Debian and install knowledge tree Open Source document management system on it.

Debian can easily be installed in a minimal configuration without a lot of extra junk you don't need on a server, and it runs well on slightly older spec hardware since it lacks a lot of the Bloat found in some distros.
 
Old 11-11-2008, 09:42 AM   #7
Sundeniro
LQ Newbie
 
Registered: Nov 2008
Posts: 7

Original Poster
Rep: Reputation: 0
Thanks for the reply but I do not want to run this PC as a Server but as a desktop with a scanner attached, to do it's own OCR and PDF conversion
 
Old 11-11-2008, 10:29 AM   #8
farslayer
Guru
 
Registered: Oct 2005
Location: Willoughby, Ohio
Distribution: linuxdebian
Posts: 7,231
Blog Entries: 5

Rep: Reputation: 189Reputation: 189
Ah so it's a document scanning station and not a Document Management System.. gotcha..

How to scan and OCR like a pro with open source tools (Linux.com
OCR for Linux: Teaching Linux to Read (Linux Magazine
Tesseract OCR - Highest rated FOSS OCR for Linux

gscan2pdf gscan2pdf - attaches metadata to PDF upon creation to allow indexing for searches.

Quote:
OCR (Optical Character Recognition)

The gocr or tesseract utilities are used to produce text from an image.

There is an OCR output buffer for each page and is embedded both as an annotation (pop-up note) and as plain text behind the scanned image in the PDF produced. This way, Beagle can index (i.e. search) the plain text, and the contents of the annotations can be viewed in Acrobat Reader.

In DjVu files, the OCR output buffer is embedded in the hidden text layer. Thus these can also be indexed by Beagle.

There is an interesting review of OCR software at http://groundstate.ca/ocr. An important conclusion was that 400dpi is necessary for decent results.

Let us know how you put this all together and what software you choose.

Thanks !!

Last edited by farslayer; 11-11-2008 at 10:30 AM.
 
Old 11-12-2008, 12:53 AM   #9
billymayday
Guru
 
Registered: Mar 2006
Location: Sydney, Australia
Distribution: Fedora, CentOS, OpenSuse, Slack, Gentoo, Debian, Arch, PCBSD
Posts: 6,678

Rep: Reputation: 122Reputation: 122
Quote:
Originally Posted by Sundeniro View Post
why those particular distributions?

I'd really appreciate it if you could expand on your suggestions?

I have been favouring UBUNTU up until now but am happy to admit that I am not knowledgeable on this subject hence my request for advice.
Stable, well supported, a wide variety of software available (brother, for example provides drivers in binary form for CentOS, Suse and Debian at least).

It's a very personal choice as to what distro you like.

As others have said, the main issue will be the scanner end, but you probably have the best chance of finding a ready package on a major distribution (and I'm not a big fan of the *buntus).
 
Old 11-12-2008, 02:10 AM   #10
Sundeniro
LQ Newbie
 
Registered: Nov 2008
Posts: 7

Original Poster
Rep: Reputation: 0
Hi all

Thanks for the advice it is very much appreciated.

I think my best course of action is to build a much better windows box as previously planned.

Tune this machine in to a LInux box in the corner of the room still not sure which distro and whether a GNOME or KDE environment is better for my needs.

Keep doing my scanning etc on the windows box while I get a good book on LINUX and learn as much about my distro as possible and take a trial and error approach to getting the machine to do what I want. (Might take months but i like learning new things)

Not sure if I should as this on another thread but, is it true i only need a firewall on a linux machine and no virus software or is this just an old wives tale?

Many thanks for all your kind advice.
 
  


Reply

Tags
ocr, scanner


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Document management system paul_mat Linux - Server 5 06-04-2008 11:10 AM
Document Management Solutions dougp23 Linux - Enterprise 1 10-10-2007 10:11 AM
Document Management Syste talat Linux - Software 1 11-23-2006 05:26 AM
Document Management System saavik Linux - Software 1 10-06-2005 06:16 PM
Electronic Document Management mac_casey Linux - Enterprise 2 05-04-2004 02:13 AM


All times are GMT -5. The time now is 11:09 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration