[SOLVED] Best file format for scanned documents/pictures?
Linux - SoftwareThis forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Quite some time ago the idea of digitize my family's most relevant documents and photographs started going through my head. Hands on deck I scanned some of them and I got to the point where I started questioning my choice for the file format.
I've been saving those scans as TIFF... and sure, it's probably not a bad idea at all, but I really don't know if I ever finish scanning loads and loads of documents it will be a good idea them to size 5-20MiB each. And I'm not sure if enabling JPEG compression on top of TIFF it's the wisest idea neither... at the end of the day it would be just JPEG on top of other file format.
Doing a quick research I came to the DjVu file format. It looks like good format to work on: it's lighter than TIFF and since I would always visualize the files from a GNU/Linux device compatibility shouldn't be a problem.
However I have no great experience in file formats. I'm sure there a dozen out there that could be good candidates.
Which you consider is the best file format for preserving old documents? Please feel free to suggest any other.
We're talking about both manuscripts and pictures. So keeping them legible and reminiscent must be a priority (I do not discard using different file formats for each aim if you suggest it), keeping them lightweight would be a plus.
For archives, I recommend a lossless format, and the highest resolution the scanner can handle. That means big files. Disk capacity is huge nowadays, so you could fill a million pictures on a big hard disk. And I bet you don't have that many.
Of course, that means they're "heavy". But you typically don't use them as they are. If you're making a webpage, down-sample and convert to JPEG. It's easy to make smaller files with compression and down-sampling. But if you have a low quality, you're simply stuck.
I've done that mistake before in the past - scanned to JPEG and down-sampled to what I needed then, made MP3 of music, and AVI of movies. Now the originals are lost/destroyed, and I regret it.
You don't always know what you will use them for later. For example, if you want to make a poster of a picture, you need high resolution and the best quality possible. And who knows what the future will bring. But I guess a 100 years from now, they will still want the highest quality possible, and a few megabytes/gigabytes wont matter.
For archives, I recommend a lossless format, and the highest resolution the scanner can handle. That means big files. Disk capacity is huge nowadays, so you could fill a million pictures on a big hard disk. And I bet you don't have that many.
Of course, that means they're "heavy". But you typically don't use them as they are. If you're making a webpage, down-sample and convert to JPEG. It's easy to make smaller files with compression and down-sampling. But if you have a low quality, you're simply stuck.
I've done that mistake before in the past - scanned to JPEG and down-sampled to what I needed then, made MP3 of music, and AVI of movies. Now the originals are lost/destroyed, and I regret it.
You don't always know what you will use them for later. For example, if you want to make a poster of a picture, you need high resolution and the best quality possible. And who knows what the future will bring. But I guess a 100 years from now, they will still want the highest quality possible, and a few megabytes/gigabytes wont matter.
Thanks, Guttorm, I'm glad you shared your experience. I should have considered that scenario before.
Well, after all storage price have significantly decreased since the lasts decades. It would be a shame to have only a curtailed version of such important information available in the future just for saving some bucks.
I'll make a fair investment to make sure next generations can enjoy those documents in their greatest form.
I guess for me the case is closed. But I'm letting the thread open just a little bit more in case someone wants to add anything else.
For me, scanning old slides was for memory jogging, not advancing the academic knowledge level of the species. Many of the slides were also faulty - scratches, mould, ...
So for me, jpeg was fine. The bigger issue for me was the amount of time needed to harvest those worth keeping. Before digital, I used to take a heap of photos just to make sure I got one worth looking at. No deleting in the camera with film. An awful lot never made it to the scanner.
My document scanning needs are almost non-existant, so no thoughts there.
DjVu may be an endangered species. Its support is largely confined to Linux and the Internet Archive stopped accepting it four years ago. It was more popular 20 years ago when it had an advantage of smaller file size than pdf and pdf was still patented.
Even lossless file formats can be compressed into archives. As regards space, you gain as much as – or more than with JPEG compression alone. We could turn this now into a discussion of compressors, but that is not necessary, nowadays.
For me, the original file format depends on the character of the data and the potential uses. Usually I scan Tiffs, too, especially text-documents and forms, as they are quickly transformed into multi-page PDF-files (via multi-layer tiffs) on the command-line.
Images are a different topic. I still experiment a lot with the JPEG2000 and would replace JPG & TIFF immediately with JPEG2000 if I were sure to find decent viewers in the future. There are some which come with the conversion tools. But nobody talks about it and I am ... “chicken-hearted” (??? really. That is an adjective?)
Last edited by Michael Uplawski; 04-28-2021 at 11:58 AM.
Reason: language. An ongoing process...
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.