Linux - SoftwareThis forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Ok, so the pdf that you output with OO was 42k. And that proves what exactly?
Perhaps it proves that the pdf module in OO is inefficient?
Did you try creating a pdf with any other program?
Perhaps it proves that pdf produces larger files when the document is ONE PAGE LONG?
Did you try creating a pdf that was 400 pages long?
In front of me right now I have a PDF book of 107 pages. Guess how big it's file size is... 1 Meg... 5 Megs... 10 Megs?
No, 265K. And here I have another pdf book, 454 pages/Size 773K. Instead of being the claimed 42K per page, they end up being a lot closer to 2k. (Which is right on par with your prototype format.)
So what's My point?
I don't think there's anything wrong with using latex, but if your idea is smaller size, I think you've completely misjudged pdf.
Question: Is there a program that renders a .tex without making a pdf or ps out of it. I can't seem to find one, although I could have sworn there was.
Comment: I realize that I due bear some undue prejudice against the pdf format because my primary experience of it while learning computers was using adobe acrobat viewer with IE over a dialup connection. However, the idea of my format is not pdf-hating based. I think pdf is a wonderfull format that balances flexibility with size. I'm trying to fill a different niche. Instead of balancing flexibility and size, I want to sacrifice flexibility for size. I want to try to create slightly different versions optimized for each alphabet, etc. I'm very sorry if this came off as a pdf slam. All I mean is that if PDF can be such a wonderfully flexible format, then it should be possible to make it a bit more compact by sacrificing some of that flexibility.
Note: I am not sure, but I believe that pdf employs compression, while my prototype does not. However, I do intend to add that functionality if I can find an easy way to render it.
Not a bad idea, but then I'd just use html directly.
You might not care for my opinion, but I think that's the best idea you've had yet. Use html directly. It is small, and when zipped, it's incredibly tiny. Plus, _everyone_ has a html browser already installed. No extra software.
From what I can tell, Tex is basically a meta-document format. It is not meant to be used for creating documents directly. Instead, it was made for translating into other kinds of documents: txt, html, pdf, ps, odt, etc.
You might not care for my opinion, but I think that's the best idea you've had yet.
I'm very sorry if the way I've been writing makes me seem angry with you. I'm not, I respect your opinion. In many ways you are right - pdf is a compact format. In a lot of ways it is questionable what gains will be made by essentially optimizing pdf for a specific subset of its features. I do care what you think, you're an active member of this board and if memory serves me right you've helped me out before. I don't mean to be insulting.
Regarding html, my only concern are certain features not supported. I can place images behind text only with a table/css/image hack and there is little way to specify a document's page height/width (although width is more important than height). However, I will research these more. Ultimately - I don't want to create a file format that will be tiny but have to expand to a much larger format in memory in order to be readable. That's exactly why myformat -> .tex -> ps is not a good idea.
I understand your concern with the html and images. It can be a bit tricky depending on your goal. Especially things like "over lapping images".
Can you explain to us what the purpose of this is?
You wish to write a book or some documentation for the OLPC project?
The goal is primarily a learning experience for myself - with the results to be submitted to OLPC if this learning experience yeilds a usable result. The idea is to write a file format similar to pdf or ps, but with a smaller feature set and smaller final binary file size. Basically, to create a more compact pdf, so that OLPC users can fit more e-books onto their OLPC CF cards or spend less space on the same books and have more room for other things.
My current problem is that I don't have the programming skills to write a rendering engine for an entirely new binary format. I'm hoping to find a good intermediate format that I can test and develop my new format with. IE create a document in my intermediate format and convert it. To view the file, temporarily convert back to the intermediate format. Ideally the intermediate format is only temporary, until I can write the rendering engine - otherwise one would just store the document in the intermediate format.
No offense, but I don't think you quite get "it". People write books in pdf and html. That is all there is to it. Your plan can only succeed if after you finish it, you do one of two things:
1. Convert every pdf and html book that already exists to your format.
2. Run a huge campaign to convince authors that your format is somehow technically better than the other options.
To be perfectly honest, I don't see your plan working. If I wanted a "really small" document, I'd use a program like 'pdftotext'. Then I'd have the smallest format possible, plain text. Lastly, I could gzip it, and view it with 'zless'. (I've done this before)
1. Ideally, I'd be writing a pdf to newformat converter. OLPC notebooks are headed to areas that for the most part don't have a internet infrastructure. These laptops are being designed so that they can be human powered because they're intended to be sent to many areas that don't have available electricity. I don't think we're going to have too much trouble with kids surfing around online and being unable to open the pdf's they stumble across. Most e-books will have to be physically sent on physical media (likely an SD card) and it wouldn't be that hard for the person with a full-featured first-world computer to toss it through the converter.
2. I don't need to convert every document, because I'm not expecting OLPC to remove pdf and html viewing software from their computers. I want to provide one additional option, not completely replace a well-established standard.
3. I don't need this to "succeed". I've said before I'm doing this as a learning experience. *Then* if it yeilds a usable result I'll submit it to OLPC. Yes - I am reinventing the wheel, but primarily as a tool to help me understand how the wheel was built. If I can make a wheel that's a little better for driving toy cars on instead of using wheels designed for bicycles - then so be it. If not - then OK. I'm not writing this with any delusion that it will ever become a standard.
4. If I got this format to work exactly as I want it, it would be smaller than plain text. By pushing a pdf through pdf2text, any images will obviously be stripped out, and only plain text portions will remain. One of the ways I seek to reclaim size is by optimizing the character set for the language. For English, instead of 8-bit ASCII, I'm using a 7-bit character set and I just stripped out non-printing characters. After allowing for all letters (capital and lowercase), numbers and symbols present on a standard US QWERTY keyboard (plus a couple) I still have some unused characters. These are used as tags in an XML-esque style way. One-character tags. If we have two comparable documents (plain text only) the newformat one will be smaller since each character is 7 bits not 8. The ASCII will have the advantage for very small files because it doesn't waste the 86 bits on a header, but if the file runs more than 86 characters the newformat has an advantage. Admittedly, I don't know how this affects gzipping.
To anyone following the post, I think I've settled on html as the intermediate format. I'll have to be happy with the hack for placing an image behind text and I've decided on using a table with width fixed by a pixel value to separate pages.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.