Share your knowledge at the LQ Wiki.
Go Back > Blogs > penguiniator
User Name


Rating: 4 votes, 3.00 average.

Plain Text, Archiving, and Presentation Fidelity

Posted 04-01-2010 at 07:31 PM by penguiniator

My introduction to computers was as a hobby. My first computer had Microsoft Works for MS-DOS version 1.05 installed on it. Among other things, I decided to use the computer to keep a journal. It was perfect. I could do all my writing on the computer and even edit the text without wasting paper.

What was even better, I could store my journal on floppy disks, which are more durable than paper.

Several years later I decided to open and read the journal I wrote on that first computer. I no longer had Microsoft Works, and the word processor I was using by then could not open my journals. I learned my first lesson in proprietary file format lock-in that day.

I failed to consider the long-term consequences of storing documents in proprietary formats, or even to consider formats at all, really. I sacrificed those concerns in favor of the editing and storage efficiency of computer-based vs. paper-based documents.

I immediately searched for a format that could work across applications and operating systems (I was using OS/2 by then). I tried various formats with varying degrees of failure. The only format that worked 100% of the time was plain text. It was also clear to me that this format was likely to continue working well into the future, because it had already been in use from well into the past (in computer time).

It was also the only format that worked with 100% of the programs that offered text editing capabilities. It didn't matter if I used a word processor or a text editor, and it didn't matter if I used a Microsoft operating system or one from some other vendor.

I recognized even then that the file format of Microsoft's office software chained its customers to its platforms and that entrusting data to proprietary file formats put it at risk. I refused to use word processor formats for anything I wanted to preserve long-term.

But this created other problems. If I wanted to print a document, I still had to use a word processor, and if I wanted to store that document long-term, I had to keep it in plain text. I was not aware of typesetting systems like LaTeX at the time. So, I saw only two options. I could keep two versions of my document on the computer. One could be plain text and the other a word processing file format of some kind. Or, I could keep a plain text version on the computer and a paper copy.

The first option had the advantage of allowing me to store the document entirely on electronic media. But this still had one major drawback. The visual representation of the document could not be preserved long-term.

The second option had all the disadvantages of paper documents, with the added drawback of separate storage for the plain text computer file from its printed version. But, its visual representation was far more durable.

The long-term office document storage problem is now being addressed by Open Document Format. In my opinion, it still has not proved itself an equal of plain text in solving that issue, let alone the issue of cross-application fidelity. Until it solves the second issue, long-term preservation of the printed appearance of office documents will remain out of reach.

Portable Document Format helps address this issue, but it fails to address others. Paper manuscripts often have notes in the margins, stricken text and other, additional information attached to them that PDF documents cannot preserve. Word processor files are better at preserving these details than even plain text files, unless additional formatting is used that preserves the plain text-i-ness of such files while enabling meta-information to exist inside of them. For these and other reasons, plain text has gained a reputation as an inferior format to word processing file formats among many users.

But, in recent years the Internet has elevated the status of plain text. The promise of the World Wide Web was that collaborative publishing would be open to all. This vision, held by its original creator, was not realized fully until the invention of wikis and blogs. And these do not rely on the features of word processors, but work with formats that are entirely dependent on plain text. Word processors, with their paper-centric interfaces and output medium, are increasingly becoming obsolete as this new publishing paradigm takes hold.

But, ordinary authors are not necessarily savvy in the use of HTML and other markup systems used on the World Wide Web. For this reason, simplified markup languages were created that remove the requirement to know HTML in order to use wikis and blogs.

The problem was that different systems used different markup, and one had to learn different markup on each website for which there was a different markup system in place. This was an added source of confusion.

To address this issue once and for all, even simpler markup languages and utilities were created that would translate its syntax into HTML and other markup systems. One such utility is txt2tags. Its syntax can be translated into HTML, several wiki formats, and LaTeX, which can be translated into PDF. And it allows embedded comments, which addresses the issue of author notes and other information that is not part of the final document.

Another utility that partly addresses this issue is Markdown. Markdown borrows from conventions used in email messages and adds additional features to format text. It converts its syntax to HTML. This allows users to create valid HTML documents with a syntax familiar to them from reading email.

There are other markup systems, such as reStructured Text, that go farther than Markdown does to produce multiple output formats. They all have advantages and disadvantages. In my opinion, txt2tags has the advantages of offering multiple output formats in addition to HTML and is aimed at a wider audience than Markdown or other systems. By storing a txt2tags document with its LaTeX and PDF versions in a single archive, document text, notes, and the visual representation may be preserved over a long period of time. It can also produce HTML and various wiki markup from the same source document. It may not be a perfect solution, but it goes a long way toward that solution.

These utilities preserve plain text without sacrificing presentation. They let you have your cake and eat it too, instead of forcing you to choose one or the other.

Ironically, plain text, the archaic format looked down upon during the rise of the word processor and its potential to lock customers into a single vendor's product, is the format best suited to unseat the word processor from its dominant position. Word processors are a relic from a pre-networked world dominated by printed documents. They are ill-suited to today's instantly-published, Internet-connected, platform-neutral world where a document is more likely to appear on a blog or a wiki than to be printed.
Posted in Uncategorized
Views 5904 Comments 1
« Prev     Main     Next »
Total Comments 1


  1. Old Comment
    Insightful and very well written. Reading an article like this brings one into a very good mood.

    /* Warning, below code may muck up human parsers not expecting opinion */

    The ultimate Word Processing utility would be a wysiwyM ( ... what you mean, like LyX ) centered around HTML and CSS.

    Due to the deprecation of styles in the HTML, users would learn to use formatting styles ( Title 1,2,3, bullet list, block text, normal etc ) rather than apply styles to sections bit by bit with the array of settings on the toolbar.

    With nearly everyone in the world calling text files 'word documents' and slideshows 'powerpoints', users have lost the sense of word processing concepts and had them replaced with product skills. No-one I know uses styles other than myself to format text, preferring to change it with the toolbars to the required format every time. Upon teaching them how to apply and change styles, multiple 'oohs' and 'aahs' come out as they find they can save styles as templates so that all new documents have them when they open up and they can change the style across their whole document instantly.

    Another benefit to a HTML based system would be that it can be edited on any computer on the planet, with or without extra software. Although in this case it would mainly be edited with a specific program to make HTML authoring like LyX ( but not like a WYSIWYG HTML editor ) and therefore more user friendly for note taking, diaries/journals, school work, reports etc.

    All the content could be put in a tarball ( or zip for cross-platform compatibility ) or the CSS inlined if the document is one page. On that note, multiple pages ( not paper-sized pieces of documents, basically documents themselves ) can be contained together and share resources such as stylesheets and images in one tarball.

    A few months ago I tried to start a computer diary up again. Written in HTML with nano it looked good to the eye on paper, in elinks and in my graphical browser. I didn't use any custom styles, favoring headings ( <h1> etc ) horizontal rules and HTML tables. The only reason I stopped was my holiday ended and I didn't have any time to continue it.

    We need to bring people out of the dark ages of using proprietary formats that look different on every computer to formats that are future proof by international standard and formatted so that the document is not sensitive down the pixel of change.
    Posted 04-02-2010 at 05:49 AM by William (Dthdealer) William (Dthdealer) is offline


All times are GMT -5. The time now is 01:15 AM.

Main Menu
Write for LQ is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration