Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
By jeremy at 2006-06-29 13:52
Tidy Up Those Tags
Features - Power Tools
Written by Jeremy Garcia
Web sites are rarely maintained by the same person over a long period of time. Much more often, web pages pass from developer to developer. And since the skill (and style) of developers can differ greatly, a site’s HTML can grow to be inconsistent, even ugly, sloppy, and non-compliant with standards.
If you inherit such dubious HTML, you may long for a tool to help clean up and validate the code. Luckily, such a tool exists: it’s called HTML Tidy and it’s available from http://tidy.sourceforge.net/. HTML Tidy can automatically fix a wide range of coding errors and can also tidy up sloppy editing into nice-looking markup (often called pretty printing). HTML Tidy can even make the (often extremely ugly) output from specialized HTML editors such as Frontpage readable by a human.
HTML Tidy was originally written by Dave Raggett and is licensed under the W3C license.
After downloading and unpacking the source tarball, run…
$ /bin/sh build/gnuauto/setup.sh
from the top source directory. You can now use the normal ./configure&&make&&make install process to complete the installation, leaving you with the utility tidy.
In addition to fixing a wide range of coding problems, tidy can also highlight things that you need to work on manually. tidy lists each item with the line number and column, so you can easily see where the problem lies in your markup. To be safe, tidy won’t generate a cleaned up version when there are problems that it’s unsure how to handle. Instead, it logs these as errors rather than warnings.
Some things that tidy can automatically fix include missing or mismatched end tags, end tags in the wrong order, and missing slashes (/) in anchor tags. tidy understands and can help correct both HTML and XHTML. It has limited support for XML and doesn’t recognize CDATA sections or DTD subsets, among other features. It’s aware of and can cope with ASP, JSTE, and PHP, but does not understand the scripting languages themselves. Because of this it can get easily confused and may report missing attributes when they appear within such code. Nested quotation marks on single lines within a scripting language can also throw off tidy.
Now that you understand a little about what tidy does, let’s start to use the program. stdin is the default input and stdout is the default output. By default, errors are sent to stderr, but you can use the –f filename.out option to redirect the errors to a file.
The two main modes you can run tidy in are –m, which modifies the original input files, and –o file, which writes output to file file. So, the command:
takes the file linuxmag.html as input and writes the tidied output to linuxmag.out and errors to linuxmag.err.
If you don’t use the –f option to redirect errors to a file, you’ll be unable to pipe the output of tidy to a utility like less to get pagination. You can avoid this problem by redirecting stderr to stdout with:
$ tidy –o linuxmag.out linuxmag.html 2>&1 | less
To make the usage of multiple advanced options easier, tidy has a –config option to specify a configuration file. Refer to the “Quick Reference” on the HTML Tidy web site for a full list of options. In addition, tidy supports many other command line options such as –clean, which replaces FONT, NOBR, and CENTER tags with CSS, and –indent, which indents element content. For a full list of command line options type tidy –?.
The tidy source also comes with TidyLib, which as the name suggests is a thread safe library version of tidy. In addition to TidyLib, bindings for C++,Perl, COM,.NET, and PHP are also available. As of PHP 5, tidy is a standard extension that can be enabled with the ––with-tidy compile option.
Using tidy allows you to quickly and easily cleanup HTML code and validate it to the latest standards. While it can’t fix everything, it will alert you to what it cannot fix, so you can handle those cases by hand. The rest is automatic. It can now even handle the often unruly Word 2000 HTML output and even allows you to correct and pretty print on the fly in the latest version of PHP.