LinuxQuestions.org
Go Job Hunting at the LQ Job Marketplace
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 08-05-2004, 10:16 PM   #1
cpanelskindepot
Member
 
Registered: Jun 2004
Posts: 43

Rep: Reputation: 15
Remove white spaces in HTML files using shell : Possible????????


Hi guys.

I wanted to compress my HTML files automatically using a shell script. I wonder if this is possible.

Say I have:
<HTML>
<HEAD>
<TITLE>Example Web Page</TITLE>
</HEAD>
<body>
<p>You have reached this web page by typing example.com,
example.net,
or example.org into your web browser.</p>
<p>These domain names are reserved for use in documentation and are not available
for registration. See <a href="http://www.rfc-editor.org/rfc/rfc2606.txt">RFC
2606</a>, Section 3.</p>
</BODY>
</HTML>

I want to run a shell script so that it becomes:

<HTML><HEAD><TITLE>Example Web Page</TITLE></HEAD> <body><p>You have reached this web page by typing example.com,
example.net,
or example.org into your web browser.</p><p>These domain names are reserved for use in documentation and are not available
for registration. See <a href="http://www.rfc-editor.org/rfc/rfc2606.txt">RFC
2606</a>, Section 3.</p></BODY></HTML>

I wanted all the HTML squeezed together.
The only way I can think of it to detect the white spaces between > and < then get rid of them. I will be careful not to delete whitespaces between texts. i.e. Youhavereachedthiswebpage

Is this possible?
There must be some way to do it I think.
 
Old 08-05-2004, 10:51 PM   #2
arvind_sv
Member
 
Registered: Oct 2002
Location: Bangalore
Distribution: Gentoo Linux
Posts: 96

Rep: Reputation: 15
Suppose you have the html file, test.html, which you want to, er, compress, how about this?

Code:
tr -s '\n' @ <test.html | sed 's/>@</></g' | tr '@' '\n'
There are many other ways to do this. But, are you ready to sacrifice readability for a few bytes? I mean, there's no compression being done. You're saving nearly nothing. If it's a 1000 line HTML file, you'd be saving a maximum of about 1KB. Is it worth it?

Anyway, to be safer, so that an @ in the html file itself does not cause any problems, try this:

Code:
tr -s '\n' '\200' <test.html | sed 's/>\o200</></g' | tr '\200' '\n'
Arvind
 
Old 08-05-2004, 11:14 PM   #3
arvind_sv
Member
 
Registered: Oct 2002
Location: Bangalore
Distribution: Gentoo Linux
Posts: 96

Rep: Reputation: 15
Here's an awk version of the same,

Code:
awk 'BEGIN {RS = ">\n+<"} {if (/.*HTML>/ != 0) print; else printf "%s><", $0;}' test.html
A simpler version:

Code:
awk 'BEGIN {RS = ">\n+<"} {printf "%s><", $0;}' test.html
This has a small flaw in that it prints an extra "><" at the end of the file.

Arvind
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
shell script to remove old files based on date WindozBytes Linux - General 12 06-04-2012 01:21 AM
Shell script to remove backup ~ files hallamigo Linux - General 3 09-13-2010 03:47 PM
Shell Scripting and Files With Spaces Matir Programming 12 08-17-2005 01:43 AM
problems in removing white spaces from string of text monil Programming 7 03-08-2005 11:28 AM
how to get rid of the input beginning white spaces feetyouwell Programming 3 09-30-2004 01:33 AM


All times are GMT -5. The time now is 04:56 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration