How to "Search & Replace" in html files using Perl?
Hi,
I am a Perl newbie, so please be gentle with me. :p I wish to insert <base href=http://www.ab.com> into a html file (say index.html) using a Perl script or Unix command. How do I do that? Any advice much appreciated. |
In Perl, you can open an existing file for reading, or appending; but to the best of my knowledge not for modifying. You'll have to make a new file, write what you want at the beginning, and then add the stuff from the old file. Finally you can rename the files. The nice part of doing it this way is you still have a copy of the original if you break things.
The basic steps are as follows (you'll have to modify the examples for your exact case): Open a new file for writing open (OUT, "> $outfile") || die "Cannot create file $outfile: $!"; where $outfile is a variable containing the filename, complete with path Open the existing file you wish to modify open (FILE, "$file") || die "Cannot open file $file: $!"; where $file is a variable containg the file name of the file you wish to read from Write the line you want to the new file print OUT "some text goes here\n"; Loop through the lines of the old file writing them to the new one while ($line = <FILE>) { print OUT "$line\n"; } Close the file handles close (FILE) || die "Can't close file $file\n"; close (OUT) || die "Can't close file $outfile\n" Rename the files rename ($file, "$file".".bak"); (syntax may be wrong here) rename ($outfile, $file); This may not be exactly what you need, and I haven't written a Perl script in a while, so there may be errors; but it will get you started. Also, there are examples of this on the web if you search for filehandles in Perl. Finally, a good perl book is essential if you plan on writting scripts. |
I should read more carfully. If you want to use UNIX commands; "cat" will work for this. The man page is fairly straightforward. This will make for a much simpler script than using Perl.
|
Quote:
I toyed with appending and then substitution in a Perl script but to no avail. With appending, it always adds to the last line in the file. <base href=http://www.abc.com> in the last line is pretty useless. Your comment that "appending can't modify" finally enlightened me that this is not the way to go. With substitution, I don't know why but it just didn't work! For example, I want to replace <head> with <head><base href=http://www.abc.com> (which in effect adds the base href tag after the head tag) s/<head>/<head><base href=http:\/\/www.abc.com>/g; Nothing was changed at all. |
Quote:
> cat index.html | sed -e 's/<head>/<head><base href=http:\/\/www.miningnews.net>/' |
What finally did work is this Unix command:
> perl -pi -e 's/<head>/<head><base href=http:\/\/www.abc.com>/' index.html Thanks again, smannell! :) |
I see the thread is finished, but here is the easiest (imo) way ;)
Code:
open (IN,$file) or die "$!"; #<- open the file |
Hi ivantora,
Thanks for your response. Do we leave IN and OUT in the perl script, or replace them with the filenames? How do we run the perl script if we use IN and OUT? |
IN and OUT are just filehandles. The thing you need is to set $file before the other lines (forgot to do that ;) ). Like
$file = "/home/ivanatora/bleh/index.html"; |
All times are GMT -5. The time now is 12:01 AM. |