LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Red Hat (https://www.linuxquestions.org/questions/red-hat-31/)
-   -   How to "Search & Replace" in html files using Perl? (https://www.linuxquestions.org/questions/red-hat-31/how-to-search-and-replace-in-html-files-using-perl-308646/)

rebel 04-01-2005 11:14 AM

How to "Search & Replace" in html files using Perl?
 
Hi,

I am a Perl newbie, so please be gentle with me. :p


I wish to insert <base href=http://www.ab.com> into a html file (say index.html) using a Perl script or Unix command. How do I do that?

Any advice much appreciated.

smannell 04-01-2005 02:04 PM

In Perl, you can open an existing file for reading, or appending; but to the best of my knowledge not for modifying. You'll have to make a new file, write what you want at the beginning, and then add the stuff from the old file. Finally you can rename the files. The nice part of doing it this way is you still have a copy of the original if you break things.

The basic steps are as follows (you'll have to modify the examples for your exact case):

Open a new file for writing
open (OUT, "> $outfile") || die "Cannot create file $outfile: $!";
where $outfile is a variable containing the filename, complete with path

Open the existing file you wish to modify
open (FILE, "$file") || die "Cannot open file $file: $!";
where $file is a variable containg the file name of the file you wish to read from

Write the line you want to the new file
print OUT "some text goes here\n";

Loop through the lines of the old file writing them to the new one
while ($line = <FILE>) {
print OUT "$line\n";
}

Close the file handles
close (FILE) || die "Can't close file $file\n";
close (OUT) || die "Can't close file $outfile\n"

Rename the files
rename ($file, "$file".".bak"); (syntax may be wrong here)
rename ($outfile, $file);

This may not be exactly what you need, and I haven't written a Perl script in a while, so there may be errors; but it will get you started. Also, there are examples of this on the web if you search for filehandles in Perl. Finally, a good perl book is essential if you plan on writting scripts.

smannell 04-01-2005 02:07 PM

I should read more carfully. If you want to use UNIX commands; "cat" will work for this. The man page is fairly straightforward. This will make for a much simpler script than using Perl.

rebel 04-02-2005 08:18 AM

Quote:

Originally posted by smannell
In Perl, you can open an existing file for reading, or appending; but to the best of my knowledge not for modifying.
Thanks very much for your response smannell. Very much appreciated. :)

I toyed with appending and then substitution in a Perl script but to no avail.


With appending, it always adds to the last line in the file. <base href=http://www.abc.com> in the last line is pretty useless.

Your comment that "appending can't modify" finally enlightened me that this is not the way to go.



With substitution, I don't know why but it just didn't work!

For example, I want to replace <head> with <head><base href=http://www.abc.com>
(which in effect adds the base href tag after the head tag)

s/<head>/<head><base href=http:\/\/www.abc.com>/g;

Nothing was changed at all.

rebel 04-02-2005 08:25 AM

Quote:

Originally posted by smannell
I should read more carfully. If you want to use UNIX commands; "cat" will work for this. The man page is fairly straightforward. This will make for a much simpler script than using Perl.
For some reason, the "cat and sed" Unix command didn't work.


> cat index.html | sed -e 's/<head>/<head><base href=http:\/\/www.miningnews.net>/'

rebel 04-02-2005 08:26 AM

What finally did work is this Unix command:


> perl -pi -e 's/<head>/<head><base href=http:\/\/www.abc.com>/' index.html



Thanks again, smannell! :)

ivanatora 04-06-2005 05:17 PM

I see the thread is finished, but here is the easiest (imo) way ;)
Code:

open (IN,$file) or die "$!";      #<- open the file
@in = <IN>;                          #<- read it into an array
close(IN);                                #<- we don't need it anymore, so close it
foreach $line (@in){                #for each element of the array
$line =~ s|<head>|<head><base href=http://www.abv.com/|;  #<- do the substitution, notice that you can use other letters except '/' for the s/// delimiters, in that case you don't need to escape the /'s later ;)
}     
open (OUT,"> $file") or die "$!";  #<- open the file for writing
print OUT @in;                                #<- dump the modified content into it
close(OUT);                                      #<- and close it

It's no difficult :) And you don't have to mix bash commands, and perl functions and other stuff. Hope I helped :)

rebel 04-07-2005 08:11 PM

Hi ivantora,

Thanks for your response.

Do we leave IN and OUT in the perl script, or replace them with the filenames? How do we run the perl script if we use IN and OUT?

ivanatora 04-09-2005 12:58 PM

IN and OUT are just filehandles. The thing you need is to set $file before the other lines (forgot to do that ;) ). Like
$file = "/home/ivanatora/bleh/index.html";


All times are GMT -5. The time now is 12:01 AM.