LinuxQuestions.org - [SOLVED] bash script to dynamically edit an html file

- Programming (https://www.linuxquestions.org/questions/programming-9/)

- - bash script to dynamically edit an html file (https://www.linuxquestions.org/questions/programming-9/bash-script-to-dynamically-edit-an-html-file-802004/)

Hmmm...

So I tried both of these awk one-liners and they don't quite seem to work, though the first one seems to work better. This is the command I tried:

Code:

awk 'BEGIN{FS="[\\||>*<*]"}ARGV[1] == FILENAME{_[$1]=$2}ARGV[2] == FILENAME{if($5 in _){match($2,/[0-9]+/,pc);gsub(pc[0],pc[0]/2)}print $0"\n"gensub($5,_[$5],2)}' ipsandhostlist.txt sedtest.html

This script turned this:

Code:

  <tr>



        <td class=default width="30%"><a href="#172_27_1_107">172.27.1.107</a></td>



        <td class=default width="40%">Security warning(s) found</td></tr>



  <tr>

into this:

Code:

0 0 0 0<0t0r0>0

0 0 0 0<0t0r0>0

        <td class=default width="15%"><a href="#172_27_1_58">172.27.1.58</a></td>

        <td class=default width="15%"><a href="#172_27_1_58">usbedtstee01.hologic.corp</a></td>

        <td class=default width="20%">Security warning(s) found</td></tr>

        <td class=default width="20%">Security warning(s) found</td></tr>

0 0 0 0<0t0r0>0

0 0 0 0<0t0r0>0

So it did match the hostname to the ip (which is awesome!), but it also seemed to duplicate every other line, and input zeroes for every second character on most other lines. So how is this script finding the ip? Is it using the regex in:

Code:

{match($2,/[0-9]+/,pc);gsub(pc[0],pc[0]/2)}

or is it pulling it from a specific field?

Oh, and for what it's worth, the second script does the same as the first, except it doesn't match the hostname to the ip. It duplicates every line and inputs the extra zeroes.

Thanks for the help so far!

Using GNU sed and bash

Code:

while read ip hostname;do

  line='<td class=default width="60%"><a href="#'${ip//./_}'">'

  ip=${ip//./\\.}        # Escape the dots: 172\.27\.1\.107

  sed -i "/$line/{ s/60%/30%/; h; s/$ip/$hostname/; x; G }" nessus.html

done < hostnames.txt

The above method runs sed once for each ip/hostname. If there are a large number of ip/hostnames it would be more efficient to create a script of sed commands so that all the editing of the file can be done with just a single run of sed.

Code:

> sedscript        # Start with an empty sedscript file

while read ip hostname;do

  line='<td class=default width="60%"><a href="#'${ip//./_}'">'

  ip=${ip//./\\.}        # Escape the dots: 172\.27\.1\.107

  echo "/$line/{ s/60%/30%/; h; s/$ip/$hostname/; x; G }" >> sedscript

done < hostnames.txt



sed -f sedscript nessus.html > newnessus.html

Hi melee

I will try and break it down for you, oh and sorry about the duplication of the other lines <my bad> will fix that too :)

BEGIN{FS="[\\||>*<*]"} - Set the delimeters to be used, in this case \\| is the pipe for ipsandhost and >*<* is for sedtest

ARGV[1] == FILENAME{_[$1]=$2} - while in file ipsandhost create an array where index is ip and value is host name

ARGV[2] == FILENAME{ - while in second file sedtest

if($5 in _) - if after splitting line the fifth field (ip address) is one of the indexes in array _

{match($2,/[0-9]+/,pc); - match is a function which looks for the regex /[0-9]+/ (this is one or more numbers) in string represented by field 2 and store in array pc

gsub(pc[0],pc[0]/2) - for the whole line (represented by $0) change all occurrences equal equal to value stored in pc[0] (which was 60 in example) with pc[0]/2 (ie 30);

print $0"\n"gensub($5,_[$5],2)} - print the original line ($0) plus the newline character ("\n") on newline print $0 but replace the fifth filed string (represents the ip address) with the value stored in array equivalent to that address (_[$5]) but only replace second occurrence (2). reason for last part is because the searched for string has dots in it (ie the ones separating the ip address) it also is matching any character between numbers, hence if you make it global it will also replace 172_27_1_58 as numbers are the same and dot "." is matching the underscore "_"

// extra stuff I should have put in. you will notice that the print above is now in the 'if'
else print} - this will now print the line as is if it does not require a change

So new line looks like:

Code:

awk 'BEGIN{FS="[\\||>*<*]"}ARGV[1] == FILENAME{_[$1]=$2}ARGV[2] == FILENAME{if($5 in _){match($2,/[0-9]+/,pc);gsub(pc[0],pc[0]/2);print $0"\n"gensub($5,_[$5],2)}else print}' ipsandhostlist.txt sedtest.html

Let me know how we go?

Kenhelm,

This is exactly what I was looking for...

It's a testament to good coding that my script (which didn't work) was about 5 times longer than the one you posted. :)

All, thanks for your help!

Quote:

echo "/$line/{ s/60%/30%/; h; s/$ip/$hostname/; x; G }" >> sedscript

I realise you said it wasn't so important, but just in case not all lines have 60 in them this will not
always work as intended.

Glad you have a solution.

Hey Kenhelm (or anyone):

what's this part do?

Code:

ip=${ip//./\\.}

more specifically this part

Code:

{ip//./

I understand the rest of this is the escaping of the dots, but I'm a little confused about the rest...

Thanks

${ip//./\\.} is bash parameter expansion (sometimes called 'parameter substitution' or 'variable substitution').
It's similar to the sed s/ / / command but it uses filename globbing patterns, not regular expressions.

Code:

var="some dogs are doggedly dogmatic"

echo ${var/dog/cat}        # replace first

some cats are doggedly dogmatic



echo ${var//dog/cat}        # replace all

some cats are catgedly catmatic



echo ${var/*d??/cat}        # globbing pattern

catmatic

Have you checked the revised awk as well, as I believe it works for all scenarios.

Quote:

Originally Posted by Kenhelm (Post 3938083)

Code:

var="some dogs are doggedly dogmatic"

echo ${var/dog/cat}        # replace first

some cats are doggedly dogmatic



echo ${var//dog/cat}        # replace all

some cats are catgedly catmatic



echo ${var/*d??/cat}        # globbing pattern

catmatic

I didn't even know this existed... Thanks for the education!

Quote:

Originally Posted by grail (Post 3938251)

Have you checked the revised awk as well, as I believe it works for all scenarios.

grail,

I haven't tried this yet as the script that Kenhelm provided worked perfectly. I'll give it a try though to see if it works and report back here.