LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   SED - minor changes work - Larger doesn't (working and non working code included) (https://www.linuxquestions.org/questions/programming-9/sed-minor-changes-work-larger-doesnt-working-and-non-working-code-included-586394/)

Nimoy 09-21-2007 12:56 PM

SED - minor changes work - Larger doesn't (working and non working code included)
 
Hi there I am changing strings to other strings in all files in a directory.

Here is an example of what works

CHANGE 1: - The visible copyright notice.

for file in *.html
do
cp $file $file.bak &&
sed 's/Copyright 1999-2005 - All rights reserved/Copyright 1999-2007 - All rights reserved/g' $file.bak >$file
done


CHANGE 2: - The internal copyright notice

(In change 2 I choose # as a delimiter as to keep sed from being confused.)

for file in *.html
do
cp $file $file.bak &&
sed 's#<meta name="copyright" content="Copyright © 1999-2005 by Fire Flower Cybernetics. All rights reserved.">#<meta name="copyright" content="Copyright © 1999-2007, Fire Flower Cybernetics. All rights reserved.">#g' $file.bak >$file
done

CHANGE 3 - AND THIS IS WHERE THINGS GO WRONG: - The Google ads insertion.

I keep ending up with blank files or files where the ad doesn't show.

If I insert the same text manually it works.... ???? Any ideas

for file in *.html
do
cp $file $file.bak &&
sed 's#</form>Make a difference - Make a donation!<br>

</td>

</tr>

</tbody>

</table>

<br>
#</form>Make a difference - Make a donation!<br>

</td>

</tr>

</tbody>

</table>

<br>
<script type="text/javascript"><!--
google_ad_client = "pub-5045815486985038";
google_ad_width = 728;
google_ad_height = 90;
google_ad_format = "728x90_as";
google_ad_type = "text";
//2007-08-14: globabilityaug2007setup
google_ad_channel = "5631073777";
google_color_border = "000000";
google_color_bg = "FFFFFF";
google_color_link = "0000FF";
google_color_text = "000000";
google_color_url = "008000";
//-->
</script>
<script type="text/javascript"
src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
</script><br><br>#"g' $file.bak >$file
done

Thanks in advance!

jozyba 09-21-2007 01:47 PM

At the end of your third example you've got a stray double-quote after the # delimiter:
Code:

</script><br><br>#"g' $file.bak >$file
..................^


PTrenholme 09-21-2007 02:09 PM

And, just to be pedantic, what's the point of the cp when sed will overwrite the file? A simple mv would be somewhat more efficient.

Nimoy 09-21-2007 04:18 PM

Quote:

Originally Posted by PTrenholme (Post 2899471)
And, just to be pedantic, what's the point of the cp when sed will overwrite the file? A simple mv would be somewhat more efficient.

As I have understood it the cp will copy a backup file to .bak so you don't loose the original when sed is overwriting the original html...

Anyhow - Still trouble in paradise:

Won't work: Getting an error message:

sed: -e expression #1, char 46: unterminated `s' command

I'm going slightly mad

Nimoy 09-21-2007 04:27 PM

File size
 
And filesizes are zero....

Nimoy 09-21-2007 04:29 PM

I also tried
 
to throw everything into a textfile.sh and running it via sh textfile.sh - Only change is that the unterminated message gives a +1 higher number...

jozyba 09-21-2007 05:47 PM

I think you're just using the wrong tool for the job. sed is good for processing files one line at a time; anything more is pushing it beyond what it's designed for. The error message "sed: -e expression #1, char 46: unterminated `s' command" was complaining about the newline character at the end of the first line of your sed script.

I think you need to load the whole file contents into a variable and then search for and replace the substring. Here's a version using bash. It would probably be much faster written in Python, Perl or Ruby:

Code:

#! /bin/bash

substring='</form>Make a difference - Make a donation!<br>

</td>

</tr>

</tbody>

</table>

<br>
'

replacement='</form>Make a difference - Make a donation!<br>

</td>

</tr>

</tbody>

</table>

<br>
<script type="text/javascript"><!--
google_ad_client = "pub-5045815486985038";
google_ad_width = 728;
google_ad_height = 90;
google_ad_format = "728x90_as";
google_ad_type = "text";
//2007-08-14: globabilityaug2007setup
google_ad_channel = "5631073777";
google_color_border = "000000";
google_color_bg = "FFFFFF";
google_color_link = "0000FF";
google_color_text = "000000";
google_color_url = "008000";
//-->
</script>
<script type="text/javascript"
src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
</script><br><br>'

for file in *.html; do
    cp $file $file.bak            # alternatively use 'mv'
    file_contents="$(<$file.bak)"
    echo "${file_contents//$substring/$replacement}" >$file
done


PTrenholme 09-22-2007 09:41 AM

Quote:

Originally Posted by Nimoy (Post 2899559)
As I have understood it the cp will copy a backup file to .bak so you don't loose the original when sed is overwriting the original html...[snip]

Yes, cp will create a copy of the file, leaving the original file. On the other hand mv will rename the original file without the overhead of creating the copy. Since the original file is to be overwritten, by renaming the original you eliminate that overhead.

As to your problem, look into awk.

[edit]
I was looking at your site's source code, thinking I'd see if I could clobber together a simple awk program for you, and noticed that your html does not pass the W3C standards for html 4. (In fact, in the snippets you showed us, we see <br> instead of the expected standard construct: <br />.)

I also noticed that the "home page" included the "google" code, but that it did not seem to be working.
[/edit]

Nimoy 09-22-2007 11:02 AM

Will be testing this in a mo - Keeping you posted
 
Quote:

Originally Posted by jozyba (Post 2899646)
I think you're just using the wrong tool for the job. sed is good for processing files one line at a time; anything more is pushing it beyond what it's designed for. The error message "sed: -e expression #1, char 46: unterminated `s' command" was complaining about the newline character at the end of the first line of your sed script.

I think you need to load the whole file contents into a variable and then search for and replace the substring. Here's a version using bash. It would probably be much faster written in Python, Perl or Ruby:

Code:

#! /bin/bash

substring='</form>Make a difference - Make a donation!<br>

</td>

</tr>

</tbody>

</table>

<br>
'

replacement='</form>Make a difference - Make a donation!<br>

</td>

</tr>

</tbody>

</table>

<br>
<script type="text/javascript"><!--
google_ad_client = "pub-5045815486985038";
google_ad_width = 728;
google_ad_height = 90;
google_ad_format = "728x90_as";
google_ad_type = "text";
//2007-08-14: globabilityaug2007setup
google_ad_channel = "5631073777";
google_color_border = "000000";
google_color_bg = "FFFFFF";
google_color_link = "0000FF";
google_color_text = "000000";
google_color_url = "008000";
//-->
</script>
<script type="text/javascript"
src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
</script><br><br>'

for file in *.html; do
    cp $file $file.bak            # alternatively use 'mv'
    file_contents="$(<$file.bak)"
    echo "${file_contents//$substring/$replacement}" >$file
done


Thanks for the example!

Nimoy 09-22-2007 11:03 AM

True
 
Quote:

Originally Posted by PTrenholme (Post 2900145)
Yes, cp will create a copy of the file, leaving the original file. On the other hand mv will rename the original file without the overhead of creating the copy. Since the original file is to be overwritten, by renaming the original you eliminate that overhead.

As to your problem, look into awk.

[edit]
I was looking at your site's source code, thinking I'd see if I could clobber together a simple awk program for you, and noticed that your html does not pass the W3C standards for html 4. (In fact, in the snippets you showed us, we see <br> instead of the expected standard construct: <br />.)

I also noticed that the "home page" included the "google" code, but that it did not seem to be working.
[/edit]

Regarding the W3C Validation - It passed the validator tool last time I tinkered... Something I'll be looking into, thanks for the heads up.

Regarding the google code... odd, had people using both IE and FF test the bits that I had manually inserted.

As for the overhead - yup true, but the notion of having a backup is nice, right now the need for speed is not essential.

Nimoy 09-22-2007 12:45 PM

50: Syntax error: Bad substitution
 
Is the response I get when running the script...

Line 50 is the following:

echo "${file_contents//$substring/$replacement}" >$file

Any ideas as to what might be wrong ?

jozyba 09-22-2007 02:19 PM

It works perfectly for me using bash 3.1.17. Either you've made some changes to the script above which have introduced a syntax error, or you're not using a bash shell. I see that you're using Ubuntu - you haven't got bash symlinked to something else have you? Try:
Code:

ls -l /bin/sh
ls -l /bin/bash
bash --version


Nimoy 09-22-2007 02:36 PM

Results
 
Quote:

Originally Posted by jozyba (Post 2900376)
It works perfectly for me using bash 3.1.17. Either you've made some changes to the script above which have introduced a syntax error, or you're not using a bash shell. I see that you're using Ubuntu - you haven't got bash symlinked to something else have you? Try:
Code:

ls -l /bin/sh
ls -l /bin/bash
bash --version


ls -l /bin/sh

gave me

lrwxrwxrwx 1 root root 4 2007-08-02 14:27 /bin/sh -> dash

and

ls -l /bin/bash
-rwxr-xr-x 1 root root 700560 2007-04-11 01:32 /bin/bash

and

bash --version

GNU bash, version 3.2.13(1)-release (i486-pc-linux-gnu)
Copyright (C) 2005 Free Software Foundation, Inc.

Regarding the script I did a copy and paste job from LQ. Saved the file in gedit and sh'ed the script.

Nimoy 09-22-2007 02:37 PM

looks like a symlink
 
anything I can do ?

jozyba 09-22-2007 02:52 PM

On Ubuntu 'sh' is symlinked to 'dash', so when you call the script by going 'sh myscript.sh' it will ignore the '#!/bin/bash' at the head of the script and use dash instead. dash cannot cope with the syntax in line 50 of your script.

The solution is either to run it with 'bash myscript.sh' or just to use 'chmod u+x myscript.sh' to make it executable, then call it with './myscript.sh'.

Nimoy 09-22-2007 03:13 PM

Weird....
 
Quote:

Originally Posted by jozyba (Post 2900394)
Bl**dy Ubuntu! :)

On Ubuntu 'sh' is symlinked to 'dash', so when you call the script by going 'sh myscript.sh' it will ignore the '#!/bin/bash' at the head of the script and use dash instead. dash cannot cope with the syntax in line 49 of your script.

The solution is either to run it with 'bash myscript.sh' or just to use 'chmod u+x myscript.sh' to make it executable, then call it with './myscript.sh'.

Tried both solutions - No error messages this time. So I presume the syntax is ok as such.

However no changes inside any of the HTML files - Got a .bak file for every file.

Maybe because the script copies the HTML files and doesn't perform the change in every html file ?

jozyba 09-22-2007 03:27 PM

Quote:

Originally Posted by Nimoy (Post 2900408)
Tried both solutions - No error messages this time. So I presume the syntax is ok as such.

Well, that's progress.

Quote:

However no changes inside any of the HTML files
The script will only make changes if the *precise* string in the "$substring" variable is found. If no changes are being made, then that substring is not being found. So now it's time for you to check that the substring in your script is correct. If your search clue contains even an extra space or newline character it will not match.

Nimoy 09-22-2007 04:34 PM

Working!!!
 
Changed the substring I wanted to replace - so the script now looks like this:


#! /bin/bash

substring='<a HREF="index.html" TARGET="_top">Home</a> '
replacement='<script type="text/javascript"><!--
google_ad_client = "pub-5045815486985038";
google_ad_width = 728;
google_ad_height = 90;
google_ad_format = "728x90_as";
google_ad_type = "text";
//2007-08-14: globabilityaug2007setup
google_ad_channel = "5631073777";
google_color_border = "000000";
google_color_bg = "FFFFFF";
google_color_link = "0000FF";
google_color_text = "000000";
google_color_url = "008000";
//-->
</script>
<script type="text/javascript"
src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
</script><br><br>
<a HREF="index.html" TARGET="_top">Home</a>'

for file in *.html; do
cp $file $file.bak # alternatively use 'mv'
file_contents="$(<$file.bak)"
echo "${file_contents//$substring/$replacement}" >$file
done

The above was saved into a text file called winner.sh and I have set the perms as you explained earlier in the thread so the script is executable by calling ./winner.sh

HOURS UPON HOURS OF DREARY SYNTAX REPLACEMENT HAVE NOW BEEN SHAVED AWAY FROM MY TIME SPENT MAKING THESE CHANGES - AS WELL AS FUTURE ONES!

THANKS A MILLION!!!!!!!!!!!!!


All times are GMT -5. The time now is 08:44 PM.