LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Software (https://www.linuxquestions.org/questions/linux-software-2/)
-   -   using sed to insert lines with special characters (https://www.linuxquestions.org/questions/linux-software-2/using-sed-to-insert-lines-with-special-characters-433786/)

disorderly 04-10-2006 02:35 PM

using sed to insert lines with special characters
 
i'm looking to insert a line in multiple files, and then change their file extension. i've found many examples of using sed to do this but can't get it exactly right.

i need to put the line:
PHP Code:

<?php include("/home/functions.php");validateUser($_SERVER['PHP_SELF']);?>

on line 1 of many pages, *.html, *.htm. in many directories.

i keep getting the error:
bash: syntax error near unexpected token 'include("'

i've come up with this:
cat example.html | sed '1s/^/<?php include("/home/functions.php");validateUser($_SERVER['PHP_SELF']);?>'

i think it's because of the space or something and could really use some guidance
thanks,
disorderly

disorderly 04-10-2006 02:59 PM

ok i got it to work for 1 file! i used:
myheading='<?php include("/home/functions.php");validateUser($_SERVER['PHP_SELF']);?>'
cat top.html | sed '1s/^/$myheading/'

any idea how i can repeat this process for multiple files and then change their file extension to *.php?

thanks again,
disorderly

jschiwal 04-10-2006 03:49 PM

If they are all in the same directory you can use a for loop in bash:
myheading='<?php include("/home/functions.php");validateUser($_SERVER['PHP_SELF']);?>'
for file in *.htm *.html; do
sed '1s/^/$myheading/' "$file"
mv "${file}" "${file%.*}php"
done

jiml8 04-10-2006 07:40 PM

I believe that your original problem was that you need to escape the quotes and the slashes in order to tell sed that you really want these inserted in the string. sed was taking them as control characters.

viz:
Code:

cat example.html | sed '1s/^/<?php include(\"\/home\/functions.php\");validateUser($_SERVER[\'PHP_SELF\']);?>

disorderly 04-11-2006 09:52 AM

jiml8, thank you for your assistance. i am still getting the error - bash: syntax error near unexpected token `;'
how can i enter a sed statement that contains the special character ";" when sed takes it as a command or something?

jschiwal, thank you as well! that is the second half! as soon as i can get this frustrating sed statement to work with a ";", i'll try it!

thanks again,
disorderly

disorderly 04-11-2006 01:02 PM

thanks to both of your help i've gotten this almost working except the last file in the directory always ends up blank
Code:

#!/bin/bash

for file in *.html; do
cat $file | sed '1s/^/\<\?php include(\"\/home\/functions.php\");validateUser($_SERVER[\"PHP_SELF\"])\?\>\
/' > $file
mv "${file}" "${file%.*}.php"
done

my goals is to have this working recursively so i can execute this in the top level folder and let it do it's work on the 9,000+ files that need that PHP header. any idea why the last file always ends up with 0 bytes?

*note - this drove me crazy but it was easy enough to fix: if anyone is trying to insert a newline character after their sed insertion, just put a "\", then hit return. heck if i didn't know then there are others...

thanks,
disorderly

disorderly 04-11-2006 01:22 PM

nevermind it was the server just being a jerk - the script works fine. i just have to get it to work recursively in multiple directories - now can i do this?
thanks,
disorderly

disorderly 04-11-2006 01:49 PM

i haven't found anyway to recursively affect directories so now i'm trying this with:
Code:

find /home/*.htm -exec sed '1s/^/\<\?php include(\"\/home\/functions.php\");validateUser($_SERVER[\"PHP_SELF\"])\?\>/' {} \;
it almost works but won't write to the files! in many examples i've seen people use the -i switch, but it doesn't work on this POS server; i.e. i need to specify the output file and i'm stuck

dive 04-11-2006 01:52 PM

Something like this should work. It checks if a file is a directory and then cd's into it it and calls itself again, and then does a cd ..

Code:

for i in *
do
if [ -d "$i" ] # if * is a directory
then
cd "$i" # descend into the directory
for y in *
do

... your code here

if [ -d "$y" ] # if this is also a directory, call this program again
then
cd "$y"
html2php; # this is the name of your program, must be in your PATH or use full path in command
cd ..
fi
done
cd ..
fi
done


disorderly 04-11-2006 02:21 PM

ahh, i didn't know i could use nested loops in shell scripting - thank you for reading this post dive, i'll give that a try.
incidently my find script above is very flaky - it deletes my files as often as it writes them correctly, dunno why but a warning for anyone that is going to use it..

dive 04-11-2006 03:01 PM

*small edit

It may be the mv line that does this. You could try adding -v option, or maybe even use cp -v to get an idea of filenames are being used

disorderly 04-11-2006 03:51 PM

howdy dive
i've used your code and come up with this
Code:

for i in *
do
        if [ -d "$i" ] # if * is a directory
        then
                cd "$i" # descend into the directory

                for y in *
                do

                        if [ -f "$y" && `grep '*.html' $y]
                        then
                                cat $y | sed '1s/^/insertedLineHere/' > $y
                                # mv "${file}" "${file%.*}.php"
                                cat $y >> /homefiles.txt #for testing
                        fi

                        if [ -d "$y" ] # if this is also a directory, call this program again
                        then
                                cd "$y"
                                programName.sh; # this is the name of your program
                                cd ..
                        fi
                done
        cd ..
        fi
done

but i'm pretty sure this line is wrong:
Code:

if [ -f "$y" && `grep '*.html' $y]
i'm not sure how to make sure that that the only files effected are *.html, *.htm.

dive 04-11-2006 04:22 PM

if [ 'grep .htm $y' ]

should work. Don't need to test $y with -f flag since if grep is true then it must be a regular file. You don't have any dirs named with .html I take it?

disorderly 04-11-2006 07:04 PM

thanks again for reading this dive! yes you can assume none of the directories are named 'html', or 'htm'.
aha i understand the logic in dropping the -f comparison when using grep, but the script is taking in PDF and *.word files as well. isn't grep for searching inside files rather than determining a file's extension?

dive 04-11-2006 08:29 PM

Sorry been a long day...

if echo "$y" | grep ".htm"
then
...

This way grep will search whatever is piped to it. But if you have pdf files containing the sequence .htm you will need some other test

disorderly 04-12-2006 08:17 AM

thanks for the update dive. i gave up for the night last night after 13 hours and the script erased half my files and then itself. i'll try the new code now that my head is clear again :)

dive 04-12-2006 11:49 AM

I can't really see how files are being erased here. Did you try using mv with the -v flag to get a more verbose output? It will say something like test.html -> test.php for each mv. That way you can check back in terminal for errors.

If the cat and sed commands are to blame for deleting files (maybe cat'ing a file back into itself?) you could try using a temporary file:

echo "<?php include(\"/home/functions.php\");validateUser(\$_SERVER['PHP_SELF']);?>" > tempfile
cat $y >> tempfile
mv -v tempfile "${y%.*}.php"

disorderly 04-13-2006 02:20 PM

there's always another way...
 
hi dive - excellent alternative solution! the script worked great after that. the problem seemed to be with sed; the script produced some very strange results:
1) most files were truncated (i think sed has a memory limit or something..)
2) files were being copied & dropped in random folders
3) files were having their contents erased so the byte size was 0

here is the finished script
Code:

#!/bin/bash

# this script recursively inserts a line into the head of all files
# in the directory from where it was called
# it notes all files changed to output.txt
# it shows everything the script does in logs.txt

logs=/home/logs.txt
output=/home/output.txt
phead='<?php include("func.php");validateUser($_SERVER["PHP_SELF"]);?>'

for i in *
do
        if [ -d "$i" ] # if * is a directory
        then
                echo "1) $PWD" >> $logs
                cd "$i" # descend into the directory

                for y in *
                do
                        if echo "$y" | grep ".htm"
                        then
                                echo "2) modifying: $PWD/$y" >> $logs
                                echo $phead > temp
                                cat $y >> temp
                                mv -v temp "${y%.*}.php"
                                chmod 744 "${y%.*}.php"
                                rm -v $y
                                echo "$PWD/$y" >> $output #for testing
                        fi

                        if [ -d "$y" ]  # if a directory, call self
                        then
                                echo "3) $PWD" >> $logs
                                cd "$y"
                                sh /home/c.sh; # call self (recursively)
                                cd ..
                                echo "4) going back up to: $PWD" >> $logs
                        fi
                done
                cd ..
        fi

        if echo "$i" | grep ".htm"
                then
                        echo "5) modifying: $PWD/$i" >> $logs
                        echo $phead > temp
                        cat $i >> temp
                        mv -v temp "${i%.*}.php"
                        chmod 744 "${i%.*}.php"
                        rm -v $i
                        echo "$PWD/$i" >> $output #for testing
        fi
done

it has one limitation - some of the files in these directories were made with Frontpage [puke] and thus have folder in the same directory named the same way but with the word "html" in them. for example the file "content.html" includes two files: ./_vti_cnf/content.html and ./_derived/content.html_sourcecontrol whatever the f those are, and they get changed by the script too.

for anyone that is intested i also found another way to check for the file extension:
Code:

for i in *;do
        if [ ! -d "$i" ];then
                ext=${i##*.}
                echo '$i' is $i and it\'s extension is $ext
                if [ $ext = htm ]; then
                      echo "$i ends with htm"
                if
        fi
done


dive 04-13-2006 03:01 PM

I'm glad that you got it working.

One thing: after test to see if $i is -d, it acts upon that and either changes the file, or descends into dir, and finally ascends out of it. Next it tests again for .htm and then changes the file, whether or not it is a dir. That is why folders are being changed too. Try this:

Code:

#!/bin/bash

# this script recursively inserts a line into the head of all files
# in the directory from where it was called
# it notes all files changed to output.txt
# it shows everything the script does in logs.txt

logs=/home/logs.txt
output=/home/output.txt
phead='<?php include("func.php");validateUser($_SERVER["PHP_SELF"]);?>'

for i in *
do
        if [ -d "$i" ] # if * is a directory
        then
                echo "1) $PWD" >> $logs
                cd "$i" # descend into the directory

                for y in *
                do
                        if echo "$y" | grep ".htm"
                        then
                                echo "2) modifying: $PWD/$y" >> $logs
                                echo $phead > temp
                                cat $y >> temp
                                mv -v temp "${y%.*}.php"
                                chmod 744 "${y%.*}.php"
                                rm -v $y
                                echo "$PWD/$y" >> $output #for testing
                        fi

                        if [ -d "$y" ]  # if a directory, call self
                        then
                                echo "3) $PWD" >> $logs
                                cd "$y"
                                sh /home/c.sh; # call self (recursively)
                                cd ..
                                echo "4) going back up to: $PWD" >> $logs
                        fi
                done
                cd ..

        elif echo "$i" | grep ".htm" # else if not a dir -d
                then
                        echo "5) modifying: $PWD/$i" >> $logs
                        echo $phead > temp
                        cat $i >> temp
                        mv -v temp "${i%.*}.php"
                        chmod 744 "${i%.*}.php"
                        rm -v $i
                        echo "$PWD/$i" >> $output #for testing
        fi
done


disorderly 04-13-2006 03:46 PM

aHA - nice catch dive i'll try that out tomorrow morning!
again, i appreciate all the help :)
-disorderly

dive 04-13-2006 06:38 PM

Something to test
 
Sort of going on the basis that if you call a script recursively you shouldn't need the same file writing code twice, I came up with this:

Code:

#!/bin/bash

# this script recursively inserts a line into the head of all files
# in the directory from where it was called
# it notes all files changed to output.txt
# it shows everything the script does in logs.txt

logs=/home/logs.txt
output=/home/output.txt
phead='<?php include("func.php");validateUser($_SERVER["PHP_SELF"]);?>'

for i in *
do
        if [ -d "$i" ] # if * is a directory
        then
                echo "1) $PWD" >> $logs
                cd "$i" # descend into the directory
                        sh /home/c.sh # call self (recursively)
                        echo "4) going back up to: $PWD" >> $logs
                cd ..
        elif echo "$i" | grep ".htm"
        then
                        echo "5) modifying: $PWD/$i" >> $logs
                        echo $phead > temp
                        cat $i >> temp
                        mv -v temp "${i%.*}.php"
                        chmod 744 "${i%.*}.php"
                        rm -v $i
                        echo "$PWD/$i" >> $output #for testing
        fi
done

It worked for me with a small test with dirs going 2 deep, but let me know how it works. No problem for the help btw - I love scripting and it's helped me with one of my own scripts too.

disorderly 04-13-2006 07:13 PM

short and sweet - that's much tighter coding! it worked perfectly - the end result was the same as the former longer script. thanks for all the help - i appreciate it. this stuff really intrigues me- i've now gone out and picked up the books, "learning the bash shell, unix shell programming" & "wicked cool shell scripts." hopefully i can learn to automate some of my my job so i can have more time to read slashdot at work ;)
-disorderly

dive 04-13-2006 07:56 PM

I found this quite helpful: http://www.tldp.org/LDP/abs/html/ plus doing any sort of search for bash scripting or specific commands usually comes up with useful titbits

disorderly 04-13-2006 08:19 PM

oh cool - that site is a goldmine! would you happen to know why sed couldn't work on the files i was trying to edit? the text files were between 1k and 139k

dive 04-13-2006 08:23 PM

It may be a filesize limitation of sed I guess. But I've had problems in the past escaping characters in sed and iirc that also involved disapearing files.

jschiwal 04-14-2006 06:09 AM

There are two things that you kept doing with the sed examples.

cat $file | sed '<sed program' >$file

First you could write this as sed '<sed program>' $file. You don't need the cat command.
Second if you redirect the output to the same file you use as the input, you will zero out the file before it starts. Either produce a different file, or use the '-i' option (in situ editing).

sed '<sed program>' "${file}" >"${file%.html}.php"
rm "$file"

or

sed -i '<sed program>' "${file}"
mv "$file" "${file%.html}.php"

Also, there is an insert command in sed.
sed '1i\
<? your php line ?>' $file >${file%.html}.php}

The newline is needed after the 'i\'

disorderly 04-20-2006 05:30 PM

thanks jschiwal for the explaination! i feel much better understanding WHY something didn't work. i put
Code:

sed '<sed program>' "${file}" >"${file%.html}.php"
into the program and it worked like a charm :)

i think i had given up on the use of the -i switch because my version of sed doesn't seem to accept it - just gives me an error. might be too old


All times are GMT -5. The time now is 09:34 AM.