LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   Massive Text Replacement Project (https://www.linuxquestions.org/questions/programming-9/massive-text-replacement-project-841182/)

dougp25 10-29-2010 10:21 AM

Massive Text Replacement Project
 
I've posted here a few times, and it feels like I am making steps forward, but I still can't see the finish line. So I will try to summarize the issue as succinctly as possible, and include all relevant code, and hopefully someone can "dope-slap" me and show me what little thing I might be missing!

History: large open source PHP project, school management program. Comprises about 200 scripts. Had another developer for awhile, and he wanted a version in German, so he edited all the scripts and replaced text that would show up in the browser with variables (i.e. instead of "Click Here", we have _PAGE_TEXT_CLICK). There was one big file with 2400 variables defined (DEFINE _PAGE_TEXT_CLICK "Click Here";). He has since moved on, and the demand for a multi-lingual project never materialized. As I am now sole code-maintainer, I find it very cumbersome to make changes to the project, and now want to go back to English only version.

I have been trying to create a script that will just do a massive 'search and replace' through all 200 scripts (realizing that each script might have about 10 replacements that need to be done, so many attempts at replacing text would fail, as every DEFINE statement is unique).

OK, that's the history! I created this massive 'run.sed' file that looks like this:

sed -i 's|_TEACHER_EDIT_STUDENT_1_NOTES|\"Notes\"|'
sed -i 's|_TEACHER_EDIT_STUDENT_1_BIRTHCITY|\"Birth City\"|'
sed -i 's|_TEACHER_EDIT_STUDENT_1_BIRTHSTATE|\"Birth State\"|'
sed -i 's|_TEACHER_EDIT_STUDENT_1_BIRTHCOUNTRY|\"Birth Country\"|'
sed -i 's|_TEACHER_EDIT_STUDENT_1_PRVS_SCHOOLNAME|\"Prvs School Name\"|'
sed -i 's|_TEACHER_EDIT_STUDENT_1_PRVS_SCHOOLADDRESS|\"Prvs School Address\"|'

[chopped, 2400 lines]

Single attempts aimed at a single file work, so the syntax appears to be fine.

My bash file that should iterate through the scripts looks like this:

#!/bin/bash
FILES=/home/me/swift200/*.php
for f in $FILES
do
echo "Processing $f file..."
./run.sed
done

The script is executable, I am doing this as root, none of my PHP files have spaces in the name, and I am executing this from the directory that contains the PHP files.

When I run it, many of these go scrolling by:

sed:no input files

Any help is duly appreciated. hell if I could, I'd offer PayPal to anyone willing to help out, because the next step is going to be to print out the 2400 lines, and manually edit each file and make the changes. (I say that, but 'never give up' is a great Linux geek motto!)

Thank you!

grail 10-29-2010 10:28 AM

Quote:

sed:no input files
So this is the clue to the whole thing. I will try and help by educating instead of telling you the answer.

So my question for you is: What exactly would you type on the command line to use your run.sed script with one file?

This should then tell you exactly what is missing from your script.

fordeck 10-29-2010 10:34 AM

I think all you need is to include the $f as is listed below:

Code:

#!/bin/bash
FILES=/home/me/swift200/*.php
for f in $FILES
do
echo "Processing $f file..."
./run.sed $f
done

OOP's looks like 'grail' beat me to it.

Regards,

Fordeck

dougp25 10-29-2010 10:53 AM

No change....totally perplexed here.
I thought that was going to do it too, but I don't even get my echo statement, which should be telling me what file it's currently processing.

dougp25 10-29-2010 11:14 AM

A little update:

I made a much smaller run.sed so I can do some testing and not watch so many lines go flying by!

The bash script IS iterating through my directory, so that is good. It echoes that it is stepping through each file. But at each file, sed still says "no input file".

It seems that my bash script (replace.sh) knows what the current file is, but he's not handing it off to run.sed. I tried editing run.sed and adding a f$ to the end of each line, but sed just says "no file or directory".

I think this is real close, it's just that hand off thing.

grail 10-29-2010 11:44 AM

So let me get this straight, you have your script looking exactly like fordeck's?
Quote:

I tried editing run.sed and adding a f$ to the end of each line, but sed just says "no file or directory".
This will definitely not work as sed has no idea what $f refers to, hence the error message.

Ok, let us try this:

1. run.sed has only one sed command in it
2. test.php file in directory we are testing in
3. replace.sh to be like:
Code:

#!/bin/bash

FILES=/home/me/swift200/*.php

echo "Manual run with file name"
./run.sed test.php

echo "Manual run with variable FILES=$FILES"
./run.sed "$FILES"

echo "Run from for loop"
for f in $FILES
do
    echo "Processing $f file..."
    ./run.sed "$f"
done

Run this like:
Code:

./replace.sh > log_file
Please post back both the log_file, run.sed and test.php files

dougp25 10-29-2010 12:12 PM

2 Attachment(s)
Did as you suggested. Many sed:no input files go flying by.

The log_file successfully goes through the entire directory. I have attached it, but for brevity, here's a few lines:

Manual run with file name
Manual run with variable FILES=/home/doug/swift200/*.php
Run from for loop
Processing /home/doug/swift200/admin_add_contact_user.php file...
Processing /home/doug/swift200/admin_add_edit_contact_1.php file...
Processing /home/doug/swift200/admin_add_edit_contact_2.php file...
Processing /home/doug/swift200/admin_add_edit_contact_3.php file...
Processing /home/doug/swift200/admin_add_edit_contact_4.php file...

run.sed is composed thusly:

sed -i 's|_BROWSER_TITLE|\"School Management System\"|'

test.php does not make the substitution (after running, _BROWSER_TITLE still remains and not "School Management System"). I have also attached test.php

I did a test about 30 minutes ago, where run.sed looked like this:

#!/bin/bash
echo "working on file $1"
sed -i 's|_BROWSER_TITLE|\"School Management System\"|'

My echo would come up like this:

Working of file .//home/doug/swift200/admin_manage_contacts.php

I thought maybe run.sed was getting handed a weird filename, and hence the "no input" problem. Or maybe the variable needs to be exported?

Thanks for the help.

PS- I had to add a txt suffix to both or they would not upload.

dougp25 10-29-2010 01:07 PM

Possible solution? I appended a *.php to each line in run.sed, so it now looks like this:

sed -i 's|_BROWSER_TITLE|\"School Management System\"|' *.php
sed -i 's|_WELCOME|\"Welcome\"|' *.php
sed -i 's|_YES|\"Yes\"|' *.php
sed -i 's|_NO|\"No\"|' *.php

Computer is currently churning through files, using ls -lag shows me files are being updated frequently.

Thanks for all the help, it looks like this might do it! I will report back with success or not.

dougp25 10-29-2010 03:14 PM

Damn! Hope some of you are still with me on this one.

What Grail offered up worked perfectly. Unfortunately I had two bash windows open, and was working on two sets of the scripts with differing ideas. So I ran grail's plan against set 1, but showed the results of set 2, which obviously had no change.

So with the one line sed.run, ALL scripts replaced _BROWSER_TEXT with School Management System.

My plan of appending *.php to each line in sed.run was a bust.

Will take help from anyone, and I will be more careful reporting results! Sorry about that.

devnull10 10-29-2010 07:24 PM

Let me make this a lot easier for you! :)

You will need to modify the script to ensure that your dictionary file is parsed correctly - the one I did as a test just had "VARIABLE=translated text" on each line rather DEFINE x y, but that was only because I didn't see how the file was setup at first - I'm sure you can work out very easily what you need to do that though!! (awk is your friend - this thread will help: http://www.linuxquestions.org/questi...-count-179078/)
Then create the following shell script:

Code:

    1  #!/bin/bash                                                                             
    2 
    3  for i in $(ls *.php)
    4  do
    5    cat dictionary | while read j
    6    do
    7      SEARCH=$(echo $j | cut -d = -f 1)
    8      REPLACE=$(echo $j | cut -d = -f 2)
    9      sed -i "s/$SEARCH/$REPLACE/g" $i > /dev/null
    10    done
    11  done

The line numbers are just because I did a straight copy from vi. If you want to put some debug messages in there so you can see what file it is on etc then insert a line after line 4, something like:

echo "Processing $i"

As always, test on a test system or backup before running on your live data - I may well have missed something! :)

You're welcome.

dive 10-29-2010 07:50 PM

Just going back to your original sed file, you didn't actually specify a filename to the sed commands, that's why the no input file error.

This should give you an idea:

Code:

for f in *.php
do
  ./run.sed $f
done

Then in your sed file:

Code:

sed -i 's|SEARCH|REPLACE|g' $@
$@ passes the command parameters (in this case $f) to the sed file.

Of course putting *.php on the end of the sed line will also work, but just trying to explain why it was failing.

The -i switch in sed needs to have the filename appended to the statement.

Oh and by the way the g on the end means 'global' otherwise it will only replace the first occurrence.

AnanthaP 10-29-2010 08:02 PM

I dont understand why you didn't continue the parameterised approach of the original programmer who has obviously placed all these variable definitions in one file. What ain't broken doesn't need to be fixed.

grail 10-30-2010 05:50 AM

Sorry I haven't replied sooner. I was in the middle of doing so last night when the power went out :(

A number of people have already pointed you in some good directions. As some have pointed out the issue run.sed is the lines in it either need to receive
arguments as to which files to work on, or as I have seen in a previous post, you could change it to a sed interpreter script, ie have #!/bin/sed (or wherever yours is located) at the top
instead of bash.
So could look like:
Code:

#!/bin/sed -f

s/_BROWSER_TITLE/"School Management System"/

And then call it like:
Code:

./run.sed -i file
So in your script it will become:
Code:

#!/bin/bash

FILES=/home/me/swift200/*.php

for f in $FILES
do
    echo "Processing $f file..."
    ./run.sed -i "$f"
done

Although I do like devnull10's idea of using the dictionary itself.

devnull10 10-30-2010 07:34 AM

Having thought about it, you don't need to use awk, I didn't realise the value was surrounded by quotes - you can just use cut (although there is nothing wrong with awk! :)) .
The following script should do what you need, based upon your dictionary definition file. Obviously replace the word dictionary with whatever your definition file is called.


Code:

#!/bin/bash

for i in $(ls *.txt)
do
  echo "Processing $i"
  cat dictionary | while read j
  do
    SEARCH=$(echo $j | cut -d ' '  -f 2)
    REPLACE=$(echo $j | cut -d '"' -f 2)
    sed -i "s/$SEARCH/$REPLACE/g" $i > /dev/null
  done
done


dougp25 10-30-2010 08:36 AM

THANK YOU THANK YOU THANK YOU ALL!

Grail for the awesome help getting going! And you gave me another idea, why not have one huge script with the sed replacements in the middle? Sort of

#!/bin/bash

FILES=/home/me/swift200/*.php

for f in $FILES
do
sed -i 's|_BROWSER_TITLE|\"School Management System\"|' $f
sed -i 's|_WELCOME|\"Welcome\"|' $f
sed -i 's|_YES|\"Yes\"|' $f
sed -i 's|_NO|\"No\"|' $f
done

I am paraphrasing sort of there, but wouldn't this have worked too? Of course, I will test that out on a sample dir, but it seems like it would have got me past the variable passing.

Dive gives me the official dope-slap. It's just weird, because I rewrote run.sed with a $1 at the end of each,my assumption being that bash stores variables in "built in" places. But everytime I got to run.sed and echoed $1, it always had an extra slash in front of the filename, so it would fail. Thanks for reminding me about the g flag as well.

Devnull10, now THAT is one clean looking way to do this! I rewrote the dictionary file, and ran yours, with a few tweaks, that was awful easy.

Thanks everyone! I can now move forward a whole lot faster! My order for Classic Shell Scripting and Learning the Bash Shell from O'Reilly are on their way.

Of course now I need to make sure all the substitutions are good with PHP...God forbid there's a missing semicolon somewhere


All times are GMT -5. The time now is 12:48 AM.