LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   Recursive Text Editing: Where to begin? (https://www.linuxquestions.org/questions/programming-9/recursive-text-editing-where-to-begin-515898/)

trashbird1240 01-03-2007 09:15 AM

Recursive Text Editing: Where to begin?
 
Howdy forum,

I have a series of Stata scripts (.do-files) that were saved in Windows and I just converted to Linux; Stata interprets the scripts just fine, however when I look at them in a text editor, I get "^M" at the end of every line. This hampers the use of ESS' syntax highlighting. So, I want to accomplish a couple of things, mainly get these files into readable format.

The files are arranged in a subdirectories of two separate directories, themselves subdirectories of /home/joel

1. Go through each file and change "c:/" to "~" and in general change Windows pathnames to UNIX pathnames.
2. fromdos <old-do-file> new-do-file
3. pack up the old-do-files into a tree structure just like the current one and tar-then-bzip2 it so that I only have the new ones in my directories.

The last step seems like the easiest part. I thought of using sed to accomplish goal #1, and fromdos (slackware) is pretty easy to use and always does the job correctly.

HOWEVER, because of the directory structure, I need to do this recursively. Where do I start writing a script that will recursively search for the files I need to edit and change, and archive?

Also, I've been learning and using bash quite a lot -- would it be better to use some other shell (e.g., zsh) to interpret this script?

Thanks,
Joel

Nick_Battle 01-03-2007 09:49 AM

> Where do I start writing a script that will recursively search
> for the files I need to edit and change, and archive?

Does find(1) not give you what you want?

Cheers,
-nick

colucix 01-03-2007 10:00 AM

Quote:

Originally Posted by trashbird1240
Howdy forum,

Stata interprets the scripts just fine, however when I look at them in a text editor, I get "^M" at the end of every line. This hampers the use of ESS' syntax highlighting.

A method to remove the ^M and any other control character is by means of the col command, e.g.

Code:

cat input_file | col -b > output_file

billymayday 01-03-2007 10:30 AM

Sounds to me like the files are simply in DOS text format (ie with CR/LF at the end of each line). You say you converted them to Linux, but did you run dos2unix on each file? This will remove the extra control characters automatically.

How many subdirectories are you talking about?

trashbird1240 01-03-2007 11:50 AM

[QUOTE=Nick_Battle]> Where do I start writing a script that will recursively search
> for the files I need to edit and change, and archive?

Does find(1) not give you what you want?
/QUOTE]

I suppose it does: I just tried it and it prints full pathnames on STDOUT. So, I'll just pipe them from find into whatever command I use to do the editing.

Joel

trashbird1240 01-03-2007 11:52 AM

Quote:

Originally Posted by billymayday
Sounds to me like the files are simply in DOS text format (ie with CR/LF at the end of each line). You say you converted them to Linux, but did you run dos2unix on each file? This will remove the extra control characters automatically.

I said I will convert them using fromdos. fromdos is the Slackware version of dos2unix.

Quote:

Originally Posted by billymayday
How many subdirectories are you talking about?

I'm talking about subdirectories of subdirectories, dude. Recursion to the max.

Seriously, I organize my projects by researchers I work for, then the projects that they are doing, then sometimes by section of the analysis I'm doing.

Thanks for the help -- I think I'm getting a good place to start.

Joel

billymayday 01-03-2007 01:55 PM

Joel, I'm with Nick - look at find pretty carefully - especially the exec options

trashbird1240 01-19-2007 03:41 PM

Okay, now that I've been working on this for a while, it's time to update you:

Here's what I've tried:
Code:

find ./data/ ./ado/personal/ -depth -iregex ^.+\\.[a]*do$ -exec sed '/c:/s/\\/\//g;/c:/s/c:/\~/g;s/^.*^M$//' {} \;
The sed command does exactly what it should, and the find command finds the files just perfectly. My question is how to I redirect the output so that it actually edits the files (saves the STDOUT to the file that I am editing). Right now it will print out the modified out put, and testing the commands individually shows that they are doing what I want, but the actual files are untouched.

For example, if I enter

Code:

find ./data/ ./ado/personal/ -depth -iregex ^.+\\.[a]*do$ -exec sed '/c:/s/\\/\//g;/c:/s/c:/\~/g;s/^.*^M$//' {} > {} \;
The command executes and I get no errors. I also get the files remaining the same, even though I thought I could direct the output back to the file with {} > {}. If I do the above with grep

Code:

find ./data/ ./ado/personal/ -depth -iregex ^.+\\.[a]*do$ -exec sed '/c:/s/\\/\//g;/c:/s/c:/\~/g;s/^.*^M$//' {} \;|grep "~"
Then I see that my edits are happening. However, the files are still the same.

How do I actually edit the files? (I've tried /w in sed but that does something different from what I want)

Thanks,
Joel

ntubski 01-19-2007 10:20 PM

Quote:

Originally Posted by man sed
-i[SUFFIX], --in-place[=SUFFIX]

edit files in place (makes backup if extension supplied)

Or output to a different file: {} > {}.new

trashbird1240 01-22-2007 10:00 AM

thanks -- I will try it today.
Joel

trashbird1240 01-22-2007 11:51 AM

it worked

Joel

ygloo 01-22-2007 12:17 PM

...................


All times are GMT -5. The time now is 08:26 AM.