Need 'sed command' guru

valnar · 02-23-2007, 08:56 AM

I believe sed is the command I want to use, although there may be a more appropriate one.

I have about 1000 *.htm files on a Microsoft SharePoint server that were created in MS Word and all were saved from Word. The resulting files have problems opening in certain web browsers, and I tracked it down to the Smart Tag lines in those files. When you remove the SmartTag information, it works fine. Instead of opening each file individually with Notepad to remove those lines, I was hoping to use one or two Linux commands to scour the whole directory and remove them for me.

Every file contains these lines among other code, which I want removed or replaced with a null or space.

Code:

<o:SmartTagType namespaceuri="urn:schemas-microsoft-com:office:smarttags"
 name="place"/>
<o:SmartTagType namespaceuri="urn:schemas-microsoft-com:office:smarttags"
 name="PlaceType"/>
<o:SmartTagType namespaceuri="urn:schemas-microsoft-com:office:smarttags"
 name="PlaceName"/>

As you can see, three of the lines are identical, with the shorter three slightly different. Is there a series of SED commands or other such utility that would go through each of my .HTM files and remove all instances of the above?

Thanks,
Robert

pixellany · 02-23-2007, 09:13 AM

Here is a realy good tuturial on sed:
http://www.grymoire.com/Unix/Sed.html#uh-8

I am rusty here, but I think something like this may work:

sed '/SmartTag/d' <oldfile >newfile

edit: Yes, this works. inside the slashes can be any character or word fragement. If the string is found the line is deleted.

muha · 02-23-2007, 09:28 AM

If you want to be more specific (quite specific really):

Code:

sed -i '/<o:SmartTagType\snamespaceuri="urn:schemas-microsoft-com:office:smarttags"\sname="[a-zA-Z]*"\/>/d' file

The -i parameter makes sed use the file for input AND FOR OUTPUT.
So be careful what you overwrite.
Use:
sed -i.bak 'etc
To save .bak backups of the original file.
Or don't use -i at all to first check the output on screen.

[a-zA-Z]* is the only thing that is variable. It means any strings of letters, can be any length or zero.
Note the \/

valnar · 02-23-2007, 09:29 AM

Great link. Thanks.

I will muddle through it, but I was hoping some guru would know how to write up the script to do it for a whole directory real quick.

Robert

valnar · 02-23-2007, 09:32 AM

Muha, does your command run it for a whole directory of files?

colucix · 02-23-2007, 10:36 AM

Quote:

Originally Posted by muha

The -f parameter makes sed use the file for input AND FOR OUTPUT.

Actually, I thought this option was "-i". Indeed, from the sed man page:

Quote:

-f script-file, --file=script-file
add the contents of script-file to the commands to be executed

-i[SUFFIX], --in-place[=SUFFIX]
edit files in place (makes backup if extension supplied)

BTW, using the command proposed by muha (except for "\ " instead of "\s" to escape the blank spaces), you can use "find" with the -exec option, as in

Code:

find /some/dir -name "*.htm" -exec sed -i '/<o:SmartTagType\ namespaceuri="urn:schemas-microsoft-com:office:smarttags"\ name="[a-zA-Z]*"\/>/d' {} \;

where /some/dir will be the actual path of the directory from which to start the search for *.htm file (see man find for details). Since this command modifies your files, here is some safety rule:
1) execute the find command without the -exec option, just to see if the result of the search is what you expected (that is it found only the files you want to modify)
2) execute a safe command with the -exec option to check the correct syntax of the find command itself, e.g.

Code:

find /some/dir -name "*.htm" -exec ls -l {} \;

3) make a backup of the files you're going to modify!

valnar · 02-23-2007, 11:50 AM

You people are wonderful. Thank you so much.

Robert

valnar · 02-23-2007, 01:21 PM

I'd like to use these commands at a Windows prompt and they aren't working correctly with the GnuWin32 versions. Is there a Linux/Unix Bash interpreter that I can load on a Windows box that gives me Bash and most of the main GNU utilities to run from within a Windows/NFTS partition?

Robert

jschiwal · 02-23-2007, 01:35 PM

Yes. I use Cygwin at work. You could install Cygwin or Cygwin/X. Installing Cygwin/X would allow you to for example ssh -X into a linux machine and run a program on the linux machine with the program's window being displayed on your local windows machine. I use a 3 line script at work that uses ls, sed, enscript and ps2pdf to produce pdf catalogs for commercial backups.

Extracting spot names and clients from thousands of schedules, creating a sorted tab separated file allows me to use a simple one liner which uses grep to locate a spots or clients quickly at my encoding station. There is no way I could install any kind of database on my encoding station. The text utilities like grep, sort, uniq, sed and awk are very handy to use at work.

muha · 02-27-2007, 06:35 AM

Quote:

Originally Posted by colucix

Actually, I thought this option was "-i".

Correct

My bad! I've edited my post.