LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices


Reply
  Search this Thread
Old 02-23-2007, 08:56 AM   #1
valnar
Member
 
Registered: May 2005
Posts: 66

Rep: Reputation: 15
Need 'sed command' guru


I believe sed is the command I want to use, although there may be a more appropriate one.

I have about 1000 *.htm files on a Microsoft SharePoint server that were created in MS Word and all were saved from Word. The resulting files have problems opening in certain web browsers, and I tracked it down to the Smart Tag lines in those files. When you remove the SmartTag information, it works fine. Instead of opening each file individually with Notepad to remove those lines, I was hoping to use one or two Linux commands to scour the whole directory and remove them for me.

Every file contains these lines among other code, which I want removed or replaced with a null or space.


Code:
<o:SmartTagType namespaceuri="urn:schemas-microsoft-com:office:smarttags"
 name="place"/>
<o:SmartTagType namespaceuri="urn:schemas-microsoft-com:office:smarttags"
 name="PlaceType"/>
<o:SmartTagType namespaceuri="urn:schemas-microsoft-com:office:smarttags"
 name="PlaceName"/>
As you can see, three of the lines are identical, with the shorter three slightly different. Is there a series of SED commands or other such utility that would go through each of my .HTM files and remove all instances of the above?

Thanks,
Robert
 
Old 02-23-2007, 09:13 AM   #2
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Mint
Posts: 17,809

Rep: Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743
Here is a realy good tuturial on sed:
http://www.grymoire.com/Unix/Sed.html#uh-8

I am rusty here, but I think something like this may work:

sed '/SmartTag/d' <oldfile >newfile

edit: Yes, this works. inside the slashes can be any character or word fragement. If the string is found the line is deleted.

Last edited by pixellany; 02-23-2007 at 09:18 AM.
 
Old 02-23-2007, 09:28 AM   #3
muha
Member
 
Registered: Nov 2005
Distribution: xubuntu, grml
Posts: 451

Rep: Reputation: 38
If you want to be more specific (quite specific really):
Code:
sed -i '/<o:SmartTagType\snamespaceuri="urn:schemas-microsoft-com:office:smarttags"\sname="[a-zA-Z]*"\/>/d' file
The -i parameter makes sed use the file for input AND FOR OUTPUT.
So be careful what you overwrite.
Use:
sed -i.bak 'etc
To save .bak backups of the original file.
Or don't use -i at all to first check the output on screen.

[a-zA-Z]* is the only thing that is variable. It means any strings of letters, can be any length or zero.
Note the \/

Last edited by muha; 02-27-2007 at 06:36 AM. Reason: -f is changed to -i for saving to a file
 
Old 02-23-2007, 09:29 AM   #4
valnar
Member
 
Registered: May 2005
Posts: 66

Original Poster
Rep: Reputation: 15
Great link. Thanks.

I will muddle through it, but I was hoping some guru would know how to write up the script to do it for a whole directory real quick.

Robert
 
Old 02-23-2007, 09:32 AM   #5
valnar
Member
 
Registered: May 2005
Posts: 66

Original Poster
Rep: Reputation: 15
Muha, does your command run it for a whole directory of files?
 
Old 02-23-2007, 10:36 AM   #6
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983
Quote:
Originally Posted by muha
The -f parameter makes sed use the file for input AND FOR OUTPUT.
Actually, I thought this option was "-i". Indeed, from the sed man page:
Quote:
-f script-file, --file=script-file
add the contents of script-file to the commands to be executed

-i[SUFFIX], --in-place[=SUFFIX]
edit files in place (makes backup if extension supplied)
BTW, using the command proposed by muha (except for "\ " instead of "\s" to escape the blank spaces), you can use "find" with the -exec option, as in
Code:
find /some/dir -name "*.htm" -exec sed -i '/<o:SmartTagType\ namespaceuri="urn:schemas-microsoft-com:office:smarttags"\ name="[a-zA-Z]*"\/>/d' {} \;
where /some/dir will be the actual path of the directory from which to start the search for *.htm file (see man find for details). Since this command modifies your files, here is some safety rule:
1) execute the find command without the -exec option, just to see if the result of the search is what you expected (that is it found only the files you want to modify)
2) execute a safe command with the -exec option to check the correct syntax of the find command itself, e.g.
Code:
find /some/dir -name "*.htm" -exec ls -l {} \;
3) make a backup of the files you're going to modify!

Last edited by colucix; 02-23-2007 at 10:37 AM.
 
Old 02-23-2007, 11:50 AM   #7
valnar
Member
 
Registered: May 2005
Posts: 66

Original Poster
Rep: Reputation: 15
You people are wonderful. Thank you so much.

Robert
 
Old 02-23-2007, 01:21 PM   #8
valnar
Member
 
Registered: May 2005
Posts: 66

Original Poster
Rep: Reputation: 15
I'd like to use these commands at a Windows prompt and they aren't working correctly with the GnuWin32 versions. Is there a Linux/Unix Bash interpreter that I can load on a Windows box that gives me Bash and most of the main GNU utilities to run from within a Windows/NFTS partition?

Robert
 
Old 02-23-2007, 01:35 PM   #9
jschiwal
LQ Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 682Reputation: 682Reputation: 682Reputation: 682Reputation: 682Reputation: 682
Yes. I use Cygwin at work. You could install Cygwin or Cygwin/X. Installing Cygwin/X would allow you to for example ssh -X into a linux machine and run a program on the linux machine with the program's window being displayed on your local windows machine. I use a 3 line script at work that uses ls, sed, enscript and ps2pdf to produce pdf catalogs for commercial backups.

Extracting spot names and clients from thousands of schedules, creating a sorted tab separated file allows me to use a simple one liner which uses grep to locate a spots or clients quickly at my encoding station. There is no way I could install any kind of database on my encoding station. The text utilities like grep, sort, uniq, sed and awk are very handy to use at work.
 
Old 02-27-2007, 06:35 AM   #10
muha
Member
 
Registered: Nov 2005
Distribution: xubuntu, grml
Posts: 451

Rep: Reputation: 38
Quote:
Originally Posted by colucix
Actually, I thought this option was "-i".
Correct My bad! I've edited my post.
 
  


Reply

Tags
replace, search, sed


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
sed command help jaredhanks Linux - General 4 08-15-2006 04:19 PM
sed command ancys Programming 5 08-03-2006 11:39 PM
sed command help... Pete.Hanson@jacobs.c Programming 8 06-02-2006 05:53 PM
sed command linuxdev Linux - Newbie 9 02-24-2004 04:50 PM
sed command kwigibo Linux - General 3 04-21-2002 04:11 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - General

All times are GMT -5. The time now is 06:26 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration