Linux - GeneralThis Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I believe sed is the command I want to use, although there may be a more appropriate one.
I have about 1000 *.htm files on a Microsoft SharePoint server that were created in MS Word and all were saved from Word. The resulting files have problems opening in certain web browsers, and I tracked it down to the Smart Tag lines in those files. When you remove the SmartTag information, it works fine. Instead of opening each file individually with Notepad to remove those lines, I was hoping to use one or two Linux commands to scour the whole directory and remove them for me.
Every file contains these lines among other code, which I want removed or replaced with a null or space.
As you can see, three of the lines are identical, with the shorter three slightly different. Is there a series of SED commands or other such utility that would go through each of my .HTM files and remove all instances of the above?
If you want to be more specific (quite specific really):
Code:
sed -i '/<o:SmartTagType\snamespaceuri="urn:schemas-microsoft-com:office:smarttags"\sname="[a-zA-Z]*"\/>/d' file
The -i parameter makes sed use the file for input AND FOR OUTPUT.
So be careful what you overwrite.
Use:
sed -i.bak 'etc
To save .bak backups of the original file.
Or don't use -i at all to first check the output on screen.
[a-zA-Z]* is the only thing that is variable. It means any strings of letters, can be any length or zero.
Note the \/
Last edited by muha; 02-27-2007 at 06:36 AM.
Reason: -f is changed to -i for saving to a file
where /some/dir will be the actual path of the directory from which to start the search for *.htm file (see man find for details). Since this command modifies your files, here is some safety rule:
1) execute the find command without the -exec option, just to see if the result of the search is what you expected (that is it found only the files you want to modify)
2) execute a safe command with the -exec option to check the correct syntax of the find command itself, e.g.
Code:
find /some/dir -name "*.htm" -exec ls -l {} \;
3) make a backup of the files you're going to modify!
I'd like to use these commands at a Windows prompt and they aren't working correctly with the GnuWin32 versions. Is there a Linux/Unix Bash interpreter that I can load on a Windows box that gives me Bash and most of the main GNU utilities to run from within a Windows/NFTS partition?
Yes. I use Cygwin at work. You could install Cygwin or Cygwin/X. Installing Cygwin/X would allow you to for example ssh -X into a linux machine and run a program on the linux machine with the program's window being displayed on your local windows machine. I use a 3 line script at work that uses ls, sed, enscript and ps2pdf to produce pdf catalogs for commercial backups.
Extracting spot names and clients from thousands of schedules, creating a sorted tab separated file allows me to use a simple one liner which uses grep to locate a spots or clients quickly at my encoding station. There is no way I could install any kind of database on my encoding station. The text utilities like grep, sort, uniq, sed and awk are very handy to use at work.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.