Need help to strip XML & XSL tags from multiple files
Hello,
I want to write a BASH file to automatically merge multiple XSLT files together for faster upload to the client side. Here is the logic I want to use: 1. Merge 2 or more .xsl files together using 'cat'. 2. Strip all occurrences of the following lines (they appear at the top and bottom of every .xsl file): Code:
<?xml version='1.0'?> Code:
<?xml version='1.0'?> Code:
</xsl:stylesheet> (note: I had to remove the URL that appeared in the lines above so that this forum would accept my post) Thank you Daniel |
edit files with sed
If you're going to start by creating one big file, then you can feed the result to sed to strip out the tags. See man sed--it does take arguments for which lines to process or not process--you will have to figure out which works better for you.
In a shell script you can do all sorts of things like count lines with wc, create temp files, etc. That's what makes programming entertaining. Please also see http://www.catb.org/~esr/faqs/smart-questions.html |
cat *.xml | xml_cleanup
xml_cleanup: Code:
#!/bin/sed -f |
Thank you for your help bigearsbilly. Based on your example I was able to created my own version of xml_cleanup (included below):
Code:
# Remove all occurrences of the following lines from the merged file |
Some ways to make it shorter/prettier:
Code:
# Remove all occurrences of the following lines from the merged file A bang after an address range "negates" it. '[[:blank:]]' is the same as "a tab and a space". (I sometimes find it more cumbersome to type, but it is easier to understand & just as long to read -- shorter if you give credit for the deleted, no longer needed explanation.) Finally, I don't believe "cat" is necessary anywhere here. "sed" operates on all files given to it as arguments -- i.e. you might say it "self cats"> |
Your code is indeed shorter and more efficient. Thank you.
However, I still have a problem. I want to insert a comment after the 2nd line of the file, but when I uncomment the code below I get the following error message from sed: unknown command: `<'. I tested with different strings and found that whatever character appears at position 1 is automatically flagged as an "unknown command". xslCleanup.sed Code:
# Remove all occurrences of the following lines from the merged file Code:
<!-- THIS FILE WAS GENERATED AUTOMATICALLY ON <date> AT <time>. DO NOT EDIT. --> Daniel |
hats off to archtoad!
Quote:
after the \ have you? Inserting date, hmmm, don't reckon so; not in plain old sed. one can also delete spaces like: Code:
#!/usr/bin/sed -nf |
That was it. I removed the ^M character and everything worked.
Thank you. Daniel |
For the benefit of all, here is the final version of my 'sed' command file:
Code:
# Remove all occurrences of the following lines from the merged file |
Thanks for the compliments.
Would: Code:
4i <\!-- Created on `date` --> |
All times are GMT -5. The time now is 09:19 AM. |