Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game. |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
 |
10-09-2005, 11:59 PM
|
#1
|
LQ Newbie
Registered: Oct 2005
Location: St-Bruno, Quebec, Canada
Distribution: RH ES 4
Posts: 6
Rep:
|
Need help to strip XML & XSL tags from multiple files
Hello,
I want to write a BASH file to automatically merge multiple XSLT files together for faster upload to the client side.
Here is the logic I want to use:
1. Merge 2 or more .xsl files together using 'cat'.
2. Strip all occurrences of the following lines (they appear at the top and bottom of every .xsl file):
Code:
<?xml version='1.0'?>
<xsl:stylesheet version="1.0" xmlns:xsl="some url">
</xsl:stylesheet>
3. Add back the following lines at the top of the merged file:
Code:
<?xml version='1.0'?>
<xsl:stylesheet version="1.0" xmlns:xsl="some url">
4. Add back the following line at the end of the merged file:
I need help to write the find & replace commands (using sed, awk or whatever) needed to strip the unwanted lines.
(note: I had to remove the URL that appeared in the lines above so that this forum would accept my post)
Thank you
Daniel
Last edited by dfrechet; 10-10-2005 at 01:48 PM.
|
|
|
10-10-2005, 12:53 AM
|
#2
|
LQ Newbie
Registered: Aug 2005
Location: California
Distribution: whatever the customer pays for
Posts: 10
Rep:
|
edit files with sed
If you're going to start by creating one big file, then you can feed the result to sed to strip out the tags. See man sed--it does take arguments for which lines to process or not process--you will have to figure out which works better for you.
In a shell script you can do all sorts of things like count lines with wc, create temp files, etc. That's what makes programming entertaining.
Please also see http://www.catb.org/~esr/faqs/smart-questions.html
|
|
|
10-10-2005, 06:15 AM
|
#3
|
Senior Member
Registered: Mar 2004
Location: england
Distribution: Mint, Armbian, NetBSD, Puppy, Raspbian
Posts: 3,516
|
cat *.xml | xml_cleanup
xml_cleanup:
Code:
#!/bin/sed -f
# add to beginning
1i\
<?xml version='1.0'?>\
<xsl:stylesheet version="1.0" xmlns:xsl="some url">
# stick at end
$a\
</xsl:stylesheet>
# remove
/<xsl:stylesheet.*>/d
/<\/xsl:stylesheet.*>/d
/<\?xml version/d
/<xsl:stylesheet.*>/d
|
|
|
10-10-2005, 01:39 PM
|
#4
|
LQ Newbie
Registered: Oct 2005
Location: St-Bruno, Quebec, Canada
Distribution: RH ES 4
Posts: 6
Original Poster
Rep:
|
Thank you for your help bigearsbilly. Based on your example I was able to created my own version of xml_cleanup (included below):
Code:
# Remove all occurrences of the following lines from the merged file
s/<?xml version.*>//
s/<xsl:stylesheet.*>//
s/<\/xsl:stylesheet.*>//
# Add the following lines at the beginning of the merged file
1i\
<?xml version='1.0'?>\
<xsl:stylesheet version="1.0" xmlns:xsl="some url">
# Add the following line at the end of the merged file
$a\
</xsl:stylesheet>
# Remove leading blanks from each line (the square brackets contain a tab and a space)
s/^[ ]*//
# Reduce strings containing multiple blanks to single blanks (each pair of square brackets contain a tab and a space)
s/[ ][ ]*/ /g
# Remove DOS line breaks (^M)
s/\r//
# Delete blank lines
/^$/d
Daniel
Last edited by dfrechet; 10-10-2005 at 04:14 PM.
|
|
|
10-10-2005, 05:29 PM
|
#5
|
Senior Member
Registered: Oct 2004
Location: Houston, TX (usa)
Distribution: MEPIS, Debian, Knoppix,
Posts: 4,727
|
Some ways to make it shorter/prettier:
Code:
# Remove all occurrences of the following lines from the merged file
# except the 1st 2 or the last, as the case may be
1,2!s,<?xml version.*>,
1,2!s,<xsl:stylesheet.*>,,
$!s,</xsl:stylesheet.*>,,
# Reduce strings containing multiple blanks to single blanks
s,[[:blank:]]*, ,g
# Remove any leading blank from each line
s,^ ,,
# Remove DOS line breaks (^M)
s,\r,,
# Delete blank lines
/^$/d
Notes
A bang after an address range "negates" it.
'[[:blank:]]' is the same as "a tab and a space". (I sometimes find it more cumbersome to type, but it is easier to understand & just as long to read -- shorter if you give credit for the deleted, no longer needed explanation.)
Finally, I don't believe "cat" is necessary anywhere here. "sed" operates on all files given to it as arguments -- i.e. you might say it "self cats">
Last edited by archtoad6; 10-10-2005 at 05:54 PM.
|
|
|
10-10-2005, 09:01 PM
|
#6
|
LQ Newbie
Registered: Oct 2005
Location: St-Bruno, Quebec, Canada
Distribution: RH ES 4
Posts: 6
Original Poster
Rep:
|
Your code is indeed shorter and more efficient. Thank you.
However, I still have a problem. I want to insert a comment after the 2nd line of the file, but when I uncomment the code below I get the following error message from sed: unknown command: `<'. I tested with different strings and found that whatever character appears at position 1 is automatically flagged as an "unknown command".
xslCleanup.sed
Code:
# Remove all occurrences of the following lines from the merged file
# except the 1st 2 or the last, as the case may be
1,2!s,<?xml version.*>,,
1,2!s,<xsl:stylesheet.*>,,
$!s,</xsl:stylesheet.*>,,
# Insert the following line after the 2nd line at the top of the file
##3i\
##<!-- THIS FILE IS GENERATED AUTOMATICALLY. DO NOT EDIT. -->
# Reduce strings containing multiple consecutive spaces (not tabs) to single spaces
s, *, ,g
# Remove any leading blank from each line
s,^ ,,
# Remove DOS line breaks (^M)
s,\r,,
# Delete blank lines
/^$/d
Also, is it be possible to insert the current date and time in a line using 'sed'. For example:
Code:
<!-- THIS FILE WAS GENERATED AUTOMATICALLY ON <date> AT <time>. DO NOT EDIT. -->
How can this be done?
Daniel
Last edited by dfrechet; 10-11-2005 at 06:42 AM.
|
|
|
10-11-2005, 03:03 AM
|
#7
|
Senior Member
Registered: Mar 2004
Location: england
Distribution: Mint, Armbian, NetBSD, Puppy, Raspbian
Posts: 3,516
|
hats off to archtoad!
Quote:
3i\
<!-- THIS FILE IS GENERATED AUTOMATICALLY. DO NOT EDIT. -->
|
this works OK for me, you haven't got a space or DOS ^M
after the \ have you?
Inserting date, hmmm, don't reckon so; not in plain old sed.
one can also delete spaces like:
Code:
#!/usr/bin/sed -nf
/./p
I.e -n = no default print then print any lines with at least 1 character.
|
|
|
10-11-2005, 07:13 AM
|
#8
|
LQ Newbie
Registered: Oct 2005
Location: St-Bruno, Quebec, Canada
Distribution: RH ES 4
Posts: 6
Original Poster
Rep:
|
That was it. I removed the ^M character and everything worked.
Thank you.
Daniel
|
|
|
10-11-2005, 08:14 AM
|
#9
|
LQ Newbie
Registered: Oct 2005
Location: St-Bruno, Quebec, Canada
Distribution: RH ES 4
Posts: 6
Original Poster
Rep:
|
For the benefit of all, here is the final version of my 'sed' command file:
Code:
# Remove all occurrences of the following lines from the merged file
# except the 1st 2 lines
1,2!s,<?xml version.*>,,
1,2!s,<xsl:stylesheet.*>,,
s,</xsl:stylesheet.>,,
# Insert the following line after the 2nd line at the top of the file
3i\
<!-- THIS FILE IS GENERATED AUTOMATICALLY. DO NOT EDIT. -->
# Add the following line at the end of the merged file
$a\
</xsl:stylesheet>
# Reduce strings containing multiple consecutive spaces (not tabs) to single spaces
s, *, ,g
# Remove any leading blank from each line
s,^ ,,
# Remove DOS line breaks (^M)
s,\r,,
# Remove comments inserted automatically by Stylus Studio
/<!-- Stylus Studio/,/-->/D
# Delete blank lines
/^$/d
|
|
|
10-12-2005, 06:52 AM
|
#10
|
Senior Member
Registered: Oct 2004
Location: Houston, TX (usa)
Distribution: MEPIS, Debian, Knoppix,
Posts: 4,727
|
Thanks for the compliments.
Would:
Code:
4i <\!-- Created on `date` -->
solve your date stamping problem?
|
|
|
All times are GMT -5. The time now is 09:03 PM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|