LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 10-09-2005, 11:59 PM   #1
dfrechet
LQ Newbie
 
Registered: Oct 2005
Location: St-Bruno, Quebec, Canada
Distribution: RH ES 4
Posts: 6

Rep: Reputation: 0
Need help to strip XML & XSL tags from multiple files


Hello,

I want to write a BASH file to automatically merge multiple XSLT files together for faster upload to the client side.

Here is the logic I want to use:
1. Merge 2 or more .xsl files together using 'cat'.

2. Strip all occurrences of the following lines (they appear at the top and bottom of every .xsl file):
Code:
<?xml version='1.0'?>
<xsl:stylesheet version="1.0" xmlns:xsl="some url">
</xsl:stylesheet>
3. Add back the following lines at the top of the merged file:
Code:
<?xml version='1.0'?>
<xsl:stylesheet version="1.0" xmlns:xsl="some url">
4. Add back the following line at the end of the merged file:
Code:
</xsl:stylesheet>
I need help to write the find & replace commands (using sed, awk or whatever) needed to strip the unwanted lines.

(note: I had to remove the URL that appeared in the lines above so that this forum would accept my post)

Thank you
Daniel

Last edited by dfrechet; 10-10-2005 at 01:48 PM.
 
Old 10-10-2005, 12:53 AM   #2
sgla1
LQ Newbie
 
Registered: Aug 2005
Location: California
Distribution: whatever the customer pays for
Posts: 10

Rep: Reputation: 0
edit files with sed

If you're going to start by creating one big file, then you can feed the result to sed to strip out the tags. See man sed--it does take arguments for which lines to process or not process--you will have to figure out which works better for you.

In a shell script you can do all sorts of things like count lines with wc, create temp files, etc. That's what makes programming entertaining.

Please also see http://www.catb.org/~esr/faqs/smart-questions.html
 
Old 10-10-2005, 06:15 AM   #3
bigearsbilly
Senior Member
 
Registered: Mar 2004
Location: england
Distribution: FreeBSD, Debian, Mint, Puppy
Posts: 3,282

Rep: Reputation: 172Reputation: 172
cat *.xml | xml_cleanup

xml_cleanup:
Code:
#!/bin/sed -f


# add to beginning

1i\
<?xml version='1.0'?>\
<xsl:stylesheet version="1.0" xmlns:xsl="some url">

# stick at end

$a\
</xsl:stylesheet>

# remove

/<xsl:stylesheet.*>/d
/<\/xsl:stylesheet.*>/d
/<\?xml version/d
/<xsl:stylesheet.*>/d
 
Old 10-10-2005, 01:39 PM   #4
dfrechet
LQ Newbie
 
Registered: Oct 2005
Location: St-Bruno, Quebec, Canada
Distribution: RH ES 4
Posts: 6

Original Poster
Rep: Reputation: 0
Thank you for your help bigearsbilly. Based on your example I was able to created my own version of xml_cleanup (included below):

Code:
# Remove all occurrences of the following lines from the merged file
s/<?xml version.*>//
s/<xsl:stylesheet.*>//
s/<\/xsl:stylesheet.*>//

# Add the following lines at the beginning of the merged file
1i\
<?xml version='1.0'?>\
<xsl:stylesheet version="1.0" xmlns:xsl="some url">

# Add the following line at the end of the merged file
$a\
</xsl:stylesheet>

# Remove leading blanks from each line (the square brackets contain a tab and a space)
s/^[ 	]*//

# Reduce strings containing multiple blanks to single blanks (each pair of square brackets contain a tab and a space)
s/[ 	][ 	]*/ /g

# Remove DOS line breaks (^M)
s/\r//

# Delete blank lines
/^$/d
Daniel

Last edited by dfrechet; 10-10-2005 at 04:14 PM.
 
Old 10-10-2005, 05:29 PM   #5
archtoad6
Senior Member
 
Registered: Oct 2004
Location: Houston, TX (usa)
Distribution: MEPIS, Debian, Knoppix,
Posts: 4,727
Blog Entries: 15

Rep: Reputation: 230Reputation: 230Reputation: 230
Some ways to make it shorter/prettier:
Code:
# Remove all occurrences of the following lines from the merged file
# except the 1st 2 or the last, as the case may be
1,2!s,<?xml version.*>,
1,2!s,<xsl:stylesheet.*>,,
  $!s,</xsl:stylesheet.*>,,

# Reduce strings containing multiple blanks to single blanks 
s,[[:blank:]]*, ,g

# Remove any leading blank from each line
s,^ ,,

# Remove DOS line breaks (^M)
s,\r,,

# Delete blank lines
/^$/d
Notes
A bang after an address range "negates" it.

'[[:blank:]]' is the same as "a tab and a space". (I sometimes find it more cumbersome to type, but it is easier to understand & just as long to read -- shorter if you give credit for the deleted, no longer needed explanation.)

Finally, I don't believe "cat" is necessary anywhere here. "sed" operates on all files given to it as arguments -- i.e. you might say it "self cats">

Last edited by archtoad6; 10-10-2005 at 05:54 PM.
 
Old 10-10-2005, 09:01 PM   #6
dfrechet
LQ Newbie
 
Registered: Oct 2005
Location: St-Bruno, Quebec, Canada
Distribution: RH ES 4
Posts: 6

Original Poster
Rep: Reputation: 0
Your code is indeed shorter and more efficient. Thank you.

However, I still have a problem. I want to insert a comment after the 2nd line of the file, but when I uncomment the code below I get the following error message from sed: unknown command: `<'. I tested with different strings and found that whatever character appears at position 1 is automatically flagged as an "unknown command".

xslCleanup.sed
Code:
# Remove all occurrences of the following lines from the merged file
# except the 1st 2 or the last, as the case may be
1,2!s,<?xml version.*>,,
1,2!s,<xsl:stylesheet.*>,,
  $!s,</xsl:stylesheet.*>,,

# Insert the following line after the 2nd line at the top of the file
##3i\
##<!-- THIS FILE IS GENERATED AUTOMATICALLY. DO NOT EDIT. -->

# Reduce strings containing multiple consecutive spaces (not tabs) to single spaces
s,  *, ,g

# Remove any leading blank from each line
s,^ ,,

# Remove DOS line breaks (^M)
s,\r,,

# Delete blank lines
/^$/d
Also, is it be possible to insert the current date and time in a line using 'sed'. For example:
Code:
<!-- THIS FILE WAS GENERATED AUTOMATICALLY ON <date> AT <time>. DO NOT EDIT. -->
How can this be done?

Daniel

Last edited by dfrechet; 10-11-2005 at 06:42 AM.
 
Old 10-11-2005, 03:03 AM   #7
bigearsbilly
Senior Member
 
Registered: Mar 2004
Location: england
Distribution: FreeBSD, Debian, Mint, Puppy
Posts: 3,282

Rep: Reputation: 172Reputation: 172
hats off to archtoad!

Quote:
3i\
<!-- THIS FILE IS GENERATED AUTOMATICALLY. DO NOT EDIT. -->
this works OK for me, you haven't got a space or DOS ^M
after the \ have you?

Inserting date, hmmm, don't reckon so; not in plain old sed.

one can also delete spaces like:

Code:
#!/usr/bin/sed -nf

/./p
I.e -n = no default print then print any lines with at least 1 character.
 
Old 10-11-2005, 07:13 AM   #8
dfrechet
LQ Newbie
 
Registered: Oct 2005
Location: St-Bruno, Quebec, Canada
Distribution: RH ES 4
Posts: 6

Original Poster
Rep: Reputation: 0
That was it. I removed the ^M character and everything worked.

Thank you.

Daniel
 
Old 10-11-2005, 08:14 AM   #9
dfrechet
LQ Newbie
 
Registered: Oct 2005
Location: St-Bruno, Quebec, Canada
Distribution: RH ES 4
Posts: 6

Original Poster
Rep: Reputation: 0
For the benefit of all, here is the final version of my 'sed' command file:

Code:
# Remove all occurrences of the following lines from the merged file
# except the 1st 2 lines
1,2!s,<?xml version.*>,,
1,2!s,<xsl:stylesheet.*>,,
s,</xsl:stylesheet.>,,

# Insert the following line after the 2nd line at the top of the file
3i\
<!-- THIS FILE IS GENERATED AUTOMATICALLY. DO NOT EDIT. -->

# Add the following line at the end of the merged file
$a\
</xsl:stylesheet>

# Reduce strings containing multiple consecutive spaces (not tabs) to single spaces
s,  *, ,g

# Remove any leading blank from each line
s,^ ,,

# Remove DOS line breaks (^M)
s,\r,,

# Remove comments inserted automatically by Stylus Studio
/<!-- Stylus Studio/,/-->/D

# Delete blank lines
/^$/d
 
Old 10-12-2005, 06:52 AM   #10
archtoad6
Senior Member
 
Registered: Oct 2004
Location: Houston, TX (usa)
Distribution: MEPIS, Debian, Knoppix,
Posts: 4,727
Blog Entries: 15

Rep: Reputation: 230Reputation: 230Reputation: 230
Thanks for the compliments.

Would:
Code:
4i <\!-- Created on  `date`  -->
solve your date stamping problem?
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
strip html tags rblampain Programming 6 08-07-2005 06:22 AM
Docbook XSL Stylesheets /etc/xml/catalog Ironica Linux - Software 0 12-10-2004 03:03 AM
XML and XSL Kedelfor Programming 4 09-13-2004 05:30 PM
xml parsing with xsl crabboy Programming 2 03-22-2004 01:45 AM
Xerces, Xalan -- XML + XSL = HTML marktaff Linux - Software 1 10-15-2002 10:01 PM


All times are GMT -5. The time now is 06:36 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration