LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 06-23-2008, 12:24 PM   #1
gnashley
Amigo developer
 
Registered: Dec 2003
Location: Germany
Distribution: Slackware
Posts: 4,928

Rep: Reputation: 612Reputation: 612Reputation: 612Reputation: 612Reputation: 612Reputation: 612
Reformat 'pretty' xml to one-line entries


I have some xml files (actually aiml) which are mostly formatted in a standard xml-style with opening and closing tags which match. The content between the opening and closing tags stretches across multiple lines.
How can I reformat each tag set into one long line? I assume that 'awk' is probably the easiest way to do it, but I'm not particular about using sed or perl if they are easier.

The matching tags are <category> and </category>. It would save an extra step later if all tabs and multiple spaces were normalized to single spaces.

For instance, something like this:
Code:
<category>
    <pattern>TEXT </pattern>
    <template>TEXT   </template>
</category>
should come out like this:
Code:
<category><pattern>TEXT </pattern><template>TEXT </template></category>
I have several more operations to do on each tag-set, but they can all be done easier on line-by-line basis, so I'd like to do the above first.
Anybody know any good one-liners?
 
Old 06-23-2008, 01:39 PM   #2
brianmcgee
Member
 
Registered: Jun 2007
Location: Munich, Germany
Distribution: RHEL, CentOS, Fedora, SLES (...)
Posts: 399

Rep: Reputation: 40
Code:
# xmllint --noblanks file.xml > unpretty.xml
Make it pretty again:

Code:
# xmllint --format unpretty.xml > pretty.xml
 
Old 06-23-2008, 04:34 PM   #3
gnashley
Amigo developer
 
Registered: Dec 2003
Location: Germany
Distribution: Slackware
Posts: 4,928

Original Poster
Rep: Reputation: 612Reputation: 612Reputation: 612Reputation: 612Reputation: 612Reputation: 612
I hadn't heard of xmllint. I tried it, but it doesn't do exactly what I want. It strips out all white space. I need to preserve white space, but only as single spaces.
Also, my goal is to transform the whole document into non-xml code.
Let me restate what I want:

Concatenate all text between <category> and </category> into a single line. It doesn't matter if the <category> and </category> tags are stripped off as well. Because the spacing of the original documents is quite irregular, I can't seem to come up with a dependable combination of substitutions using sed. I'm pretty sure that awk is gonna be the best for this. I'm still going to do several more steps on each line which can be handles with (mostly) simple substitutions.
Here's another example of how messed up the input file can be:
Code:
<category>
    <pattern>TEXT </pattern>
    <template>TEXT   </template>
  </category>               <category> 
<pattern>TEXT </pattern>
              <template>TEXT   </template>
</category>
Again, that should come out something like this:

Code:
<category> <pattern>TEXT </pattern> <template>TEXT </template> </category>
<category> <pattern>TEXT </pattern> <template>TEXT </template> </category>
tabs and multiple spaces should be reduced to single spaces, but I can do that afterwards, too.
 
  


Reply

Tags
formating, line, xml


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
how could I delete duplicates entries in xml using php catzilla Programming 2 10-30-2005 07:08 PM
how to delete duplicates entries in xml file using sed/awk/sort ? catzilla Linux - Software 1 10-28-2005 02:57 PM
Command Line Entries AndeAnderson Linux - Newbie 4 04-14-2005 12:11 PM
hdd reformat from the command line Dewey Linux - Newbie 2 03-28-2005 10:39 PM
command line help - pretty basic Chuck23 Linux - Hardware 6 01-13-2005 10:30 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 05:06 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration