Reformat 'pretty' xml to one-line entries
I have some xml files (actually aiml) which are mostly formatted in a standard xml-style with opening and closing tags which match. The content between the opening and closing tags stretches across multiple lines.
How can I reformat each tag set into one long line? I assume that 'awk' is probably the easiest way to do it, but I'm not particular about using sed or perl if they are easier. The matching tags are <category> and </category>. It would save an extra step later if all tabs and multiple spaces were normalized to single spaces. For instance, something like this: Code:
<category> Code:
<category><pattern>TEXT </pattern><template>TEXT </template></category> Anybody know any good one-liners? |
Code:
# xmllint --noblanks file.xml > unpretty.xml Code:
# xmllint --format unpretty.xml > pretty.xml |
I hadn't heard of xmllint. I tried it, but it doesn't do exactly what I want. It strips out all white space. I need to preserve white space, but only as single spaces.
Also, my goal is to transform the whole document into non-xml code. Let me restate what I want: Concatenate all text between <category> and </category> into a single line. It doesn't matter if the <category> and </category> tags are stripped off as well. Because the spacing of the original documents is quite irregular, I can't seem to come up with a dependable combination of substitutions using sed. I'm pretty sure that awk is gonna be the best for this. I'm still going to do several more steps on each line which can be handles with (mostly) simple substitutions. Here's another example of how messed up the input file can be: Code:
<category> Code:
<category> <pattern>TEXT </pattern> <template>TEXT </template> </category> |
All times are GMT -5. The time now is 12:28 PM. |