LinuxQuestions.org - URGENT: Help for Find and Replace with AWK

- Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)

- - URGENT: Help for Find and Replace with AWK (https://www.linuxquestions.org/questions/linux-newbie-8/urgent-help-for-find-and-replace-with-awk-604732/)

URGENT: Help for Find and Replace with AWK

Dear collegues

I have a big problem when replacing strings with AWK. I'm integrating a XML conversor in a legacy application and I need to parse the receiving XML files.

Because received XML files are not well formatted I need to eliminate blank spaces -no problem- and to separate the xml tags, each one in one line, and finally parse XML blocks for convert to specific flat files with a C module.

cat $FILE | awk '( NF > 0 ) { print $0 }' > $FILE_TMP1
if [ -f "$FILE_TMP1" ] ; then
cat $FILE_TMP2 | awk -v RS='><' -v OFS='><' '{print -ksh}' > $FILE_TMP2
else
echo "-69258"
exit -1
fi

for i in $(ls -1rt *.$FILE_TYPE)
do
while read -r LINE
do
PARAM=`echo $LINE | cut -f 2 -d "<" | cut -f 1 -d ">"`
if [[ "$PARAM" = "$INICI_BLOC_ORD" ]] ; then
SUFIX=`date +%Y%m%d%H%M%S`
FILE="ORD_"$SUFIX
FILE_FI=$FILE".PROC"
touch ./$FILE
elif [ "$PARAM" = "$FI_BLOC_ORD" ] ; then
echo "$FI_FILE_ORD" >> ./$FILE
mv ./$FILE $DIR_PROC"/"$FILE_FI
else
if [[ -f "$FILE" ]] ; then
echo "$LINE" >> ./$FILE
fi
fi
done < $i
done
....

But the command:
$ cat $FILE_TMP2 | awk -v RS='><' -v OFS='><' '{print $0}' > $FILE_TMP2
don't work well, because in $FILE_TMP2 this appears
...
<NamespaceR:SrvcId
STR</NamespaceR:SrvcId
<NamespaceR:PlanningCode
T</NamespaceR:PlanningCode
<NamespaceR:ProtocolType
COM</NamespaceR:ProtocolType
<NamespaceR:FileRef
NRXS505987257RR6</NamespaceR:FileRef
</NamespaceR:RoutingTable
...
when this was waiting for me
...
<NamespaceR:SrvcId>STR</NamespaceR:SrvcId>
<NamespaceR:PlanningCode>T</NamespaceR:PlanningCode>
<NamespaceR:ProtocolType>COM</NamespaceR:ProtocolType>
<NamespaceR:FileRef>NRXS505987257RR6</NamespaceR:FileRef>
</NamespaceR:RoutingTable>
....

The ">" are missing and the resultant mask is not the desired, because I need replace
"><"
by
">
>"

Any suggerence?

Best Regards!

Xavier

Have you looked at SED? It is well-suited to search and replace.

My favorite SED and AWK tutorials---and more: http://www.grymoire.com/Unix/

Also, it's better to put your code in [code] tags---preserves formatting and is easier to read.

That could get hairy. You might want to look at Perl, which has several XML parsing modules.

Tabs, Blanks, Tags and Trips!

I'm sorry!

I correct the command line of the post:
But the command:
$ cat $FILE_TMP2 | awk -v RS='><' -v OFS='>\n<' '{print $0}' > $FILE_TMP2
don't work well, because in $FILE_TMP2 this ...

Great collegues! Thanks a lot!

I'm using the next command for trim blank lines and tabs, and well-formatting all the XML tags! (ksh AIX5.3)

cat $FILE_TMP2 | awk '( NF > 0 ) { print $0 }' | sed -e 's/ //g' | sed -e 's/></>***</g' | tr -s '*' '\012' > $TMP_FILE

This is the correct order, but if I change it those dont work: the last line is lost!

The sequences is beloww:

For trimiing blank lines...
cat $FILE | awk '( NF > 0 ) { print $0 }'

For triming tabs...
| sed -e 's/ //g'

and finally, avoiding more than one xml tag in one line...
| sed -e 's/></>***</g' | tr -s '*' '\012' > $TMP_FILE

Cheers!

One more time, thanks a lot!

Xavier
:)