LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (http://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   URGENT: Help for Find and Replace with AWK (http://www.linuxquestions.org/questions/linux-newbie-8/urgent-help-for-find-and-replace-with-awk-604732/)

onacorpuscle 12-05-2007 02:04 PM

URGENT: Help for Find and Replace with AWK
 
Dear collegues

I have a big problem when replacing strings with AWK. I'm integrating a XML conversor in a legacy application and I need to parse the receiving XML files.

Because received XML files are not well formatted I need to eliminate blank spaces -no problem- and to separate the xml tags, each one in one line, and finally parse XML blocks for convert to specific flat files with a C module.

cat $FILE | awk '( NF > 0 ) { print $0 }' > $FILE_TMP1
if [ -f "$FILE_TMP1" ] ; then
cat $FILE_TMP2 | awk -v RS='><' -v OFS='><' '{print -ksh}' > $FILE_TMP2
else
echo "-69258"
exit -1
fi

for i in $(ls -1rt *.$FILE_TYPE)
do
while read -r LINE
do
PARAM=`echo $LINE | cut -f 2 -d "<" | cut -f 1 -d ">"`
if [[ "$PARAM" = "$INICI_BLOC_ORD" ]] ; then
SUFIX=`date +%Y%m%d%H%M%S`
FILE="ORD_"$SUFIX
FILE_FI=$FILE".PROC"
touch ./$FILE
elif [ "$PARAM" = "$FI_BLOC_ORD" ] ; then
echo "$FI_FILE_ORD" >> ./$FILE
mv ./$FILE $DIR_PROC"/"$FILE_FI
else
if [[ -f "$FILE" ]] ; then
echo "$LINE" >> ./$FILE
fi
fi
done < $i
done
....

But the command:
$ cat $FILE_TMP2 | awk -v RS='><' -v OFS='><' '{print $0}' > $FILE_TMP2
don't work well, because in $FILE_TMP2 this appears
...
<NamespaceR:SrvcId
STR</NamespaceR:SrvcId
<NamespaceR:PlanningCode
T</NamespaceR:PlanningCode
<NamespaceR:ProtocolType
COM</NamespaceR:ProtocolType
<NamespaceR:FileRef
NRXS505987257RR6</NamespaceR:FileRef
</NamespaceR:RoutingTable
...
when this was waiting for me
...
<NamespaceR:SrvcId>STR</NamespaceR:SrvcId>
<NamespaceR:PlanningCode>T</NamespaceR:PlanningCode>
<NamespaceR:ProtocolType>COM</NamespaceR:ProtocolType>
<NamespaceR:FileRef>NRXS505987257RR6</NamespaceR:FileRef>
</NamespaceR:RoutingTable>
....

The ">" are missing and the resultant mask is not the desired, because I need replace
"><"
by
">
>"

Any suggerence?

Best Regards!

Xavier

pixellany 12-05-2007 02:17 PM

Have you looked at SED? It is well-suited to search and replace.

My favorite SED and AWK tutorials---and more: http://www.grymoire.com/Unix/

Also, it's better to put your code in [code] tags---preserves formatting and is easier to read.

chrism01 12-05-2007 06:32 PM

That could get hairy. You might want to look at Perl, which has several XML parsing modules.

onacorpuscle 12-07-2007 12:22 PM

Tabs, Blanks, Tags and Trips!
 
I'm sorry!

I correct the command line of the post:
But the command:
$ cat $FILE_TMP2 | awk -v RS='><' -v OFS='>\n<' '{print $0}' > $FILE_TMP2
don't work well, because in $FILE_TMP2 this ...

Great collegues! Thanks a lot!

I'm using the next command for trim blank lines and tabs, and well-formatting all the XML tags! (ksh AIX5.3)

cat $FILE_TMP2 | awk '( NF > 0 ) { print $0 }' | sed -e 's/ //g' | sed -e 's/></>***</g' | tr -s '*' '\012' > $TMP_FILE

This is the correct order, but if I change it those dont work: the last line is lost!

The sequences is beloww:

For trimiing blank lines...
cat $FILE | awk '( NF > 0 ) { print $0 }'

For triming tabs...
| sed -e 's/ //g'

and finally, avoiding more than one xml tag in one line...
| sed -e 's/></>***</g' | tr -s '*' '\012' > $TMP_FILE

Cheers!

One more time, thanks a lot!

Xavier
:)


All times are GMT -5. The time now is 05:03 PM.