URGENT: Help for Find and Replace with AWK
Dear collegues
I have a big problem when replacing strings with AWK. I'm integrating a XML conversor in a legacy application and I need to parse the receiving XML files. Because received XML files are not well formatted I need to eliminate blank spaces -no problem- and to separate the xml tags, each one in one line, and finally parse XML blocks for convert to specific flat files with a C module. cat $FILE | awk '( NF > 0 ) { print $0 }' > $FILE_TMP1 if [ -f "$FILE_TMP1" ] ; then cat $FILE_TMP2 | awk -v RS='><' -v OFS='><' '{print -ksh}' > $FILE_TMP2 else echo "-69258" exit -1 fi for i in $(ls -1rt *.$FILE_TYPE) do while read -r LINE do PARAM=`echo $LINE | cut -f 2 -d "<" | cut -f 1 -d ">"` if [[ "$PARAM" = "$INICI_BLOC_ORD" ]] ; then SUFIX=`date +%Y%m%d%H%M%S` FILE="ORD_"$SUFIX FILE_FI=$FILE".PROC" touch ./$FILE elif [ "$PARAM" = "$FI_BLOC_ORD" ] ; then echo "$FI_FILE_ORD" >> ./$FILE mv ./$FILE $DIR_PROC"/"$FILE_FI else if [[ -f "$FILE" ]] ; then echo "$LINE" >> ./$FILE fi fi done < $i done .... But the command: $ cat $FILE_TMP2 | awk -v RS='><' -v OFS='><' '{print $0}' > $FILE_TMP2 don't work well, because in $FILE_TMP2 this appears ... <NamespaceR:SrvcId STR</NamespaceR:SrvcId <NamespaceR:PlanningCode T</NamespaceR:PlanningCode <NamespaceR:ProtocolType COM</NamespaceR:ProtocolType <NamespaceR:FileRef NRXS505987257RR6</NamespaceR:FileRef </NamespaceR:RoutingTable ... when this was waiting for me ... <NamespaceR:SrvcId>STR</NamespaceR:SrvcId> <NamespaceR:PlanningCode>T</NamespaceR:PlanningCode> <NamespaceR:ProtocolType>COM</NamespaceR:ProtocolType> <NamespaceR:FileRef>NRXS505987257RR6</NamespaceR:FileRef> </NamespaceR:RoutingTable> .... The ">" are missing and the resultant mask is not the desired, because I need replace "><" by "> >" Any suggerence? Best Regards! Xavier |
Have you looked at SED? It is well-suited to search and replace.
My favorite SED and AWK tutorials---and more: http://www.grymoire.com/Unix/ Also, it's better to put your code in [code] tags---preserves formatting and is easier to read. |
That could get hairy. You might want to look at Perl, which has several XML parsing modules.
|
Tabs, Blanks, Tags and Trips!
I'm sorry!
I correct the command line of the post: But the command: $ cat $FILE_TMP2 | awk -v RS='><' -v OFS='>\n<' '{print $0}' > $FILE_TMP2 don't work well, because in $FILE_TMP2 this ... Great collegues! Thanks a lot! I'm using the next command for trim blank lines and tabs, and well-formatting all the XML tags! (ksh AIX5.3) cat $FILE_TMP2 | awk '( NF > 0 ) { print $0 }' | sed -e 's/ //g' | sed -e 's/></>***</g' | tr -s '*' '\012' > $TMP_FILE This is the correct order, but if I change it those dont work: the last line is lost! The sequences is beloww: For trimiing blank lines... cat $FILE | awk '( NF > 0 ) { print $0 }' For triming tabs... | sed -e 's/ //g' and finally, avoiding more than one xml tag in one line... | sed -e 's/></>***</g' | tr -s '*' '\012' > $TMP_FILE Cheers! One more time, thanks a lot! Xavier :) |
All times are GMT -5. The time now is 12:55 AM. |