LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (http://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Help in Parsing data (http://www.linuxquestions.org/questions/linux-newbie-8/help-in-parsing-data-862016/)

saurabhmehan 02-11-2011 06:21 AM

Help in Parsing data
 
I have below string

Code:


Transaction_ID:SDP-DM-151204679 , Transaction_DateTime:2011-02-11 00:00:15 GMT+05:30 , Transaction_Mode:WAP , Circle_ID:4 , Circle_Name:BJ ,Zone: , CustomerID:B_31563486 , MSISDN:7870904329 , IMSI:405876122068099 , IMEI: , Sub_Profile:Pre-Paid , CPID:Nazara , CPNAME:Nazara , Content_ID:NA , Content_Name:Java%20Games , Base_Price:50.0 , Charge_Code:code50 , Content_Price:50 , Content_Status: , Content_Type:Games , Other_Info:] , Static_ID:BJ#25848082 , Original_Content_Owner_ID:Nazara , External_Correlation_Id:2011021023585279271 , Product_Name: , Sender_MSISDN:56363 , Subscription_Channel: , Subscription_Type: , Location:BJ , PPL_FLAG:TRUE , CustomCDRInterceptor - CDR Info[Optional_Field1: , Optional_Field2: ,



I need the output like :

Code:


SDP-DM-151204679,2011-02-11 00:00:15 GMT+05:30,WAP,4,BJ,,B_31563486,7870904329,405876122068099,,Pre-Paid,Nazara,Nazara,NA,Java%20Games,50.0,code50,50,,Games,,BJ#25848082,Nazara,2011021023585279271,,56363,,,BJ,TRUE,,,


syg00 02-11-2011 06:47 AM

Show us what you have attempted - we can't (won't) write it for you.
Help maybe.

saurabhmehan 02-11-2011 06:53 AM

Parsing probem
 
I am trying the following code but not work as expected
Code:

awk -F"," '{print split ( $0, s, ":" ); print $2}' filename
Quote:

Originally Posted by syg00 (Post 4254977)
Show us what you have attempted - we can't (won't) write it for you.
Help maybe.


colucix 02-11-2011 07:21 AM

Quote:

Originally Posted by saurabhmehan (Post 4254984)
awk -F"," '{print split ( $0, s, ":" ); print $2}' filename

This does not manage correctly the Transaction_DateTime filed, that contains multiple colons in its value. Also you have to manage extra spaces after and before the commas. Something like this should do the trick:
Code:

awk 'BEGIN{RS=ORS=","}!/\n/{if ( $0 ~ /Transaction_DateTime/ ) $0 = gensub(/Transaction_DateTime:(.*)/,"\\1","g"); else $0 = gensub(/^.*:]*(.*)/,"\\1","g"); gsub(/^ +| +$/,""); print}END{printf "\n"}' file
This also takes in account the extra bracket in the "Other_Info:]" field. I'm not sure it is a typo. Anyway the solution does not change. Hope this helps.

grail 02-11-2011 08:56 AM

Or something like:
Code:

awk '{print $2}' FS="[]:]" ORS="," RS="[ \t]*,[ \t]*" file
If you wish to end on a new line you may wish to add an END command to put it in.

colucix 02-11-2011 09:37 AM

Quote:

Originally Posted by grail (Post 4255074)
Or something like:
Code:

awk '{print $2}' FS="[]:]" ORS="," RS="[ \t]*,[ \t]*" file
If you wish to end on a new line you may wish to add an END command to put it in.

Yes, it would be easy. But - as I already mentioned - the problem is with the extra colons in the Transaction_DateTime field. Your code removes the minutes, seconds and timezone fields, since they are $3, $4 and $5. Anyway, I'm sure you can find out a more compact version of my code in post #4. ;)

grail 02-11-2011 10:20 AM

Sorry ... good point ... how about:
Code:

awk 'sub(/^[^:]+:[]]?/,"")' ORS="," RS="[ \t]*,[ \t]*" file

colucix 02-11-2011 02:32 PM

Quote:

Originally Posted by grail (Post 4255159)
Sorry ... good point ... how about:
Code:

awk 'sub(/^[^:]+:[]]?/,"")' ORS="," RS="[ \t]*,[ \t]*" file

Awesome! :)

saurabhmehan 02-15-2011 10:14 AM

Thanks
 
Thanks Grail.
Quote:

Originally Posted by grail (Post 4255159)
Sorry ... good point ... how about:
Code:

awk 'sub(/^[^:]+:[]]?/,"")' ORS="," RS="[ \t]*,[ \t]*" file


grail 02-15-2011 10:50 AM

No probs ... remember to mark as SOLVED once you have a solution.


All times are GMT -5. The time now is 05:33 AM.