LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (http://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Want to improve the performance of script (http://www.linuxquestions.org/questions/linux-newbie-8/want-to-improve-the-performance-of-script-862838/)

saurabhmehan 02-15-2011 09:29 AM

Want to improve the performance of script
 
Hi All,

I have written a script as follows which is taking lot of time in executing/searching only 3500 records taken as input from one file in log file of 12 GB Approximately.
Working of script is read the csv file as an input having 2 arguments which are transaction_id,mobile_number and search the log file having these two strings with one more static string that is "CustomCDRInterceptor",then format the searched data in prescribed format.

Code of script written is as follows:
Code:

#!/bin/bash
if [ $# -ne 2 ]
then
    echo "Error in $0 - Invalid Argument Count"
    echo "Syntax: $0 input_file output_file"
    exit
fi

awk -F"," '{print $1 , $2}' $1 |
while read a b
do
  output=`cat $2 | grep "CustomCDRInterceptor" | grep "$a" | grep "$b" | cut -d"|" -f6 | awk -F"," '{print $4,",",$28,",",$27,",",$17,",",$12,","$21,",",$11,",",$26,",",$14,",",$6,",",$30,",",$31,",",$19,",",$5,",",$22,",",$10,",",$9,",",$20,",",$15,",",$29,",",substr($32,1,match($32,/\]/)-1),",",$23,",",$18,",",$24,",",$7,",",$13,",",$2,",",$25,",",$16,",",$8,",",$1,",",$3,","}'`
  #echo $output
  echo $output | perl -F, -lane 's/^\s*[- \w\[]+:(.*?)\s*$/$1/ foreach @F; print join ",", @F'
done

Sample data of input file will be like :
Code:

8273518145,SDP-DM-152281623
9062995078,SDP-DM-152281631
7870856010,SDP-DM-152281650
8445208702,SDP-DM-152281662
8923084825,SDP-DM-152281668
9061161091,SDP-DM-152281712
8401832603,SDP-DM-152281733
8273522929,SDP-DM-152281837
8341646298,SDP-DM-152281851
9062930630,SDP-DM-152281868

Sample Data in log file is as follows:
Code:

15-Feb-2011 20:56:36,538|8401131793|subscription_app|-23e57aa%3A12e29c422b8%3A502a|ChargeAmount|REInterceptor -  Is already Rated [Yes] RatedPrice [0.0]
15-Feb-2011 20:56:36,538|8401131793|subscription_app|-23e57aa%3A12e29c422b8%3A502a|ChargeAmount|ChargingInterceptor - subscriber details processed sucessfully- {arg0.referenceCode=balanceEnquiry:true;subsChannel:Unknown;channelType:Subscription;transactionId:-23e57aa%3A12e29c422b8%3A502a;pricePtAvl:true;eventType:subscription;contentId:4945;serviceId:CR03;Circle_Name:GJ;Circle_ID:5;isRated:Yes;productName:VAS0003ALL;basePrice:0.0;subsType:RECURRING;Sub_Profile:Pre-Paid, arg0.endUserIdentifier=8401131793, arg0.charge.description= Retrieve-Balance , arg0.charge.currency=INR, arg0.charge.code=, arg0.charge.amount=0.0}
15-Feb-2011 20:56:36,539|8445862834|subscription_app|5a1fa24a%3A12e29cb5fb3%3A1d75|ChargeAmount|CustomCDRInterceptor - CDR Info[Optional_Field1:,Subscription_Channel:Unknown,Optional_Field2:,Transaction_ID:,Content_ID:4945,IMEI:,Product_Name:VAS0003ALL,PPL_FLAG:,Charge_Code:,Base_Price:0.0,CustomerID:B_55822315,Circle_Name:UK,Sender_MSISDN:,IMSI:405818123375012,Content_Status:,Location:UK,Circle_ID:18,Original_Content_Owner_ID:,CPNAME:default_provider,Content_Price:0.0,Zone:,Content_Name:,Static_ID:UK#37453052,External_Correlation_Id:5a1fa24a%3A12e29cb5fb3%3A1d75,Subscription_Type:RECURRING,MSISDN:8445862834,Transaction_Mode:Subscription,Transaction_DateTime:2011-02-15 20:56:36 GMT+05:30,Content_Type:,Sub_Profile:Pre-Paid,CPID:,Other_Info:]
15-Feb-2011 20:56:36,539|8401131793|subscription_app|-23e57aa%3A12e29c422b8%3A502a|ChargeAmount|CustomCDRInterceptor - CDR Info[Optional_Field1:,Subscription_Channel:Unknown,Optional_Field2:,Transaction_ID:,Content_ID:4945,IMEI:,Product_Name:VAS0003ALL,PPL_FLAG:,Charge_Code:,Base_Price:0.0,CustomerID:B_44445354,Circle_Name:GJ,Sender_MSISDN:,IMSI:405927121139030,Content_Status:,Location:GJ,Circle_ID:5,Original_Content_Owner_ID:,CPNAME:default_provider,Content_Price:0.0,Zone:Default,Content_Name:,Static_ID:GJ#32697724,External_Correlation_Id:-23e57aa%3A12e29c422b8%3A502a,Subscription_Type:RECURRING,MSISDN:8401131793,Transaction_Mode:Subscription,Transaction_DateTime:2011-02-15 20:56:36 GMT+05:30,Content_Type:,Sub_Profile:Pre-Paid,CPID:,Other_Info:]
15-Feb-2011 20:56:36,540|8445862834|subscription_app|5a1fa24a%3A12e29cb5fb3%3A1d75|ChargeAmount|GetBalance|PaymentPlugin-Request -  Get User Balance of: 8445862834
15-Feb-2011 20:56:36,540|8401131793|subscription_app|-23e57aa%3A12e29c422b8%3A502a|ChargeAmount|GetBalance|PaymentPlugin-Request -  Get User Balance of: 8401131793
15-Feb-2011 20:56:36,545|8445862834|subscription_app|5a1fa24a%3A12e29cb5fb3%3A1d75|ChargeAmount|GetBalance|PaymentPlugin-Response -  Retrieved Balance Bucket: 1;20091003;20110810;21.50;|


i92guboj 02-15-2011 09:51 AM

For a starter, I'd rather use either awk, bash, or perl, but not the three interpreters together. That alone will probably cut down forking, ram usage, context switching and will avoid moving that amount of data from one process to another, much more taking into account that you are using awk and perl inside a loop, and they will be invoked thousands of times, probably.

grail 02-15-2011 09:57 AM

Well my first two observations are these:

1. Why use cat, grep, cut, awk and perl to do a job that could easily all be done in perl and probably even awk?

2. As $2 is listed in your 'Syntax' as the 'output' file, why is it not receiving any output but being used as input??

kirtimaan_bkn 02-15-2011 10:03 AM

perl is the tool which you need to parse large files.


All times are GMT -5. The time now is 11:06 AM.