LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Concatenate string through variable in awk (https://www.linuxquestions.org/questions/linux-newbie-8/concatenate-string-through-variable-in-awk-4175594187/)

beca123456 11-24-2016 03:05 PM

Concatenate string through variable in awk
 
Hi,

input.tab (pipe-separated):
Code:

DATE|PRODUCTS|Customer_A|Customer_B|Customer_C
01Jan|meat:fruit:dairy|0,4:21,8:3,55|90,123:34,2:54,111|0,0:1,0:0,12
02Jan|fruit:meat:other|12,0:1,34:432,9|134,0:322,3:45,0|32,56:54,0:654,0


$1: Transaction date

$2: describe the order of the different type of products separated by ":" (change from one record to another)

$3-$NF: Customer transactions.
. Each type of products are separated by ":" and are described in $1
. Numbers at the left and right of the comma are the "purchased" and "sold" items respectively

For example, the 01Jan Customer_A:
- purchased 0 meat, 21 fruits, 3 dairies
- sold 4 meat, 8 fruits, 55 dairy

But the 02Jan Customer_A:
- purchased 12 fruits, 1 meat, 432 other
- sold 0 fruit, 34 meat, 9 other


OBJECTIVE: for each date, count the number and list the name(s) of customers who sold fruits, and append the original line:
Code:

Number_Customer|Customers|DATE|PRODUCTS|Customer_A|Customer_B|Customer_C
2.00|Customer_A_(21,8); Customer_B_(34,2)|01Jan|meat:fruit:dairy|0,4:21,8:3,55|90,123:34,2:54,111|0,0:1,0:0,12
1.00|Customer_C_(32,56)|02Jan|fruit:meat:other|12,0:1,34:432,9|134,0:322,3:45,0|32,56:54,0:654,0


MY CODE SO FAR:
Code:

gawk '
BEGIN{FS=OFS="|"}
NR==1{
        for(j=3; j<=NF; j++){
                cust_j=$j
        }
        print "Number_Customer|Customers" FS $0
}

NR>1{
        # Identify the "fruit" data in FORMAT
        a=split($2,b,":")

        for(i=1; i<=a; i++){
                if(b[i] ~ /^fruit$/){
                        index_fruit=i
                }
        }

        # Extract sold fruit (i.e. number on the right of the comma) in each "Customer_X" fields
        # Concatenate Customers in variable "string"
        for(j=3; j<=NF; j++){
                split($j,c,":")
                split(c[index_fruit],d,",")
                if(d[2] > 0){
                        x+=1
                        customer_j=cust_j"_("c[index_fruit]")"
                }
                else{
                        x+=0
                        customer_j=""
                }
               
                string=string";"customer_j
        }
       
        # Print fields
        if(x!=0){
                printf("%.2f\|%s\|%s\n",x,string,$0)
                x=0
                string=""
        }
        else{
                print "0.00" FS "-" FS $0
        }
}' input.tab

With this code I get the following output. The Customer names are wrong (it keeps only the last one of the loop), and I have extra ";"
Code:

Number_Customer|Customers|DATE|PRODUCTS|Customer_A|Customer_B|Customer_C
2.00|;Customer_C_(21,8);Customer_C_(34,2);|01Jan|meat:fruit:dairy|0,4:21,8:3,55|90,123:34,2:54,111|0,0:1,0:0,12
1.00|;;;Customer_C_(32,56)|02Jan|fruit:meat:other|12,0:1,34:432,9|134,0:322,3:45,0|32,56:54,0:654,0

I don't get why in $2 the customer names are wrong but the figures are correct.
Probably because the NR==1 block keeps the last iteration of the loop...

grail 11-25-2016 03:13 AM

I am not sure why you have the 'else' when you are trying to build the customer string as you only want to add the customer when the sold value is greater than zero.

I ended up building my own to see where we differ. Your problem is with your original for loop:
Code:

for(j=3; j<=NF; j++){
  cust_j=$j
}

Here, cust_j will always end up being equal to the last customer name as it is a static value. Try making it an array ;)

Here is mine as a comparison:
Code:

BEGIN{ FS=OFS="|" }

NR == 1{
  for(i = 3; i <= NF; i++)
    cust[i] = $i

  print "Number_Customer|Customers",$0
}

NR > 1{
  n = split($2, f, ":")

  for(i = 1; i <= n; i++)
    if(f[i] == "fruit")
      pos = i

  for(i = 3; i <= NF; i++){
    split($i, s, "[:,]")
    if(s[pos * 2] > 0){
      custs = custs (custs != "" ?";":"")cust[i]"_("s[pos * 2 - 1]","s[pos * 2]")"
      n_cust++
    } 
  }

  if( custs != "" )
    print n_cust,custs,$0

  custs = ""
  n_cust = 0
}

And here is my output:
Code:

Number_Customer|Customers|DATE|PRODUCTS|Customer_A|Customer_B|Customer_C
2|Customer_A_(21,8);Customer_B_(34,2)|01Jan|meat:fruit:dairy|0,4:21,8:3,55|90,123:34,2:54,111|0,0:1,0:0,12
1|Customer_C_(32,56)|02Jan|fruit:meat:other|12,0:1,34:432,9|134,0:322,3:45,0|32,56:54,0:654,0



All times are GMT -5. The time now is 05:34 AM.