[SOLVED] Help with this piece of script ?

Ezzmazz · 04-15-2019, 11:40 AM

Hello,

Can you help me with this piece of script ?

I have this file :

Code:

July 2018,19%,46%
August 2018,20%,45%
September 2018,20%,41%
October 2018,21%,39%
November 2018,21%,39%
December 2018,21%,41%
January 2019,25%,46%
February 2019,27%,50%

I need to calculate the difference between the values in the second column but two by two:

For exemple :

July -> August
CPU : +1%

August -> September
CPU : +0%

September -> October
CPU : +1%

To do this, I use this script :

Code:

awk -F'[ ,%]' 'FNR==0{next}
               FNR>1{print m " → " $1;printf "CPU : %+d%%%s\n",$3-u,ORS}
               {m=$1;u=$3}
              ' CLUSTER_1.txt

But I'm not quite sure I'm sure I understand it well. Can you help me ?

I think I understand the beginning :

FNR = Number of records in the file

As long as the number of records in the file is 0, then continue.

If FNR>1, then print the variable m, followed by an arrow, an element in column 1 then print " CPU : ".

But I don't understand this part :

Code:

 %+d%%%s\n",$3-u,ORS}
               {m=$1;u=$3}

In fact, I don't understand how the calculation is done.

Can you show me?

Thank you !

Turbocapitalist · 04-15-2019, 11:57 AM

It can be rearranged slightly for clarity:

Code:

awk -F'[ ,%]' 'FNR==0{next}
               FNR>1{
                       print m " → " $1;
                       printf "CPU : %+d%%%s\n",$3-u,ORS;
               }
               {m=$1;u=$3}
              ' CLUSTER_1.txt

Lookup the printf function in the AWK manual. "man awk"
The % is a special character, so to print just a % you have to write it as %%.

Normal variables are just plain names, they don't have $ in front, except for the field numbers.
So the m=$1 and u=$3 just save the first and third field for later as m and u respectively. The calculation is done the next time through with $3 - u, which subtracts the third field from the previous record from the third field in the current record.

The fields are separated by a pattern [ ,%], so a space or comma or percentage sign will mark the border between fields.

The body of the above AWK script could be expressed something like this in pseudo-code:

Code:

if(FNR==0) then {
        next
}

if (FNR>1 ) then {
        print m " → " $1;
        printf ("CPU : %+d%%%s\n",$3-u,ORS);
}

if (true) then {
        m=$1;
        u=$3
}

MadeInGermany · 04-15-2019, 12:39 PM

FNR==0 is never true. You can omit it.

printf:
the %+d refers to the first following argument ($3-u)
the %% is a % character.
The %s refers the second following argument (ORS).
ORS is "\n" (=linefeed) by default, so you can as well say: printf ("CPU : %+d%%\n\n", $3-u)

Ezzmazz · 04-16-2019, 03:52 AM

Quote:

Originally Posted by Turbocapitalist

It can be rearranged slightly for clarity:

Code:

awk -F'[ ,%]' 'FNR==0{next}
               FNR>1{
                       print m " → " $1;
                       printf "CPU : %+d%%%s\n",$3-u,ORS;
               }
               {m=$1;u=$3}
              ' CLUSTER_1.txt

Lookup the printf function in the AWK manual. "man awk"
The % is a special character, so to print just a % you have to write it as %%.

Normal variables are just plain names, they don't have $ in front, except for the field numbers.
So the m=$1 and u=$3 just save the first and third field for later as m and u respectively. The calculation is done the next time through with $3 - u, which subtracts the third field from the previous record from the third field in the current record.

The fields are separated by a pattern [ ,%], so a space or comma or percentage sign will mark the border between fields.

The body of the above AWK script could be expressed something like this in pseudo-code:

Code:

if(FNR==0) then {
        next
}

if (FNR>1 ) then {
        print m " → " $1;
        printf ("CPU : %+d%%%s\n",$3-u,ORS);
}

if (true) then {
        m=$1;
        u=$3
}

Quote:

Originally Posted by MadeInGermany

FNR==0 is never true. You can omit it.

printf:
the %+d refers to the first following argument ($3-u)
the %% is a % character.
The %s refers the second following argument (ORS).
ORS is "\n" (=linefeed) by default, so you can as well say: printf ("CPU : %+d%%\n\n", $3-u)

Hello

Thank you for those explanations ! I understand better this script !

One more thing I didn't really understand :

Where and how the values in the second column are selected ?

I mean, if I want to extract this colum with the CPU value, I make :

Code:

awk -F',' '{print $2}' CLUSTER_1.txt

And here I specify that I select the value 2. But with this script, I don't understand how the values of the second column are selected !

Concretly, what is $1 and $3 ?

Excuse me for my beginner's questions !

Turbocapitalist · 04-16-2019, 04:12 AM

The secret sauce is in the input Field Separator (FS) as defined by the pattern set by -F. There are several ways to assign the Field Separator a value or a pattern. The -F option is one of them. So your pattern is any time one of the characters between the brackets is found [ ,%] awk would start a new field. That is to say whenever a space, a comma, or a percentage sign is found. So consider this line:

August 2018,20%,45%

With your pattern, "August" would be the first field, "2018" the second, "20" the third, and 45 the fifth. Yes, the fifth.

Now if you modify the pattern to [ ,%]+ then that last one will be the fourth.

Code:

awk '{$1=$1;print;}' FS='[ ,%]'  OFS='|'  CLUSTER_1.txt
awk '{$1=$1;print;}' FS='[ ,%]+' OFS='|'  CLUSTER_1.txt
awk '{$1=$1;print;}' FS='[ ,%]+' OFS="\t" CLUSTER_1.txt

(The $1=$1 is a trick to get AWK to reformat the line using the Output Field Separator (OFS))

Then you can look at the first field and the third field. Or look at the second and fifth fields:

Code:

awk '{print $1,$3;}' FS='[ ,%]'  OFS='|'  CLUSTER_1.txt
awk '{print $2,$5;}' FS='[ ,%]'  OFS='|'  CLUSTER_1.txt

There's a good overview at the UNIX tutorial grymoire. See also "man awk"

Ezzmazz · 04-17-2019, 04:41 AM

Quote:

Originally Posted by Turbocapitalist

The secret sauce is in the input Field Separator (FS) as defined by the pattern set by -F. There are several ways to assign the Field Separator a value or a pattern. The -F option is one of them. So your pattern is any time one of the characters between the brackets is found [ ,%] awk would start a new field. That is to say whenever a space, a comma, or a percentage sign is found. So consider this line:

August 2018,20%,45%

With your pattern, "August" would be the first field, "2018" the second, "20" the third, and 45 the fifth. Yes, the fifth.

Now if you modify the pattern to [ ,%]+ then that last one will be the fourth.

Code:

awk '{$1=$1;print;}' FS='[ ,%]'  OFS='|'  CLUSTER_1.txt
awk '{$1=$1;print;}' FS='[ ,%]+' OFS='|'  CLUSTER_1.txt
awk '{$1=$1;print;}' FS='[ ,%]+' OFS="\t" CLUSTER_1.txt

(The $1=$1 is a trick to get AWK to reformat the line using the Output Field Separator (OFS))

Then you can look at the first field and the third field. Or look at the second and fifth fields:

Code:

awk '{print $1,$3;}' FS='[ ,%]'  OFS='|'  CLUSTER_1.txt
awk '{print $2,$5;}' FS='[ ,%]'  OFS='|'  CLUSTER_1.txt

There's a good overview at the UNIX tutorial grymoire. See also "man awk"

Hello !

Thank you for these clarifications ! I finally understand how it works !

So, with this csv file:

Code:

July 2018 CLUSTER_ALE_01 19% 46%
August 2018 CLUSTER_ALE_01 20% 45%
September 2018 CLUSTER_ALE_01 20% 41%
October 2018 CLUSTER_ALE_01 21% 39%
November 2018 CLUSTER_ALE_01 21% 39%
December 2018 CLUSTER_ALE_01 21% 41%
January 2019 CLUSTER_ALE_01 25% 46%
February 2019 CLUSTER_ALE_01 27% 50%

I take your script and added a line for RAM :

Code:

awk -F'[ ,%]' 'FNR==0{next}
               FNR>1{
		    print m " → " $1;
		    printf "CPU : %+d%%%s",$4-u,ORS
		    printf "RAM : %+d%%%s\n",$6-o,ORS
		    }
               {m=$1;u=$4}
	       {m=$1;o=$6}
              ' CLUSTER_1.txt

And the result is :

Code:

July → August
CPU : +1%
RAM : -1%

August → September
CPU : +0%
RAM : -4%

September → October
CPU : +1%
RAM : -2%

October → November
CPU : +0%
RAM : +0%

November → December
CPU : +0%
RAM : +2%

December → January
CPU : +4%
RAM : +5%

January → February
CPU : +2%
RAM : +4%

We agree that "u" or the "o" I added are just variable names to receive the content of $1 and $3?

Thank you again!

Turbocapitalist · 04-17-2019, 04:58 AM

Sure. There's a number of ways to write it:

Code:

awk -F'[ ,%]+' '
FNR>1{
        print m " → " $1; 
        printf("CPU : %+d%%%s",$3-cpu,ORS);
        printf("RAM : %+d%%%s",$4-ram,ORS);
        printf("%s", ORS);
}
{
        m=$1;cpu=$3;ram=$4
}
'

MadeInGermany · 04-17-2019, 03:10 PM

With shell builtins:

Code:

nr=0
while IFS=' ,' read month year pcpu pram junk
do
  cpu=${pcpu%\%}
  ram=${pram%\%}
  if [ $((nr+=1)) -gt 1 ]
  then
    echo "$omonth ? $month"
    printf "CPU : %+d%%\n" $((cpu - ocpu))
    printf "RAM : %+d%%\n" $((ram - oram))
    echo
  fi
  omonth=$month; ocpu=$cpu; oram=$ram
done < CLUSTER_1.txt

Note: unlike awk, bash cannot handle decimals, for example 4.2%

Ezzmazz · 04-18-2019, 08:43 AM

Quote:

Originally Posted by Turbocapitalist

Sure. There's a number of ways to write it:

Code:

awk -F'[ ,%]+' '
FNR>1{
        print m " → " $1; 
        printf("CPU : %+d%%%s",$3-cpu,ORS);
        printf("RAM : %+d%%%s",$4-ram,ORS);
        printf("%s", ORS);
}
{
        m=$1;cpu=$3;ram=$4
}
'

Quote:

Originally Posted by MadeInGermany

With shell builtins:

Code:

nr=0
while IFS=' ,' read month year pcpu pram junk
do
  cpu=${pcpu%\%}
  ram=${pram%\%}
  if [ $((nr+=1)) -gt 1 ]
  then
    echo "$omonth ? $month"
    printf "CPU : %+d%%\n" $((cpu - ocpu))
    printf "RAM : %+d%%\n" $((ram - oram))
    echo
  fi
  omonth=$month; ocpu=$cpu; oram=$ram
done < CLUSTER_1.txt

Note: unlike awk, bash cannot handle decimals, for example 4.2%

Hello !

Thank you for your answers ! I now understand better the use of awk !