[SOLVED] awk behavior unpredictable

master-of-puppets · 09-30-2014, 02:54 PM

On my debian wheezy box the output of df when using the -m option for 1MB Blocks and awking the values "Size" and "Used" looks like this:

Code:

[19:07:35] billsb@sideshow:/share/es-ops/scripts/BUILD_FARM $ df -m /export/ws/bob | awk '{print $2, $3}'
1M-blocks Used
1032124 531937

So to get my values you'd think I'd have to use:

Code:

df -m /export/ws/bob | awk '{print $2, $3}' | sed '1d'

But the loop that awks these values uses printf and doesn't seem to need the | sed '1d':

Code:

df -m "${WORKSPACES2[@]/#//export/ws/}" | awk '
BEGIN  { "date +'%m-%d-%y'" | getline date;
             printf "%s",date }
    NR > 1 { printf ",%s,%s", $2, $3; }
    END    { printf "\n"}' >> "$OUTPUT_DIR/$HOSTNAME.csv"

On my squeeze boxes the output of df when using the -m option for 1MB Blocks and awking the values "Size" and "Used" looks like this:

Code:

[19:16:31] billsb@simpsons:/share/es-ops/scripts/BUILD_FARM $ df -m /export/ws/bart | awk '{print $2, $3}'
1M-blocks Used

1140818 56858

It puts a blank line between the headers and the values. When I run it with sed '1d' it looks like this:

Code:

[19:18:14] billsb@simpsons:/share/es-ops/scripts/BUILD_FARM $ df -m /export/ws/bart | awk '{print $2, $3}' | sed '1d'

1140820 56856

I have 4 hosts sideshow, simpsons, moes, and flanders. The folders I'm trying to extract the values of "Size" and "used" from are:

Code:

case "$HOSTNAME" in
	simpsons) WORKSPACES=(bart_avail bart_used homer_avail home_used lisa_avail lisa_used marge_avail marge_used releases_avail releases_used rt-private_avail rt-private_used simpsons-ws0_avail simpsons-ws0_used simpsons-ws1_avail simpsons-ws1_used simpsons-ws2_avail simpsons-ws2_used vsimpsons-ws_avail vsimpsons-ws_used) ;;
	moes)     WORKSPACES=(barney_avail barney_used carl_avail carl_used lenny_avail lenny_used moes-ws2_avail moes-ws2_used) ;;
	flanders) WORKSPACES=(flanders-ws0_avail flanders-ws0_used flanders-ws1_avail flanders-ws1_used flanders-ws2_avail flanders-ws2_used maude_avail maude_used ned_avail ned_used rod_avail ro
d_used todd_avail todd_used to-delete_avail to-delete_used) ;;
esac

The path base is:

Code:

BASE=/export/ws

On sideshow (debian wheezy) the output looks like this:

Code:

,,,
sideshow
,bob_size,bob_used,mel_size,mel_used,sideshow-ws2_size,sideshow-ws2_used
09-25-14,1032124,508509,1032124,683647,1032108,46787
09-28-14,1032124,519385,1032124,690727,1032108,178159
09-29-14,1032124,519385,1032124,691161,1032108,178159
09-30-14,1032124,520456,1032124,711363,1032108,180249

Which is the desired output. On the others the output gets all messed up. Can anybody see by looking at the output of df on debian squeeze what might be happening?

I'm sure it's just a difference in df and awk between the systems.

This is squeeze:

Code:

No LSB modules are available.
Distributor ID: Debian
Description:    Debian GNU/Linux 6.0.7 (squeeze)
Release:        6.0.7
Codename:       squeeze

Code:

GNU Awk 3.1.7

This is wheezy:

Code:

No LSB modules are available.
Distributor ID: Debian
Description:    Debian GNU/Linux 7.3 (wheezy)
Release:        7.3
Codename:       wheezy

Code:

mawk 1.3.3 Nov 1996, Copyright (C) Michael D. Brennan

compiled limits:
max NF             32767
sprintf buffer      2040

master-of-puppets · 09-30-2014, 05:51 PM

Nobody wants to help? hey at least I tried the best I could on my own before asking for help. Sheesh!

rknichols · 09-30-2014, 08:01 PM

I doubt you'll find many people who want to dig through all that code. Have you looked at the actual output from df on each system? One problem might be that df can insert extra line breaks when mount point names get too long. You can use the "-P" (--portability) option to prevent that.

My other suggestion is to try the standard debugging technique of putting a "set -x" command at the top of the script and then examining the output in detail to see what is actually getting executed. Saving the output from df in a file rather than piping it directly to awk might also prove instructive.

metaschima · 09-30-2014, 08:09 PM

I agree that it is a lot of code and rather vague description of what you need, and you expect someone to respond in 3 hours.

It shouldn't be too difficult for you to debug it, but nearly impossible for us with the amount of info you gave. Look at the output of 'df' on each system and you are likely to figure it out, if not, post the output. Also, adding functions will help clarify what is going on.

master-of-puppets · 09-30-2014, 09:03 PM

Quote:

Originally Posted by rknichols

I doubt you'll find many people who want to dig through all that code. Have you looked at the actual output from df on each system? One problem might be that df can insert extra line breaks when mount point names get too long. You can use the "-P" (--portability) option to prevent that.

My other suggestion is to try the standard debugging technique of putting a "set -x" command at the top of the script and then examining the output in detail to see what is actually getting executed. Saving the output from df in a file rather than piping it directly to awk might also prove instructive.

I'll add the output of df at the top of my post and I will use pseudo code to show to what's supposed to be happening. I have been pulling my hair out all day trying to figure it out. Please stay with me.

master-of-puppets · 09-30-2014, 09:05 PM

Quote:

Originally Posted by metaschima

I agree that it is a lot of code and rather vague description of what you need, and you expect someone to respond in 3 hours.

It shouldn't be too difficult for you to debug it, but nearly impossible for us with the amount of info you gave. Look at the output of 'df' on each system and you are likely to figure it out, if not, post the output. Also, adding functions will help clarify what is going on.

I'll add the output of df at the top of my post and I will use pseudo code to show to what's supposed to be happening. I have been pulling my hair out all day trying to figure it out. Please stay with me.

grail · 09-30-2014, 10:41 PM

So a few things:

1. Others have already advised to look at your df output

2. sed code is not needed in original awk as it says NR > 1, which means process all but the first line

3. The above obviously does not work as now the data starts on the third line. So i see your options as:

a. See point 1

b. NR > 2

c. NR > 1 && NF

You are going to have to learn more about awk if you are going to continue to use it. May I suggest looking at the manual

NevemTeve · 09-30-2014, 11:25 PM

my df(1) manual suggests using option '-P' to get a 'standard, portable output format'

master-of-puppets · 09-30-2014, 11:35 PM

Quote:

Originally Posted by grail

So a few things:

1. Others have already advised to look at your df output

2. sed code is not needed in original awk as it says NR > 1, which means process all but the first line

3. The above obviously does not work as now the data starts on the third line. So i see your options as:

a. See point 1

b. NR > 2

c. NR > 1 && NF

You are going to have to learn more about awk if you are going to continue to use it. May I suggest looking at the manual

You are awesome.

The first pass:

Code:

[21:32:31] billsb@sideshow:/share/es-ops/scripts/BUILD_FARM $ bash -x ./test.sh
+ OUTPUT_DIR=/share/es-ops/Build_Farm_Reports/WorkSpace_Reports
+ BASE=/export/ws
++ date +%m-%d-%y
+ TODAY=09-30-14
++ hostname
+ HOSTNAME=sideshow
+ case "$HOSTNAME" in
+ WORKSPACES3=("bob_avail" "bob_used" "mel_avail" "mel_used" "sideshow-ws2_avail" "sideshow-ws2_used")
+ '[' -f test.csv ']'
++ hostname
+ '[' sideshow == sideshow ']'
+ echo sideshow
+ separator=,
+ for v in '"${WORKSPACES3[@]}"'
+ echo -n ,bob_avail
+ for v in '"${WORKSPACES3[@]}"'
+ echo -n ,bob_used
+ for v in '"${WORKSPACES3[@]}"'
+ echo -n ,mel_avail
+ for v in '"${WORKSPACES3[@]}"'
+ echo -n ,mel_used
+ for v in '"${WORKSPACES3[@]}"'
+ echo -n ,sideshow-ws2_avail
+ for v in '"${WORKSPACES3[@]}"'
+ echo -n ,sideshow-ws2_used
+ echo
+ case "$HOSTNAME" in
+ WORKSPACES4=("bob" "mel" "sideshow-ws2")
+ df -m /export/ws/bob /export/ws/mel /export/ws/sideshow-ws2
+ awk '
BEGIN  { "date +%m-%d-%y" | getline date;
                         printf "%s",date }
        NR > 1 && NF { printf ",%s,%s", $2, $3; }
        END    { printf "\n"}'

[21:32:35] billsb@sideshow:/share/es-ops/scripts/BUILD_FARM $ cat test.csv
sideshow
,bob_avail,bob_used,mel_avail,mel_used,sideshow-ws2_avail,sideshow-ws2_used
09-30-14,1032124,531937,1032124,722301,1032108,151337

good so far and now the second pass:

Code:

[21:32:40] billsb@sideshow:/share/es-ops/scripts/BUILD_FARM $ bash -x ./test.sh
+ OUTPUT_DIR=/share/es-ops/Build_Farm_Reports/WorkSpace_Reports
+ BASE=/export/ws
++ date +%m-%d-%y
+ TODAY=09-30-14
++ hostname
+ HOSTNAME=sideshow
+ case "$HOSTNAME" in
+ WORKSPACES3=("bob_avail" "bob_used" "mel_avail" "mel_used" "sideshow-ws2_avail" "sideshow-ws2_used")
+ '[' -f test.csv ']'
+ '[' -f test.csv ']'
++ hostname
+ '[' sideshow == sideshow ']'
+ case "$HOSTNAME" in
+ WORKSPACES5=("bob" "mel" "sideshow-ws2")
+ df -m /export/ws/bob /export/ws/mel /export/ws/sideshow-ws2
+ awk '
BEGIN  { "date +%m-%d-%y" | getline date;
                         printf "%s",date }
        NR > 1 && NF { printf ",%s,%s", $2, $3; }
        END    { printf "\n"}'

[21:33:59] billsb@sideshow:/share/es-ops/scripts/BUILD_FARM $ cat test.csv
sideshow
,bob_avail,bob_used,mel_avail,mel_used,sideshow-ws2_avail,sideshow-ws2_used
09-30-14,1032124,531937,1032124,722301,1032108,151337
09-30-14,1032124,531937,1032124,722301,1032108,151337

Thanks so much this is SOLVED!!!!!

pan64 · 10-01-2014, 12:48 AM

I would not put that date call into awk, instead:

Code:

awk -v A="$(date +%m-%d-%y)" ' BEGIN { printf A } '
looks better at least for me.

About looking for valid lines you can do the following too:
df -m | awk '/% \// { printf ",%s,%s", $2, $3; }'

grail · 10-01-2014, 02:37 AM

Is there something wrong with awk's date abilities?

Code:

awk 'BEGIN{print strftime("%m-%d-%Y")}'

ntubski · 10-01-2014, 09:53 AM

Quote:

Originally Posted by grail

Is there something wrong with awk's date abilities?

Portability:

Quote:

9.1.5 Time Functions

gawk provides the following functions for working with timestamps. They are gawk extensions; they are not specified in the POSIX standard. However, recent versions of mawk (see Other Versions) also support these functions.

master-of-puppets · 10-01-2014, 03:22 PM

Quote:

Originally Posted by pan64

I would not put that date call into awk, instead:

Code:

awk -v A="$(date +%m-%d-%y)" ' BEGIN { printf A } '
looks better at least for me.

About looking for valid lines you can do the following too:
df -m | awk '/% \// { printf ",%s,%s", $2, $3; }'

I'll try it when I get some time thanks. Plus I'm taking an awk tutorial to save you guys headaches.

---------- Post added 10-01-14 at 03:23 PM ----------

Quote:

Originally Posted by grail

Is there something wrong with awk's date abilities?

Code:

awk 'BEGIN{print strftime("%m-%d-%Y")}'

Thanks I'll give it a try when I get the chance.

master-of-puppets · 10-01-2014, 03:23 PM

Quote:

Originally Posted by ntubski

Portability:

Awesome information thank you very much.

master-of-puppets · 10-01-2014, 03:25 PM

Quote:

Originally Posted by grail

So a few things:

1. Others have already advised to look at your df output

2. sed code is not needed in original awk as it says NR > 1, which means process all but the first line

3. The above obviously does not work as now the data starts on the third line. So i see your options as:

a. See point 1

b. NR > 2

c. NR > 1 && NF

You are going to have to learn more about awk if you are going to continue to use it. May I suggest looking at the manual

How do I give you credit for solving the issue? I looked around for a "Solved" button but couldn't find one.