LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 10-02-2014, 01:27 AM   #1
master-of-puppets
Member
 
Registered: Jun 2011
Posts: 49

Rep: Reputation: Disabled
Round numbers to two digits after the decimal point


Taken from a script I wrote to find the top 5 consumers of space on each directory

This command:
Code:
find $BASE/$v -printf "%u  %s\n" | awk '{user[$1]+=$2}; END{ for( i in user) print i " " user[i]}' | sed -e '/^[0-9]/d' | sed -e 's/root//g' | sed -e '/^ [0-9]/d' | awk '{print $1, $2/1024/1024/1024, "GB"}' | sort -n -r -k2 | head -5
gives me output like this:
Code:
Top 5 consumers of space per workspace on server sideshow 09-29-14
,,,
,,,
Top 5 consumers on workspace bob
,,,
radickj 97.0708 GB
nichols2 90.4442 GB
sherryr 75.3845 GB
rabii 67.4304 GB
lefevre 39.0694 GB
,,,
Top 5 consumers on workspace mel
,,,
akrishna 125.225 GB
somyalip 124.585 GB
mvijayas 105.741 GB
release 102.279 GB
vuhang 83.4457 GB
,,,
Top 5 consumers on workspace sideshow-ws2
,,,
marlette 124.913 GB
iyershan 35.785 GB
starkd 19.3732 GB
jcook 3.8147e-06 GB
baylisn 3.8147e-06 GB
,,,
and I want to round column 2 to have two digits after the decimal point.

This one gets me a little bit closer (good old Perl):
Code:
#!/bin/bash
find . -printf "%u  %s\n" | awk '{user[$1]+=$2}; END{ for( i in user) print i " " user[i]}' | perl -ne '
    @x = split;
    for ($i = 0; $i <= $#x; $i++) {
        if ($x[$i] =~ /^[0-9]*\.[0-9]+$/) {
            $x[$i] = int ($x[$i] * 100 + .5) / 100;
        };
        print "$x[$i] ";
    };
    print "\n";' | sed -e '/^[0-9]/d' | sed -e 's/root//g' | sed -e '/^ [0-9]/d' | awk '{print $1, $2/1024/1024/1024, "GB"}' | sort -n -r -k2
with output like this:
Code:
[23:41:22] billsb@sideshow:~ $ bash -x ./test2.sh
+ find . -printf '%u  %s\n'
+ awk '{user[$1]+=$2}; END{ for( i in user) print i " " user[i]}'
+ perl -ne '
    @x = split;
    for ($i = 0; $i <= $#x; $i++) {
        if ($x[$i] =~ /^[0-9]*\.[0-9]+$/) {
            $x[$i] = int ($x[$i] * 100 + .5) / 100;
        };
        print "$x[$i] ";
    };
+ sed -e '/^[0-9]/d'
    print "\n";'
+ sed -e s/root//g
+ sed -e '/^ [0-9]/d'
+ awk '{print $1, $2/1024/1024/1024, "GB"}'
+ sort -n -r -k2
billsb 7.65496 GB
fongs 0.234436 GB
But obviously there are more than two digits after the decimal point.

Found this nice sed one liner:
Code:
sed -re 's/([0-9]+\.[0-9]{2})[0-9]+/\1/g' "$OUTPUTDIR"/*_top_5_per_workspace_*
Output:
Code:
Top 5 consumers on workspace ned
,,,
agnewjo 261699 GB
keowng 115.72 GB
hayesa 107.61 GB
mirajeev 79.96 GB
mcgleena 74.28 GB
,,,
Top 5 consumers on workspace rod
,,,
tsilvers 42.63 GB
htalgery 14.34 GB
brianr 11.59 GB
xwang27 10.57 GB
ashok 9.78 GB
,,,
Top 5 consumers on workspace todd
,,,
williamc 30.24 GB
brianr 18.80 GB
mcgleena 17.51 GB
kinnairp 16.11 GB
dnair 12.24 GB
,,,
Top 5 consumers on workspace to-delete
,,,
dstein 10.89 GB
jdavis 5.77e-08 GB
msebree 2.36e-05 GB
But there's one issue with "e-0x":
Code:
jdavis 5.77e-08 GB
msebree 2.36e-05 GB

Last edited by master-of-puppets; 10-02-2014 at 07:10 AM.
 
Old 10-02-2014, 05:22 AM   #2
master-of-puppets
Member
 
Registered: Jun 2011
Posts: 49

Original Poster
Rep: Reputation: Disabled
I figured it out:
Code:
find . -printf "%u  %s\n" | awk '{user[$1]+=$2}; END{ for( i in user) print i " " user[i]}' | sed -e '/^[0-9]/d' | sed -e 's/root//g' | sed -e '/^ [0-9]/d' |  awk '{printf($1",""%.2f\n", $2/1024/1024/1024)}'
output when run against my home directory:
Code:
[03:18:35] billsb@sideshow:~ $  find . -printf "%u  %s\n" | awk '{user[$1]+=$2}; END{ for( i in user) print i " " user[i]}' | sed -e '/^[0-9]/d' | sed -e 's/root//g' | sed -e '/^ [0-9]/d' |  awk '{printf($1",""%.2f\n", $2/1024/1024/1024)}'
fongs,0.23
billsb,7.65
How can I put the GB in the third column again?

I figured that out too:
Code:
| awk '{$NF=$NF"GB"; print}'
Now I get:
Code:
fongs,0.23GB
billsb,7.65GB

Last edited by master-of-puppets; 10-02-2014 at 05:47 AM.
 
Old 10-02-2014, 09:13 AM   #3
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,999

Rep: Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190
So I can't say I kept following your other thread to find out how you got here, but I can tell you (if no one has before) that there is no reason to use multiple seds and awks.

Now I am not sure why you would have lines starting with numbers or a space and then a number, but if you want to exclude 'root' simply put that as an if in your first awk.
Your last awk can then also be replaced in the first by simply using the printf (or maybe look up OFMT ( i think that is what it is from memory ) in the manual) to display your output as needed.

Also, further investigation in the manual will show you can do exponential arithmetic in awk so you do not have to divide by 1024 multiple times

I also see no need to do the additional step ( I assume another awk ) to display GB ... just put it at the end of your print statement ... before the newline
 
2 members found this post helpful.
Old 10-02-2014, 01:59 PM   #4
master-of-puppets
Member
 
Registered: Jun 2011
Posts: 49

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by grail View Post
So I can't say I kept following your other thread to find out how you got here, but I can tell you (if no one has before) that there is no reason to use multiple seds and awks.

Now I am not sure why you would have lines starting with numbers or a space and then a number, but if you want to exclude 'root' simply put that as an if in your first awk.
Your last awk can then also be replaced in the first by simply using the printf (or maybe look up OFMT ( i think that is what it is from memory ) in the manual) to display your output as needed.

Also, further investigation in the manual will show you can do exponential arithmetic in awk so you do not have to divide by 1024 multiple times

I also see no need to do the additional step ( I assume another awk ) to display GB ... just put it at the end of your print statement ... before the newline
I tried putting the GB at the end of the printf in several different ways. I played around with that printf 6 ways this side of Sunday until I found that extra awk solution. Maybe you could show me how. Thanks grail. I'll try your other solutions too it's just that there's an outage on our build servers right now.
 
Old 10-02-2014, 09:21 PM   #5
master-of-puppets
Member
 
Registered: Jun 2011
Posts: 49

Original Poster
Rep: Reputation: Disabled
grail you were asking about why I used sed to remove any line starting with a space and any line starting with a space and a number:
Code:
fongs 251723429
billsb 8219608787
4294967294 430626867
root 454
25 15609
The first part of my find command:
Code:
find  . -printf "%u  %s\n" | awk '{user[$1]+=$2}; END{ for( i in user) print i " " user[i]}'
shows KB for users and returns two rows that have two columns of numbers. Now if you can show me how to do the arithmetic functions in awk and how to remove those rows with only numbers too without sed I would appreciate it.

Thanks.

Last edited by master-of-puppets; 10-02-2014 at 09:23 PM.
 
Old 10-02-2014, 09:49 PM   #6
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,999

Rep: Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190
Hmmm ... your output would suggest that you have 2 users without names and their user ids are 4294967294 and 25, now the second may be there, but the first is extremely not likely (and if i remember correctly not even possible). So either you are not showing the full command or you have something very wrong somewhere and should probably be investigated before continuing??

1. So as to GB, I do not see why the following would not work?:
Code:
printf($1"%.2fGB\n", $2/1024/1024/1024)
2. To exclude say 'root' a simple 'if' inside your 'for' loop would suffice

3. As for the arithmetic, I believe I have provided the manual page to you previously, so a little bit of work on your side should be able to work that one out
 
1 members found this post helpful.
Old 10-02-2014, 10:04 PM   #7
master-of-puppets
Member
 
Registered: Jun 2011
Posts: 49

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by grail View Post
Hmmm ... your output would suggest that you have 2 users without names and their user ids are 4294967294 and 25, now the second may be there, but the first is extremely not likely (and if i remember correctly not even possible). So either you are not showing the full command or you have something very wrong somewhere and should probably be investigated before continuing??

1. So as to GB, I do not see why the following would not work?:
Code:
printf($1"%.2fGB\n", $2/1024/1024/1024)
2. To exclude say 'root' a simple 'if' inside your 'for' loop would suffice

3. As for the arithmetic, I believe I have provided the manual page to you previously, so a little bit of work on your side should be able to work that one out
Okay thanks grail. Now would these improvements make the script more efficient? In other words would the script run any faster? It takes 2 days on a couple of my servers. I'll go ahead and look at the man page for awk thanks grail.

Last edited by master-of-puppets; 10-03-2014 at 01:00 AM.
 
Old 10-02-2014, 11:50 PM   #8
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,999

Rep: Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190
Hmmm ... hard to say not knowing the amount of data being processed. If it currently takes 2 days I would suggest you need to overhaul the whole thing as that would sound unacceptable from a business usability case.

Generally, if you can remove superfluous calls to multiple commands things should run at least a little quicker.
So looking at something from the first post:
Code:
#!/bin/bash
find . -printf "%u  %s\n" | awk '{user[$1]+=$2}; END{ for( i in user) print i " " user[i]}' | perl -ne '
    @x = split;
    for ($i = 0; $i <= $#x; $i++) {
        if ($x[$i] =~ /^[0-9]*\.[0-9]+$/) {
            $x[$i] = int ($x[$i] * 100 + .5) / 100;
        };
        print "$x[$i] ";
    };
    print "\n";' | sed -e '/^[0-9]/d' | sed -e 's/root//g' | sed -e '/^ [0-9]/d' | awk '{print $1, $2/1024/1024/1024, "GB"}' | sort -n -r -k2
Here I have highlighted in red where you are calling an additional command external to the previous one.

One thing I would note is that you are calling everything inside a bash script yet you also call something as powerful as Perl ... it would probably make a lot of sense to re-write the entire
script simply in Perl as it is an exceptional tool and very well versed in all the tasks you are calling from the external commands (except maybe find, but i am not a perl guru so it may be able to do this as well )

I can say for certain that it will easily perform all tasks were you currently are using:

bash
sed
awk
sort

I realise this would be a large amount of work, but the benefit would be of an equal magnitude (at least looking at a high level, again there are many others on here with more Perl experience who may weigh in and advise)
 
1 members found this post helpful.
Old 10-03-2014, 12:57 AM   #9
master-of-puppets
Member
 
Registered: Jun 2011
Posts: 49

Original Poster
Rep: Reputation: Disabled
Thanks a lot grail. I will need to break down and take some time on the weekend to learn the ins and outs from scratch because I agree with you all these external commands take time to pipe all the data through them each and every one.

Okay so now let me show you what I cobbled together from what I found online adding my own clumsy external commands.

The script in it's entirety:
Code:
#!/bin/bash
OUTPUT_DIR=/share/es-ops/Build_Farm_Reports/WorkSpace_Reports
BASE=/export/ws
TODAY=`date +"%m-%d-%y"`
HOSTNAME=`hostname`
case "$HOSTNAME" in
        sideshow) WORKSPACES=(bob mel sideshow-ws2) ;;
        simpsons) WORKSPACES=(bart homer lisa marge releases rt-private simpsons-ws0 simpsons-ws1 simpsons-ws2 vsimpsons-ws) ;;
        moes)     WORKSPACES=(barney carl lenny moes-ws2) ;;
        flanders) WORKSPACES=(flanders-ws0 flanders-ws1 flanders-ws2 maude ned rod todd to-delete) ;;
esac
if ! [ -f $OUTPUT_DIR/$HOSTNAME_top_5_workspace_$TODAY.csv ]; then
echo "Top 5 consumers of space per workspace on server `hostname` $TODAY" > $OUTPUT_DIR/"$HOSTNAME"_top_5_per_workspace_$TODAY.csv
echo ",,," >> $OUTPUT_DIR/"$HOSTNAME"_top_5_per_workspace_$TODAY.csv
echo ",,," >> $OUTPUT_DIR/"$HOSTNAME"_top_5_per_workspace_$TODAY.csv
for v in "${WORKSPACES[@]}"
do
echo "Top 5 consumers on workspace $v" >> $OUTPUT_DIR/"$HOSTNAME"_top_5_per_workspace_$TODAY.csv
echo ",,," >> $OUTPUT_DIR/"$HOSTNAME"_top_5_per_workspace_$TODAY.csv
#find $BASE/$v -printf "%u  %s\n" | awk '{user[$1]+=$2}; END{ for( i in user) print i " " user[i]}' | sed -e '/^[0-9]/d' | sed -e 's/root//g' | sed -e '/^ [0-9]/d' | awk '{print $1, $2/1024/1024/1024, "GB"}' | sort -n -r -k2 | head -5 >> $OUTPUT_DIR/"$HOSTNAME"_top_5_per_workspace_$TODAY.csv
find  $BASE/$v -printf "%u  %s\n" | awk '{user[$1]+=$2}; END{ for( i in user) print i " " user[i]}' | sed -e '/^[0-9]/d' | sed -e 's/root//g' | sed -e '/^ [0-9]/d' |  awk '{printf($1",""%.2f\n", $2/1024/1024/1024)}' | awk '{$NF=$NF",GB"; print}' | sort -t, -k+2 -n -r | head -5 >> $OUTPUT_DIR/"$HOSTNAME"_top_5_per_workspace_$TODAY.csv
echo ",,," >> $OUTPUT_DIR/"$HOSTNAME"_top_5_per_workspace_$TODAY.csv
done
fi
And a small sample of output:
Code:
Top 5 consumers of space per workspace on server sideshow 10-02-14
,,,
,,,
Top 5 consumers on workspace bob
,,,
radickj,97.36,GB
nichols2,90.35,GB
sherryr,70.74,GB
rabii,67.48,GB
lefevre,39.07,GB
,,,
Top 5 consumers on workspace mel
,,,
somyalip,143.54,GB
mvijayas,117.08,GB
release,102.27,GB
vuhang,87.04,GB
akrishna,85.89,GB
,,,
Top 5 consumers on workspace sideshow-ws2
,,,
marlette,97.30,GB
iyershan,35.78,GB
starkd,23.39,GB
maoze,3.61,GB
linalb,2.98,GB
,,,
I have another script that combines the data from all four build servers along with data that has been generated by two other scripts and then an email script that attaches the .csv file to an email and sends it to our admin who will be making graphs and presentations out of this stuff.

I'll check back with you after I have figured out how to rewrite this thing and get the same kind of output in less time.

Thanks for all your help. Oh and I agree we should rewrite that original Perl script that gives disk usage statistics. Perl seems so much faster.

Last edited by master-of-puppets; 10-03-2014 at 01:08 AM.
 
Old 10-03-2014, 02:18 AM   #10
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,999

Rep: Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190
Well here are some options that make it a little cleaner (can't say that any of these changes would improve the speed directly):
Code:
path_to_file="$OUTPUT_DIR/$HOSTNAME_top_5_workspace_$TODAY.csv"

if ! [[ -f "$path_to_file" ]]; then
  cat>>"$path_to_file"<<-EOF
    Top 5 consumers of space per workspace on server $(hostname) $TODAY
    ,,,
    ,,,
  EOF
  for v in "${WORKSPACES[@]}"
  do
    cat>>"$path_to_file"<<-EOF
      Top 5 consumers on workspace $v
      ,,,
      $(find  "$BASE/$v" -printf "%u  %s\n" | awk '{user[$1]+=$2}; END{ for(i in user)if(i != "root")printf("%s, %.2fGB\n",i,user[i]/2**30)}' | sort -t, -k+2 -n -r | head -5)
      ,,,
    EOF
  done
fi
 
2 members found this post helpful.
Old 10-03-2014, 10:04 AM   #11
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian, Arch
Posts: 3,774

Rep: Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081
Quote:
Originally Posted by grail View Post
the benefit would be of an equal magnitude
I disagree, notice that the number of external programs called is constant, i.e. not proportional to the input. I would guess that the part that is taking most of the time is
Code:
find . -printf "%u  %s\n"
because it has to crawl over a lot of files. Disk I/O takes a lot of time.
 
1 members found this post helpful.
Old 10-03-2014, 12:34 PM   #12
master-of-puppets
Member
 
Registered: Jun 2011
Posts: 49

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by grail View Post
Well here are some options that make it a little cleaner (can't say that any of these changes would improve the speed directly):
Code:
path_to_file="$OUTPUT_DIR/$HOSTNAME_top_5_workspace_$TODAY.csv"

if ! [[ -f "$path_to_file" ]]; then
  cat>>"$path_to_file"<<-EOF
    Top 5 consumers of space per workspace on server $(hostname) $TODAY
    ,,,
    ,,,
  EOF
  for v in "${WORKSPACES[@]}"
  do
    cat>>"$path_to_file"<<-EOF
      Top 5 consumers on workspace $v
      ,,,
      $(find  "$BASE/$v" -printf "%u  %s\n" | awk '{user[$1]+=$2}; END{ for(i in user)if(i != "root")printf("%s, %.2fGB\n",i,user[i]/2**30)}' | sort -t, -k+2 -n -r | head -5)
      ,,,
    EOF
  done
fi
Wow grail awesome that's the kind of awk code that I couldn't write on my own unless I broke down and learned awk from scratch which would take some time. I will break down and learn it but this is awesome. I think it will help speed things up and it's more elegant and cleaner too. Thanks a million.

---------- Post added 10-03-14 at 12:35 PM ----------

Quote:
Originally Posted by ntubski View Post
I disagree, notice that the number of external programs called is constant, i.e. not proportional to the input. I would guess that the part that is taking most of the time is
Code:
find . -printf "%u  %s\n"
because it has to crawl over a lot of files. Disk I/O takes a lot of time.
Do you think du would be faster? I should look online to see some speed comparisons between du and find. Thanks for the input.
 
Old 10-03-2014, 01:55 PM   #13
bigearsbilly
Senior Member
 
Registered: Mar 2004
Location: england
Distribution: Mint, Armbian, NetBSD, Puppy, Raspbian
Posts: 3,515

Rep: Reputation: 239Reputation: 239Reputation: 239
Code:
 du -s */  | sort -rn | nl|  head -5
     1  202864  s001/
     2  60676   x007/
     3  11800   v039/
     4  8268    t501/
     5  8268    ray/
 
1 members found this post helpful.
Old 10-03-2014, 10:15 PM   #14
master-of-puppets
Member
 
Registered: Jun 2011
Posts: 49

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by bigearsbilly View Post
Code:
 du -s */  | sort -rn | nl|  head -5
     1  202864  s001/
     2  60676   x007/
     3  11800   v039/
     4  8268    t501/
     5  8268    ray/
Thanks so much. Is this faster than find do you know?
 
Old 10-04-2014, 10:32 AM   #15
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian, Arch
Posts: 3,774

Rep: Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081
Quote:
Originally Posted by master-of-puppets View Post
Do you think du would be faster?
I don't think it would be much faster because it still has to read the same data from the disk. It would make your script a lot simpler though, as bigearsbilly's post demonstrated, because it does the summing up for you, so it's still a better choice than find.

For speed, I think you should look at disk quota systems. Those tools sit at the level of the filesystem so they can compute the sum incrementally as files are added rather than starting from scratch each time.
 
1 members found this post helpful.
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Round up to 2 decimal place with the character at last Bunty2013 Linux - Newbie 16 10-18-2013 02:19 AM
[SOLVED] bash and decimal numbers moreje Programming 5 01-23-2013 10:29 AM
[SOLVED] Does groove IP has a bug when entering account numbers or long digits Geek255 Linux - Mobile 1 03-09-2012 08:01 PM
Zenwalk 6.4 arabic digits/numbers not working hottdogg Zenwalk 0 01-17-2011 03:32 AM
Karatsuba Wrong Answer After 6 digits Decimal Peter_APIIT Programming 0 05-02-2009 05:18 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 02:21 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration