[SOLVED] Round numbers to two digits after the decimal point
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
So I can't say I kept following your other thread to find out how you got here, but I can tell you (if no one has before) that there is no reason to use multiple seds and awks.
Now I am not sure why you would have lines starting with numbers or a space and then a number, but if you want to exclude 'root' simply put that as an if in your first awk.
Your last awk can then also be replaced in the first by simply using the printf (or maybe look up OFMT ( i think that is what it is from memory ) in the manual) to display your output as needed.
Also, further investigation in the manual will show you can do exponential arithmetic in awk so you do not have to divide by 1024 multiple times
I also see no need to do the additional step ( I assume another awk ) to display GB ... just put it at the end of your print statement ... before the newline
So I can't say I kept following your other thread to find out how you got here, but I can tell you (if no one has before) that there is no reason to use multiple seds and awks.
Now I am not sure why you would have lines starting with numbers or a space and then a number, but if you want to exclude 'root' simply put that as an if in your first awk.
Your last awk can then also be replaced in the first by simply using the printf (or maybe look up OFMT ( i think that is what it is from memory ) in the manual) to display your output as needed.
Also, further investigation in the manual will show you can do exponential arithmetic in awk so you do not have to divide by 1024 multiple times
I also see no need to do the additional step ( I assume another awk ) to display GB ... just put it at the end of your print statement ... before the newline
I tried putting the GB at the end of the printf in several different ways. I played around with that printf 6 ways this side of Sunday until I found that extra awk solution. Maybe you could show me how. Thanks grail. I'll try your other solutions too it's just that there's an outage on our build servers right now.
find . -printf "%u %s\n" | awk '{user[$1]+=$2}; END{ for( i in user) print i " " user[i]}'
shows KB for users and returns two rows that have two columns of numbers. Now if you can show me how to do the arithmetic functions in awk and how to remove those rows with only numbers too without sed I would appreciate it.
Thanks.
Last edited by master-of-puppets; 10-02-2014 at 09:23 PM.
Hmmm ... your output would suggest that you have 2 users without names and their user ids are 4294967294 and 25, now the second may be there, but the first is extremely not likely (and if i remember correctly not even possible). So either you are not showing the full command or you have something very wrong somewhere and should probably be investigated before continuing??
1. So as to GB, I do not see why the following would not work?:
Code:
printf($1"%.2fGB\n", $2/1024/1024/1024)
2. To exclude say 'root' a simple 'if' inside your 'for' loop would suffice
3. As for the arithmetic, I believe I have provided the manual page to you previously, so a little bit of work on your side should be able to work that one out
Hmmm ... your output would suggest that you have 2 users without names and their user ids are 4294967294 and 25, now the second may be there, but the first is extremely not likely (and if i remember correctly not even possible). So either you are not showing the full command or you have something very wrong somewhere and should probably be investigated before continuing??
1. So as to GB, I do not see why the following would not work?:
Code:
printf($1"%.2fGB\n", $2/1024/1024/1024)
2. To exclude say 'root' a simple 'if' inside your 'for' loop would suffice
3. As for the arithmetic, I believe I have provided the manual page to you previously, so a little bit of work on your side should be able to work that one out
Okay thanks grail. Now would these improvements make the script more efficient? In other words would the script run any faster? It takes 2 days on a couple of my servers. I'll go ahead and look at the man page for awk thanks grail.
Last edited by master-of-puppets; 10-03-2014 at 01:00 AM.
Hmmm ... hard to say not knowing the amount of data being processed. If it currently takes 2 days I would suggest you need to overhaul the whole thing as that would sound unacceptable from a business usability case.
Generally, if you can remove superfluous calls to multiple commands things should run at least a little quicker.
So looking at something from the first post:
Code:
#!/bin/bash
find . -printf "%u %s\n" | awk '{user[$1]+=$2}; END{ for( i in user) print i " " user[i]}' | perl -ne '
@x = split;
for ($i = 0; $i <= $#x; $i++) {
if ($x[$i] =~ /^[0-9]*\.[0-9]+$/) {
$x[$i] = int ($x[$i] * 100 + .5) / 100;
};
print "$x[$i] ";
};
print "\n";' | sed -e '/^[0-9]/d' | sed -e 's/root//g' | sed -e '/^ [0-9]/d' | awk '{print $1, $2/1024/1024/1024, "GB"}' | sort -n -r -k2
Here I have highlighted in red where you are calling an additional command external to the previous one.
One thing I would note is that you are calling everything inside a bash script yet you also call something as powerful as Perl ... it would probably make a lot of sense to re-write the entire
script simply in Perl as it is an exceptional tool and very well versed in all the tasks you are calling from the external commands (except maybe find, but i am not a perl guru so it may be able to do this as well )
I can say for certain that it will easily perform all tasks were you currently are using:
bash
sed
awk
sort
I realise this would be a large amount of work, but the benefit would be of an equal magnitude (at least looking at a high level, again there are many others on here with more Perl experience who may weigh in and advise)
Thanks a lot grail. I will need to break down and take some time on the weekend to learn the ins and outs from scratch because I agree with you all these external commands take time to pipe all the data through them each and every one.
Okay so now let me show you what I cobbled together from what I found online adding my own clumsy external commands.
The script in it's entirety:
Code:
#!/bin/bash
OUTPUT_DIR=/share/es-ops/Build_Farm_Reports/WorkSpace_Reports
BASE=/export/ws
TODAY=`date +"%m-%d-%y"`
HOSTNAME=`hostname`
case "$HOSTNAME" in
sideshow) WORKSPACES=(bob mel sideshow-ws2) ;;
simpsons) WORKSPACES=(bart homer lisa marge releases rt-private simpsons-ws0 simpsons-ws1 simpsons-ws2 vsimpsons-ws) ;;
moes) WORKSPACES=(barney carl lenny moes-ws2) ;;
flanders) WORKSPACES=(flanders-ws0 flanders-ws1 flanders-ws2 maude ned rod todd to-delete) ;;
esac
if ! [ -f $OUTPUT_DIR/$HOSTNAME_top_5_workspace_$TODAY.csv ]; then
echo "Top 5 consumers of space per workspace on server `hostname` $TODAY" > $OUTPUT_DIR/"$HOSTNAME"_top_5_per_workspace_$TODAY.csv
echo ",,," >> $OUTPUT_DIR/"$HOSTNAME"_top_5_per_workspace_$TODAY.csv
echo ",,," >> $OUTPUT_DIR/"$HOSTNAME"_top_5_per_workspace_$TODAY.csv
for v in "${WORKSPACES[@]}"
do
echo "Top 5 consumers on workspace $v" >> $OUTPUT_DIR/"$HOSTNAME"_top_5_per_workspace_$TODAY.csv
echo ",,," >> $OUTPUT_DIR/"$HOSTNAME"_top_5_per_workspace_$TODAY.csv
#find $BASE/$v -printf "%u %s\n" | awk '{user[$1]+=$2}; END{ for( i in user) print i " " user[i]}' | sed -e '/^[0-9]/d' | sed -e 's/root//g' | sed -e '/^ [0-9]/d' | awk '{print $1, $2/1024/1024/1024, "GB"}' | sort -n -r -k2 | head -5 >> $OUTPUT_DIR/"$HOSTNAME"_top_5_per_workspace_$TODAY.csv
find $BASE/$v -printf "%u %s\n" | awk '{user[$1]+=$2}; END{ for( i in user) print i " " user[i]}' | sed -e '/^[0-9]/d' | sed -e 's/root//g' | sed -e '/^ [0-9]/d' | awk '{printf($1",""%.2f\n", $2/1024/1024/1024)}' | awk '{$NF=$NF",GB"; print}' | sort -t, -k+2 -n -r | head -5 >> $OUTPUT_DIR/"$HOSTNAME"_top_5_per_workspace_$TODAY.csv
echo ",,," >> $OUTPUT_DIR/"$HOSTNAME"_top_5_per_workspace_$TODAY.csv
done
fi
And a small sample of output:
Code:
Top 5 consumers of space per workspace on server sideshow 10-02-14
,,,
,,,
Top 5 consumers on workspace bob
,,,
radickj,97.36,GB
nichols2,90.35,GB
sherryr,70.74,GB
rabii,67.48,GB
lefevre,39.07,GB
,,,
Top 5 consumers on workspace mel
,,,
somyalip,143.54,GB
mvijayas,117.08,GB
release,102.27,GB
vuhang,87.04,GB
akrishna,85.89,GB
,,,
Top 5 consumers on workspace sideshow-ws2
,,,
marlette,97.30,GB
iyershan,35.78,GB
starkd,23.39,GB
maoze,3.61,GB
linalb,2.98,GB
,,,
I have another script that combines the data from all four build servers along with data that has been generated by two other scripts and then an email script that attaches the .csv file to an email and sends it to our admin who will be making graphs and presentations out of this stuff.
I'll check back with you after I have figured out how to rewrite this thing and get the same kind of output in less time.
Thanks for all your help. Oh and I agree we should rewrite that original Perl script that gives disk usage statistics. Perl seems so much faster.
Last edited by master-of-puppets; 10-03-2014 at 01:08 AM.
Well here are some options that make it a little cleaner (can't say that any of these changes would improve the speed directly):
Code:
path_to_file="$OUTPUT_DIR/$HOSTNAME_top_5_workspace_$TODAY.csv"
if ! [[ -f "$path_to_file" ]]; then
cat>>"$path_to_file"<<-EOF
Top 5 consumers of space per workspace on server $(hostname) $TODAY
,,,
,,,
EOF
for v in "${WORKSPACES[@]}"
do
cat>>"$path_to_file"<<-EOF
Top 5 consumers on workspace $v
,,,
$(find "$BASE/$v" -printf "%u %s\n" | awk '{user[$1]+=$2}; END{ for(i in user)if(i != "root")printf("%s, %.2fGB\n",i,user[i]/2**30)}' | sort -t, -k+2 -n -r | head -5)
,,,
EOF
done
fi
I disagree, notice that the number of external programs called is constant, i.e. not proportional to the input. I would guess that the part that is taking most of the time is
Code:
find . -printf "%u %s\n"
because it has to crawl over a lot of files. Disk I/O takes a lot of time.
Well here are some options that make it a little cleaner (can't say that any of these changes would improve the speed directly):
Code:
path_to_file="$OUTPUT_DIR/$HOSTNAME_top_5_workspace_$TODAY.csv"
if ! [[ -f "$path_to_file" ]]; then
cat>>"$path_to_file"<<-EOF
Top 5 consumers of space per workspace on server $(hostname) $TODAY
,,,
,,,
EOF
for v in "${WORKSPACES[@]}"
do
cat>>"$path_to_file"<<-EOF
Top 5 consumers on workspace $v
,,,
$(find "$BASE/$v" -printf "%u %s\n" | awk '{user[$1]+=$2}; END{ for(i in user)if(i != "root")printf("%s, %.2fGB\n",i,user[i]/2**30)}' | sort -t, -k+2 -n -r | head -5)
,,,
EOF
done
fi
Wow grail awesome that's the kind of awk code that I couldn't write on my own unless I broke down and learned awk from scratch which would take some time. I will break down and learn it but this is awesome. I think it will help speed things up and it's more elegant and cleaner too. Thanks a million.
---------- Post added 10-03-14 at 12:35 PM ----------
Quote:
Originally Posted by ntubski
I disagree, notice that the number of external programs called is constant, i.e. not proportional to the input. I would guess that the part that is taking most of the time is
Code:
find . -printf "%u %s\n"
because it has to crawl over a lot of files. Disk I/O takes a lot of time.
Do you think du would be faster? I should look online to see some speed comparisons between du and find. Thanks for the input.
I don't think it would be much faster because it still has to read the same data from the disk. It would make your script a lot simpler though, as bigearsbilly's post demonstrated, because it does the summing up for you, so it's still a better choice than find.
For speed, I think you should look at disk quota systems. Those tools sit at the level of the filesystem so they can compute the sum incrementally as files are added rather than starting from scratch each time.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.