Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place! |
| Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
 |
GNU/Linux Basic Guide
This 255-page guide will provide you with the keys to understand the philosophy of free software, teach you how to use and handle it, and give you the tools required to move easily in the world of GNU/Linux. Many users and administrators will be taking their first steps with this GNU/Linux Basic guide and it will show you how to approach and solve the problems you encounter.
Click Here to receive this Complete Guide absolutely free. |
|
 |
12-08-2012, 08:53 AM
|
#1
|
|
Senior Member
Registered: Jul 2012
Location: Grenoble, Fr.
Distribution: Sun Solaris, RHEL, Ubuntu, Debian 6.0
Posts: 1,676
|
Once again... awk.. awk... awk
I have a file file1, containing 50000 entries (numerical floating point numbers only).
And I am doing:
1. Sum of total no. of lines
2. Sum of lines containing values (i.e. val1) that are less than 1
3. Sum of lines containing values (i.e. val2) that are greater than 1
4. Percentage of both val1 and val2.
And I did (Note: All below code is part of a script which is generating file1):
Code:
#sum
sum=$(awk 'END{print NR}' file1)
#calculating val1
val1=$(awk '$1<1 { n++ } END{ print n }' file1)
prctg1=$(echo | awk "{print $val1*100/$sum}")
#calculating val2
val2=$(awk '$1>1 { n++ } END{ print n }' file1)
prctg2=$(echo | awk "{print $val2*100/$sum}")
echo "$sum\t$prctg1\t$prctg2"
It's fine upto this. But I want to combine both val1 and perctg1 commands in a one-liner awk code. I tried, but perhaps making some syntax mistake and I have no clue! So any suggestions that how can I combine them?
BTW, LQ has always been so helpful to me. Infact, I am in learning phase of awk, so I could applied what I've learned so far. But still expecting your help again 
Last edited by shivaa; 12-08-2012 at 01:24 PM.
Reason: Error rectified
|
|
|
|
12-08-2012, 09:25 AM
|
#2
|
|
LQ Veteran
Registered: Sep 2003
Location: the Netherlands
Distribution: lfs, debian, rhel
Posts: 8,848
|
So we meet again
Is there a specific reason why you use multiple awk statements? All your requirements can be done with one awk statement:
Code:
awk 'BEGIN{val1=0;val2=0}/^[\.0]\./{val1++}/^[1-9]/{val2++}END{ print "sum: ",NR, "val1: ", val1, "("val1*100/NR"%)", "val2: ", val2, "("val2*100/NR"%)"}' infile
or in a bit more readable form:
Code:
awk 'BEGIN{
val1 = 0 ;
val2 = 0
}
/^[\.0]\./ { val1++ } # less then 1
/^[1-9]/ { val2++ } # larger then one
END{
print "sum: ",NR, "val1: ", val1, "("val1*100/NR"%)", "val2: ", val2, "("val2*100/NR"%)"
}' infile
Output looks like this:
Code:
sum: 13 val1: 7 (53.8462%) val2: 6 (46.1538%)
|
|
|
1 members found this post helpful.
|
12-08-2012, 09:40 AM
|
#3
|
|
Guru
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 6,378
|
As druuna has abley answered the important question, just let me add a correction in wasted code:
Code:
# below is a useless use of echo
prctg2=$(echo | awk "{print $val2*100/$sum}")
prctg2=$(awk 'BEGIN{print $val2*100/$sum}' val2=$val2 sum=$sum)
# the above negates the problem of letting the shell interfere with any of the data
|
|
|
1 members found this post helpful.
|
12-09-2012, 04:44 AM
|
#4
|
|
LQ Veteran
Registered: Sep 2003
Location: the Netherlands
Distribution: lfs, debian, rhel
Posts: 8,848
|
@shivaa: I noticed you read grail's and my reply. If this is solved can you put up the [SOLVED] tag...
BTW: If you ever do need the calculated values outside of awk you can do the following:
Code:
#!/bin/bash
# option 1
echo "------------------------------------"
echo -e "One way (using process substitution)\n"
while read SUM VAL1 PCT1 VAL2 PCT2
do
# do your stuff here
echo "Sum : $SUM"
echo "Val1 : $VAL1"
echo "Percentage1 : $PCT1"
echo "Val2 : $VAL2"
echo "Percentage2 : $PCT2"
done < <(awk 'BEGIN{val1=0;val2=0}/^[\.0]\./{val1++}/^[1-9]/{val2++}END{ print NR, val1, val1*100/NR, val2, val2*100/NR}' infile)
# option 2
echo -e "\n-----------------------------"
echo -e "An alternative (using a pipe)\n"
awk 'BEGIN{val1=0;val2=0}/^[\.0]\./{val1++}/^[1-9]/{val2++}END{ print NR, val1, val1*100/NR, val2, val2*100/NR}' infile | \
while read SUM VAL1 PCT1 VAL2 PCT2
do
# do your stuff here
echo "Sum : $SUM"
echo "Val1 : $VAL1"
echo "Percentage1 : $PCT1"
echo "Val2 : $VAL2"
echo "Percentage2 : $PCT2"
done
|
|
|
1 members found this post helpful.
|
12-09-2012, 08:36 AM
|
#5
|
|
Senior Member
Registered: Jul 2012
Location: Grenoble, Fr.
Distribution: Sun Solaris, RHEL, Ubuntu, Debian 6.0
Posts: 1,676
Original Poster
|
Thanks @druuna and @grail. I actually have not yet tested it, that's why kept this post unsolved.
|
|
|
|
12-11-2012, 05:46 AM
|
#6
|
|
Senior Member
Registered: Jul 2012
Location: Grenoble, Fr.
Distribution: Sun Solaris, RHEL, Ubuntu, Debian 6.0
Posts: 1,676
Original Poster
|
Code:
# below is a useless use of echo
prctg2=$(echo | awk "{print $val2*100/$sum}")
prctg2=$(awk 'BEGIN{print $val2*100/$sum}' val2=$val2 sum=$sum)
# the above negates the problem of letting the shell interfere with any of the data
Hi Grail, as you said above, after invoking both bolow two cmds:
Code:
val2=$(awk '/^[1-9]/ {val2++} END{ print val2}' file1)
sum=$(awk 'END{print NR}' file1)
When I invoke:
Code:
prctg2=$(awk 'BEGIN{print $val2*100/$sum}' val2=$val2 sum=$sum)
It's giving me errors, like awk: division by zero or nawk: illegal field $().. . I tried simple /usr/bin/awk as well as /usr/xpg4/bin/awk. Also could you explain the use of val2=$val2 sum=$sum after print action?
Last edited by shivaa; 12-11-2012 at 06:11 AM.
|
|
|
|
12-11-2012, 06:40 AM
|
#7
|
|
Senior Member
Registered: Jul 2012
Location: Grenoble, Fr.
Distribution: Sun Solaris, RHEL, Ubuntu, Debian 6.0
Posts: 1,676
Original Poster
|
Code:
awk 'BEGIN{
val1 = 0 ;
val2 = 0
}
/^[\.0]\./ { val1++ } # less then 1
/^[1-9]/ { val2++ } # larger then one
END{
print "sum: ",NR, "val1: ", val1, "("val1*100/NR"%)", "val2: ", val2, "("val2*100/NR"%)"
}' infile
Hi Druuna, your solution is wokring fine. But still I find myself confused with searching based on patterns, so:
1. Can I make following changes, instead of using patterns? (assuming that infile has only numerical floating numbers):
Code:
$1 < 1 { val1++ } # less then 1
$1 > 1 { val2++ } # larger then one
2. (Please do not mind if I ask that..  ) Does / ^[\.0]\./ means all values starting with .0? And what does \./ means here... all values that are .0. ??
Last edited by shivaa; 12-11-2012 at 06:44 AM.
|
|
|
|
12-11-2012, 07:21 AM
|
#8
|
|
LQ Veteran
Registered: Sep 2003
Location: the Netherlands
Distribution: lfs, debian, rhel
Posts: 8,848
|
Quote:
Originally Posted by shivaa
1. Can I make following changes, instead of using patterns? (assuming that infile has only numerical floating numbers):
Code:
$1 < 1 { val1++ } # less then 1
$1 > 1 { val2++ } # larger then one
|
Have you tried? You do need to make one of the entries look like >= or <= otherwise 1.0000 won't be detected.
Quote:
|
Originally Posted by shivaa
2. (Please do not mind if I ask that..  ) Does / ^[\.0]\./ means all values starting with .0? And what does \./ means here... all values that are .0. ??
|
Values that start with a dot OR a 0 (zero) followed by a dot.
I do believe a I made a mistake in the original regexp, but it works for your data because all the entries seem to be starting with a leading zero (0.01 vs .01). It can be rewritten as: Values that start with 1 -> 9
|
|
|
1 members found this post helpful.
|
12-11-2012, 08:10 AM
|
#9
|
|
Senior Member
Registered: Jul 2012
Location: Grenoble, Fr.
Distribution: Sun Solaris, RHEL, Ubuntu, Debian 6.0
Posts: 1,676
Original Poster
|
Quote:
|
^[\.0]\. Values that start with a dot OR a 0 (zero) followed by a dot.
|
Ooopps... From the beginning I am considering such patterns as .0, which actually means that values beginning either with a " ." or a " 0", not with .0.
For instance (please correct me, if I am wrong):
^[abc] .....Means all values beginning either with a a or b or c. It does not mean all values beginning with abc! I hope it will clear all my previous doubts as well  .
Likewise, if I want to search, $1<=0.01; 0.01 < $1 < 0.1; $1 >=0.1 (i.e. 3 ranges), then also I can use such patterns using such regexp! Will sure try it.
Many thanks druuna... I am short of words! You've done a great job!!
---------------------------
Hi Grail, waiting for your response now (please refer my reply above).
|
|
|
|
12-11-2012, 08:26 AM
|
#10
|
|
LQ Veteran
Registered: Sep 2003
Location: the Netherlands
Distribution: lfs, debian, rhel
Posts: 8,848
|
Quote:
Originally Posted by shivaa
For instance (please correct me, if I am wrong):
^[abc] .....Means all values beginning either with a a or b or c. It does not mean all values beginning with abc! I hope it will clear all my previous doubts as well  .
|
That is correct.
You might want to revisit this site: Regex Tutorial, Examples and Reference especially: Character Classes or Character Sets
And this from the wiki page:
Quote:
[ ]
A bracket expression. Matches a single character that is contained within the brackets. For example, [abc] matches "a", "b", or "c". [a-z] specifies a range which matches any lowercase letter from "a" to "z". These forms can be mixed: [abcx-z] matches "a", "b", "c", "x", "y", or "z", as does [a-cx-z].
The - character is treated as a literal character if it is the last or the first (after the ^) character within the brackets: [abc-], [-abc]. Note that backslash escapes are not allowed. The ] character can be included in a bracket expression if it is the first (after the ^) character: []abc].
|
|
|
|
|
12-11-2012, 09:18 AM
|
#11
|
|
Guru
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 6,378
|
Quote:
|
It's giving me errors, like awk: division by zero or nawk: illegal field $().. . I tried simple /usr/bin/awk as well as /usr/xpg4/bin/awk. Also could you explain the use of val2=$val2 sum=$sum after print action?
|
I cannot vouch for nawk. I am using gawk so maybe nawk does not like the setting of variables after. You could simply try using the -v option to set them.
Placing the setting of the variables after the quoted code is just a preference I have for setting multiple variables instead of using -v several times.
|
|
|
|
12-11-2012, 11:16 AM
|
#12
|
|
Senior Member
Registered: Nov 2005
Distribution: Debian
Posts: 2,053
|
Quote:
Originally Posted by grail
I cannot vouch for nawk. I am using gawk so maybe nawk does not like the setting of variables after. You could simply try using the -v option to set them.
|
You'll need to use -v for gawk as well, the plain var= val form performs the assignment after the BEGIN rule has been run:
Quote:
6.1.3.2 Assigning Variables on the Command Line
When the assignment is preceded with the -v option ... the variable is set at the very beginning, even before the BEGIN rules execute. ... Otherwise, the variable assignment is performed ... after the processing of the preceding input file argument.
|
|
|
|
1 members found this post helpful.
|
12-11-2012, 06:50 PM
|
#13
|
|
Guru
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 6,378
|
Thanks ntubski ... I was not aware of this variation 
|
|
|
|
12-31-2012, 04:56 AM
|
#14
|
|
Senior Member
Registered: Jul 2012
Location: Grenoble, Fr.
Distribution: Sun Solaris, RHEL, Ubuntu, Debian 6.0
Posts: 1,676
Original Poster
|
Many thanks @druuna & @grail!
Ciao!
|
|
|
|
| Thread Tools |
Search this Thread |
|
|
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
All times are GMT -5. The time now is 12:47 AM.
|
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|