LinuxQuestions.org
Help answer threads with 0 replies.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices

Reply
 
Search this Thread
Old 12-08-2012, 08:53 AM   #1
shivaa
Senior Member
 
Registered: Jul 2012
Location: Grenoble, Fr.
Distribution: Sun Solaris, RHEL, Ubuntu, Debian 6.0
Posts: 1,797
Blog Entries: 4

Rep: Reputation: 285Reputation: 285Reputation: 285
Once again... awk.. awk... awk


I have a file file1, containing 50000 entries (numerical floating point numbers only).

And I am doing:
1. Sum of total no. of lines
2. Sum of lines containing values (i.e. val1) that are less than 1
3. Sum of lines containing values (i.e. val2) that are greater than 1
4. Percentage of both val1 and val2.

And I did (Note: All below code is part of a script which is generating file1):
Code:
#sum
sum=$(awk 'END{print NR}' file1)

#calculating val1
val1=$(awk '$1<1 { n++ } END{ print n }' file1)
prctg1=$(echo | awk "{print $val1*100/$sum}")

#calculating val2
val2=$(awk '$1>1 { n++ } END{ print n }' file1)
prctg2=$(echo | awk "{print $val2*100/$sum}")

echo "$sum\t$prctg1\t$prctg2"
It's fine upto this. But I want to combine both val1 and perctg1 commands in a one-liner awk code. I tried, but perhaps making some syntax mistake and I have no clue! So any suggestions that how can I combine them?

BTW, LQ has always been so helpful to me. Infact, I am in learning phase of awk, so I could applied what I've learned so far. But still expecting your help again

Last edited by shivaa; 12-08-2012 at 01:24 PM. Reason: Error rectified
 
Old 12-08-2012, 09:25 AM   #2
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374
So we meet again

Is there a specific reason why you use multiple awk statements? All your requirements can be done with one awk statement:
Code:
awk 'BEGIN{val1=0;val2=0}/^[\.0]\./{val1++}/^[1-9]/{val2++}END{ print "sum: ",NR, "val1: ", val1, "("val1*100/NR"%)", "val2: ", val2, "("val2*100/NR"%)"}' infile
or in a bit more readable form:
Code:
awk 'BEGIN{
  val1 = 0 ;
  val2 = 0
}
/^[\.0]\./ { val1++ } # less then 1
/^[1-9]/   { val2++ } # larger then one
END{ 
  print "sum: ",NR, "val1: ", val1, "("val1*100/NR"%)", "val2: ", val2, "("val2*100/NR"%)"
}' infile
Output looks like this:
Code:
sum:  13 val1:  7 (53.8462%) val2:  6 (46.1538%)
 
1 members found this post helpful.
Old 12-08-2012, 09:40 AM   #3
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,513

Rep: Reputation: 1895Reputation: 1895Reputation: 1895Reputation: 1895Reputation: 1895Reputation: 1895Reputation: 1895Reputation: 1895Reputation: 1895Reputation: 1895Reputation: 1895
As druuna has abley answered the important question, just let me add a correction in wasted code:
Code:
# below is a useless use of echo
prctg2=$(echo | awk "{print $val2*100/$sum}")

prctg2=$(awk 'BEGIN{print $val2*100/$sum}' val2=$val2 sum=$sum)
# the above negates the problem of letting the shell interfere with any of the data
 
1 members found this post helpful.
Old 12-09-2012, 04:44 AM   #4
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374
@shivaa: I noticed you read grail's and my reply. If this is solved can you put up the [SOLVED] tag...

BTW: If you ever do need the calculated values outside of awk you can do the following:
Code:
#!/bin/bash
# option 1
echo "------------------------------------"
echo -e "One way (using process substitution)\n"

while read SUM VAL1 PCT1 VAL2 PCT2
do
  # do your stuff here
  echo "Sum         : $SUM"
  echo "Val1        : $VAL1"
  echo "Percentage1 : $PCT1"
  echo "Val2        : $VAL2"
  echo "Percentage2 : $PCT2"
done < <(awk 'BEGIN{val1=0;val2=0}/^[\.0]\./{val1++}/^[1-9]/{val2++}END{ print NR, val1, val1*100/NR, val2, val2*100/NR}' infile)

# option 2
echo -e "\n-----------------------------"
echo -e "An alternative (using a pipe)\n"

awk 'BEGIN{val1=0;val2=0}/^[\.0]\./{val1++}/^[1-9]/{val2++}END{ print NR, val1, val1*100/NR, val2, val2*100/NR}' infile | \
while read SUM VAL1 PCT1 VAL2 PCT2
do
  # do your stuff here
  echo "Sum         : $SUM"
  echo "Val1        : $VAL1"
  echo "Percentage1 : $PCT1"
  echo "Val2        : $VAL2"
  echo "Percentage2 : $PCT2"
done
 
1 members found this post helpful.
Old 12-09-2012, 08:36 AM   #5
shivaa
Senior Member
 
Registered: Jul 2012
Location: Grenoble, Fr.
Distribution: Sun Solaris, RHEL, Ubuntu, Debian 6.0
Posts: 1,797
Blog Entries: 4

Original Poster
Rep: Reputation: 285Reputation: 285Reputation: 285
Thanks @druuna and @grail. I actually have not yet tested it, that's why kept this post unsolved.
 
Old 12-11-2012, 05:46 AM   #6
shivaa
Senior Member
 
Registered: Jul 2012
Location: Grenoble, Fr.
Distribution: Sun Solaris, RHEL, Ubuntu, Debian 6.0
Posts: 1,797
Blog Entries: 4

Original Poster
Rep: Reputation: 285Reputation: 285Reputation: 285
Code:
# below is a useless use of echo
prctg2=$(echo | awk "{print $val2*100/$sum}")

prctg2=$(awk 'BEGIN{print $val2*100/$sum}' val2=$val2 sum=$sum)
# the above negates the problem of letting the shell interfere with any of the data
Hi Grail, as you said above, after invoking both bolow two cmds:
Code:
val2=$(awk '/^[1-9]/ {val2++} END{ print val2}' file1)
sum=$(awk 'END{print NR}' file1)
When I invoke:
Code:
prctg2=$(awk 'BEGIN{print $val2*100/$sum}' val2=$val2 sum=$sum)
It's giving me errors, like awk: division by zero or nawk: illegal field $().. . I tried simple /usr/bin/awk as well as /usr/xpg4/bin/awk. Also could you explain the use of val2=$val2 sum=$sum after print action?

Last edited by shivaa; 12-11-2012 at 06:11 AM.
 
Old 12-11-2012, 06:40 AM   #7
shivaa
Senior Member
 
Registered: Jul 2012
Location: Grenoble, Fr.
Distribution: Sun Solaris, RHEL, Ubuntu, Debian 6.0
Posts: 1,797
Blog Entries: 4

Original Poster
Rep: Reputation: 285Reputation: 285Reputation: 285
Code:
awk 'BEGIN{
  val1 = 0 ;
  val2 = 0
}
/^[\.0]\./ { val1++ } # less then 1
/^[1-9]/   { val2++ } # larger then one
END{ 
  print "sum: ",NR, "val1: ", val1, "("val1*100/NR"%)", "val2: ", val2, "("val2*100/NR"%)"
}' infile
Hi Druuna, your solution is wokring fine. But still I find myself confused with searching based on patterns, so:

1. Can I make following changes, instead of using patterns? (assuming that infile has only numerical floating numbers):
Code:
$1 < 1 { val1++ } # less then 1
$1 > 1 { val2++ } # larger then one
2. (Please do not mind if I ask that.. ) Does /^[\.0]\./ means all values starting with .0? And what does \./ means here... all values that are .0. ??

Last edited by shivaa; 12-11-2012 at 06:44 AM.
 
Old 12-11-2012, 07:21 AM   #8
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374
Quote:
Originally Posted by shivaa View Post
1. Can I make following changes, instead of using patterns? (assuming that infile has only numerical floating numbers):
Code:
$1 < 1 { val1++ } # less then 1
$1 > 1 { val2++ } # larger then one
Have you tried? You do need to make one of the entries look like >= or <= otherwise 1.0000 won't be detected.

Quote:
Originally Posted by shivaa
2. (Please do not mind if I ask that.. ) Does /^[\.0]\./ means all values starting with .0? And what does \./ means here... all values that are .0. ??
Code:
^[\.0]\.
Values that start with a dot OR a 0 (zero) followed by a dot.

I do believe a I made a mistake in the original regexp, but it works for your data because all the entries seem to be starting with a leading zero (0.01 vs .01). It can be rewritten as:
Code:
^0\.
Code:
^[1-9]
Values that start with 1 -> 9
 
1 members found this post helpful.
Old 12-11-2012, 08:10 AM   #9
shivaa
Senior Member
 
Registered: Jul 2012
Location: Grenoble, Fr.
Distribution: Sun Solaris, RHEL, Ubuntu, Debian 6.0
Posts: 1,797
Blog Entries: 4

Original Poster
Rep: Reputation: 285Reputation: 285Reputation: 285
Quote:
^[\.0]\. Values that start with a dot OR a 0 (zero) followed by a dot.
Ooopps... From the beginning I am considering such patterns as .0, which actually means that values beginning either with a "." or a "0", not with .0.

For instance (please correct me, if I am wrong):
^[abc] .....Means all values beginning either with a a or b or c. It does not mean all values beginning with abc! I hope it will clear all my previous doubts as well .

Likewise, if I want to search, $1<=0.01; 0.01 < $1 < 0.1; $1 >=0.1 (i.e. 3 ranges), then also I can use such patterns using such regexp! Will sure try it.
Many thanks druuna... I am short of words! You've done a great job!!

---------------------------

Hi Grail, waiting for your response now (please refer my reply above).
 
Old 12-11-2012, 08:26 AM   #10
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374
Quote:
Originally Posted by shivaa View Post
For instance (please correct me, if I am wrong):
^[abc] .....Means all values beginning either with a a or b or c. It does not mean all values beginning with abc! I hope it will clear all my previous doubts as well .
That is correct.

You might want to revisit this site: Regex Tutorial, Examples and Reference especially: Character Classes or Character Sets

And this from the wiki page:
Quote:
[ ]
A bracket expression. Matches a single character that is contained within the brackets. For example, [abc] matches "a", "b", or "c". [a-z] specifies a range which matches any lowercase letter from "a" to "z". These forms can be mixed: [abcx-z] matches "a", "b", "c", "x", "y", or "z", as does [a-cx-z].

The - character is treated as a literal character if it is the last or the first (after the ^) character within the brackets: [abc-], [-abc]. Note that backslash escapes are not allowed. The ] character can be included in a bracket expression if it is the first (after the ^) character: []abc].
 
Old 12-11-2012, 09:18 AM   #11
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,513

Rep: Reputation: 1895Reputation: 1895Reputation: 1895Reputation: 1895Reputation: 1895Reputation: 1895Reputation: 1895Reputation: 1895Reputation: 1895Reputation: 1895Reputation: 1895
Quote:
It's giving me errors, like awk: division by zero or nawk: illegal field $().. . I tried simple /usr/bin/awk as well as /usr/xpg4/bin/awk. Also could you explain the use of val2=$val2 sum=$sum after print action?
I cannot vouch for nawk. I am using gawk so maybe nawk does not like the setting of variables after. You could simply try using the -v option to set them.

Placing the setting of the variables after the quoted code is just a preference I have for setting multiple variables instead of using -v several times.
 
Old 12-11-2012, 11:16 AM   #12
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian
Posts: 2,455

Rep: Reputation: 843Reputation: 843Reputation: 843Reputation: 843Reputation: 843Reputation: 843Reputation: 843
Quote:
Originally Posted by grail View Post
I cannot vouch for nawk. I am using gawk so maybe nawk does not like the setting of variables after. You could simply try using the -v option to set them.
You'll need to use -v for gawk as well, the plain var=val form performs the assignment after the BEGIN rule has been run:

Quote:
6.1.3.2 Assigning Variables on the Command Line

When the assignment is preceded with the -v option ... the variable is set at the very beginning, even before the BEGIN rules execute. ... Otherwise, the variable assignment is performed ... after the processing of the preceding input file argument.
 
1 members found this post helpful.
Old 12-11-2012, 06:50 PM   #13
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,513

Rep: Reputation: 1895Reputation: 1895Reputation: 1895Reputation: 1895Reputation: 1895Reputation: 1895Reputation: 1895Reputation: 1895Reputation: 1895Reputation: 1895Reputation: 1895
Thanks ntubski ... I was not aware of this variation
 
Old 12-31-2012, 04:56 AM   #14
shivaa
Senior Member
 
Registered: Jul 2012
Location: Grenoble, Fr.
Distribution: Sun Solaris, RHEL, Ubuntu, Debian 6.0
Posts: 1,797
Blog Entries: 4

Original Poster
Rep: Reputation: 285Reputation: 285Reputation: 285
Many thanks @druuna & @grail!
Ciao!
 
  


Reply

Tags
awk


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED]Wierd AWK behavior / AWK not reading first line. Involar Linux - Newbie 9 11-28-2012 10:53 AM
awk error awk: line 2: missing } near end of file boscop Linux - Networking 2 04-08-2012 10:49 AM
[SOLVED] call awk from bash script behaves differently to awk from CLI = missing newlines titanium_geek Programming 4 05-26-2011 09:06 PM
[SOLVED] awk: how can I assign value to a shell variable inside awk? quanba Programming 6 03-23-2010 02:18 AM
shell command using awk fields inside awk one71 Programming 6 06-26-2008 04:11 PM


All times are GMT -5. The time now is 05:39 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration