LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 12-08-2012, 08:53 AM   #1
shivaa
Senior Member
 
Registered: Jul 2012
Location: Grenoble, Fr.
Distribution: Sun Solaris, RHEL, Ubuntu, Debian 6.0
Posts: 1,800
Blog Entries: 4

Rep: Reputation: 286Reputation: 286Reputation: 286
Once again... awk.. awk... awk


I have a file file1, containing 50000 entries (numerical floating point numbers only).

And I am doing:
1. Sum of total no. of lines
2. Sum of lines containing values (i.e. val1) that are less than 1
3. Sum of lines containing values (i.e. val2) that are greater than 1
4. Percentage of both val1 and val2.

And I did (Note: All below code is part of a script which is generating file1):
Code:
#sum
sum=$(awk 'END{print NR}' file1)

#calculating val1
val1=$(awk '$1<1 { n++ } END{ print n }' file1)
prctg1=$(echo | awk "{print $val1*100/$sum}")

#calculating val2
val2=$(awk '$1>1 { n++ } END{ print n }' file1)
prctg2=$(echo | awk "{print $val2*100/$sum}")

echo "$sum\t$prctg1\t$prctg2"
It's fine upto this. But I want to combine both val1 and perctg1 commands in a one-liner awk code. I tried, but perhaps making some syntax mistake and I have no clue! So any suggestions that how can I combine them?

BTW, LQ has always been so helpful to me. Infact, I am in learning phase of awk, so I could applied what I've learned so far. But still expecting your help again

Last edited by shivaa; 12-08-2012 at 01:24 PM. Reason: Error rectified
 
Old 12-08-2012, 09:25 AM   #2
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405
So we meet again

Is there a specific reason why you use multiple awk statements? All your requirements can be done with one awk statement:
Code:
awk 'BEGIN{val1=0;val2=0}/^[\.0]\./{val1++}/^[1-9]/{val2++}END{ print "sum: ",NR, "val1: ", val1, "("val1*100/NR"%)", "val2: ", val2, "("val2*100/NR"%)"}' infile
or in a bit more readable form:
Code:
awk 'BEGIN{
  val1 = 0 ;
  val2 = 0
}
/^[\.0]\./ { val1++ } # less then 1
/^[1-9]/   { val2++ } # larger then one
END{ 
  print "sum: ",NR, "val1: ", val1, "("val1*100/NR"%)", "val2: ", val2, "("val2*100/NR"%)"
}' infile
Output looks like this:
Code:
sum:  13 val1:  7 (53.8462%) val2:  6 (46.1538%)
 
1 members found this post helpful.
Old 12-08-2012, 09:40 AM   #3
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
As druuna has abley answered the important question, just let me add a correction in wasted code:
Code:
# below is a useless use of echo
prctg2=$(echo | awk "{print $val2*100/$sum}")

prctg2=$(awk 'BEGIN{print $val2*100/$sum}' val2=$val2 sum=$sum)
# the above negates the problem of letting the shell interfere with any of the data
 
1 members found this post helpful.
Old 12-09-2012, 04:44 AM   #4
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405
@shivaa: I noticed you read grail's and my reply. If this is solved can you put up the [SOLVED] tag...

BTW: If you ever do need the calculated values outside of awk you can do the following:
Code:
#!/bin/bash
# option 1
echo "------------------------------------"
echo -e "One way (using process substitution)\n"

while read SUM VAL1 PCT1 VAL2 PCT2
do
  # do your stuff here
  echo "Sum         : $SUM"
  echo "Val1        : $VAL1"
  echo "Percentage1 : $PCT1"
  echo "Val2        : $VAL2"
  echo "Percentage2 : $PCT2"
done < <(awk 'BEGIN{val1=0;val2=0}/^[\.0]\./{val1++}/^[1-9]/{val2++}END{ print NR, val1, val1*100/NR, val2, val2*100/NR}' infile)

# option 2
echo -e "\n-----------------------------"
echo -e "An alternative (using a pipe)\n"

awk 'BEGIN{val1=0;val2=0}/^[\.0]\./{val1++}/^[1-9]/{val2++}END{ print NR, val1, val1*100/NR, val2, val2*100/NR}' infile | \
while read SUM VAL1 PCT1 VAL2 PCT2
do
  # do your stuff here
  echo "Sum         : $SUM"
  echo "Val1        : $VAL1"
  echo "Percentage1 : $PCT1"
  echo "Val2        : $VAL2"
  echo "Percentage2 : $PCT2"
done
 
1 members found this post helpful.
Old 12-09-2012, 08:36 AM   #5
shivaa
Senior Member
 
Registered: Jul 2012
Location: Grenoble, Fr.
Distribution: Sun Solaris, RHEL, Ubuntu, Debian 6.0
Posts: 1,800

Original Poster
Blog Entries: 4

Rep: Reputation: 286Reputation: 286Reputation: 286
Thanks @druuna and @grail. I actually have not yet tested it, that's why kept this post unsolved.
 
Old 12-11-2012, 05:46 AM   #6
shivaa
Senior Member
 
Registered: Jul 2012
Location: Grenoble, Fr.
Distribution: Sun Solaris, RHEL, Ubuntu, Debian 6.0
Posts: 1,800

Original Poster
Blog Entries: 4

Rep: Reputation: 286Reputation: 286Reputation: 286
Code:
# below is a useless use of echo
prctg2=$(echo | awk "{print $val2*100/$sum}")

prctg2=$(awk 'BEGIN{print $val2*100/$sum}' val2=$val2 sum=$sum)
# the above negates the problem of letting the shell interfere with any of the data
Hi Grail, as you said above, after invoking both bolow two cmds:
Code:
val2=$(awk '/^[1-9]/ {val2++} END{ print val2}' file1)
sum=$(awk 'END{print NR}' file1)
When I invoke:
Code:
prctg2=$(awk 'BEGIN{print $val2*100/$sum}' val2=$val2 sum=$sum)
It's giving me errors, like awk: division by zero or nawk: illegal field $().. . I tried simple /usr/bin/awk as well as /usr/xpg4/bin/awk. Also could you explain the use of val2=$val2 sum=$sum after print action?

Last edited by shivaa; 12-11-2012 at 06:11 AM.
 
Old 12-11-2012, 06:40 AM   #7
shivaa
Senior Member
 
Registered: Jul 2012
Location: Grenoble, Fr.
Distribution: Sun Solaris, RHEL, Ubuntu, Debian 6.0
Posts: 1,800

Original Poster
Blog Entries: 4

Rep: Reputation: 286Reputation: 286Reputation: 286
Code:
awk 'BEGIN{
  val1 = 0 ;
  val2 = 0
}
/^[\.0]\./ { val1++ } # less then 1
/^[1-9]/   { val2++ } # larger then one
END{ 
  print "sum: ",NR, "val1: ", val1, "("val1*100/NR"%)", "val2: ", val2, "("val2*100/NR"%)"
}' infile
Hi Druuna, your solution is wokring fine. But still I find myself confused with searching based on patterns, so:

1. Can I make following changes, instead of using patterns? (assuming that infile has only numerical floating numbers):
Code:
$1 < 1 { val1++ } # less then 1
$1 > 1 { val2++ } # larger then one
2. (Please do not mind if I ask that.. ) Does /^[\.0]\./ means all values starting with .0? And what does \./ means here... all values that are .0. ??

Last edited by shivaa; 12-11-2012 at 06:44 AM.
 
Old 12-11-2012, 07:21 AM   #8
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405
Quote:
Originally Posted by shivaa View Post
1. Can I make following changes, instead of using patterns? (assuming that infile has only numerical floating numbers):
Code:
$1 < 1 { val1++ } # less then 1
$1 > 1 { val2++ } # larger then one
Have you tried? You do need to make one of the entries look like >= or <= otherwise 1.0000 won't be detected.

Quote:
Originally Posted by shivaa
2. (Please do not mind if I ask that.. ) Does /^[\.0]\./ means all values starting with .0? And what does \./ means here... all values that are .0. ??
Code:
^[\.0]\.
Values that start with a dot OR a 0 (zero) followed by a dot.

I do believe a I made a mistake in the original regexp, but it works for your data because all the entries seem to be starting with a leading zero (0.01 vs .01). It can be rewritten as:
Code:
^0\.
Code:
^[1-9]
Values that start with 1 -> 9
 
1 members found this post helpful.
Old 12-11-2012, 08:10 AM   #9
shivaa
Senior Member
 
Registered: Jul 2012
Location: Grenoble, Fr.
Distribution: Sun Solaris, RHEL, Ubuntu, Debian 6.0
Posts: 1,800

Original Poster
Blog Entries: 4

Rep: Reputation: 286Reputation: 286Reputation: 286
Quote:
^[\.0]\. Values that start with a dot OR a 0 (zero) followed by a dot.
Ooopps... From the beginning I am considering such patterns as .0, which actually means that values beginning either with a "." or a "0", not with .0.

For instance (please correct me, if I am wrong):
^[abc] .....Means all values beginning either with a a or b or c. It does not mean all values beginning with abc! I hope it will clear all my previous doubts as well .

Likewise, if I want to search, $1<=0.01; 0.01 < $1 < 0.1; $1 >=0.1 (i.e. 3 ranges), then also I can use such patterns using such regexp! Will sure try it.
Many thanks druuna... I am short of words! You've done a great job!!

---------------------------

Hi Grail, waiting for your response now (please refer my reply above).
 
Old 12-11-2012, 08:26 AM   #10
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405
Quote:
Originally Posted by shivaa View Post
For instance (please correct me, if I am wrong):
^[abc] .....Means all values beginning either with a a or b or c. It does not mean all values beginning with abc! I hope it will clear all my previous doubts as well .
That is correct.

You might want to revisit this site: Regex Tutorial, Examples and Reference especially: Character Classes or Character Sets

And this from the wiki page:
Quote:
[ ]
A bracket expression. Matches a single character that is contained within the brackets. For example, [abc] matches "a", "b", or "c". [a-z] specifies a range which matches any lowercase letter from "a" to "z". These forms can be mixed: [abcx-z] matches "a", "b", "c", "x", "y", or "z", as does [a-cx-z].

The - character is treated as a literal character if it is the last or the first (after the ^) character within the brackets: [abc-], [-abc]. Note that backslash escapes are not allowed. The ] character can be included in a bracket expression if it is the first (after the ^) character: []abc].
 
Old 12-11-2012, 09:18 AM   #11
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
Quote:
It's giving me errors, like awk: division by zero or nawk: illegal field $().. . I tried simple /usr/bin/awk as well as /usr/xpg4/bin/awk. Also could you explain the use of val2=$val2 sum=$sum after print action?
I cannot vouch for nawk. I am using gawk so maybe nawk does not like the setting of variables after. You could simply try using the -v option to set them.

Placing the setting of the variables after the quoted code is just a preference I have for setting multiple variables instead of using -v several times.
 
Old 12-11-2012, 11:16 AM   #12
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian, Arch
Posts: 3,781

Rep: Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081
Quote:
Originally Posted by grail View Post
I cannot vouch for nawk. I am using gawk so maybe nawk does not like the setting of variables after. You could simply try using the -v option to set them.
You'll need to use -v for gawk as well, the plain var=val form performs the assignment after the BEGIN rule has been run:

Quote:
6.1.3.2 Assigning Variables on the Command Line

When the assignment is preceded with the -v option ... the variable is set at the very beginning, even before the BEGIN rules execute. ... Otherwise, the variable assignment is performed ... after the processing of the preceding input file argument.
 
1 members found this post helpful.
Old 12-11-2012, 06:50 PM   #13
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
Thanks ntubski ... I was not aware of this variation
 
Old 12-31-2012, 04:56 AM   #14
shivaa
Senior Member
 
Registered: Jul 2012
Location: Grenoble, Fr.
Distribution: Sun Solaris, RHEL, Ubuntu, Debian 6.0
Posts: 1,800

Original Poster
Blog Entries: 4

Rep: Reputation: 286Reputation: 286Reputation: 286
Many thanks @druuna & @grail!
Ciao!
 
  


Reply

Tags
awk



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED]Wierd AWK behavior / AWK not reading first line. Involar Linux - Newbie 9 11-28-2012 10:53 AM
awk error awk: line 2: missing } near end of file boscop Linux - Networking 2 04-08-2012 10:49 AM
[SOLVED] call awk from bash script behaves differently to awk from CLI = missing newlines titanium_geek Programming 4 05-26-2011 09:06 PM
[SOLVED] awk: how can I assign value to a shell variable inside awk? quanba Programming 6 03-23-2010 02:18 AM
shell command using awk fields inside awk one71 Programming 6 06-26-2008 04:11 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 04:48 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration