LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (http://www.linuxquestions.org/questions/programming-9/)
-   -   find the total of numbers that are higher than x in a text file with numbers (using awk??) (http://www.linuxquestions.org/questions/programming-9/find-the-total-of-numbers-that-are-higher-than-x-in-a-text-file-with-numbers-using-awk-846045/)

Mike_V 11-23-2010 10:57 AM

find the total of numbers that are higher than x in a text file with numbers (using awk??)
 
Hi there,

I have a file: list.txt that contains this:

Code:

1
2
3
5
4
6
9
8
2
1
3
6
4
7
9

and want to count the total of numbers that are higher than x. If x is 7 than the answer in my example above would be 3. FYI: My actual files have hundreds of values in one column and also contain decimals.

I can imagine that awk may do the trick, but could not find it in my regular references:
http://www.grymoire.com/Unix/Awk.html
http://www.gnu.org/software/gawk/manual/gawk.html

Your help is appreciated a lot!

dugan 11-23-2010 11:07 AM

Code:

awk '{if ($1>7) print $1}' input.txt | wc -l

Mike_V 11-23-2010 11:23 AM

Thanks dugan. It indeed works for the sample data.

There are two issues:

1.
If I try your line and use not 8-9 but 8-10, it doesn't work. Even if I add 10 and 11 to my list (is it limited to 1 digit?)

2.
More importantly, as you also acknowledged, my real data is more complex.
Here is a sample if the read data:

Code:

0.0820013
0.0294894
0.0269461
0.0327966
0.0877525
0.0385039
0.0271613
0.0284816
0.0623967
0.0427087

and I would like to know how many are above, say, 0.05.

Thanks!

dugan 11-23-2010 11:26 AM

I edited my post to give you a better answer.

I don't mind providing a single line to help with homework, but may I ask what the real-world problem was?

The original solution, btw, was indeed limited to one digit integers:

Code:

cat input.txt | egrep '^[8-9]$' | wc -l
I recommend learning enough about regular expressions to understand why.

Mike_V 11-23-2010 12:01 PM

your edited first post indeed does the trick! Very nice. Thanks Dugan!

My real world problem: People were lying in an MRI scanner to measure changes in neuronal activity over time. During 10 minutes their brain is measured every 2.5 seconds (one measure is called one volume in 3D, there are 240 volumes, creating a 4D dataset). People are instructed to lie as still as possible but regardless people move more or less. We perform rigid body motion correction on each volume (fitting each volume to the first volume and storing this as a new 4D dataset). The file above is the relative motion correction in 3-D space (so in x-y-z direction) in millimeters. And with relative I mean changes from one volume to the next, and not absolute (change from the first volume). I want to know how many times a person has moved more than .5 mm. That's what your one liner is going to do... I have to do this for a couple of hundred subjects. So my life just got a lot easier, thanks! I'm a psychologist working with brain data... in the process I've learned some programming, but I should indeed learn a bit more about regular expressions.

H_TeXMeX_H 11-23-2010 01:27 PM

You can also do it completely in awk:

Code:

awk '{ if ( $1 > 0.05 ) num++ }END{ print num }' test.txt
Awk is neat because it has C-like and sometimes C compatible syntax (printf). Great for working with tables of data, and with floating point arithmetic.

Mike_V 11-23-2010 03:10 PM

H_TeXMeX_H: also big thanks! This is even easier to combine in an awk one-liner with some other stats that I need to extract.

grail 11-23-2010 05:29 PM

Probably not of great value here but you can also use awk to condense things like this (at the expense of readability):
Code:

awk '{c[($1>0.5)]++}END{print c[1]}' file

Mike_V 11-23-2010 07:57 PM

One more additional question (and it's not crucial, but it would be nice to solve). If I run this one:

Code:

awk '{ if ( $1 > 0.05 ) num++ }END{ print num }' test.txt
and for one file there is not a single number larger than 0.05, the output will be empty (=nothing).

Is it easy to output a zero in that case (how do the "if then else" rules work in awk??)

barriehie 11-23-2010 09:20 PM

@ Mike_V; In regards to learning a bit more about regular expressions, regexp, this got me started. http://www.regular-expressions.info/tutorial.html

grail 11-23-2010 09:23 PM

Code:

BEGIN{c=0}
or
Code:

END{if(c)print c; else print 0}

H_TeXMeX_H 11-24-2010 02:52 AM

Quote:

Originally Posted by Mike_V (Post 4168832)
One more additional question (and it's not crucial, but it would be nice to solve). If I run this one:

Code:

awk '{ if ( $1 > 0.05 ) num++ }END{ print num }' test.txt
and for one file there is not a single number larger than 0.05, the output will be empty (=nothing).

Is it easy to output a zero in that case (how do the "if then else" rules work in awk??)

It's true in that case case it outputs nothing.

Yes, C-like syntax = if else clauses:

Code:

awk '{ if ( $1 > 0.05 ) num++; else num=0 }END{ print num }' test.txt
or like grail suggests you can initialize it yourself for safety (I usually do anyway to avoid stuff like this):

Code:

awk 'BEGIN{num=0}{ if ( $1 > 0.05 ) num++ }END{ print num }' test.txt

Mike_V 11-24-2010 09:51 AM

excellent! thanks again


All times are GMT -5. The time now is 05:02 AM.