Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game. |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
 |
11-23-2010, 10:57 AM
|
#1
|
Member
Registered: Apr 2009
Location: Boston MA
Distribution: CentOS 6.2 x86_64 GNU/Linux
Posts: 59
Rep:
|
find the total of numbers that are higher than x in a text file with numbers (using awk??)
Hi there,
I have a file: list.txt that contains this:
Code:
1
2
3
5
4
6
9
8
2
1
3
6
4
7
9
and want to count the total of numbers that are higher than x. If x is 7 than the answer in my example above would be 3. FYI: My actual files have hundreds of values in one column and also contain decimals.
I can imagine that awk may do the trick, but could not find it in my regular references:
http://www.grymoire.com/Unix/Awk.html
http://www.gnu.org/software/gawk/manual/gawk.html
Your help is appreciated a lot!
Last edited by Mike_V; 11-23-2010 at 11:00 AM.
|
|
|
11-23-2010, 11:07 AM
|
#2
|
LQ Guru
Registered: Nov 2003
Location: Canada
Distribution: distro hopper
Posts: 11,352
|
Code:
awk '{if ($1>7) print $1}' input.txt | wc -l
Last edited by dugan; 11-23-2010 at 11:20 AM.
|
|
1 members found this post helpful.
|
11-23-2010, 11:23 AM
|
#3
|
Member
Registered: Apr 2009
Location: Boston MA
Distribution: CentOS 6.2 x86_64 GNU/Linux
Posts: 59
Original Poster
Rep:
|
Thanks dugan. It indeed works for the sample data.
There are two issues:
1.
If I try your line and use not 8-9 but 8-10, it doesn't work. Even if I add 10 and 11 to my list (is it limited to 1 digit?)
2.
More importantly, as you also acknowledged, my real data is more complex.
Here is a sample if the read data:
Code:
0.0820013
0.0294894
0.0269461
0.0327966
0.0877525
0.0385039
0.0271613
0.0284816
0.0623967
0.0427087
and I would like to know how many are above, say, 0.05.
Thanks!
Last edited by Mike_V; 11-23-2010 at 11:24 AM.
|
|
|
11-23-2010, 11:26 AM
|
#4
|
LQ Guru
Registered: Nov 2003
Location: Canada
Distribution: distro hopper
Posts: 11,352
|
I edited my post to give you a better answer.
I don't mind providing a single line to help with homework, but may I ask what the real-world problem was?
The original solution, btw, was indeed limited to one digit integers:
Code:
cat input.txt | egrep '^[8-9]$' | wc -l
I recommend learning enough about regular expressions to understand why.
Last edited by dugan; 11-23-2010 at 11:33 AM.
|
|
1 members found this post helpful.
|
11-23-2010, 12:01 PM
|
#5
|
Member
Registered: Apr 2009
Location: Boston MA
Distribution: CentOS 6.2 x86_64 GNU/Linux
Posts: 59
Original Poster
Rep:
|
your edited first post indeed does the trick! Very nice. Thanks Dugan!
My real world problem: People were lying in an MRI scanner to measure changes in neuronal activity over time. During 10 minutes their brain is measured every 2.5 seconds (one measure is called one volume in 3D, there are 240 volumes, creating a 4D dataset). People are instructed to lie as still as possible but regardless people move more or less. We perform rigid body motion correction on each volume (fitting each volume to the first volume and storing this as a new 4D dataset). The file above is the relative motion correction in 3-D space (so in x-y-z direction) in millimeters. And with relative I mean changes from one volume to the next, and not absolute (change from the first volume). I want to know how many times a person has moved more than .5 mm. That's what your one liner is going to do... I have to do this for a couple of hundred subjects. So my life just got a lot easier, thanks! I'm a psychologist working with brain data... in the process I've learned some programming, but I should indeed learn a bit more about regular expressions.
Last edited by Mike_V; 11-23-2010 at 12:06 PM.
|
|
|
11-23-2010, 01:27 PM
|
#6
|
LQ Guru
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,928
|
You can also do it completely in awk:
Code:
awk '{ if ( $1 > 0.05 ) num++ }END{ print num }' test.txt
Awk is neat because it has C-like and sometimes C compatible syntax (printf). Great for working with tables of data, and with floating point arithmetic.
|
|
1 members found this post helpful.
|
11-23-2010, 03:10 PM
|
#7
|
Member
Registered: Apr 2009
Location: Boston MA
Distribution: CentOS 6.2 x86_64 GNU/Linux
Posts: 59
Original Poster
Rep:
|
H_TeXMeX_H: also big thanks! This is even easier to combine in an awk one-liner with some other stats that I need to extract.
|
|
|
11-23-2010, 05:29 PM
|
#8
|
LQ Guru
Registered: Sep 2009
Location: Perth
Distribution: Arch
Posts: 10,037
|
Probably not of great value here but you can also use awk to condense things like this (at the expense of readability):
Code:
awk '{c[($1>0.5)]++}END{print c[1]}' file
|
|
1 members found this post helpful.
|
11-23-2010, 07:57 PM
|
#9
|
Member
Registered: Apr 2009
Location: Boston MA
Distribution: CentOS 6.2 x86_64 GNU/Linux
Posts: 59
Original Poster
Rep:
|
One more additional question (and it's not crucial, but it would be nice to solve). If I run this one:
Code:
awk '{ if ( $1 > 0.05 ) num++ }END{ print num }' test.txt
and for one file there is not a single number larger than 0.05, the output will be empty (=nothing).
Is it easy to output a zero in that case (how do the "if then else" rules work in awk??)
|
|
|
11-23-2010, 09:23 PM
|
#11
|
LQ Guru
Registered: Sep 2009
Location: Perth
Distribution: Arch
Posts: 10,037
|
or
Code:
END{if(c)print c; else print 0}
Last edited by grail; 11-23-2010 at 09:25 PM.
|
|
|
11-24-2010, 02:52 AM
|
#12
|
LQ Guru
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,928
|
Quote:
Originally Posted by Mike_V
One more additional question (and it's not crucial, but it would be nice to solve). If I run this one:
Code:
awk '{ if ( $1 > 0.05 ) num++ }END{ print num }' test.txt
and for one file there is not a single number larger than 0.05, the output will be empty (=nothing).
Is it easy to output a zero in that case (how do the "if then else" rules work in awk??)
|
It's true in that case case it outputs nothing.
Yes, C-like syntax = if else clauses:
Code:
awk '{ if ( $1 > 0.05 ) num++; else num=0 }END{ print num }' test.txt
or like grail suggests you can initialize it yourself for safety (I usually do anyway to avoid stuff like this):
Code:
awk 'BEGIN{num=0}{ if ( $1 > 0.05 ) num++ }END{ print num }' test.txt
Last edited by H_TeXMeX_H; 11-24-2010 at 02:55 AM.
|
|
|
11-24-2010, 09:51 AM
|
#13
|
Member
Registered: Apr 2009
Location: Boston MA
Distribution: CentOS 6.2 x86_64 GNU/Linux
Posts: 59
Original Poster
Rep:
|
excellent! thanks again
|
|
|
All times are GMT -5. The time now is 07:47 AM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|