LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 08-19-2013, 05:07 AM   #1
gav251
LQ Newbie
 
Registered: Feb 2010
Posts: 18

Rep: Reputation: 0
linear reg using awk script


Hi all
I need an awk scipt that uses columns $1 (x) and $3 (y) from a file and print out the regression data to a new file. the following code handles the mathematics

'# Usage: awk -f linreg.awk file
#

{ x[NR] = $1; y[NR] = $2;
sx += x[NR]; sy += y[NR];
sxx += x[NR]*x[NR];
sxy += x[NR]*y[NR];
}

END{
det = NR*sxx - sx*sx;
a = (NR*sxy - sx*sy)/det;
b = (-sx*sxy+sxx*sy)/det;
print a, b;
# for(i=1;i<=NR;i++) print x[i],a*x[i]+b;
}
'
gav.
 
Old 08-19-2013, 06:05 AM   #2
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405
Hi gav251,

As mentioned in one of your earlier threads: Please post a relevant example of the input file and the expected output you require. Details is what we need/want.
 
1 members found this post helpful.
Old 08-19-2013, 09:54 AM   #3
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
I am not sure I follow ... you say you want regression data and that the code you have presented produces the the data ... what is left for us to answer?
 
1 members found this post helpful.
Old 08-20-2013, 04:20 AM   #4
gav251
LQ Newbie
 
Registered: Feb 2010
Posts: 18

Original Poster
Rep: Reputation: 0
hi
using awk -f gives
-nan -nan
the file looks
1.00 1.61 0.476234
2.00 1.78 0.576613
3.00 1.81 0.593327
i need reg.fil
which has slope and intercept
gav.
 
Old 08-20-2013, 04:43 AM   #5
Firerat
Senior Member
 
Registered: Oct 2008
Distribution: Debian sid
Posts: 2,683

Rep: Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783
Not sure I understand what you want


This script ( LinerReg.awk )
Code:
#!/usr/bin/awk -f

{ x[NR] = $1; y[NR] = $2;
sx += x[NR]; sy += y[NR];
sxx += x[NR]*x[NR];
sxy += x[NR]*y[NR];
}

END{
det = NR*sxx - sx*sx;
a = (NR*sxy - sx*sy)/det;
b = (-sx*sxy+sxx*sy)/det;
print a, b;
# for(i=1;i<=NR;i++) print x[i],a*x[i]+b;
}

your input ( Input.file )
Code:
1.00 1.61 0.476234
2.00 1.78 0.576613
3.00 1.81 0.593327
Code:
./LinerReg.awk Input.file
gives
Code:
0.1 1.53333
if you want to redirect that to a file,
Code:
./LinerReg.awk Input.file > Output.file
 
1 members found this post helpful.
Old 08-20-2013, 04:51 PM   #6
PTrenholme
Senior Member
 
Registered: Dec 2004
Location: Olympia, WA, USA
Distribution: Fedora, (K)Ubuntu
Posts: 4,187

Rep: Reputation: 354Reputation: 354Reputation: 354Reputation: 354
Perhaps you wanted something like this (although this uses a gawk extension - the second END section):
Code:
#!/bin/gawk -f

{ x[NR] = $1 ; y[NR] = $2;
  sx += x[NR]; sy += y[NR];
  sxx += x[NR]*x[NR];
  sxy += x[NR]*y[NR];
}

END{
  det = NR*sxx - sx*sx;
  a = (NR*sxy - sx*sy)/det;
  b = (-sx*sxy+sxx*sy)/det;
  printf("Slope=%5.3f, Intercept=%5.3f\n",a, b);
}
END{
  printf("\nX\tApprox\tError\n")
  for(i=1;i<=NR;i++) {
    estimate=a*x[i]+b;
    printf("%5.3f\t%5.3f\t%5.3f\n",x[i],estimate,estimate-y[i]);
  }
}
For your sample data, this is the output:
Code:
$ ./gav251.gawk data 
Slope=0.100, Intercept=1.533

X       Approx  Error
1.000   1.633   0.023
2.000   1.733   -0.047
3.000   1.833   0.023
 
1 members found this post helpful.
Old 08-21-2013, 04:20 AM   #7
gav251
LQ Newbie
 
Registered: Feb 2010
Posts: 18

Original Poster
Rep: Reputation: 0
hi
PTrenholme
this is perfect but I keep getting a 'file not found' even though I have gawk and permissions ok
Firerat also perfect but how do I change to $1 vs $3
many many thanks
Gav
 
Old 08-21-2013, 04:39 AM   #8
Firerat
Senior Member
 
Registered: Oct 2008
Distribution: Debian sid
Posts: 2,683

Rep: Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783
sorry, I don't understand your question.

ref. 'file not found'

PTrenholme 'made up' a file name "gav251.gawk" substitute for the name of your gawk script ( including path )
 
Old 08-21-2013, 08:57 AM   #9
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
In addition to suggestion above the same goes for the file you are passing to the awk script.
 
Old 08-21-2013, 09:21 AM   #10
PTrenholme
Senior Member
 
Registered: Dec 2004
Location: Olympia, WA, USA
Distribution: Fedora, (K)Ubuntu
Posts: 4,187

Rep: Reputation: 354Reputation: 354Reputation: 354Reputation: 354
From your question I assume that you've not been using Linux very long. So here's a "step-by-step" description of how I wrote the script, prepared the data file, and tested the script. (In what follows, everything after the sharp (#), including it, is a comment, not something you need to enter.

All of what follows is assumed to take place in a terminal window. Before you start, go to the post, above, where the script is displayed, highlight the code inside the code block, and copy it to your clipboard. Then open a text editor, paste the code into the editor and save it in some folder. (I used the kate editor, and saved the script in ~/tmp/awk/gav251/gav251.gawk I had, of course, created [mkdir -p ~/tmp/awk/gav251/] that directory before I saved anything there.) Most editors will save files in ~/Documents by default. [kate is a KDE programming editor, but, for this, a simple "Notepad" clone (e.g., leafpad, etc.) would suffice. Do not use a word processor editor. They add "markup" characters that really mess up the script file .)

Oh, in the following, the $ and > are the bash input prompts which should be automatically displayed.
Code:
$ cd ~/tmp/awk/gav251/ # Change your working directory to the folder in which the script was saved.
$ chmod +x gav251.gawk # Make the script executable
$ cat <<EOF >data      # You may already have your data file. This is just an example. The name, "data," is arbitrary.
> 1.00 1.61 0.476234
> 2.00 1.78 0.576613
> 3.00 1.81 0.593327
> EOF
$ ./gav251.gawk  data  # And run the script using the new data file
Note that the file names are quite arbitrary. You're free to use anything you want to use. (Of course, short name are easier to type, and names containing special characters and to by "quoted" when used.)

Oh, to make a change in the script, just open it in the text editor, make the change, and save it. If you use that same name when you save it, you won't need to run the chmod again, but doing so will do no harm.
 
Old 08-22-2013, 04:28 AM   #11
gav251
LQ Newbie
 
Registered: Feb 2010
Posts: 18

Original Poster
Rep: Reputation: 0
Hi
Firerat
if I change line 3 $2 to $3 the output is
0 0
I need to use columns 1 and 3 from my data file.

Hi PTrenholme
sorry initially I missed '.' in filename so got fnf error. Now script is 777 and
./s2.gawk: 4: ./s2.gawk: x[NR]: Permission denied
./s2.gawk: 4: ./s2.gawk: y[NR]: Permission denied
./s2.gawk: 5: ./s2.gawk: sx: Permission denied
./s2.gawk: 5: ./s2.gawk: sy: Permission denied
./s2.gawk: 6: ./s2.gawk: sxx: Permission denied
Cannot open load file '+='
line 0: util.c: No such file or directory

./s2.gawk: 10: ./s2.gawk: END{: Permission denied
./s2.gawk: 11: ./s2.gawk: det: Permission denied
./s2.gawk: 12: ./s2.gawk: Syntax error: "(" unexpected
I know this is 99% done.
Gav.
 
Old 08-22-2013, 04:57 AM   #12
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
I think we need to back up a bit here as the error messages are becoming more misleading

Using code tags around everything you paste in, please provide the following:
Code:
$ cat data.file
<output here>
$ cat s2.gawk
<output here>
$ ls -l data.file s2.gawk
<output here>
$ gawk --version
<output here>
$ ./s2.gawk data.file
<output here>
Please replace all <output here> with your actual output
 
Old 08-22-2013, 05:01 AM   #13
Firerat
Senior Member
 
Registered: Oct 2008
Distribution: Debian sid
Posts: 2,683

Rep: Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783
I get
Code:
0.0585465 0.431632
and all those permission denied ..
You are missing the shebang

Code:
#!/usr/bin/gawk -f

{ x[NR] = $1 ; y[NR] = $2;
  sx += x[
confirm the path to gawk with
Code:
which gawk
 
Old 08-23-2013, 05:10 AM   #14
gav251
LQ Newbie
 
Registered: Feb 2010
Posts: 18

Original Poster
Rep: Reputation: 0
Hi
it is awk that I am unfamiliar with since I am a biophysicist not programmer, I finnd it is a v. useful UNIX tool. Pls don't concern about general admin (using since 93). When I mentioned 'almost there' it's because the Firerat script works fine it is when I put $3 in line 3 I get
linreg.awk rmsd.tab
0 0

I don't see how to fix this. How do I modify this

#!/usr/bin/awk -f

{ x[NR] = $1; y[NR] = $3;
sx += x[NR]; sy += y[NR];
sxx += x[NR]*x[NR];
sxy += x[NR]*y[NR];
}

END{
det = NR*sxx - sx*sx;
a = (NR*sxy - sx*sy)/det;
b = (-sx*sxy+sxx*sy)/det;
print a, b;
# for(i=1;i<=NR;i++) print x[i],a*x[i]+b;
}

then you guys are rid of me..
Gav.
 
Old 08-23-2013, 05:39 AM   #15
Firerat
Senior Member
 
Registered: Oct 2008
Distribution: Debian sid
Posts: 2,683

Rep: Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783
Firstly, it was your script, I just added the shebang ( the #! bit )
Second, [code]your code here[/code] make things much easier to read ( we are 'coders' we care about that )

Third, I copy'n'pasted your code, it worked just fine.. I even switched it to nawk, still fine.

So, the only thing I can think of right now is your input is different

I've tried to 'pollute' mine, but not succeeded

can you post the output of
Code:
cat -A rmsd.tab
Warning:


[code]
your output here
[/code]


if it is very long..
Code:
cat -A rmsd.tab | head

Last edited by Firerat; 08-23-2013 at 05:40 AM.
 
1 members found this post helpful.
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Perl script for linear interpolation eminempark Programming 11 01-21-2013 12:42 AM
help reg. awk command phone_book Linux - Newbie 1 03-20-2009 09:50 PM
LXer: Linear Optimization with the GNU Linear Programming Kit LXer Syndicated Linux News 0 08-12-2006 04:21 AM
Clarification needed in this reg exp/awk code mselvam Programming 3 07-09-2005 05:26 PM
need clarification in this reg exp/awk code mselvam Linux - General 1 07-08-2005 03:57 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 05:09 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration