Hi,
I am tring to calculate standard deviation of a vector.
I wrote short R script that get a vector and calculate the sd.
Code:
#!/usr/bin/env Rscript
args<-commandArgs(TRUE)
openfile <- args[1]
md=read.table(openfile)
x=as.numeric(unlist(md))
sd(x)
i am executing it from the terminal like this:
./sd.script.R vec
The example vector is this:
1404208
1470129
1384566
1572675
1450707
1410318
1458955
1462355
1469413
1467187
The output is this:
51702.08
Also when i am using stdev function of excel.
On stdev in excel i know that it based on sample.
But i didnt find in the description of R anything about sampling.. (
http://stat.ethz.ch/R-manual/R-patch...s/html/sd.html)
Anyway, i did a lot of analysis using the R function and now i want to change my sciprt to work only with bash operations, but to have the same results...
I found:
Code:
awk '{sum+=$1; sumsq+=$1*$1}END{print sqrt(sumsq/NR - (sum/NR)**2)}' vec
and
Code:
awk '{sum+=$1; array[NR]=$1} END {for(x=1;x<=NR;x++){sumsq+=((array[x]-(sum/NR))**2);}print sqrt(sumsq/NR)}' vec
and many others..
But they all calculate the sd different than R, and their output is:
49048.9
I guess the result is different because those are not sampling the data as R do..
And this is a big difference for analysis..
Any idea on how to have the same results?
BTW , the vector that i showed for example is smaller version of what i actually have. My vectors are 600+ values.
Thanks!