Hi,

I am tring to calculate standard deviation of a vector.

I wrote short R script that get a vector and calculate the sd.

Code:

#!/usr/bin/env Rscript
args<-commandArgs(TRUE)
openfile <- args[1]
md=read.table(openfile)
x=as.numeric(unlist(md))
sd(x)

i am executing it from the terminal like this:

./sd.script.R vec

The example vector is this:

1404208

1470129

1384566

1572675

1450707

1410318

1458955

1462355

1469413

1467187

The output is this:

51702.08

Also when i am using stdev function of excel.

On stdev in excel i know that it based on sample.

But i didnt find in the description of R anything about sampling.. (

http://stat.ethz.ch/R-manual/R-patch...s/html/sd.html)

Anyway, i did a lot of analysis using the R function and now i want to change my sciprt to work only with bash operations, but to have the same results...

I found:

Code:

awk '{sum+=$1; sumsq+=$1*$1}END{print sqrt(sumsq/NR - (sum/NR)**2)}' vec

and

Code:

awk '{sum+=$1; array[NR]=$1} END {for(x=1;x<=NR;x++){sumsq+=((array[x]-(sum/NR))**2);}print sqrt(sumsq/NR)}' vec

and many others..

But they all calculate the sd different than R, and their output is:

49048.9

I guess the result is different because those are not sampling the data as R do..

And this is a big difference for analysis..

Any idea on how to have the same results?

BTW , the vector that i showed for example is smaller version of what i actually have. My vectors are 600+ values.

Thanks!