Share your knowledge at the LQ Wiki.
 Home Forums HCL Reviews Tutorials Articles Register Search Today's Posts Mark Forums Read
 LinuxQuestions.org Need computationally fast approximation to standard deviation.
 Programming This forum is for all programming questions. The question does not have to be directly related to Linux and any language is fair game.

Notices

 11-21-2008, 12:51 PM #2 pixellany LQ Veteran   Registered: Nov 2005 Location: Annapolis, MD Distribution: Arch/XFCE Posts: 17,802 Rep: Wow!! Quite a screenful. I have essentially no experience with finding optimum algorithms for computer programming. All I can say is that the formula for standard deviation is pretty simple. It's not obvious why it would represent a huge processing load. If speed is an issue, my instinct is that you would want to keep a running tally of Summation(X^2) and then take the root when you are done. Sorry--I don't have a "sigma" key.
 11-21-2008, 01:41 PM #3 H_TeXMeX_H LQ Guru   Registered: Oct 2005 Location: \$RANDOM Distribution: slackware64 Posts: 12,928 Blog Entries: 2 Rep: There may be some hints here: http://en.wikipedia.org/wiki/Algorit...ating_variance note: variance = stdev * stdev Also, if you know assembly, that would help.
11-21-2008, 02:02 PM   #4
jiml8
Senior Member

Registered: Sep 2003
Posts: 3,171

Original Poster
Rep:
Quote:
 Originally Posted by pixellany Wow!! Quite a screenful. I have essentially no experience with finding optimum algorithms for computer programming. All I can say is that the formula for standard deviation is pretty simple. It's not obvious why it would represent a huge processing load. If speed is an issue, my instinct is that you would want to keep a running tally of Summation(X^2) and then take the root when you are done. Sorry--I don't have a "sigma" key.
Yeah, I have been thinking about exactly that. In fact, when I compute the RMS power from the I and Q components, I wind up with those squares - and I take the square root to return the power.

But I then turn around and do a lot of filtering before I do my detection work. I have been looking at the implications of filtering power squared rather than power and I have to do some math to see what it all means.

It does appear to make sense to not take the square root, filter the hell out of it, then square it again, sum it, and square root it again...if I can make it work and show how it is OK. My client is good at this; I'll have to be able to justify it.

I am just wondering if there is a quick algorithm - something that gets me away from those square root computations.

11-21-2008, 02:31 PM   #5
jiml8
Senior Member

Registered: Sep 2003
Posts: 3,171

Original Poster
Rep:
Quote:
 Originally Posted by H_TeXMeX_H There may be some hints here: http://en.wikipedia.org/wiki/Algorit...ating_variance note: variance = stdev * stdev Also, if you know assembly, that would help.
That page does have some interesting ideas...

And I do know assembly, but will avoid it in this application if at all possible. We do have to upgrade these systems in the future, and assembly is very non-portable.

 11-21-2008, 04:06 PM #6 H_TeXMeX_H LQ Guru   Registered: Oct 2005 Location: \$RANDOM Distribution: slackware64 Posts: 12,928 Blog Entries: 2 Rep: You don't have to use assembly, but it might be faster if you know what you're doing. But if it must be portable then don't use it.
 11-21-2008, 04:45 PM #7 jlinkels Senior Member   Registered: Oct 2003 Location: Bonaire Distribution: Debian Wheezy/Jessie/Stretch/Sid, Linux Mint DE Posts: 4,631 Rep: Splendid idea to use the statistical value of the amplitude and see if it changes over time. If your problem is squaring and square-rooting, why not use look-up tables for these functions. If you need more accuracy, you could even linear-interpolate between functions. Much more accuracy you would probably not need. jlinkels
11-21-2008, 05:54 PM   #8
salasi
Senior Member

Registered: Jul 2007
Location: Directly above centre of the earth, UK
Distribution: SuSE, plus some hopping
Posts: 4,064

Rep:
some random-ish thoughts...

Quote:
 Originally Posted by jiml8 I am particularly interested in the case where a broadband carrier is present, and a narrowband carrier appears in part of the spectrum occupied by the broadband carrier.
I don't understand what you mean by a broadband carrier; a carrier is inherently a narrowband signal*, except for spread spectrum, for which you have either a simultaneous array of narrow band signals or a succesion of narrow band signals, depending.

(* This comment is incorrect, if you include flaws in the transmission equipment; probably 'should be' should replace 'is')

Quote:
 Now, the broadband carrier commonly does not have a uniform spectral density (the exact spectral distribution depends on the type of modulation and the type of data it carries), so its amplitude is not constant across the envelope. I have observed variances of as much as 8 dB from peak to valley in one carrier that I have been observing.
OK, you seem to be using the word 'carrier' in a non-standard way to mean the whole signal, and not just the non-encoding part of the signal; is that the cause of my confusion? Could you also say what the modulation scheme is, please?
Quote:
 The appearance of a narrowband carrier within the broadband envelope should be represented as a +3dB shift in the power amplitude within that band. Thus, should a narrowband carrier appear in what is a valley in the broadband carrier, the +3dB shift might not bring the resultant power up to the level of higher power density components of the broadband carrier, and I therefore will have to do some work to detect it.
Could you take moving averages (integer numbers, for preference) for each bin, and when the ratios (to previous values? to the rest of the values?) of certain bins suddenly pop up, would that be sufficiently reliable, or would it be too noise prone?

Quote:
 Briefly, an L-band tuner feeds I and Q signals to a digital signal processor which digitizes the data then performs an FFT. The output of the FFT is fed across the PCI bus to the host computer, which is running Linux. I do not have control of the algorithms on this DSP, though I have made a proposal to my client which would have him using a product of ours (mine and an associate) where we COULD move more processing into the DSP. But for now I am stuck.
Not immediately relevant, but does the FFT include any windowing (eg, hamming)?

Quote:
 At the present time, I am getting one complete FFT every 5 ms, but my computation overhead means that I am spending 45 ms on each pass through my processing loop. I would like to speed this up, and a faster processor isn't an option at this point because this is a drop-in second generation code for a deployed system.
The obvious suggestion to speed up the floating point math part on a PC, is not to do the floating point math part on a PC!

It will be faster to do the math in some fixed point format; floating point is a whole lot more forgiving when it comes to dynamic range issues and avoiding dynamic range issues with fixed point takes real care.

This may not be a big issue, if you are dealing with a ratio; if you scale the numbers so that the main signal is somewhere in the middle of the dynamic range (approximately), if you do get something that overloads, that is a bad sign...but is that bad sign necessarily the bad sign that you are looking for, or could it be some other effect (eg, broadband interference)?

Quote:
 The problem is that a standard deviation is computationally VERY expensive.
Naturally. But, remember, it has been your choice to attack this problems via standard deviations. You could consider a simple approximation to the square roots, but I'd guess that this won't help as much these days as it once did (fp being a lot faster than it once was). But you'd only really know definitively the answer to that, if you tried it.
Quote:
 I can ease the pain by storing intermediate results in RAM (even though this will cost a LOT of RAM),
I would have expected that this was the only workable possibility (storing anything to disk at every sample is going to be a disaster), and it needs that amount of ram that it needs... Of course, variables that are integers would take up less space than floating point numbers.

 11-21-2008, 07:19 PM #9 ta0kira Senior Member   Registered: Sep 2004 Distribution: FreeBSD 9.1, Kubuntu 12.10 Posts: 3,078 Rep: If you actually use σ then you must have some sort of finite interval, i.e. you can't expand your interval to infinity. That implies that older data should somehow be less relevant; therefore, maybe you can keep two running pseudo-statistics based on an exponential decay function. The first thing that comes to mind is something I use to detect departures from consistent activity (though not for signals or hardware,) which essentially comes down to this: Code: ``` -Ct I = e I ≡ importance C ≡ a constant t ≡ elapsed time``` Using something like that (probably with the Δt since the last update,) you can find something comparable to a time-weighted average with: Code: ` = ·I + xnew·(1 - I)` As far as a standard deviation, you might perform that in the same manner with something like this: Code: `σnew^2 = σold^2·I + (x - )^2·(1 - I)` ta0kira Last edited by ta0kira; 11-22-2008 at 03:54 PM.
11-23-2008, 05:41 PM   #10
jiml8
Senior Member

Registered: Sep 2003
Posts: 3,171

Original Poster
Rep:
Lookup tables?

Quote:
 Originally Posted by jlinkels Splendid idea to use the statistical value of the amplitude and see if it changes over time. If your problem is squaring and square-rooting, why not use look-up tables for these functions. If you need more accuracy, you could even linear-interpolate between functions. Much more accuracy you would probably not need. jlinkels
I had given no thought at all to lookup tables. Idea never occurred to me.

How delightfully primitive!

However, I could generate it at initialization time and if I wanted square roots of -30.0 dB squared to -90 dB squared, at 0.1 dB intervals, that would only take about 300K of 32 bit numbers. It would save a lot of processing time...

I am going to play with this idea a bit...

 11-23-2008, 06:48 PM #11 jiml8 Senior Member   Registered: Sep 2003 Posts: 3,171 Original Poster Rep: A lookup table turns out to be very simple to generate. The attached program is a standalone that works. I will integrate the two subroutines into my code and do some timing experiments a bit later on. This looks like it will be pretty fast though. Actually, I can't see any approach being faster, and the accuracy appears to be more than I really need. To play with this, compile it using the -lm option to gcc then specify a value between 900 and 8100 and it will give the square root. Compare with a calculator and see that this is pretty good. Code: ```#include #include #include #include #include #include #include #include float *create_sq_rt(void); float get_sqrt(float *,double); int main(int argc, char *argv[]) { float *sqrttable,ans; double useval; if(argv[1] != 0x0) { useval = atof(argv[1]); if(useval >= 900 && useval <= 8100) { sqrttable = create_sq_rt(); ans = get_sqrt(sqrttable,useval); printf("answer is %f\n",ans); if(sqrttable)free(sqrttable); } } } float *create_sq_rt() { float *sqrttable; double ii; //range is -30^2 to -90^2 by tenths. //since 30^2 is 900 and we want integer indexes, we multiply by 10 // thus we start with index 9000, then 900.1 is 9001, etc. To start // array index at 0 we subtract 9000 sqrttable = malloc(72000*sizeof(float)); for(ii=900.0;ii<8100.0;ii+=.1) { sqrttable[(int)((ii*10)-9000)] = (float)sqrt(ii); } return sqrttable; } float get_sqrt(float *sqrttable,double square) { int sqint; sqint = (int)((square * 10)-9000); return sqrttable[sqint]; }```
 11-23-2008, 07:01 PM #12 Quakeboy02 Senior Member   Registered: Nov 2006 Distribution: Debian Squeeze 2.6.32.9 SMP AMD64 Posts: 3,318 Rep: Jim, would it be faster to convert everything up front to integers to allow you to use solely integer math after the conversion process?
11-23-2008, 07:04 PM   #13
jiml8
Senior Member

Registered: Sep 2003
Posts: 3,171

Original Poster
Rep:
Quote:
 Originally Posted by salasi some random-ish thoughts... I don't understand what you mean by a broadband carrier; a carrier is inherently a narrowband signal*, except for spread spectrum, for which you have either a simultaneous array of narrow band signals or a succesion of narrow band signals, depending. (* This comment is incorrect, if you include flaws in the transmission equipment; probably 'should be' should replace 'is') OK, you seem to be using the word 'carrier' in a non-standard way to mean the whole signal, and not just the non-encoding part of the signal; is that the cause of my confusion? Could you also say what the modulation scheme is, please?
Yes, I am following my client's somewhat sloppy use of the language here.

Quote:
 Could you take moving averages (integer numbers, for preference) for each bin, and when the ratios (to previous values? to the rest of the values?) of certain bins suddenly pop up, would that be sufficiently reliable, or would it be too noise prone?
I don't see any way ratios help here, when I have to look at absolute power.

Quote:
 Not immediately relevant, but does the FFT include any windowing (eg, hamming)?
Of course.

Quote:
 The obvious suggestion to speed up the floating point math part on a PC, is not to do the floating point math part on a PC! It will be faster to do the math in some fixed point format; floating point is a whole lot more forgiving when it comes to dynamic range issues and avoiding dynamic range issues with fixed point takes real care.
I cannot avoid floating point. I have taken a hard look at that issue.

Quote:
 I would have expected that this was the only workable possibility (storing anything to disk at every sample is going to be a disaster), and it needs that amount of ram that it needs... Of course, variables that are integers would take up less space than floating point numbers.
This DSP board brings with it a hard memory limit of 896 Megs. This could become a serious problem, and I have to keep it in mind at all points. I am working on relieving that problem, but so far the vendor has been distinctly unhelpful. This is lending some strength to my attempt to get the client to go with our (my associate and me) solution, which would give me a lot more flexibility AND a bigger piece of the pie .

11-23-2008, 07:11 PM   #14
jiml8
Senior Member

Registered: Sep 2003
Posts: 3,171

Original Poster
Rep:
Quote:
 Originally Posted by ta0kira If you actually use σ then you must have some sort of finite interval, i.e. you can't expand your interval to infinity. That implies that older data should somehow be less relevant; therefore, maybe you can keep two running pseudo-statistics based on an exponential decay function. The first thing that comes to mind is something I use to detect departures from consistent activity (though not for signals or hardware,) which essentially comes down to this: Code: ``` -Ct I = e I ≡ importance C ≡ a constant t ≡ elapsed time``` Using something like that (probably with the Δt since the last update,) you can find something comparable to a time-weighted average with: Code: ` = ·I + xnew·(1 - I)` As far as a standard deviation, you might perform that in the same manner with something like this: Code: `σnew^2 = σold^2·I + (x - )^2·(1 - I)` ta0kira
I am not interested in the intelligence in the signal, just the appearance and disappearance of the envelope. Thus, though I am indeed doing time-weighted averaging, I am not permitting a decay function as I would if I were interested in the signal content; a sample 1 second ago is as relevant as the most recent sample, In fact, handling the statistics when the signal routinely displays major changes in spectral distribution is giving me a bit of heartburn, particularly where I have to identify this collision case. Such a change will show as a larger standard deviation in that signal bin, which decreases the detectability of the new signal.

Right now, I am just trying to determine whether I can reliably detect the appearance of a new signal in an occupied band (which is a problem), and I am trying to determine what kind of false alarm rate will be acceptable, and how I can filter it out upstream.

Also, I am trying to come up with a very fast way to do this...

11-23-2008, 07:19 PM   #15
jiml8
Senior Member

Registered: Sep 2003
Posts: 3,171

Original Poster
Rep:
Quote:
 Originally Posted by Quakeboy02 Jim, would it be faster to convert everything up front to integers to allow you to use solely integer math after the conversion process?
I keep looking at that, and I would love to do it.

But at this point it is not at all obvious to me that I can do it so I am continuing with floating point.

First you make it work, then you optimize it - and it happens that what I am working on here is groundbreaking; no one else is doing what we are doing. The approach was recently patented by my client and I see no showstoppers in it though I do see a number of issues - such as the one being discussed here.

So for now I'll minimize extraneous issues (such as doing integer math where it is naturally floating point) and I'll make it work. Then I'll see if I can speed it up that way.

 Posting Rules You may not post new threads You may not post replies You may not post attachments You may not edit your posts BB code is On Smilies are On [IMG] code is Off HTML code is Off Forum Rules

 Similar Threads Thread Thread Starter Forum Replies Last Post LXer Syndicated Linux News 0 09-16-2007 07:41 PM LXer Syndicated Linux News 0 03-11-2007 08:16 PM mshinska Programming 5 10-25-2005 11:03 PM lanczer Linux - Software 6 09-19-2005 02:10 PM allelopath Linux - Software 2 02-04-2005 02:36 PM

LinuxQuestions.org

All times are GMT -5. The time now is 05:35 PM.

 Contact Us - Advertising Info - Rules - LQ Merchandise - Donations - Contributing Member - LQ Sitemap -