LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 05-10-2016, 12:10 PM   #1
mpapet
Member
 
Registered: Nov 2003
Location: Los Angeles
Distribution: debian
Posts: 548

Rep: Reputation: 72
Applied Math Problem: Identify Outlier in Numeric Trend


I'm a DIY programmer working on a GIS topic in PHP.

I've got an array of elevations in meters, in a time series. The elevation values are sampled frequently and definitely trend slowly up/down. It's from a garmin gps and sometimes the values written cannot be possible.

Good set: 33.33, 33.35, 33.40, 33.41, 33.41, 33.41, 33.39

Bad set: 33.33, 33.35, 36.8, 38.9, 44.00, 33.41, 33.39

In the bad set, a road cannot climb 3 meters in about 1 second, and then suddenly fall down. GPS collection isn't perfect in these small devices.

I tried a z-score with a standard deviation and it sort of works, but the z-score differences are small.

Does someone know of a trend analysis type algorithm that would easily identify those bad values? I was looking at a low-pass filter, but it's not clear if that is overkill, or even a good tool.

If it's not obvious, I am not mathematically talented.

Last edited by mpapet; 05-10-2016 at 03:28 PM.
 
Old 05-10-2016, 12:39 PM   #2
mostlyharmless
Senior Member
 
Registered: Jan 2008
Distribution: Arch/Manjaro, might try Slackware again
Posts: 1,851
Blog Entries: 14

Rep: Reputation: 284Reputation: 284Reputation: 284
For your particular problem, I wouldn't approach it statistically or with a filter: just do it the way you identified it. Put a limit on the physically possible. Flag any values where the change is implausible, say, greater than 0.25 meters. You can change the threshold number with some experimentation.

[edit] Strictly speaking, what I have suggested is a filter, though simple. I am a big believer in less processing of data with fewer assumptions. I did assume your data was a fixed time interval time series from your description.

Last edited by mostlyharmless; 05-10-2016 at 03:07 PM.
 
1 members found this post helpful.
Old 05-10-2016, 01:16 PM   #3
astrogeek
Moderator
 
Registered: Oct 2008
Distribution: Slackware [64]-X.{0|1|2|37|-current} ::12<=X<=15, FreeBSD_12{.0|.1}
Posts: 6,269
Blog Entries: 24

Rep: Reputation: 4196Reputation: 4196Reputation: 4196Reputation: 4196Reputation: 4196Reputation: 4196Reputation: 4196Reputation: 4196Reputation: 4196Reputation: 4196Reputation: 4196
Mostlyharmless' suggestion of placing some limit, or error band on possible values is sound, but your description is lacking in enough detail to make a good choice for those limits.

You say that the samples are a time series, "sampled frequently", but you need to define that sample frequency much better, as well as how that relates to displacement in space, or speed.

If the samples are in linear time, once per second for example, then how much change in elevation is too much depends critically on speed.

On the other hand, if samples are ticked off by distance such as one samaple per X-rotations of a wheel, then it is not a time series at all, speed does not appear in the data and it would be somewhat easier to say how much is too much.

There is also the possibility that samples are neither time nor distance related, such as if the driver is told to take a sample at each intersection, in which case you can't easily pick an error limit as it would then be route dependent!
 
Old 05-10-2016, 02:43 PM   #4
michaelk
Moderator
 
Registered: Aug 2002
Posts: 25,745

Rep: Reputation: 5924Reputation: 5924Reputation: 5924Reputation: 5924Reputation: 5924Reputation: 5924Reputation: 5924Reputation: 5924Reputation: 5924Reputation: 5924Reputation: 5924
Based upon "GPS data in a time series" I assume it is just linear time maybe 1 sample every 1-5 seconds but I could be worng.

Is this a GPS receiver that just logs position data to a file that you download or are you recording "raw" NMEA messages. If the later is confusing then never mind. Just curious because if you were recording NMEA messages then you could possibly check for altitude accuracy at that time and throw out that value without having to do much extra post processing.
 
Old 05-10-2016, 03:11 PM   #5
mpapet
Member
 
Registered: Nov 2003
Location: Los Angeles
Distribution: debian
Posts: 548

Original Poster
Rep: Reputation: 72
Quote:
Originally Posted by astrogeek View Post
If the samples are in linear time, once per second for example, then how much change in elevation is too much depends critically on speed.
We're talking about conventional roads here and high frequency sample rates like 1 per second. There's no road where you gain/lose 1 meter or more in just a second.

I think I'm going to do a moving average as that will catch the bad readings, then I can replace the bad reading with the moving average.
 
Old 05-10-2016, 03:21 PM   #6
mpapet
Member
 
Registered: Nov 2003
Location: Los Angeles
Distribution: debian
Posts: 548

Original Poster
Rep: Reputation: 72
Quote:
Originally Posted by michaelk View Post
Based upon "GPS data in a time series" I assume it is just linear time maybe 1 sample every 1-5 seconds but I could be worng.

Is this a GPS receiver that just logs position data to a file that you download or are you recording "raw" NMEA messages. If the later is confusing then never mind. Just curious because if you were recording NMEA messages then you could possibly check for altitude accuracy at that time and throw out that value without having to do much extra post processing.
At the moment I'm using a mobile phone app that writes your track to .gpx file. Bicyclists and runners use standalone GPS devices from brands like Garmin. It's the same idea, only using my mobile phone.

I am checking altitude accuracy using USGS data. The altitude written into the .gpx file from my mobile phone is very wrong. But, checking *every* point when the file is one line per second is not really useful and time consuming. My idea is to check a few points and adjust the .gpx data to get close enough.

Eventually, I'll do the same adjustments for the occasional bad longitude/latitude values. That's a different problem though.

Thank you for your interest.

Last edited by mpapet; 05-10-2016 at 03:23 PM.
 
Old 05-10-2016, 06:49 PM   #7
michaelk
Moderator
 
Registered: Aug 2002
Posts: 25,745

Rep: Reputation: 5924Reputation: 5924Reputation: 5924Reputation: 5924Reputation: 5924Reputation: 5924Reputation: 5924Reputation: 5924Reputation: 5924Reputation: 5924Reputation: 5924
Just as an FYI but navigation systems use a Kalman filter to get rid of outlying sensor data. Much to complicated for your project.

http://bilgin.esme.org/BitsAndBytes/...lterforDummies
 
Old 05-11-2016, 08:09 AM   #8
rtmistler
Moderator
 
Registered: Mar 2011
Location: USA
Distribution: MINT Debian, Angstrom, SUSE, Ubuntu, Debian
Posts: 9,883
Blog Entries: 13

Rep: Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930
Quote:
Originally Posted by mpapet View Post
We're talking about conventional roads here and high frequency sample rates like 1 per second. There's no road where you gain/lose 1 meter or more in just a second.
Sure there are road conditions where you can gain/lose 1 meter in less than a second, you're not considering velocity here are you? You can sample GPS faster; however it also depends if the correlators in the device will provide you with output faster. We've used 5 Hz and 10 Hz on devices, but there are faster ones, just not commercial. I recommend something using a binary protocol like OSP or other proprietary binary protocols so you can get much more detailed information. However this all sort of falls apart because commercial GPS promises +/- 3 meter accuracy.
Quote:
Originally Posted by michaelk View Post
Just as an FYI but navigation systems use a Kalman filter to get rid of outlying sensor data. Much to complicated for your project.

http://bilgin.esme.org/BitsAndBytes/...lterforDummies
Agreed, and whether or not you're looking directly at the messages, those messages are already past the correlators in the device. There are no GPS devices where you can see their raw data as received from the satellites, they already have their filter in the mix, because that is their product.

You might want to look into one of the highly accurate GPS devices which claim to use dead reckoning. However my caveat is that while we've "tried", two or "ten" things interfere. Firstly, the accuracy of the accelerometers on a phone is horrible and they'll want 16-bit accelerometer accuracy, in 10 dimensions. staticXYZ, magXYZ, gyroXYZ, and for your case, barometric. Secondly, all the vendors who promise this, somehow can't give me a demo/devkit board which actually works. They just tell me, "Oh, we've used the blah-blah chip with our GPS module, ... that works." and then they never answer the phone or email again. You need to calibrate the position sensors, ensure that there are no magnetic interferences (and on a road, there always are, cars/trucks, sign posts, iron content in the soil, underground pipes). And you'll need to control the temperature with enough stability of get the gyros accurate.

Last edited by rtmistler; 05-11-2016 at 08:12 AM.
 
Old 05-11-2016, 08:36 AM   #9
michaelk
Moderator
 
Registered: Aug 2002
Posts: 25,745

Rep: Reputation: 5924Reputation: 5924Reputation: 5924Reputation: 5924Reputation: 5924Reputation: 5924Reputation: 5924Reputation: 5924Reputation: 5924Reputation: 5924Reputation: 5924
Now your talking about an inertial navigation package... GPS is just an input and not the main source.
 
Old 05-11-2016, 12:39 PM   #10
mpapet
Member
 
Registered: Nov 2003
Location: Los Angeles
Distribution: debian
Posts: 548

Original Poster
Rep: Reputation: 72
Quote:
Originally Posted by mostlyharmless View Post
For your particular problem, I wouldn't approach it statistically or with a filter: just do it the way you identified it. Put a limit on the physically possible. Flag any values where the change is implausible, say, greater than 0.25 meters. You can change the threshold number with some experimentation.

[edit] Strictly speaking, what I have suggested is a filter, though simple. I am a big believer in less processing of data with fewer assumptions. I did assume your data was a fixed time interval time series from your description.
This is what I'm going to do. I'm time constrained and I too prefer the simplest answer. I'm using consumer level gps devices and that's all. I'll do a moving average as that isn't hard to do with a small array. array_push/array_pop

Last edited by mpapet; 05-11-2016 at 12:42 PM.
 
Old 05-12-2016, 08:38 AM   #11
sundialsvcs
LQ Guru
 
Registered: Feb 2004
Location: SE Tennessee, USA
Distribution: Gentoo, LFS
Posts: 10,671
Blog Entries: 4

Rep: Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945
You can also use other statistics like "Standard Deviation."

You might wish to calculate the distance (Pythagorean Theorem ...) from one point to the next, and consider whether it just had you moving several miles in a few seconds' time. If so, consider larger ranges until the distance ... the velocity of the traveler ... becomes "plausible" again. The intermediate points within that range might be outliers.
 
  


Reply

Tags
algorithm



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] [Solved] spamassassin error: Argument perl_version isn't numeric in numeric ge (>=) carltm CentOS 12 12-02-2014 08:23 AM
Perl Math::Bezier returns Math::Bizarre output captainentropy Programming 3 10-09-2013 09:00 PM
How to identify numeric Input pinga123 Linux - Newbie 4 10-29-2009 08:50 PM
LXer: Statistical outlier in the MTBF LXer Syndicated Linux News 0 06-08-2009 08:50 PM
problem in comparing numeric with string naren_0101bits Programming 1 01-28-2008 08:10 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 12:52 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration