Help answer threads with 0 replies.
Go Back > Blogs > rtmistler
User Name


Rate this Entry

USB Receive and Rapid Parse, GPS Example

Posted 06-11-2013 at 08:17 AM by rtmistler

This is more of a general programming example which can be applied to other systems, in addition to Linux.

This is a discussion of an application programming scenario where a GPS device is attached via a serial USB connection to the system. Challenges faced in this project include:
  • Detecting a USB Serial Resource on your System
  • Configuring and Controlling a USB Serial Resource
  • Processing Receive Data from the USB Serial Resource
  • De-Framing GPS Data
  • Parsing GPS Data
Further to this is general understanding of GPS data packets. For the purposes of this discussion I'm considering NMEA based GPS packets.
General Overview of GPS Data
Before I begin, I'll briefly overview GPS packet format so that you, the reader can understand the eventual parse algorithm. Note first that GPS data does not have to show up as listed in the above references, in fact with the advent of the GLONASS ( system, the packets have expanded. Further, many manufacturers have custom protocol schemes, some of which are binary. The reasons for this are that binary can more easily include raw data from the satellites so that the user of their chips can develop more customized location fix solutions; should they desire that capability. For the purposes of this discussion, I'm limiting it to ASCII based messages from a device. And further, you'll see examples citing how messages such as these are decoded, but my intentions here are less to show GPS packet processing, and rather instead to give an example of how to take this scope of protocol project and illustrate how to accomplish a similar task.

GPS packets follow a general set of rules, or syntax.
  • All Packets Start with $
  • All Packets End with CR LF
  • Before the End of Packet is a NMEA Checksum, which is two characters prefixed with *
  • The Checksum is an Exclusive OR of all Bytes Between the $ and *

Sample GPS Packets
  • $GPRMC,181044.400,A,4237.26664,N,07142.50176,W,0.0,0.0,220513,0.0,W*64
  • $GPGGA,181044.400,4237.26664,N,07142.50176,W,1,13,0.8,098.47,M,-33.9,M,,*59
  • $GNGSA,A,3,31,23,16,03,06,20,13,32,,,,,1.8,0.8,1.7*2B
  • $GNGSA,A,3,75,66,76,65,85,,,,,,,,1.8,0.8,1.7*26
  • $GPGSV,3,1,12,03,18,184,31,06,30,164,45,13,20,316,44,14,01,143,*7C
  • $GPGSV,3,2,12,16,71,235,46,20,22,253,43,23,44,307,46,27,38,145,*74
  • $GPGSV,3,3,12,29,18,043,22,31,42,080,43,32,22,224,28,30,00,000,46*7A
  • $GLGSV,4,1,13,74,25,066,22,66,32,311,44,76,29,289,38,75,59,003,38*60
  • $GLGSV,4,2,13,65,39,240,39,84,34,061,28,83,06,016,,67,00,347,*69
  • $GLGSV,4,3,13,72,07,194,,72,38,086,40,86,04,161,,85,33,119,39*68
  • $GLGSV,4,4,13,72,07,194,,,,,,,,,,,,,*59

As you can see, there's a lot of data within these packets. If one is merely interested in their location, then not much of this data is useful; however if details about the satellites being used for the fix, the strength of their signals, or other interesting info is of use to the programmer, then they would parse all of these messages to extract the data they require.

In this case, the data is arriving several times per second, over a 115,200 bps serial data link. Therefore rapid processing of the data is important.

Detecting a USB Serial Resource on your System
There are likely other ways to detect USB serial resources on your system. The way I do it has to do with the types of serial to USB converters which are used. Many of the devices I use employ FT232R serial driver There are other types of serial USB drivers, for instance the CP210 one The common point here is that these chips make a USB device appear in your system as a serial USB connection, and it gets mapped to /dev/ttyUSBx as a device type. Your kernel may not support these driver types, however most current ones, do. For more information seek your kernel config file to see what other options you have. In general, these two are common one's I've seen and they typically work with most kernels.

Remember that everything in Linux is a file. Therefore you can determine what serial USB resources you have by looking at the following file:
This file contains the details about any USB serial devices detected in your system. It gives the USB device resource you are mapped too, as well as the vendor ID and product ID numbers. These numbers allow you to know what type of device you have, so you can determine what baud rate you wish to set your port for, and also how you'll communicate with it. Here's a typical view of the usbserial file
# cat /proc/tty/driver/usbserial
usbserinfo:1.0 driver:2.0
0: module:ftdi_sio name:"FTDI USB Serial Device" vendor:0403 product:6001 num_ports:1
port:1 path:usb-0000:00:1d.1-1.2
1: module:ftdi_sio name:"FTDI USB Serial Device" vendor:0403 product:6008 num_ports:1
port:1 path:usb-0000:00:1d.1-1.4
What this file is saying are that there are two USB serial devices, both are FTDI, as seen by the printed strings saying "ftdi", as well as the vendor ID being 0x0403. Further, one of these is on resource 0, meaning /dev/ttyUSB0, and the other is on resource 1, /dev/ttyUSB1. How we tell is that product ID 0x6001 is on resource 0, because it's in that line, and product ID 0x6006 is on resource 1. Some foreknowledge needs to reside on your part to know that you have two distinctly different devices and how they'll identify their selves in your system.

Note also that if you do have two devices with matching vendor ID and product ID values, you then need further discrimination of the devices, if possible. For instance a serial number or other attribute which the usbserial driver reads and makes available to your programming space. Another situation may be that you have two devices attached with the exact same identities; however each behaves differently with the commands and data being transferred. However the scheme needs to be, you ultimately need to discriminate the correct ports in your system.

Configuring and Controlling a USB Serial Resource
Once your ports are detected, you need to connect to them, configure them, read from them, and write to them.

For the most part I've found that ports work best in RAW mode. I have played around with these settings a bit, only to find that if you're using the most typical N81 (no parity, 8-bit data, 1 stop bit) configuration; the main things to change are the speed and set for raw mode. Of course you can set the bits per character, parity, and stop bits; almost everything is N81, however I did have one case of odd parity; hence my sample code shows this capability.

Here are some sample code clips to open and configure your port. Note: I do these steps before I open the port for communications.

Set Port Rate on Serial USB Resource /dev/ttyUSB0 and Set for RAW Mode
#define SERIAL_4800_RATE    "4800"
#define SERIAL_9600_RATE    "9600"
#define SERIAL_19200_RATE   "19200"
#define SERIAL_38400_RATE   "38400"
#define SERIAL_57600_RATE   "57600"
#define SERIAL_115200_RATE  "115200"

int set_port_rate(char *port, char *rate)
    struct termios term_ios;
    speed_t new_rate;
    int fd;

    if(!strlen(port)) {
        my_log("set_port_rate: Invalid(NULL) port\n");
        return -1;
    if((fd = open(port, O_RDWR | O_NOCTTY)) < 0) {
        l_err = errno;
        sprintf(l_err_str, "%d:%s", errno, strerror(errno));
        my_log("set_port_rate: Error opening %s. %d(%s)\n", port, errno, strerror(errno));
        return -1;
    /* My app and devices support only 115200 and 38400 rates */
    if(!strncmp(rate, SERIAL_115200_RATE, strlen(SERIAL_115200_RATE))) {
        new_rate = B115200;
    else {
        new_rate = B38400;

    memset(&term_ios, 0, sizeof(term_ios));

    /* Read the port settings as they are now */
    tcgetattr(fd, &term_ios);

    /* Take the port settings and convert it to be RAW mode */

    /* Configure the speed to match what was requested */
    cfsetspeed(&term_ios, new_rate);

    /* Flush the port and write the new configuration */
    tcflush(fd, TCIFLUSH);
    tcsetattr(fd, TCSANOW, &term_ios);

    return 0;
Set Port Parity Example
int set_port_parity(char *port, tcflag_t parity)
    struct termios term_ios;
    int fd;

    if(!strlen(port)) {
        my_log("set_port_parity: Invalid(NULL) port\n");
        return -1;
    if((fd = open(port, O_RDWR | O_NOCTTY)) < 0) {
        l_err = errno;
        sprintf(l_err_str, "%d:%s", errno, strerror(errno));
        my_log("set_port_parity: Error opening %s. %d(%s)\n", port, errno, strerror(errno));
        return -1;

    memset(&term_ios, 0, sizeof(term_ios));
    tcgetattr(fd, &term_ios);
    term_ios.c_cflag |= (PARENB | parity);
    tcflush(fd, TCIFLUSH);
    tcsetattr(fd, TCSANOW, &term_ios);

    return 0;
Examples of Port open(), read(), and write() functions

    /* Device_Name is a string such as "/dev/ttyUSB0" */
    if((portHandle = open(Device_Name, O_RDWR | O_NONBLOCK)) == -1) {
        my_debug("Failed to open port %s:%d:%s\n", Device_Name, errno, strerror(errno));
    char inBuf[4096];   // Most USB serial port implementations use a max of 4096 for the buffer
    char *inPtr;
    int retVal;

    inPtr = &inBuf[0];

    retVal = read(portHandle, inPtr, sizeof(inBuf));

    /* 0 means no data - errno can indicate 2 or 11 and not be a problem, other values are bad */
    /* -1 means error, check errno */
    /* >0 means you got data at inPtr, which is the start of inBuf and the value indicates how much data you got */
    if(write(portHandle, "?", 1) < 0) {
        /* Check errno for EINTR and you can try again */
        /* Otherwise an error, note it and see if the port has failed or something else */
Processing Receive Data from the USB Serial Resource
What do you do with the data once you receive it? And how do you architect your application to persistently read from this port, as well as parse the data for use?

Points to note are:
Most kernel usbserial implementations support 4096 byte buffer size. This information derived from viewing the driver sources. You may change it, however I have never found a case where this is required, at least for serial data. In fact, if you find that you're receiving that much data per receive, then you are taking far too long to parse your data, in fact almost a whole second. The sample data shown above is at most 800 characters, therefore that data 5 times over, comes to 4,000 characters. A good application should not allow much data to accumulate in the USB receive buffer.

Earlier you've seen an example of a receive() call. Whatever data you do receive, you must process it fully before your next receive; otherwise your application will never keep up. Consider that the GPS device is continually sending data over the 115,200 bps serial port. And further, say that it is sending this data 5 times per second. As mentioned, this could maximize about 4,000 bytes per second if all messages were filled to capacity. Caching the data in the primary receive buffer may come to mind. Later I'll discuss this possibility for other architecture reasons. In the receiver logic, it never makes sense to be able to retain data which you could not parse in time before the buffer fills to capacity, because this would mean that your de-frame and parse routines are never able to keep up and no matter how much excess space you allocate for receiver overflow, you eventually will run out.

The most common form of persistent receive I use is by invoking the select() command to determine if there is data for the receive descriptor. If not, then my forever loop continues. The most common architecture I use in these cases is to query for receive, if there is any data, get it into my receiver buffer. Then I use a byte oriented parser to de-frame the data into individual messages.

De-Framing and Parsing the GPS Data
The de-framing process consists of taking the protocol "rules" out of the received data and extracting solely the GPS messages. I also consider the action of validating the checksum as part of the de-framing process. These definitions and rules are really my terms, they're based on my history and experience; however I do not wish the reader to be confused if they've seen references stating otherwise.

My de-framing process:
  • Looks for the Start of Message SOM which is the DOLLAR SIGN $
  • Processes the next FIVE characters, then determines if those characters match a known or desired GPS messsage
  • Searches for the End of Message EOM which is the ASTERISK *
  • Verifies the Checksum
  • Allows the Message Parser to process each message type
These descriptions are somewhat simplified, there are some details which are unique to each case, as well as this case. So I'll explain some of those details for your benefit.

As you can see from the GPS overview, there are a lot of message types. The first action one should take is to communicate with their GPS device and configure it so that it only sends the message types which you are interested in seeing it send out. Also configure it to send those messages at a rate which makes the best sense for your application. In my example case, I do use the GPRMC, GPGGA, GNGSA, GPGSV, and GLGSV messages. I further request that these be sent five times per second, or at a 5 Hertz (Hz) rate.

Next, if you observe the definition of the protocol, you'll see that the messages all start with $, and all end with CR LF, which are 0x0d 0x0a. Since the GPS device is always sending data when powered and configured, I synchronize with it by searching for te LF first and consider that character to be my SOM "Start Of Message". Why do this? Because the special characters of CR (EOM) and LF (SOM) are ASCII control characters; they are not encountered within a message. Given the definitions of the GPS protocol syntax, neither are the DOLLAR and ASTERISK, however by searching for the SOM first, gives me a chance to detect the sequence of SOM $ to give me full assurance that a new GPS packet is indeed being received. This also allows me to more easily detect a random fault and not have it cause too much data being discarded. For instance, say I receive a random SOM, but not one followed by a DOLLAR? My state machine therefore goes back to a point where it searches for the next SOM and checks each byte received, discarding all, until it detects the exact sequence of SOM-$.

Once I do receive this two character start sequence, my next action is to save the next 5 bytes, then expect that the 6th byte will be a COMMA. This follows the protocol syntax, all messages I'm seeking have 5-character names and after the message name, a COMMA is required. If this is violated, then the parser immediately goes back to the start state and searches again for the SOM-$ sequence. At the point where I've verified that I have 5 characters followed by a COMMA, I then check that sequence of 5 characters to see if it matches any of the message names I'm expecting. If not, back to the start state. Note that once the SOM-$ sequence is detected, I clear my checksum. As I store each of the next 5 characters, I verify that they are ASCII letters, because that's what they should be, and XOR each one's value into my checksum.

After verifying that the message is a known GPS message type and that syntax rules are being followed, I change state to verify, copy the data into a message specific parse buffer until I either see the ASTERISK or up to a length limit. I also add each received byte into the checksum value. Each GPS message can be up to a maximum length if all attributes are filled. As a result I have a storage buffer to add in new data until I have a full message, and also limit the copy if I never encounter the ASTERISK indicating that the message is complete.

In the normal case (good message) I verify each byte is within the acceptable ASCII range of patterns, XOR each byte into the checksum, and store each byte into my local buffer. Once I have reached the ASTERISK, I enter into a "receive and verify checksum" state which receives in the transmitted checksum and verifies that the computed checksum matches.

A short cut is to not invoke library calls to validate the ASCII integrity, but instead compare that the byte value is within the correct range of values; such as 0x20-0x7e.

Updating the computed checksum is trivial, it is a simple XOR operation. It is not necessary to have a copy state per message type, but instead retaining variables which describe the copy length limitations and decoded message type for recall once the copy is done and the correct message parse function can be called.

Retaining state, the received bytes, checksum, along with other data required is important. This code is what we call re-entrant; which means it can be interrupted by higher processes, and further, it can suspend it's processing until further data is available. As an example, you may receive "SOM$GPR" which is likely the beginning of a GPRMC message; however you have not received any further data beyond the "R". Next time you receive data from the serial port, you may get 150 characters and get the full remainder of the GPRMC message, as well as an entire next message of a different type. There are no guarantees; therefore each time you suspend processing of your data, you must be capable of understanding where you last left off. This is why we store locally to the state machine the actual message bytes and the checksum. The state machine "state" itself is also retained, which allows us to know whether or not we have passed the start sequence criteria properly. The limit on the copy length is so that if we do have an error in the data and do not receive the ASTERISK for a very long time, or in fact until the next message is complete, we'll have a chance at recovering our framing sequence if we detect that we've copied too much and then reset all state and storage to indicate that we have no message pending and need to detect a proper start sequence in order to be able to begin anew.

Sorry, this is a lot of verbiage which may be confusing, therefore I'm including a few visual examples for the reader's benefit.

Sample Good GPRMC

State machine states:
0 - Search for SOM
1 - Search for DOLLAR
2 - Search for COMMA
3 - Search for ASTERISK
4 - Search for EOM

State Machine Operation Description: Some bunch of characters are received, from 1 to n, process each character one at a time until you have no more received characters to process.

State 0: (Search for SOM)
If (character is SOM) change to State 1

State 1: (Search for DOLLAR)
If (character is DOLLAR) change to State 2, reset checksum, and clear copy buffer to store new message
else change to State 0

State 2: (Search for COMMA)
If (character is COMMA) verify command type
If (command OK) set message copy limit and change to State 3
else change to State 0
else if (copied character count > 5) change to state 0
else copy character to buffer, increase counter, XOR character into checksum

State 3 (Search for ASTERISK)
If (character is ASTERISK) change to State 4, reset received counter
else if (copied character count > message copy limit) change to state 0
else copy character to buffer, increase counter, XOR character into checksum

State 4 (Search for EOM)
If (character is EOM) verify checksum, (convert 2 bytes ASCII into single byte)**
If (checksum OK) process message
change to state 0
else if (character received count > 2) change to state 0
else copy characters to local checksum buffer, increment received counter

** Converting 2 byte ASCII to single byte.
Since the protocol is entirely visible/printable ASCII, the checksum is sent in expanded, printable form. Which is to say, if the checksum is 2A, you will see "2A", which is really 0x32 0x41, you need to convert those values to represent a byte value of 0x2a so when you compare it against your computed checksum, you can properly evaluate the outcome.

The example therefore runs out as:

LF Received, go to State 1
$ Received, go to State 2
, Received, evaluate command as GPRMC = OK, go to State 3
* Received, go to State 4
EOM Received, evaluated checksum is OK, process full GPRMC from the local buffer, and reset to search for the next SOM
An example of a failed message could be something like this


Here we would see a valid SOM followed by DOLLAR and further evaluate that GPRMC is a valid message type and start copying. Eventually it becomes a fragment and we never encounter the ASTERISK, but instead encounter a new SOM. This is something I've left out of the example logic, however the code does check for SOM, EOM, DOLLAR, things like that in unexpected locations. For instance in the case of the SOM in the unexpected location, the code will discard the GPRMC in progress and actually proceed to State 1 to search for DOLLAR, in this case it will see it. If ultimately the result is that the GNGSA is properly formed; once all received; then that will be processed. However the GPRMC here is corrupted, there is no possible way to recover it, that data is discarded.
At this point, once we've de-framed each message most of the work is complete. What you next do is take the byte buffer which contains your de-framed GPS message and extract the information you need out of it. These actions should be done as rapidly as possible, you still wish to get your process back to checking the receive buffer. An option here is to send the de-framed GPS message via a PIPE to another process for parsing versus blocking to do that processing inline.
Views 2854 Comments 0
« Prev     Main     Next »
Total Comments 0




All times are GMT -5. The time now is 09:23 PM.

Main Menu
Write for LQ is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration