LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   writing "\r" to /dev/ttyS* causes segmentation fault (https://www.linuxquestions.org/questions/programming-9/writing-%5Cr-to-dev-ttys%2A-causes-segmentation-fault-867477/)

tsg 03-09-2011 09:07 AM

writing "\r" to /dev/ttyS* causes segmentation fault
 
This is a C++ program I wrote years ago that has been chugging along fine with no problems until I recently upgraded from Slackware 13.0 to 13.1. Suddenly the process dies with a segfault. I recompiled against the new libraries and still no luck.

This is the offending bit of code:

Code:

int comm_handler::transmit(const byte* out, size_t len) {
        size_t written = 0;
        ssize_t count;
        int fail = FALSE;

        while ((written < len) && !fail) {
                count = write(fd,out+written,len-written);
                if (count == -1) {
                        syslog(LOG_ERR,"transmit failed: %s",strerror(errno));
                        fail = TRUE;
                }
                else
                        written += count;
        }
        if ((written < len) || fail)
                return -1;
        else
                return 1;
}

When it is called with out = "ATH0\r" and len = 5, I get a segfault. When I call it with out = "ATH0 " (note trailing space), it goes through. Unfortunately I need a carriage return at the end or the modem doesn't process the command.

Can anyone explain why a carriage return would cause a segfault?

dwhitney67 03-09-2011 09:44 AM

The code you presented looks fine, thus I wonder if this is really the source of the SEGV. Have you debugged the code, using gdb, so verify that all of the parameter values and class member data (ie. 'fd') are valid values?

tsg 03-09-2011 09:58 AM

My knowledge of gdb is limited, but I have used it to step through the function. It fails consistently on the write() function. I tried to debug it with a core dump but I my libraries don't have debugging symbols so I don't get much meaningful data. fd, as far as I can tell, is valid (at least, it's the same value returned by the "open" function earlier), as is out, and len.

I've used minicom on the serial port to communicate with the modem and that much seems to work. It's really messing with me because the only thing that's changed is a minor version of the operating system and I haven't seen anything in the changelogs that might indicate this would be an issue.

It also segfaults if out is "ATH0\n", "ATH0\r ", "\rATH0".

dwhitney67 03-09-2011 10:05 AM

When you call transmit(), are you hard-coding the string for 'out', or is it a variable that you are passing? If the latter, can you show how it is declared.

tsg 03-09-2011 10:30 AM

It's part of a loop that attempts to reset the modem:

Code:

  266          for (tries=0; tries < 3; tries++) {
  267                  syslog(LOG_DEBUG,"DEBUG: sending \"ATH0\"");
  268                  if (transmit((byte*)"ATH0\r",5) < 0 )
  269                          continue;
  270                  syslog(LOG_DEBUG,"DEBUG: waiting for \"OK\"");
  271                  if (waitfor((byte*)"OK\r",3,MODEM_TIMEOUT) <= 0)
  272                          continue;
  273                  syslog(LOG_DEBUG,"DEBUG: done waiting for \"OK\"");
  274                  done = TRUE;
  275                  break;
  276          }

Line 268 is where it crashes, but it also happens to be the first time the method is called.

For grins, I connected the same modem to another machine with the same operating system and had the same problem.

theNbomr 03-09-2011 11:04 AM

Wild-ass shot in the dark: something has made your literal strings look like multi-byte characters, or the system is incorrectly expecting them to be. I've never really understood the mechanisms behind all of that hocus-pocus, but is seems reasonable that it might occur with an OS upgrade.

--- rod.

tsg 03-09-2011 11:14 AM

Now that you mention it, during the install, the setup program asked me something about UTF. I didn't understand what it was asking (and can't remember what it said), but offered a "safe option" which I took. I wonder if that had anything to do with it.

dwhitney67 03-09-2011 11:17 AM

Check the setting of the LANG environment variable.

Also, are you certain there is not any problems (ie. buffer overflow) in syslog()?

tsg 03-09-2011 11:33 AM

Quote:

Originally Posted by dwhitney67 (Post 4284433)
Check the setting of the LANG environment variable.

for my account (the one I'm compiling the program in): LANG=en_US
for root (whose running the program): LANG is not set

Quote:

Also, are you certain there is not any problems (ie. buffer overflow) in syslog()?
I hadn't thought to look, but there is nothing in syslog or messages that looks at all helpful.

EDIT(2): sorry, I misunderstood. The syslog()'s were added after the SEGV showed up as a means of tracing it.

EDIT: It also doesn't seem to matter if write is passed a byte* or char*, it still crashes.

tsg 03-09-2011 12:08 PM

I will also accept suggestions for a better way to do this ie. write AT commands to a US Robotics modem.

dwhitney67 03-09-2011 01:07 PM

On my system, root has LANG set to en_US.UTF-8, as does the regular-user account.

I do not have a USR device to play with, much less any other device that uses the /dev/ttyS* interfaces. Thus when I attempt to send data to a device such as /dev/ttyS0, I always get an "Input/output" error.

theNbomr 03-09-2011 01:36 PM

What happens if you try sending individual characters/bytes sequentially?
Code:

    transmit((byte*)"A",1);
    transmit((byte*)"T",1);
    transmit((byte*)"H",1);
    transmit((byte*)"0",1);
    transmit((byte*)"\r",1);

Just trying to see whether it is the carriage return, or the carriage return in combination, or...?

What if you contrive your string data to be read from somewhere at runtime, rather than as embedded literal strings?

Did any permissions/ownership on your device pseudo-files change during the upgrade? Any changes applied in the udev system? What about the driver version for that serial port (really a stretch for something as tried & true as a standard serial port, but...?)

Do you fully initialize the serial port before using it (i.e. set all of the appropriate termios parameters); something new might be applying a setting that didn't exist before.

--- rod.

tsg 03-09-2011 01:50 PM

Quote:

Originally Posted by dwhitney67 (Post 4284541)
On my system, root has LANG set to en_US.UTF-8, as does the regular-user account.

Changing LANG doesn't seem to help.

tsg 03-09-2011 02:06 PM

I may have figured something out: it may not be the transmit that is choking it but the receive, hence the carriage return causing the issue. I have the modem configured not to echo any input, so it's not going to generate any output until the carriage return. That's run in a separate handler:

Code:

void comm_handler::io_handler(int status) {
        int res,x;
        byte buf[255];
        sigset_t newset, oldset;
        sigfillset(&newset);

        sigprocmask(SIG_BLOCK,&newset,&oldset);
        res = read(fd,buf,255);
        for (x=0; x<res;x++) {
                b.push(buf[x]);
        }
        sigprocmask(SIG_SETMASK,&oldset,NULL);
}

which is set up with:

Code:

void comm_handler::init_port() {
        io.sa_handler = comm_handler::io_handler;
        sigemptyset(&io.sa_mask);
        sigaddset(&io.sa_mask,SIGIO);
        sigaction(SIGIO,&io,NULL);
        fcntl(fd,F_SETOWN,getpid());
}

I have to plead ignorance to a lot of this since I grabbed it from an online tutorial somewhere and it worked; I don't understand a lot of what it is doing. I believe that init_port() registers io_handler() as a method to be called anytime there is input available at the serial port, triggered by a SIGIO signal. The sigprocmask stuff I don't have a clue what it's doing.

b is a buffer class that I wrote to contain the input data. I'll have to go digging in there to see if maybe I screwed something up, although it has been working for years.

This is getting deep, and I appreciate the help.

wje_lq 03-09-2011 02:35 PM

Quote:

Originally Posted by tsg (Post 4284479)
I will also accept suggestions for a better way to do this ie. write AT commands to a US Robotics modem.

Oh! Well, then.

It turns out I have one of those. A US Robotics Courier. Once a month I use it to dial out to the National Bureau of Standards folks to find out what time it is. I wrote a C program which does this. As long as I was doing it, I wrote (inside the source file) a 950-line comment on how to program your modem in C. Can't hurt to give it a look-see, right?

For at least the next two weeks, you can get it here.

tsg 03-09-2011 02:47 PM

This is my open_port function, in case it sheds any light...

Code:

int comm_handler::open_port(const char* modemdevice) {

        fd = open(modemdevice, O_RDWR | O_NOCTTY );
        if (fd == -1) {
                syslog(LOG_ERR,"Could not open comm port");
                exit(-1);
        }
        else {
                fcntl(fd,F_SETFL,O_ASYNC);
                tcgetattr(fd,&options);
                options.c_cflag |= (CLOCAL | CREAD);
                options.c_cflag &= ~PARENB;
                options.c_cflag &= ~CSTOPB;
                options.c_cflag &= ~CSIZE;
                options.c_cflag |= CS8;
                options.c_lflag &= ~(ICANON | ECHO | ECHOE | ISIG);
                options.c_oflag &= ~OPOST;
                options.c_iflag &= ~(ICRNL|IXON|IXOFF);
                options.c_cc[VMIN]=0;
                options.c_cc[VTIME]=0;
                cfsetispeed(&options,BAUDRATE);
                cfsetospeed(&options,BAUDRATE);
                tcsetattr(fd,TCSANOW,&options);
        }
        syslog(LOG_ERR,"comm_handler::open_port reports fd = %d",fd);
        return fd;
}


tsg 03-09-2011 03:15 PM

Quote:

Originally Posted by wje_lq (Post 4284634)
Oh! Well, then.

It turns out I have one of those. A US Robotics Courier. Once a month I use it to dial out to the National Bureau of Standards folks to find out what time it is. I wrote a C program which does this. As long as I was doing it, I wrote (inside the source file) a 950-line comment on how to program your modem in C. Can't hurt to give it a look-see, right?

For at least the next two weeks, you can get it here.

Cool. Grabbed a copy and am going through it now. Thanks alot!

tsg 03-09-2011 03:30 PM

Now I'm gettting these in my syslogs...

Code:

segfault at 2e ip 0000002e sp bfeceb80 error 4 in alarmd[8048000+11000]
"alarmd" is the name of the executable. I have no idea what anything else means.

theNbomr 03-09-2011 06:01 PM

Another thought. After you did the upgrade, did you reboot? Is it possible that some of the machinery that makes the serial port work is now out of sync with the kernel/driver/libraries, and needs a reboot to get things right again? That wouldn't explain why mionicom seems to be immune to the problem, but easy enough to try, even if it is a bit Windows-ish.

Pretty sure the 'ip' and 'sp' in your error message are the Instruction Pointer and Stack Pointer registers, respectively. You'd find those useful, I think, if you had a core dump to analyze.

--- rod.

tsg 03-10-2011 08:10 AM

Quote:

Originally Posted by theNbomr (Post 4284842)
Another thought. After you did the upgrade, did you reboot? Is it possible that some of the machinery that makes the serial port work is now out of sync with the kernel/driver/libraries, and needs a reboot to get things right again? That wouldn't explain why mionicom seems to be immune to the problem, but easy enough to try, even if it is a bit Windows-ish.

I did reboot. It's part of the process of the installation and I did it again when I noticed the program had stopped working.

Quote:

Pretty sure the 'ip' and 'sp' in your error message are the Instruction Pointer and Stack Pointer registers, respectively. You'd find those useful, I think, if you had a core dump to analyze.
And if I had any clue what to do with them. Alas, my debugging skills are mediocre at best. If you have a pointer to where I might be able to learn some more, I'd appreciate it. I have googled but it looks a little like dark magic to me and I don't have a clue where to start.

dwhitney67 03-10-2011 08:31 AM

Have you tried to build and run any other program? I wonder if your GCC installation is fried.

tsg 03-10-2011 09:14 AM

I haven't, but I have a sneaking suspicion it may be a hosed library. At the moment I'm downgrading the machine to the last known good installation where the program worked. I have to get it running again. I'm wondering if it didn't have something to do with the upgrade procedure since both boxes were upgraded the same way. I may grab an old machine and do a fresh install of the new OS and see if my program will run on it.

theNbomr 03-10-2011 10:05 AM

Well, my only take on the IP & SP matter is that the IP is a very low value (0x2E). I don't know this for sure, but my hunch is that it is in code that is part of the C startup code, and executing before it has reached your C main() function. By extension of this logic, it would indeed point to a problem with the C startup code that may come from a shareable object library.

Did this problem start without having re-built your application, or did it work fine until after you re-built it? If the former, then it would seem also to point to a shared object library problem. Is it possible that there is a 64/32-bit aspect to the problem (ie ld.so finding the wrong version first, now, where it previously found the correct one first). If you run ldd against your compiled application, does it list any libraries that have changed since your upgrade? Any libs that exist in more than one location?

Sorry to be throwing ideas at you in a scattershot fashion. I'm just trying to open up some ideas for exploration.

--- rod.

tsg 03-10-2011 11:04 AM

Quote:

Originally Posted by theNbomr (Post 4285593)
Well, my only take on the IP & SP matter is that the IP is a very low value (0x2E). I don't know this for sure, but my hunch is that it is in code that is part of the C startup code, and executing before it has reached your C main() function. By extension of this logic, it would indeed point to a problem with the C startup code that may come from a shareable object library.

Did this problem start without having re-built your application, or did it work fine until after you re-built it? If the former, then it would seem also to point to a shared object library problem. Is it possible that there is a 64/32-bit aspect to the problem (ie ld.so finding the wrong version first, now, where it previously found the correct one first). If you run ldd against your compiled application, does it list any libraries that have changed since your upgrade? Any libs that exist in more than one location?

I rebuilt the application because it wouldn't stay running. I have run into this in the past with an operating system upgrade (it started with Slackware 7, I believe) and usually a recompile fixes it. I will run ldd and see what it says and also check for duplicate libraries.

Quote:

Sorry to be throwing ideas at you in a scattershot fashion. I'm just trying to open up some ideas for exploration.
Don't apologize. I appreciate any help you can give me. At this point I'm out of ideas, so even scattershot ones are more than welcome.

tsg 03-10-2011 12:42 PM

Downgrading the server worked. The application runs fine. Now to find out why.


All times are GMT -5. The time now is 01:44 AM.