writing "\r" to /dev/ttyS* causes segmentation fault
This is a C++ program I wrote years ago that has been chugging along fine with no problems until I recently upgraded from Slackware 13.0 to 13.1. Suddenly the process dies with a segfault. I recompiled against the new libraries and still no luck.
This is the offending bit of code: Code:
int comm_handler::transmit(const byte* out, size_t len) { Can anyone explain why a carriage return would cause a segfault? |
The code you presented looks fine, thus I wonder if this is really the source of the SEGV. Have you debugged the code, using gdb, so verify that all of the parameter values and class member data (ie. 'fd') are valid values?
|
My knowledge of gdb is limited, but I have used it to step through the function. It fails consistently on the write() function. I tried to debug it with a core dump but I my libraries don't have debugging symbols so I don't get much meaningful data. fd, as far as I can tell, is valid (at least, it's the same value returned by the "open" function earlier), as is out, and len.
I've used minicom on the serial port to communicate with the modem and that much seems to work. It's really messing with me because the only thing that's changed is a minor version of the operating system and I haven't seen anything in the changelogs that might indicate this would be an issue. It also segfaults if out is "ATH0\n", "ATH0\r ", "\rATH0". |
When you call transmit(), are you hard-coding the string for 'out', or is it a variable that you are passing? If the latter, can you show how it is declared.
|
It's part of a loop that attempts to reset the modem:
Code:
266 for (tries=0; tries < 3; tries++) { For grins, I connected the same modem to another machine with the same operating system and had the same problem. |
Wild-ass shot in the dark: something has made your literal strings look like multi-byte characters, or the system is incorrectly expecting them to be. I've never really understood the mechanisms behind all of that hocus-pocus, but is seems reasonable that it might occur with an OS upgrade.
--- rod. |
Now that you mention it, during the install, the setup program asked me something about UTF. I didn't understand what it was asking (and can't remember what it said), but offered a "safe option" which I took. I wonder if that had anything to do with it.
|
Check the setting of the LANG environment variable.
Also, are you certain there is not any problems (ie. buffer overflow) in syslog()? |
Quote:
for root (whose running the program): LANG is not set Quote:
EDIT(2): sorry, I misunderstood. The syslog()'s were added after the SEGV showed up as a means of tracing it. EDIT: It also doesn't seem to matter if write is passed a byte* or char*, it still crashes. |
I will also accept suggestions for a better way to do this ie. write AT commands to a US Robotics modem.
|
On my system, root has LANG set to en_US.UTF-8, as does the regular-user account.
I do not have a USR device to play with, much less any other device that uses the /dev/ttyS* interfaces. Thus when I attempt to send data to a device such as /dev/ttyS0, I always get an "Input/output" error. |
What happens if you try sending individual characters/bytes sequentially?
Code:
transmit((byte*)"A",1); What if you contrive your string data to be read from somewhere at runtime, rather than as embedded literal strings? Did any permissions/ownership on your device pseudo-files change during the upgrade? Any changes applied in the udev system? What about the driver version for that serial port (really a stretch for something as tried & true as a standard serial port, but...?) Do you fully initialize the serial port before using it (i.e. set all of the appropriate termios parameters); something new might be applying a setting that didn't exist before. --- rod. |
Quote:
|
I may have figured something out: it may not be the transmit that is choking it but the receive, hence the carriage return causing the issue. I have the modem configured not to echo any input, so it's not going to generate any output until the carriage return. That's run in a separate handler:
Code:
void comm_handler::io_handler(int status) { Code:
void comm_handler::init_port() { b is a buffer class that I wrote to contain the input data. I'll have to go digging in there to see if maybe I screwed something up, although it has been working for years. This is getting deep, and I appreciate the help. |
Quote:
It turns out I have one of those. A US Robotics Courier. Once a month I use it to dial out to the National Bureau of Standards folks to find out what time it is. I wrote a C program which does this. As long as I was doing it, I wrote (inside the source file) a 950-line comment on how to program your modem in C. Can't hurt to give it a look-see, right? For at least the next two weeks, you can get it here. |
This is my open_port function, in case it sheds any light...
Code:
int comm_handler::open_port(const char* modemdevice) { |
Quote:
|
Now I'm gettting these in my syslogs...
Code:
segfault at 2e ip 0000002e sp bfeceb80 error 4 in alarmd[8048000+11000] |
Another thought. After you did the upgrade, did you reboot? Is it possible that some of the machinery that makes the serial port work is now out of sync with the kernel/driver/libraries, and needs a reboot to get things right again? That wouldn't explain why mionicom seems to be immune to the problem, but easy enough to try, even if it is a bit Windows-ish.
Pretty sure the 'ip' and 'sp' in your error message are the Instruction Pointer and Stack Pointer registers, respectively. You'd find those useful, I think, if you had a core dump to analyze. --- rod. |
Quote:
Quote:
|
Have you tried to build and run any other program? I wonder if your GCC installation is fried.
|
I haven't, but I have a sneaking suspicion it may be a hosed library. At the moment I'm downgrading the machine to the last known good installation where the program worked. I have to get it running again. I'm wondering if it didn't have something to do with the upgrade procedure since both boxes were upgraded the same way. I may grab an old machine and do a fresh install of the new OS and see if my program will run on it.
|
Well, my only take on the IP & SP matter is that the IP is a very low value (0x2E). I don't know this for sure, but my hunch is that it is in code that is part of the C startup code, and executing before it has reached your C main() function. By extension of this logic, it would indeed point to a problem with the C startup code that may come from a shareable object library.
Did this problem start without having re-built your application, or did it work fine until after you re-built it? If the former, then it would seem also to point to a shared object library problem. Is it possible that there is a 64/32-bit aspect to the problem (ie ld.so finding the wrong version first, now, where it previously found the correct one first). If you run ldd against your compiled application, does it list any libraries that have changed since your upgrade? Any libs that exist in more than one location? Sorry to be throwing ideas at you in a scattershot fashion. I'm just trying to open up some ideas for exploration. --- rod. |
Quote:
Quote:
|
Downgrading the server worked. The application runs fine. Now to find out why.
|
All times are GMT -5. The time now is 01:44 AM. |