[SOLVED] program segfaults 80% of the time on OpenBSD 6.2, but never on Linux

Andy Alt · 11-13-2018, 06:42 PM

I'm maintaining a program called rmw. I'm normally not a BSD user but sometimes test it on OpenBSD 6.2. Last night when I tested, I found it segfaults about 80% of the time when it's run (no arguments needed to reproduce the segfault.

gdb reports the fault at L107. I don't see any problems with the code though, and as I mentioned, it never segfaults on Linux.

Any feedback would be appreciated.

EDIT: I found what may be the cause and am working on it. I'll post here with the results...

UPDATE: Nope. I've refactored a bit of related code, added a few commits, but still haven't solved the problem.

jggimi · 11-14-2018, 11:28 AM

OpenBSD 6.2 is no longer supported. It was end-of-life with the release of 6.4 on October 18. You may want to run your tests on this newest release, or, for bug reporting purposes, on a recent development snapshot. It may help to know that the OpenBSD Project publishes two releases a year, and only supports the most recent two.
Your gdb backtrace may provide an indication of what is occurring within isspace(3), and perhaps point to a root cause.
While there are a few OpenBSD users like me here, this forum is not an official support channel for the OS, and our expertise is limited.

Andy Alt · 11-15-2018, 10:14 PM

Quote:

Originally Posted by jggimi

OpenBSD 6.2 is no longer supported. It was end-of-life with the release of 6.4 on October 18. You may want to run your tests on this newest release, or, for bug reporting purposes, on a recent development snapshot. It may help to know that the OpenBSD Project publishes two releases a year, and only supports the most recent two.

Ok, I upgraded to 6.4 last night. Same results.

Quote:

Your gdb backtrace may provide an indication of what is occurring within isspace(3), and perhaps point to a root cause.

I'm not very proficient yet with debugging tools but I'll keep practicing.

Quote:

While there are a few OpenBSD users like me here, this forum is not an official support channel for the OS, and our expertise is limited.

Is there a good forum you'd recommend for a problem like this?

This is the most current version of the function causing difficulties...

Code:

/*
 *
 * trim_white_space: remove trailing blanks, tabs, newlines, carriage returns
 *
 */
void
trim_white_space (char *str)
{
  /* Advance pointer until NULL terminator is found */
  while (*str != '\0')
    str++;

  /* set pointer to segment preceding NULL terminator */
  str--;

  while (isspace ((unsigned int)*str))
  {
    *str = '\0';
    str--;
  }

  return;
}

jggimi · 11-16-2018, 07:00 AM

The problem occurs within the isspace(3) library function. I assume that str contains an invalid address. A similar value may be a valid address in a Linux process address space, which could explain why you don't see the error appear on Linux systems.

The OpenBSD Project's support is via Email. Specifically, its mailing lists. The bugs@ list is for bug reporting, but at this point it is not clear that there is a bug in the OpenBSD C library to report.

If this were my problem, my assumption would be it was my application at fault, and probably my application's handling of str. I'd initiate an informal query on the misc@ mailing list, asking for debugging assistance of my application, to obtain help discovering the root cause. If it turns out later that it's a problem with the C library, then I'd have enough information at that time to make a formal bug report.

GazL · 11-16-2018, 10:42 AM

Your code will underflow the buffer if the buffer is empty, or contains only isspace() characters, which could potentially cause a segfault and is something you should address.

Other than that, most likely, you're passing the function a bad pointer from elsewhere in your code. Checking for str == NULL and either writing out an error, or just returning from the function without doing anything would probably also be a good idea.

Lastly, I don't believe you need the (unsigned int) cast but that's a minor point.

Andy Alt · 11-16-2018, 08:59 PM

@Gazl great answer!

All fixed up.

NULL strings weren't the problem in this case (I use quite a bit of error-checking my program). The address was was going out-of-bounds during my subtraction, going past &str[0] in the wrong direction.

This was happening when the rmw config file was getting read, and the config files between my Linux system and BSD system had some subtle differences, and that may be why it didn't reproduce on the Linux system. However, there are some lines with only white_space in my Linux rmw config file, so really I'm pretty sure Linux should have segfaulted. My guess is that on some level.. system or compiler, that prevented the address from "underflowing" even without the proper code. (I forgot to mention that the segfault doesn't happen on OSX either.) But I think this is a good change and I surely do appreciate talking through the problem with both of you.

Code:

void
trim_white_space (char *str)
{
  if (str == NULL)
  {
    MSG_ERROR;
    fprintf (stderr, _("String passed to %s is NULL.\n\
Please report this bug to the rmw development team. Exiting...\n"), __func__);
    exit (EXIT_FAILURE);
  }
  char *pos_0 = str;
  /* Advance pointer until NULL terminator is found */
  while (*str != '\0')
    str++;

  /* set pointer to segment preceding NULL terminator */
  if (str != pos_0)
    str--;
  else
    return;

  while (isspace ((unsigned int)*str))
  {
    *str = '\0';
    if (str != pos_0)
      str--;
    else
      break;
  }

  return;
}

Quote:

Lastly, I don't believe you need the (unsigned int) cast but that's a minor point.

This is mentioned in the isspace() man page:

Quote:

NOTES
The standards require that the argument c for these functions is either
EOF or a value that is representable in the type unsigned char. If the
argument c is of type char, it must be cast to unsigned char, as in the
following example:

It's always worked fine for me without any casting at all and I only recently made that change. But it seems I should change it from int to char if I want to keep things compliant. I'm not sure.. as you say, it's a minor point but worth mentioning for the people who are learning.

GazL · 11-17-2018, 04:03 AM

Quote:

This is mentioned in the isspace() man page:

It's always worked fine for me without any casting at all and I only recently made that change. But it seems I should change it from int to char if I want to keep things compliant. I'm not sure.. as you say, it's a minor point but worth mentioning for the people who are learning.

You're quite right, I was in error there. I'm very much still learning myself, so thanks for the correction.

Best of luck with your project.

P.S.

Trying to improve my understanding of this issue, I found this helpful:

http://www.network-theory.co.uk/docs...cintro_71.html

Personally, I think ARM and PowerPC have the right idea by choosing to use unsigned char as their char type. It seems a far more sensible choice.