LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (http://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   segmentation fault bash script (http://www.linuxquestions.org/questions/linux-newbie-8/segmentation-fault-bash-script-4175446405/)

bloozman23 01-19-2013 07:00 PM

segmentation fault bash script
 
When bash script is invoked it leaves the following
error message

Calculating results for Mix - 1
./06_hresult.sh: line 15: 2075 Segmentation fault
139
Calculating results for Mix - 2
./06_hresult.sh: line 15: 2076 Segmentation fault
139
Calculating results for Mix - 4
./06_hresult.sh: line 15: 2077 Segmentation fault
139
Calculating results for Mix - 8
./06_hresult.sh: line 15: 2078 Segmentation fault
139
Calculating results for Mix - 16
./06_hresult.sh: line 15: 2079 Segmentation fault
139
Calculating results for Mix - 32
./06_hresult.sh: line 15: 2080 Segmentation fault
139
Calculating results for Mix - 64
./06_hresult.sh: line 15: 2081 Segmentation fault
139


script
Code:

#!/bin/bash


ceilMix=64
ceilItr=10

mix=1

while [ $mix -le $ceilMix ]
do
        echo "Calculating results for Mix - $mix"
        HResults -T 00020  -p -t -I /media/dvone/hmmtest/mlf/allmlf-test.mlf /media/dvone/hmmtest/dics/wlist /media/dvone/hmmtest/mlf/recout_mix$mix'_'hmm$ceilItr.mlf > /media/dvone/hmmtest/results/result_mix$mix'_'hmm$ceilItr
      echo $?
        mix=$((mix*2))
done

ubuntu 11.10
64 bit
RAM -1GB

Ser Olmy 01-19-2013 07:32 PM

The segmentation fault occurs after the "Calculating results" text but before the echo $? statement. This only leaves one possibility: The "HResults" command in line 12, whatever that is.

bloozman23 01-19-2013 09:19 PM

valgrind says

Code:

==3768== Invalid read of size 2
==3768==    at 0x4C59FE1: getenv (getenv.c:90)
==3768==    by 0x4C8F21D: __libc_message (libc_fatal.c:67)
==3768==    by 0x4D140F4: __fortify_fail (fortify_fail.c:32)
==3768==    by 0x4D140A6: __stack_chk_fail (stack_chk_fail.c:29)
==3768==    by 0x804C2E0: OutTrans (HResults.c:1092)
==3768==    by 0x2020201F: ???
==3768==  Address 0x20202020 is not stack'd, malloc'd or (recently) free'd
==3768==
==3768==
==3768== Process terminating with default action of signal 11 (SIGSEGV)
==3768==  Access not within mapped region at address 0x20202020
==3768==    at 0x4C59FE1: getenv (getenv.c:90)
==3768==    by 0x4C8F21D: __libc_message (libc_fatal.c:67)
==3768==    by 0x4D140F4: __fortify_fail (fortify_fail.c:32)
==3768==    by 0x4D140A6: __stack_chk_fail (stack_chk_fail.c:29)
==3768==    by 0x804C2E0: OutTrans (HResults.c:1092)
==3768==    by 0x2020201F: ???
==3768==  If you believe this happened as a result of a stack
==3768==  overflow in your program's main thread (unlikely but
==3768==  possible), you can try to increase the size of the
==3768==  main thread stack using the --main-stacksize= flag.
==3768==  The main thread stack size used in this run was 8388608.
==3768==
==3768== HEAP SUMMARY:
==3768==    in use at exit: 1,036,774 bytes in 72 blocks
==3768==  total heap usage: 236 allocs, 164 frees, 4,462,990 bytes allocated
==3768==
==3768== LEAK SUMMARY:
==3768==    definitely lost: 0 bytes in 0 blocks
==3768==    indirectly lost: 0 bytes in 0 blocks
==3768==      possibly lost: 0 bytes in 0 blocks
==3768==    still reachable: 1,036,774 bytes in 72 blocks
==3768==        suppressed: 0 bytes in 0 blocks
==3768== Reachable blocks (those to which a pointer was found) are not shown.
==3768== To see them, rerun with: --leak-check=full --show-reachable=yes
==3768==
==3768== For counts of detected and suppressed errors, rerun with: -v
==3768== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 13 from 6)
Segmentation fault


chrism01 01-20-2013 06:30 PM

Well here
Code:

==3768==    by 0x804C2E0: OutTrans (HResults.c:1092)
its hinting towards line 1092 in the C src. The error may have originated elsewhere, but that's where it failed.
When I used to do C, SIGSEGV aka SIG 11 was usually caused by writing beyond the end of an array, often a 'string' array, overwriting the terminating \0.
This could cause other data to be corrupted. Alternately, a subsequent read would read into the memory region beyond and crash.

bloozman23 01-20-2013 07:42 PM

Code:

/* OutTrans: output aligned transcriptions using best path in grid */
void OutTrans(void)
{
  char refBuf[4096];          /* no checking of output length so */
  char testBuf[4096];          /* these are generous sizes */
 
  strcpy(refBuf," LAB: ");
  strcpy(testBuf," REC: ");
  AppendCell(nTest,nRef,testBuf,refBuf);
  printf("Aligned transcription: %s vs %s\n", labfn, recfn);
  printf("%s\n",refBuf);
  printf("%s\n",testBuf);
  fflush(stdout);
}

Here is the OutTrans Function

Ser Olmy 01-20-2013 07:48 PM

Quote:

Originally Posted by bloozman23 (Post 4874362)
Code:

  char refBuf[4096];          /* no checking of output length so */
  char testBuf[4096];          /* these are generous sizes */


I think that qualifies as "famous last words" in C.

Which line is number 1092?

chrism01 01-20-2013 07:55 PM

Code:

no checking of output length
says it all really ;)

I can't debug your code for you from here, but basically you need to check the size/len of all the vars mentioned there. It looks like the actual error will be in the AppendCell() fn (or a fn called from there...).
OutTrans() itself doesn't do any var manipulation; just assigns start values and prints the results.

bloozman23 01-20-2013 09:52 PM

Quote:

Originally Posted by Ser Olmy (Post 4874366)
I think that qualifies as "famous last words" in C.

Which line is number 1092?

1080 -1092
void OutTrans(void)
{
char refBuf[4096]; /* no checking of output length so */
char testBuf[4096]; /* these are generous sizes */

strcpy(refBuf," LAB: ");
strcpy(testBuf," REC: ");
AppendCell(nTest,nRef,testBuf,refBuf);
printf("Aligned transcription: %s vs %s\n", labfn, recfn);
printf("%s\n",refBuf);
printf("%s\n",testBuf);
fflush(stdout);
}

bloozman23 01-20-2013 09:58 PM

[QUOTE=chrism01;4874369]
Code:

no checking of output length
says it all really ;)

I can't debug your code for you from here, but basically you need to check the size/len of all the vars mentioned there. It looks like the actual error will be in the AppendCell() fn (or a fn called from there...).
OutTrans() itself doesn't do any var manipulation; just assigns start values and prints the results.[/QUOTE

Thank you Chris for your value clues,
reference ApppendCell()
Code:

/* AppendCell: path upto grid[i][j] to tb and rb (recursive) */
void AppendCell(int i, int j, char *tb, char *rb)
{
  char *rlab,*tlab;
  LabId rid=NULL,tid=NULL;
  char empty[1];

  if (i<0 || j<0)
      HError(3391,"AppendCell: Trace back failure");
  empty[0] = '\0'; rlab = tlab = empty;
  switch (grid[i][j].dir) {
  case DIAG:
      tid  = lTest[i]; tlab = tid->name;
      rid  = lRef[j]; rlab = rid->name;
      AppendCell(i-1,j-1,tb,rb); break;
  case HOR:
      tid  = lTest[i]; tlab = tid->name;
      rid = NULL; rlab = empty;
      AppendCell(i-1,j,tb,rb); break;
  case VERT:
      tid = NULL; tlab = empty;
      rid  = lRef[j]; rlab = rid->name;
      AppendCell(i,j-1,tb,rb); break;
  case NIL:
      return;
  }
  if (tid != nulClass && rid != nulClass)
      AppendPair(rb,rlab,tb,tlab);
}


jpollard 01-21-2013 12:37 PM

It isn't the size guys - it is copying a total of 7 bytes.

The problem is more likely that these buffers are passed to another structure, and then stored.

The buffers are on the stack, so when the function returns, anything that is pointing to them will be corrupted by other functions.

Notice the function AppendCell - the parameters are tb and rb. These are the stacked 4k arrays passed to AppendPair... so does AppendPair happen to require heap allocated strings?

If not, then there is no need for the 4k arrays - constant strings would work just as well, and save 8k of stack space as a benefit.


All times are GMT -5. The time now is 08:43 AM.