LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 04-20-2006, 03:29 PM   #1
uckl_lyk
LQ Newbie
 
Registered: Apr 2006
Posts: 2

Rep: Reputation: 0
system call sem_wait() interrupted


Hi all,

My program has created 15-pairs of threads associated with 15 semaphores respectively. For example, Pair-1 has T1 and T2 where T1 performs sem_wait() while T2 performs sem_post() on a semaphore, S1. In between, sem_getvalue() is called before and after of both sem_wait() and sem_post(). The same program flow is executed by the other 14 pairs of threads. Besides that, an additional thread is created to periodically check all of the 15 semaphore values with sem_getvalue().

A series of simulation tests have been carried out to evaluate the program's performance. The program ran well for a few hours. Unfortunately, it got error - miscellaneous "system call interrupted" [ERRNO 4] on sem_wait() once a while. The interrupt signal stops the sem_wait() but it does not affect T1 to continue calling the next sem_wait(). The next immediate sem_wait() is always successful - no interrupt. The interrupt signal has put me into puzzle because it is "unpredictable". Supposedly, T1 should hang forever at sem_wait() without the sem_post() calling from T2.

My questions are:
  1. "Who" is sending the interrupt signal? The program does not sends any interrupt signal.
  2. In what condition that sem_wait() will get interrupted? Is it something related to priority inversion, deadlock, NPTL or kernel issue? [Note: Threads were created with default attributes]
  3. How to avoid getting interrupt system call on sem_wait()?
  4. How shall I start troubleshooting?

For your information
bash$ uname -a
Linux LinuxDB 2.4.21-4.ELsmp #1 SMP Fri Oct 3 17:52:56 EDT 2003 i686 i686 i386 GNU/Linux

bash$ ls /lib/libpthread*
/lib/libpthread-0.10.so /lib/libpthread.so.0

bash$ ls /lib/i686/libpthread*
/lib/i686/libpthread-0.10.so /lib/i686/libpthread.so.0

bash$ /lib/libc.so.6
GNU C Library stable release version 2.3.2, by Roland McGrath et al.
Copyright (C) 2003 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 3.2.3 20030502 (Red Hat Linux 3.2.3-20).
Compiled on a Linux 2.4.20 system on 2003-10-02.
Available extensions:
GNU libio by Per Bothner
crypt add-on version 2.1 by Michael Glad and others
linuxthreads-0.10 by Xavier Leroy
The C stubs add-on version 2.1.2.
BIND-8.2.3-T5B
NIS(YP)/NIS+ NSS modules 0.19 by Thorsten Kukuk
Glibc-2.0 compatibility add-on by Cristian Gafton
libthread_db work sponsored by Alpha Processor Inc
Thread-local storage support included.
Report bugs using the `glibcbug' script to <bugs@gnu.org>.

bash$ /lib/tls/libc.so.6
GNU C Library stable release version 2.3.2, by Roland McGrath et al.
Copyright (C) 2003 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 3.2.3 20030502 (Red Hat Linux 3.2.3-20).
Compiled on a Linux 2.4.20 system on 2003-10-02.
Available extensions:
GNU libio by Per Bothner
crypt add-on version 2.1 by Michael Glad and others
NPTL 0.60 by Ulrich Drepper
RT using linux kernel aio
The C stubs add-on version 2.1.2.
BIND-8.2.3-T5B
NIS(YP)/NIS+ NSS modules 0.19 by Thorsten Kukuk
Glibc-2.0 compatibility add-on by Cristian Gafton
Thread-local storage support included.
Report bugs using the `glibcbug' script to <bugs@gnu.org>.

bash$ g++ --version
g++ (GCC) 3.2.3 20030502 (Red Hat Linux 3.2.3-20)
Copyright (C) 2002 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

bash$ ldd --version
ldd (GNU libc) 2.3.2
Copyright (C) 2003 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Written by Roland McGrath and Ulrich Drepper.

bash$ getconf GNU_LIBPTHREAD_VERSION
NPTL 0.60

bash$ ldd test_program
libnsl.so.1 => /lib/libnsl.so.1 (0xb75d5000)
librt.so.1 => /lib/tls/librt.so.1 (0xb75c1000)
libpthread.so.0 => /lib/tls/libpthread.so.0 (0xb75b1000)
libstdc++.so.5 => /usr/lib/libstdc++.so.5 (0xb74fe000)
libm.so.6 => /lib/tls/libm.so.6 (0xb74dc000)
libc.so.6 => /lib/tls/libc.so.6 (0xb73a5000)
libgcc_s.so.1 => /lib/libgcc_s.so.1 (0xb739b000)
/lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0xb75eb000)

Makefile Compilation Option
CC = g++
CFLAGS = -g -O3 -Wall \
-DPOSIX_C_SOURCE=199506L -D_REENTRANT
LDFLAGS = -lc -mt -lnsl -lrt -lpthread

Moreover, the memory usage of the program is found decreased gradually.

The program runs on sun solaris 2.8 as well. So far, no interrupts on sem_wait().

Any help would be highly appreciated. Thanks.
 
Old 04-20-2006, 03:56 PM   #2
Mara
Moderator
 
Registered: Feb 2002
Location: Grenoble
Distribution: Debian
Posts: 9,696

Rep: Reputation: 232Reputation: 232Reputation: 232
Quote:
[*]"Who" is sending the interrupt signal? The program does not sends any interrupt signal.
I understand you get EINTR. If so, it's not an interrupt (or at least, not always). It means your program has received a signal. POSIX(?) states that sem_wait should stop when a signal is received. So it does... The question is what's the signal.
Quote:
[*]In what condition that sem_wait() will get interrupted? Is it something related to priority inversion, deadlock, NPTL or kernel issue? [Note: Threads were created with default attributes]
It's mostly answered above. Now, if you can't think about a reason for a signal, it may be a bug in the pthread library. I can't recall the dates, but there was such thing causing one function to send signals. They can be safely ingnored, but sem_wait is interrupted. Is an update an option?
Quote:
[*]How to avoid getting interrupt system call on sem_wait()?
You don't want this. No interrupts means no scheduling, what also means the other thread would not have a chance to enter sem_post.
Quote:
[*]How shall I start troubleshooting?
There's a work-around you can use now. Just run sem_wait in a loop, to ignore EINTR and run it again. That will work. In most typical cases it also means no harm.
 
Old 04-21-2006, 03:40 AM   #3
uckl_lyk
LQ Newbie
 
Registered: Apr 2006
Posts: 2

Original Poster
Rep: Reputation: 0
Hello,

Really apreciate your quick reply. From your explanation, sem_wait is stopped when a signal is received. And the signal is generated because of a few possibilities, such as priority inversion, deadlock, bugs in pthread library or kernel issue. If I would like to go for updating pthread library or glibc, what are the possible steps that I can do without rebooting the system. As the program is running on live system, therefore careful consideration on updating the library is necessary. Finding out the root cause is also important.

For your information
bash$ rpm -qa | grep glibc
glibc-profile-2.3.2-95.3
glibc-common-2.3.2-95.3
glibc-utils-2.3.2-95.3
glibc-kernheaders-2.4-8.34
glibc-devel-2.3.2-95.3
glibc-2.3.2-95.3
glibc-headers-2.3.2-95.3

bash$ rpm -qa | grep kernel
kernel-smp-2.4.21-4.EL
kernel-utils-2.4-8.37
kernel-2.4.21-4.EL
kernel-pcmcia-cs-3.1.31-13
kernel-source-2.4.21-4.EL

In fact, the thread T1 is a queue thread. T1 is being blocked by a mutex before it is popped from queue to proceed with the sem_wait system call. In addition to the signal issue, I also find out some threads seemed "hang" after the program has run for some time.

Is it really caused by the NPTL/ glibc bugs? The memory usage decreases gradually as well.

Any help would be highly appreciated. Thanks in advance.
 
Old 04-22-2006, 04:27 PM   #4
Mara
Moderator
 
Registered: Feb 2002
Location: Grenoble
Distribution: Debian
Posts: 9,696

Rep: Reputation: 232Reputation: 232Reputation: 232
When uptime is important, I would postpone update for now. It's one of the most important libraries and any problem has a chance to be visible clearly...

Instead, it's worth checking which signal is received. That would make clear if it's that bug or something differnet. Have you ever written a program that has signal-handling routines? It'd be a good idea to write simple routines for all signals (where it's possible). The routine would be very simple, just print the signal name, write it to a file etc. Do you know how to write such code or you need help?

Adding signal handling should not break the program. For testing, however, I'd recommend a program similar in structure to the one you have, but written just for tests.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
nohup + java > select: interrupted system call mivz Linux - Software 2 02-03-2006 02:28 AM
using system call ej25 Programming 9 11-30-2004 11:45 AM
new system call soul2 Linux - General 1 11-03-2004 02:41 PM
Is it possible to use system() and get the return value from the system call newguy21 Programming 1 08-11-2004 01:37 PM
NETPERF: ERROR --> send_udp_stream: error on remote: Interrupted system call dravya Linux - General 1 05-29-2004 05:49 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 04:24 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration