LinuxQuestions.org
LinuxAnswers - the LQ Linux tutorial section.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Networking
User Name
Password
Linux - Networking This forum is for any issue related to networks or networking.
Routing, network cards, OSI, etc. Anything is fair game.

Notices

Reply
 
Search this Thread
Old 10-16-2009, 09:30 AM   #1
mobrientx
LQ Newbie
 
Registered: Oct 2009
Posts: 1

Rep: Reputation: 0
Question centos v5.2: occasional socket select() problem when zero timeout is specified


Hi,

I've posted this problem in the centos forum at www.centos.org, but I thought I would solicit input from the greater Linux community who might have noted this problem and who don't commonly visit the centos forum.

We're using multi-cpu, multi-core servers from Aberdeen - which are basically repackaged supermicro servers.

uname -a

2.6.18-92.el5 #1 SMP Tue Jun 10 18:49:47 EDT 2008 i686 i686 i386 GNU/Linux

rpm -qa kernel\* | sort

kernel-2.6.18-92.el5
kernel-headers-2.6.18-92.el5

This problem has been noted with UDP sockets. We're not sure if it also happens with TCP sockets.

Occasionally, when a non-blocking UDP socket is polled using the select() function with a zeroed timeval structure, we note that the select() stalls for just over 70 minutes. We wish to respond quickly when packets appear spontaneously on this socket, but the opposite socket very, very rarely spontaneously transmits a packet. It is common for no packet to be spontaneously transmitted to this socket for many hours.

We find it quite coincidental that 0xFFFFFFFF in usec resolution equals 71 minutes, 35 seconds. We hypothesize that the usec component of the zeroed timeval structure provided to select() is occasionally being decremented to 0xFFFFFFFF (or the equivalent in "jiffies") prior to the OS testing if it is equal to zero. Thus, we incur a 71 minute, 35 second timeout. We poll this socket at quite a high rate (e.g. 50 Hz) and this problem might occur once or twice over 12 hours. It is apparently quite sensitive to precisely when the select() function is called in relation to the whatever clocks drive the OS to decrement socket timeouts.

We have searched the RedHat bug list, the centos forum, and this site and have not found any similar complaints using select() with a zeroed timeout. Has anyone else observed this behavior? Is there a remedy that entails something other than avoiding zero timeouts or a watchdog on threads that might perform zero timeout select() calls? Our product also employs a library that may perform zero timeout select() calls, so we'd prefer an OS level solution. We didn't notice anything in the centos v5.3 release notes to indicate that such a problem has been recognized and addressed.

I am not an OS level programmer, so I don't have a good feel for whether this problem is due to a unique interaction of v5.2 centos and our Aberdeen peculiar server hardware. If it isn't peculiar to our hardware, I'd have thought there would already be plenty of posts about this issue on-line.

Despite the vast number of Linux installations, I suppose it's possible a problem such as this might go unnoticed for an extended period of time. It manifests very infrequently given the number of opportunities. And one might only recognize it happens if the socket they are polling using select() with a zeroed timeout only very, very rarely receives packet traffic. Otherwise, the select() would return due to the reception of that traffic.

Thanks,
Mark

Last edited by mobrientx; 10-16-2009 at 10:34 AM.
 
Old 10-31-2009, 01:26 PM   #2
DrLove73
Senior Member
 
Registered: Sep 2009
Location: Srbobran, Serbia
Distribution: CentOS 5.5 i386 & x86_64
Posts: 1,118
Blog Entries: 1

Rep: Reputation: 129Reputation: 129
CentOS released 5.4 version few weeks ago. If would be wise to upgrade your systems since bugs are fixed constantly and this problem might be already solved by now.
 
  


Reply

Tags
kernel, select, socket, stalls, timeout


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
nonblocking select() + socket() phsythax Programming 6 06-25-2010 12:33 AM
socket - select() timeout problem in Linux Tejesh Linux - Networking 2 06-10-2009 04:38 AM
fix tcp select() with non-zero timeout for redhat 4.5 vilnius Linux - Newbie 1 10-01-2008 07:44 PM
Select() did not select my socket thvo Programming 1 05-08-2005 12:20 AM
select function as timeout? frostmagic Programming 2 02-09-2004 11:56 AM


All times are GMT -5. The time now is 10:36 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration