LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices


Reply
  Search this Thread
Old 01-22-2011, 12:24 PM   #1
exceed1
Member
 
Registered: Mar 2008
Location: oslo
Distribution: debian,redhat
Posts: 199

Rep: Reputation: 31
read call is hanging and ls -l /proc/PID/fd shows broken pipe on file descriptor


Hi

It was a little tricky to come up with a good subject, but ill explain it better here.

Not long ago we went from a diry hack of having a mounted sshfs area to using the more comfortable solution of using nfs (although unencrypted), since we are dealing with a large amount of space that needs to be used every day we choose nfs which is also the usual protocol people are using to store large amounts of space these days.

There is a backup script that runs backups of some large databases every day and this is working fine when using the sshfs area mounted up against the filesystem on the nfs server, but when we`re using the nfs mounted directory with a nfs filesystem it fails.. in the messages file it says that the nfs server is not responding when it actually is, and we have no problems writing to the directory when the message appears in the log, so this is a kernel bug related to nfs.

i did a system call trace on the backup script and it seems to be hanging for a while on a read call (we still at this point have no problems creating files in the nfs mounted directory) and when i look at the read call i can see which file descriptor number it is trying to read from.. when going into the /proc/PID/fd directory it shows all file descriptors the processes is using at the moment and then i see a symbolic link to the file that it is trying to read from (its actually not a file, its a pipe, but it was easier explaining it that way) ... when i do a "file 12" (the symbolic link has the name "12") i get the following, "broken symbolic link to pipe[:a number]" and i dont understand why. the symbolic link is a link to itself for some reason. when i have a look at other processes they also have some broken symbolic links to pipes or files (these are random processes that i checked that has nothing to do with nfs and they appear to be working flawlessly).

The read system call hangs for about 10 minutes and then it fails.. the output of the system call trace is too large for me to capture with less or more so im unable to see the return result of the read command, i guess i cold redirect the output to a file, although i dont really see how it would help either way to just see that the read function failed or didnt fail because im not able to see what it actually is reading from other than a broken symlink to a pipe (which is linked to itself like explained earlier).

When we are getting the nfs not responding message as said above we have no problem writing to the nfs area, there is nothing hanging or anything.. but when we try the backup script on the sshfs mounted file system (to the same nfs server) the script runs without problems, so the problem seems to be with nfs, but i just cant see how.. and i dont think its related to problems with the nfs server. we dont have access to the nfs server, but could get a request to have it checked, but i doubt it would do any good. we have tried different nfs options like setting up how many bytes that should be allowed transfered to and from the client/server, hard limits and so on.

i also did a tcpdump, the client is communicating with the nfs server periodically, although the not responding part is being written to the log right after the communication ends for some reason which is wierd when we have no problems writing to the nfs area, no i/o errors or anything.. but i really suspect a kernel bug here.

Does anyone know why there is a broken symlink to a pipe in the /proc directory? (this is also the case for other processes for some reason that is unrelated to nfs and seems to work without problems)

Does someone know how we can resolve the problem with the nfs area not working when the backup script is running?
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
pmap or /proc/<pid>smap or /proc/<pid>/status iQoder Linux - Newbie 1 07-16-2009 06:32 PM
Cannot read beyond the first 2^63 bytes in /proc/<pid>/mem yeye_olive Programming 5 05-17-2009 04:06 AM
how to read /proc/pid/status in sched.c linuxdoniv Programming 3 07-21-2008 09:49 PM
/proc/<pid>/stat shows zero cpu usage simchac Linux - Kernel 0 07-25-2006 07:26 AM
Need help with writeing a new system call! What is a file descriptor? draksonije Programming 3 03-22-2006 02:08 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Server

All times are GMT -5. The time now is 05:48 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration