LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices


Reply
  Search this Thread
Old 01-04-2010, 01:54 AM   #1
mam2
LQ Newbie
 
Registered: Dec 2009
Posts: 4

Rep: Reputation: 0
Deadlock in NFS4


I have a strange problem. I am connecting servers using NFS4 the shared directories are on servers running Debian 4 while the one who read from them is Debian 5.0.3. The problem is one of these shared servers suddenly stop responding and you cannot list it from Debian 5 server, also df hang, and the web application that is using it does not respond to requests that use this shared directory since it is blocked. Then the load on the server start to increase until the server cannot respond (over 90). I have found many entries in the syslog that refer to this like:

ma25555 kernel: [1200285.732919] nfs: server 10.xxx.xxx.xxx not responding, still trying
Dec 31 08:16:33 ma25555 kernel: [1200289.815378] INFO: task java:9702 blocked for more than 120 seconds.
Dec 31 08:16:33 ma25555 kernel: [1200289.835249] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 31 08:16:33 ma25555 kernel: [1200289.857500] java D 0000000000000000 0 9702 1
Dec 31 08:16:33 ma25555 kernel: [1200289.871244] ffff81039d9e3948 0000000000000086 0000000000000000 0000000000000292
Dec 31 08:16:33 ma25555 kernel: [1200289.891554] ffff81032f943670 ffff81083ccc7470 ffff81032f9438f8 000000010000000c
Dec 31 08:16:33 ma25555 kernel: [1200289.908401] ffff8108395bb240 0000000000000000 00000000ffffffff 0000000000000000
Dec 31 08:16:33 ma25555 kernel: [1200289.924310] Call Trace:
Dec 31 08:16:33 ma25555 kernel: [1200290.011013] [<ffffffffa021f3ca>] :sunrpc:rpc_wait_bit_killable+0x0/0x31
Dec 31 08:16:33 ma25555 kernel: [1200290.028766] [<ffffffffa021f3f4>] :sunrpc:rpc_wait_bit_killable+0x2a/0x31
Dec 31 08:16:33 ma25555 kernel: [1200290.048191] [<ffffffff804293f2>] __wait_on_bit+0x40/0x6e
Dec 31 08:16:33 ma25555 kernel: [1200290.068537] [<ffffffffa021f3ca>] :sunrpc:rpc_wait_bit_killable+0x0/0x31
Dec 31 08:16:33 ma25555 kernel: [1200290.089700] [<ffffffff8042948c>] out_of_line_wait_on_bit+0x6c/0x78
Dec 31 08:16:33 ma25555 kernel: [1200290.111979] [<ffffffff8024622f>] wake_bit_function+0x0/0x23
Dec 31 08:16:33 ma25555 kernel: [1200290.120914] [<ffffffffa021c2e9>] :sunrpc:xprt_connect+0x89/0x123
Dec 31 08:16:33 ma25555 kernel: [1200290.139567] [<ffffffffa021f98f>] :sunrpc:__rpc_execute+0xe6/0x223
Dec 31 08:16:33 ma25555 kernel: [1200290.157657] [<ffffffffa0219bcb>] :sunrpc:rpc_run_task+0x4f/0x56
Dec 31 08:16:34 ma25555 kernel: [1200290.171380] [<ffffffffa0219c67>] :sunrpc:rpc_call_sync+0x3e/0x5b
Dec 31 08:16:34 ma25555 kernel: [1200290.397448] [<ffffffffa02c3ed2>] :nfs:nfs4_proc_access+0x142/0x1c0
Dec 31 08:16:34 ma25555 kernel: [1200290.415733] [<ffffffff803b656c>] __alloc_skb+0x7f/0x12d
Dec 31 08:16:34 ma25555 kernel: [1200290.431886] [<ffffffff8031a31d>] __next_cpu+0x19/0x26
Dec 31 08:16:34 ma25555 kernel: [1200290.439891] [<ffffffff802295fc>] find_busiest_group+0x254/0x6dc
Dec 31 08:16:34 ma25555 kernel: [1200290.465581] [<ffffffff8020ab0d>] __switch_to+0x34c/0x35e
Dec 31 08:16:34 ma25555 kernel: [1200290.473941] [<ffffffffa02ae1e8>] :nfs:nfs_do_access+0x163/0x30c
Dec 31 08:16:34 ma25555 kernel: [1200290.491637] [<ffffffffa02ae481>] :nfs:nfs_permission+0xf0/0x15f
Dec 31 08:16:34 ma25555 kernel: [1200290.513582] [<ffffffff802a2227>] permission+0xb5/0x118
Dec 31 08:16:34 ma25555 kernel: [1200290.529537] [<ffffffff802a37af>] __link_path_walk+0x150/0xd05
Dec 31 08:16:34 ma25555 kernel: [1200290.542843] [<ffffffff802a43aa>] path_walk+0x46/0x8b
Dec 31 08:16:34 ma25555 kernel: [1200290.810421] [<ffffffff802a46d6>] do_path_lookup+0x158/0x1cf
Dec 31 08:16:34 ma25555 kernel: [1200290.823349] [<ffffffff802a34e1>] getname+0x140/0x1a7
Dec 31 08:16:34 ma25555 kernel: [1200290.971416] [<ffffffff802a5045>] __user_walk_fd+0x37/0x4c
Dec 31 08:16:34 ma25555 kernel: [1200290.985607] [<ffffffff8029e15d>] vfs_stat_fd+0x1b/0x4a
Dec 31 08:16:34 ma25555 kernel: [1200290.995950] [<ffffffff80221fbc>] do_page_fault+0x5d8/0x9c8
Dec 31 08:16:34 ma25555 kernel: [1200291.015844] [<ffffffff8029e1e8>] sys_newstat+0x19/0x31
Dec 31 08:16:34 ma25555 kernel: [1200291.028568] [<ffffffff8031e0a7>] __up_read+0x13/0x8a
Dec 31 08:16:34 ma25555 kernel: [1200291.040526] [<ffffffff8042a6a9>] error_exit+0x0/0x60
Dec 31 08:16:34 ma25555 kernel: [1200291.092521] [<ffffffff8020beca>] system_call_after_swapgs+0x8a/0x8f


I have tried the connection between the 2 servers using ping for one day and all are OK (zero lost)

There are 3 other servers that are running Debian 4 and are working fine.

So, please help

Last edited by mam2; 01-04-2010 at 01:58 AM.
 
Old 01-06-2010, 06:00 AM   #2
mam2
LQ Newbie
 
Registered: Dec 2009
Posts: 4

Original Poster
Rep: Reputation: 0
Question

Please help me. If my problem is not clear tell me to clarify it.

Thanks in advance
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Deadlock management in linux btap_644 Linux - Desktop 3 02-07-2011 04:46 PM
Simple pipe in deadlock? nutthick Programming 3 05-10-2006 07:59 AM
Apt Dependency Deadlock Quantumstate Debian 7 04-23-2006 09:33 AM
Diagnosing Deadlock jcase008 Linux - Software 3 07-03-2004 09:09 PM
deadlock rajani Programming 0 03-11-2002 06:34 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Server

All times are GMT -5. The time now is 12:05 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration