LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 10-15-2018, 04:35 PM   #1
krishnapa
LQ Newbie
 
Registered: Mar 2017
Posts: 12

Rep: Reputation: Disabled
Server became unresponsive


Hi All,

We have reboot the one of the AWS instance as it was became unresponsive. I have found below investigation on this -
1) CPU utilization was very high (>35%)
2) Close wait connection was high (>180))

As the server was not responding we rebooted it and now could see below logs in /var/log/messages file -
=============================================
Oct 14 20:26:52 AWS IP collectd[2072]: apache: curl_easy_perform failed: Operation timed out after 15000 milliseconds with 0 out of 0 bytes received
Oct 14 20:26:52 AWS IP collectd[2072]: read-function of plugin `apache//prd0' failed. Will suspend it for 30.000 seconds.
Oct 14 20:27:01 AWS IP systemd: Started Session 23803 of user root.
Oct 14 20:27:01 AWS IP systemd: Starting Session 23803 of user root.
Oct 14 20:27:14 AWS IP collectd[2072]: apache: curl_easy_perform failed: Failed connect to localhost:443; Operation now in progress
Oct 14 20:27:14 AWS IP collectd[2072]: read-function of plugin `apache//prd0' failed. Will suspend it for 60.000 seconds.
Oct 14 20:28:01 AWS IP systemd: Started Session 23804 of user root.
Oct 14 20:28:01 AWS IP systemd: Starting Session 23804 of user root.
Oct 14 20:28:01 AWS IP systemd: Started Session 23805 of user root.
Oct 14 20:28:01 AWS IP systemd: Starting Session 23805 of user root.
Oct 14 20:28:01 AWS IP sync_user: User synchronisation complete.
Oct 14 20:28:22 AWS IP collectd[2072]: apache: curl_easy_perform failed: Operation timed out after 15000 milliseconds with 0 out of 0 bytes received
Oct 14 20:28:22 AWS IP collectd[2072]: read-function of plugin `apache//prd0' failed. Will suspend it for 120.000 seconds.
Oct 14 20:29:01 AWS IP systemd: Started Session 23806 of user root.
Oct 14 20:29:01 AWS IP systemd: Starting Session 23806 of user root.
Oct 14 20:29:05 AWS IP dhclient[1278]: DHCPREQUEST on eth0 to 10.40.*.* port 67 (xid=0x1a0ebfb1)
Oct 14 20:29:05 AWS IP dhclient[1278]: DHCPACK from 10.40.*.* (xid=0x1a0ebfb1)
Oct 14 20:29:07 AWS IP dhclient[1278]: bound to 10.40.*.* -- renewal in 1358 seconds.
Oct 14 20:29:52 AWS IP kernel: INFO: task khugepaged:243 blocked for more than 120 seconds.
Oct 14 20:29:52 AWS IP kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 14 20:29:52 AWS IP kernel: khugepaged D ffff880174a08000 0 243 2 0x00000080
Oct 14 20:29:52 AWS IP kernel: Call Trace:
Oct 14 20:29:52 AWS IP kernel: [<ffffffff816ab6d9>] schedule+0x29/0x70
Oct 14 20:29:52 AWS IP kernel: [<ffffffff816acd0d>] rwsem_down_read_failed+0x10d/0x1a0
Oct 14 20:29:52 AWS IP kernel: [<ffffffff81333c78>] call_rwsem_down_read_failed+0x18/0x30
Oct 14 20:29:52 AWS IP kernel: [<ffffffff816aa870>] down_read+0x20/0x40
Oct 14 20:29:52 AWS IP kernel: [<ffffffff811ec857>] khugepaged_scan_mm_slot+0x67/0xcf0
Oct 14 20:29:52 AWS IP kernel: [<ffffffff8109a6c0>] ? internal_add_timer+0x70/0x70
Oct 14 20:29:52 AWS IP kernel: [<ffffffff811ed61b>] khugepaged+0x13b/0x480
Oct 14 20:29:52 AWS IP kernel: [<ffffffff810b34b0>] ? wake_up_atomic_t+0x30/0x30
Oct 14 20:29:52 AWS IP kernel: [<ffffffff811ed4e0>] ? khugepaged_scan_mm_slot+0xcf0/0xcf0
Oct 14 20:29:52 AWS IP kernel: [<ffffffff810b252f>] kthread+0xcf/0xe0
Oct 14 20:29:52 AWS IP kernel: [<ffffffff810b2460>] ? insert_kthread_work+0x40/0x40
Oct 14 20:29:52 AWS IP kernel: [<ffffffff816b8798>] ret_from_fork+0x58/0x90
Oct 14 20:29:52 AWS IP kernel: [<ffffffff810b2460>] ? insert_kthread_work+0x40/0x40
Oct 14 20:29:52 AWS IP kernel: INFO: task httpd:102024 blocked for more than 120 seconds.
Oct 14 20:29:52 AWS IP kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 14 20:29:52 AWS IP kernel: httpd D ffff880804d8af70 0 102024 2059 0x00000080
Oct 14 20:29:52 AWS IP kernel: Call Trace:
Oct 14 20:29:52 AWS IP kernel: [<ffffffff816ab6d9>] schedule+0x29/0x70
Oct 14 20:29:52 AWS IP kernel: [<ffffffff816acfc5>] rwsem_down_write_failed+0x225/0x3a0
Oct 14 20:29:52 AWS IP kernel: [<ffffffff81333ca7>] call_rwsem_down_write_failed+0x17/0x30
Oct 14 20:29:52 AWS IP kernel: [<ffffffff816aa8bd>] down_write+0x2d/0x3d
Oct 14 20:29:52 AWS IP kernel: [<ffffffff811bc140>] SyS_mprotect+0xd0/0x290
Oct 14 20:29:52 AWS IP kernel: [<ffffffff816b89fd>] system_call_fastpath+0x16/0x1b
Oct 14 20:29:52 AWS IP kernel: INFO: task httpd:102025 blocked for more than 120 seconds.
Oct 14 20:29:52 AWS IP kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 14 20:29:52 AWS IP kernel: httpd D ffff880eeabe8fd0 0 102025 2059 0x00000080
Oct 14 20:29:52 AWS IP kernel: Call Trace:
Oct 14 20:29:52 AWS IP kernel: [<ffffffff810c2ef8>] ? check_preempt_curr+0x78/0xa0
Oct 14 20:29:52 AWS IP kernel: [<ffffffff816ab6d9>] schedule+0x29/0x70
Oct 14 20:29:52 AWS IP kernel: [<ffffffff816acd0d>] rwsem_down_read_failed+0x10d/0x1a0
Oct 14 20:29:52 AWS IP kernel: [<ffffffff810c63fb>] ? wake_up_q+0x5b/0x80
Oct 14 20:29:52 AWS IP kernel: [<ffffffff81333c78>] call_rwsem_down_read_failed+0x18/0x30
Oct 14 20:29:52 AWS IP kernel: [<ffffffff816aa870>] down_read+0x20/0x40
Oct 14 20:29:52 AWS IP kernel: [<ffffffff816b3a0c>] __do_page_fault+0x37c/0x450
Oct 14 20:29:52 AWS IP kernel: [<ffffffff816b3b15>] do_page_fault+0x35/0x90
Oct 14 20:29:52 AWS IP kernel: [<ffffffff816af8f8>] page_fault+0x28/0x30
Oct 14 20:29:52 AWS IP kernel: INFO: task httpd:102026 blocked for more than 120 seconds.
Oct 14 20:29:52 AWS IP kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 14 20:29:52 AWS IP kernel: httpd D ffff880eeabedee0 0 102026 2059 0x00000080
Oct 14 20:29:52 AWS IP kernel: Call Trace:
Oct 14 20:29:52 AWS IP kernel: [<ffffffff816ab6d9>] schedule+0x29/0x70
Oct 14 20:29:52 AWS IP kernel: [<ffffffff816acfc5>] rwsem_down_write_failed+0x225/0x3a0
Oct 14 20:29:52 AWS IP kernel: [<ffffffff81333ca7>] call_rwsem_down_write_failed+0x17/0x30
Oct 14 20:29:52 AWS IP kernel: [<ffffffff816aa8bd>] down_write+0x2d/0x3d
Oct 14 20:29:52 AWS IP kernel: [<ffffffff811bc140>] SyS_mprotect+0xd0/0x290
Oct 14 20:29:52 AWS IP kernel: [<ffffffff816b89fd>] system_call_fastpath+0x16/0x1b
Oct 14 20:29:52 AWS IP kernel: INFO: task httpd:102027 blocked for more than 120 seconds.
Oct 14 20:29:52 AWS IP kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 14 20:29:52 AWS IP kernel: httpd D ffff880eeabe9fa0 0 102027 2059 0x00000080
Oct 14 20:29:52 AWS IP kernel: Call Trace:
Oct 14 20:29:52 AWS IP kernel: [<ffffffff816ab6d9>] schedule+0x29/0x70
Oct 14 20:29:52 AWS IP kernel: [<ffffffff816acfc5>] rwsem_down_write_failed+0x225/0x3a0
Oct 14 20:29:52 AWS IP kernel: [<ffffffff81333ca7>] call_rwsem_down_write_failed+0x17/0x30
Oct 14 20:29:52 AWS IP kernel: [<ffffffff816aa8bd>] down_write+0x2d/0x3d
Oct 14 20:29:52 AWS IP kernel: [<ffffffff811bc140>] SyS_mprotect+0xd0/0x290
Oct 14 20:29:52 AWS IP kernel: [<ffffffff816b89fd>] system_call_fastpath+0x16/0x1b
Oct 14 20:29:52 AWS IP kernel: INFO: task httpd:102028 blocked for more than 120 seconds.
Oct 14 20:29:52 AWS IP kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 14 20:29:52 AWS IP kernel: httpd D ffff880eeabecf10 0 102028 2059 0x00000080
Oct 14 20:29:52 AWS IP kernel: Call Trace:
Oct 14 20:29:52 AWS IP kernel: [<ffffffff816ab6d9>] schedule+0x29/0x70
Oct 14 20:29:52 AWS IP kernel: [<ffffffff816acfc5>] rwsem_down_write_failed+0x225/0x3a0
Oct 14 20:29:52 AWS IP kernel: [<ffffffff811b3dd1>] ? handle_mm_fault+0x691/0xfa0
Oct 14 20:29:52 AWS IP kernel: [<ffffffff81333ca7>] call_rwsem_down_write_failed+0x17/0x30
Oct 14 20:29:52 AWS IP kernel: [<ffffffff816aa8bd>] down_write+0x2d/0x3d
Oct 14 20:29:52 AWS IP kernel: [<ffffffff811bc140>] SyS_mprotect+0xd0/0x290
Oct 14 20:29:52 AWS IP kernel: [<ffffffff816b89fd>] system_call_fastpath+0x16/0x1b
Oct 14 20:29:52 AWS IP kernel: INFO: task httpd:102029 blocked for more than 120 seconds.
Oct 14 20:29:53 AWS IP kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 14 20:29:53 AWS IP kernel: httpd D ffff880eeabe8000 0 102029 2059 0x00000080
Oct 14 20:29:53 AWS IP kernel: Call Trace:
Oct 14 20:29:53 AWS IP kernel: [<ffffffff816ab6d9>] schedule+0x29/0x70
Oct 14 20:29:53 AWS IP kernel: [<ffffffff816acd0d>] rwsem_down_read_failed+0x10d/0x1a0
Oct 14 20:29:53 AWS IP kernel: [<ffffffff81333c78>] call_rwsem_down_read_failed+0x18/0x30
Oct 14 20:29:53 AWS IP kernel: [<ffffffff816aa870>] down_read+0x20/0x40
Oct 14 20:29:53 AWS IP kernel: [<ffffffff816b3a0c>] __do_page_fault+0x37c/0x450
Oct 14 20:29:53 AWS IP kernel: [<ffffffff816b3b15>] do_page_fault+0x35/0x90
Oct 14 20:29:53 AWS IP kernel: [<ffffffff816af8f8>] page_fault+0x28/0x30
Oct 14 20:29:53 AWS IP kernel: INFO: task httpd:102030 blocked for more than 120 seconds.
Oct 14 20:29:53 AWS IP kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 14 20:29:53 AWS IP kernel: httpd D ffff880eeabeaf70 0 102030 2059 0x00000080
Oct 14 20:29:53 AWS IP kernel: Call Trace:
Oct 14 20:29:53 AWS IP kernel: [<ffffffff816ab6d9>] schedule+0x29/0x70
Oct 14 20:29:53 AWS IP kernel: [<ffffffff816acd0d>] rwsem_down_read_failed+0x10d/0x1a0
Oct 14 20:29:53 AWS IP kernel: [<ffffffff81333c78>] call_rwsem_down_read_failed+0x18/0x30
Oct 14 20:29:53 AWS IP kernel: [<ffffffff816aa870>] down_read+0x20/0x40
Oct 14 20:29:53 AWS IP kernel: [<ffffffff816b3a0c>] __do_page_fault+0x37c/0x450
Oct 14 20:29:53 AWS IP kernel: [<ffffffff816b3b15>] do_page_fault+0x35/0x90
Oct 14 20:29:53 AWS IP kernel: [<ffffffff816af8f8>] page_fault+0x28/0x30
Oct 14 20:29:53 AWS IP kernel: INFO: task httpd:102031 blocked for more than 120 seconds.
Oct 14 20:29:53 AWS IP kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 14 20:29:53 AWS IP kernel: httpd D ffff880eeabebf40 0 102031 2059 0x00000080
Oct 14 20:29:53 AWS IP kernel: Call Trace:
Oct 14 20:29:53 AWS IP kernel: [<ffffffff816ab6d9>] schedule+0x29/0x70
Oct 14 20:29:53 AWS IP kernel: [<ffffffff816acd0d>] rwsem_down_read_failed+0x10d/0x1a0
Oct 14 20:29:53 AWS IP kernel: [<ffffffff81333c78>] call_rwsem_down_read_failed+0x18/0x30
Oct 14 20:29:53 AWS IP kernel: [<ffffffff816aa870>] down_read+0x20/0x40
Oct 14 20:29:53 AWS IP kernel: [<ffffffff816b3a0c>] __do_page_fault+0x37c/0x450
Oct 14 20:29:53 AWS IP kernel: [<ffffffff816b3b15>] do_page_fault+0x35/0x90
Oct 14 20:29:53 AWS IP kernel: [<ffffffff816af8f8>] page_fault+0x28/0x30
Oct 14 20:30:01 AWS IP systemd: Started Session 23809 of user root.
Oct 14 20:30:01 AWS IP systemd: Starting Session 23809 of user root.
Oct 14 20:30:02 AWS IP systemd: Started Session 23808 of user root.
Oct 14 20:30:02 AWS IP systemd: Starting Session 23808 of user root.
Oct 14 20:30:02 AWS IP systemd: Started Session 23807 of user root.
Oct 14 20:30:02 AWS IP systemd: Starting Session 23807 of user root.
Oct 14 20:30:02 AWS IP sync_user: User synchronisation complete.
==================================================

Can some one help me to understand the root cause of this ?

Thanks,

Last edited by krishnapa; 10-15-2018 at 04:37 PM.
 
Old 10-15-2018, 07:08 PM   #2
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,125

Rep: Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120
Code:
kernel: INFO: task khugepaged:243 blocked for more than 120 seconds.
You really don't want kernel threads hanging -especially ones that reallocate page tables.
I would be inclined to turn transparent hugepages off altogether - this looks like a pretty clear article.
 
Old 10-16-2018, 05:56 AM   #3
krishnapa
LQ Newbie
 
Registered: Mar 2017
Posts: 12

Original Poster
Rep: Reputation: Disabled
SO is there issue with Hardware / Apache ?

Thanks
 
Old 10-17-2018, 10:08 AM   #4
Habitual
LQ Veteran
 
Registered: Jan 2011
Location: Abingdon, VA
Distribution: Catalina
Posts: 9,374
Blog Entries: 37

Rep: Reputation: Disabled
Tell us about the kernel.

Quote:
Originally Posted by krishnapa View Post
Code:
=============================================
Oct 14 20:27:14 AWS IP collectd[2072]: read-function of plugin `apache//prd0' failed. Will suspend it for 60.000 seconds.
Oct 14 20:28:22 AWS IP collectd[2072]: apache: curl_easy_perform failed: Operation timed out after 15000 milliseconds with 0 out of 0 bytes received
Oct 14 20:29:52 AWS IP kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
show us apache file.conf edits, please.
curl in an apache file.conf?

Running
Code:
sudo echo 0 > /proc/sys/kernel/hung_task_timeout_secs
disables the message won't disable the cause.
What is plugin prd0?

Last edited by Habitual; 10-17-2018 at 10:15 AM.
 
Old 10-17-2018, 02:10 PM   #5
sgrlscz
Member
 
Registered: Aug 2008
Posts: 123

Rep: Reputation: 84
I've seen these kinds of messages on VMs when I/O overwhelms disk caching. The default settings are not necessarily the best for a VM environment that is backed by fast disk arrays.

There are 2 kernel parameters that can be adjusted to control disk caching. They are:
  • vm.dirty_background_ratio - this controls when the OS background processes will start flushing the cache to disk
  • vm.dirty_ratio - this is the max amount of memory used for cache, once you hit this limit, all I/O blocks until dirty pages are written to disk
When I ran into this problem a month or two ago, I found this blog post https://lonesysadmin.net/2013/12/22/...m-dirty_ratio/ that has a good explanation of different options for tuning the cache depending on application behaviour. It's not the newest post I've found on this problem, but they all pointed to the same 2 kernel parameters, usually with less information about what they do.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
server become unresponsive johnsmall745 Linux - Server 1 10-15-2013 02:25 AM
Server unresponsive and no server logs to check on depam Linux - Software 3 08-28-2012 07:54 AM
Confused as to why my server becomes unresponsive andrew2110 Linux - Server 7 12-22-2009 08:03 AM
server becomes unresponsive davy2002a Linux - Server 4 11-28-2007 11:42 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 11:24 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration