LinuxQuestions.org - Trying to understand this readout from top -b

- Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)

- - Trying to understand this readout from top -b (https://www.linuxquestions.org/questions/linux-newbie-8/trying-to-understand-this-readout-from-top-b-4175449304/)

Dedicated server crashes every 24 hours; Linux amateur cannot diagnose

Hello, my name is David, I am a writer and part time freelance web developer with only the most minor understanding of servers and how they work. I recently started renting from 1and1 a dedicated server but for a while now have been having trouble with server crashes, about once every 24 hours, which require me to go into the control panel and reboot.

I have added code to my .htaccess, started using a script from zbblock
to intercept bots from China, added some IP ranges to my blacklist to prevent access to my server, etc. but am still having trouble. I'm using a Drupal 7 script to run perhaps 7 sites, and in a separate directory, a Drupal 6 script to run 2 sites.

I have read a lot of material on the forums but do not feel competent to try to diagnose what is going on.

If anyone is able to help me out, please tell me what information you need. I have images of all processes, and can provide screen shots of the SSH screen. Also have all specs.

Thanks very much for any assistance. regards

Trying to understand this readout from top -b

Hello, I am trying to understand why my server crashes once a day. Thought I'd start out by sharing this. Does anyone see anything glaringly wrong?

Please forgive my ignorance, I have some websites on a dedicated server running CentOS 6.2, and Plesk panel 11.0.

Thank you for any assistance.

Showing top output won't much help, because memory usage keep changing every moment.

Do you have system's sar reports or did you notice anything special in dmesg and server logs (/var/log/...)?
Also what does server crash mean, does http process get killed unexpectedly?

And what is reported about the crash? or did it just hang, requiring a hard reboot?

Hello, thanks for your reply. I've installed Sysstat. Is there a way to collect all reports at once? I can copy and paste the results individually. Here is the output from pidstat (below). Please tell me what else I can provide. PS: Yes, the server stops unexpectedly, as does the http process. Nothing is reported in the logs about the crash, and yes, it requires a reboot from the 1&1 control panel. It happens fairly regularly. I can do screenshots of what appears on the Plesk panel as well.

Please advise, and many thanks.

Code:

Linux 2.6.32-220.13.1.el6.x86_64 (mysite.com)    02/09/2013      _x86_64_        (4 CPU)



01:32:46 PM      PID    %usr %system  %guest    %CPU  CPU  Command

01:32:46 PM        1    0.00    0.00    0.00    0.00    0  init

01:32:46 PM        3    0.00    0.00    0.00    0.00    0  migration/0

01:32:46 PM        4    0.00    0.51    0.00    0.51    0  ksoftirqd/0

01:32:46 PM        6    0.00    0.00    0.00    0.00    0  watchdog/0

01:32:46 PM        7    0.00    0.00    0.00    0.00    1  migration/1

01:32:46 PM        9    0.01    1.24    0.00    1.25    1  ksoftirqd/1

01:32:46 PM        10    0.00    0.00    0.00    0.00    1  watchdog/1

01:32:46 PM        11    0.00    0.00    0.00    0.00    2  migration/2

01:32:46 PM        13    0.00    8.18    0.00    8.18    2  ksoftirqd/2

01:32:46 PM        14    0.00    0.00    0.00    0.00    2  watchdog/2

01:32:46 PM        15    0.00    0.00    0.00    0.00    3  migration/3

01:32:46 PM        17    0.00    0.85    0.00    0.85    3  ksoftirqd/3

01:32:46 PM        18    0.00    0.00    0.00    0.00    3  watchdog/3

01:32:46 PM        19    0.00    0.00    0.00    0.00    0  events/0

01:32:46 PM        20    0.00    0.00    0.00    0.00    1  events/1

01:32:46 PM        21    0.00    0.01    0.00    0.01    2  events/2

01:32:46 PM        22    0.00    0.00    0.00    0.00    3  events/3

01:32:46 PM        28    0.00    0.00    0.00    0.00    1  sync_supers

01:32:46 PM        29    0.00    0.00    0.00    0.00    2  bdi-default

01:32:46 PM        34    0.00    0.00    0.00    0.00    0  kblockd/0

01:32:46 PM        35    0.00    0.00    0.00    0.00    1  kblockd/1

01:32:46 PM        36    0.00    0.00    0.00    0.00    2  kblockd/2

01:32:46 PM        37    0.00    0.01    0.00    0.01    3  kblockd/3

01:32:46 PM        57    0.00    0.00    0.00    0.00    3  khungtaskd

01:32:46 PM        58    0.00    0.00    0.00    0.00    2  kswapd0

01:32:46 PM        60    0.00    0.00    0.00    0.00    2  khugepaged

01:32:46 PM      141    0.00    0.04    0.00    0.04    2  kslowd000

01:32:46 PM      142    0.00    0.04    0.00    0.04    2  kslowd001

01:32:46 PM      260    0.00    0.00    0.00    0.00    1  scsi_eh_0

01:32:46 PM      385    0.00    0.00    0.00    0.00    3  kjournald

01:32:46 PM      454    0.00    0.00    0.00    0.00    1  udevd

01:32:46 PM      786    0.00    0.00    0.00    0.00    1  kdmflush

01:32:46 PM      788    0.00    0.00    0.00    0.00    1  kdmflush

01:32:46 PM      829    0.00    0.00    0.00    0.00    3  jbd2/dm-0-8

01:32:46 PM      834    0.00    0.01    0.00    0.01    3  jbd2/dm-1-8

01:32:46 PM      989    0.00    0.02    0.00    0.02    1  flush-253:0

01:32:46 PM      990    0.00    0.00    0.00    0.00    2  flush-253:1

01:32:46 PM      1191    0.00    0.00    0.00    0.00    3  dhclient

01:32:46 PM      1251    0.07    0.07    0.00    0.14    3  rsyslogd

01:32:46 PM      1513    0.00    0.00    0.00    0.00    1  sw-cp-serverd

01:32:46 PM      1523    0.00    0.00    0.00    0.00    1  sshd

01:32:46 PM      1661    0.01    0.04    0.00    0.06    1  master

01:32:46 PM      1744    0.00    0.00    0.00    0.00    0  mysqld_safe

01:32:46 PM      1930    2.12    0.92    0.00    3.04    1  mysqld

01:32:46 PM      2033    0.00    0.00    0.00    0.00    1  named

01:32:46 PM      2157    0.02    0.00    0.00    0.02    0  sw-engine

01:32:46 PM      2165    0.35    0.77    0.00    1.12    2  sw-collectd

01:32:46 PM      2179    0.00    0.00    0.00    0.00    2  crond

01:32:46 PM    21282    0.01    0.00    0.00    0.01    3  spamd

01:32:46 PM    21284    0.00    0.00    0.00    0.00    0  spamd

01:32:46 PM    21285    0.00    0.00    0.00    0.00    1  spamd

01:32:46 PM    27789    0.00    0.00    0.00    0.00    2  httpd

01:32:46 PM    27791    0.00    0.00    0.00    0.00    2  httpd

01:32:46 PM    27792    0.09    0.01    0.00    0.11    3  httpd

01:32:46 PM    27793    0.09    0.01    0.00    0.10    3  httpd

01:32:46 PM    27794    0.11    0.02    0.00    0.13    3  httpd

01:32:46 PM    27795    0.10    0.01    0.00    0.11    3  httpd

01:32:46 PM    27796    0.09    0.01    0.00    0.11    3  httpd

01:32:46 PM    27797    0.09    0.01    0.00    0.11    3  httpd

01:32:46 PM    27798    0.09    0.01    0.00    0.10    3  httpd

01:32:46 PM    27799    0.09    0.01    0.00    0.10    3  httpd

01:32:46 PM    27881    0.08    0.01    0.00    0.10    3  httpd

01:32:46 PM    27882    0.09    0.01    0.00    0.10    1  httpd

01:32:46 PM    27883    0.09    0.01    0.00    0.11    3  httpd

01:32:46 PM    29064    0.06    0.01    0.00    0.07    3  httpd

01:32:46 PM    29067    0.06    0.01    0.00    0.07    3  httpd

01:32:46 PM    29068    0.06    0.01    0.00    0.07    3  httpd

01:32:46 PM    29636    0.00    0.02    0.00    0.02    0  psa-pc-remote

01:32:46 PM    29660    0.00    0.00    0.00    0.00    0  qmgr

01:32:46 PM    31330    0.35    0.26    0.00    0.61    0  pickup

01:32:46 PM    32006    0.00    0.00    0.00    0.00    2  sshd

01:32:46 PM    32008    0.00    0.00    0.00    0.00    3  bash

01:32:46 PM    32115    0.00    0.00    0.00    0.00    2  trivial-rewrite

01:32:46 PM    32122    0.00    0.00    0.00    0.00    2  top

01:32:46 PM    32198    0.00    0.00    0.00    0.00    1  cleanup

Code:

Memory Free and Used



01:20:01 PM kbmemfree kbmemused  %memused kbbuffers  kbcached  kbcommit  %commit

01:30:01 PM  2298940  5727872    71.36    247480  3640216  1721660    17.24

01:40:01 PM  2289980  5736832    71.47    248004  3645128  1722060    17.24

01:50:01 PM  2281724  5745088    71.57    248632  3649240  1716580    17.19

02:00:01 PM  2274872  5751940    71.66    249264  3653932  1711628    17.14

02:10:01 PM  2267532  5759280    71.75    249780  3658232  1707208    17.09

02:20:01 PM  2263972  5762840    71.79    250256  3660640  1701896    17.04

02:30:01 PM  2253244  5773568    71.93    250680  3665444  1701872    17.04

02:40:01 PM  2239048  5787764    72.11    251160  3669876  1707016    17.09

02:50:01 PM  2225956  5800856    72.27    251732  3674888  1711676    17.14

03:00:01 PM  2184896  5841916    72.78    252324  3691716  1716676    17.19

03:10:01 PM  2176232  5850580    72.89    252844  3696176  1715116    17.17

03:20:01 PM  2171704  5855108    72.94    253584  3700304  1708868    17.11

03:30:01 PM  2152756  5874056    73.18    254192  3705232  1717716    17.20

Average:      2236989  5789823    72.13    250764  3670079  1712306    17.15

Code:

[root@u-------0 /]# dmesg | less

Initializing cgroup subsys cpuset

Initializing cgroup subsys cpu

Linux version 2.6.32-220.13.1.el6.x86_64 (mockbuild@c6b6.bsys.dev.centos.org) (gcc version 4.4.6 20110731 (Red Hat 4.4.6-3) (GCC) ) #1 SMP Tue Apr 17 23:56:34 BST 2012

Command line: ro root=/dev/sda1 console=tty0 console=ttyS0,57600 crashkernel=auto

KERNEL supported cpus:

  Intel GenuineIntel

  AMD AuthenticAMD

  Centaur CentaurHauls

BIOS-provided physical RAM map:

 BIOS-e820: 0000000000000000 - 000000000009e000 (usable)

 BIOS-e820: 000000000009e000 - 00000000000a0000 (reserved)

 BIOS-e820: 00000000000d0000 - 0000000000100000 (reserved)

 BIOS-e820: 0000000000100000 - 00000000d5f50000 (usable)

 BIOS-e820: 00000000d5f50000 - 00000000d5f61000 (ACPI data)

 BIOS-e820: 00000000d5f61000 - 00000000d5f62000 (ACPI NVS)

 BIOS-e820: 00000000d5f62000 - 00000000d8000000 (reserved)

 BIOS-e820: 00000000f8000000 - 00000000fc000000 (reserved)

 BIOS-e820: 00000000fec00000 - 00000000fec10000 (reserved)

 BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)

 BIOS-e820: 00000000fff00000 - 0000000100000000 (reserved)

 BIOS-e820: 0000000100000000 - 0000000228000000 (usable)

DMI present.

SMBIOS version 2.5 @ 0xF7530

DMI: FUJITSU SIEMENS D2721-H1                      /D2721-H1, BIOS 6.00 R1.05.2721.H1              06/26/2008

Phoenix BIOS detected: BIOS may corrupt low RAM, working around it.

e820 update range: 0000000000000000 - 0000000000010000 (usable) ==> (reserved)

e820 update range: 0000000000000000 - 0000000000001000 (usable) ==> (reserved)

e820 remove range: 00000000000a0000 - 0000000000100000 (usable)

last_pfn = 0x228000 max_arch_pfn = 0x400000000

MTRR default type: uncachable

MTRR fixed ranges enabled:

  00000-9FFFF write-back

  A0000-BFFFF uncachable

  C0000-CFFFF write-protect

  D0000-E3FFF uncachable

  E4000-FFFFF write-protect

MTRR variable ranges enabled:

  0 base 000000000000 mask FFFF80000000 write-back

  1 base 000080000000 mask FFFFC0000000 write-back

  2 base 0000C0000000 mask FFFFF0000000 write-back

  3 base 0000D0000000 mask FFFFFC000000 write-back

  4 base 0000D4000000 mask FFFFFE000000 write-back

  5 base 0000D6000000 mask FFFFFF000000 write-back

  6 disabled

  7 disabled

TOM2: 0000000228000000 aka 8832M

x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106

e820 update range: 00000000d7000000 - 0000000100000000 (usable) ==> (reserved)

last_pfn = 0xd5f50 max_arch_pfn = 0x400000000

initial memory mapped : 0 - 20000000

Using GB pages for direct mapping

init_memory_mapping: 0000000000000000-00000000d5f50000

 0000000000 - 00c0000000 page 1G

 00c0000000 - 00d5e00000 page 2M

 00d5e00000 - 00d5f50000 page 4k

kernel direct mapping tables up to d5f50000 @ 10000-13000

init_memory_mapping: 0000000100000000-0000000228000000

 0100000000 - 0200000000 page 1G

 0200000000 - 0228000000 page 2M

kernel direct mapping tables up to 228000000 @ 12000-14000

RAMDISK: 371ab000 - 37feff73

:

Code:

[root@u----0 /]# dmesg | grep -i memory

initial memory mapped : 0 - 20000000

init_memory_mapping: 0000000000000000-00000000d5f50000

init_memory_mapping: 0000000100000000-0000000228000000

Reserving 129MB of memory at 48MB for crashkernel (System RAM: 8832MB)

PM: Registered nosave memory: 000000000009e000 - 00000000000a0000

PM: Registered nosave memory: 00000000000a0000 - 00000000000d0000

PM: Registered nosave memory: 00000000000d0000 - 0000000000100000

PM: Registered nosave memory: 00000000d5f50000 - 00000000d5f61000

PM: Registered nosave memory: 00000000d5f61000 - 00000000d5f62000

PM: Registered nosave memory: 00000000d5f62000 - 00000000d8000000

PM: Registered nosave memory: 00000000d8000000 - 00000000f8000000

PM: Registered nosave memory: 00000000f8000000 - 00000000fc000000

PM: Registered nosave memory: 00000000fc000000 - 00000000fec00000

PM: Registered nosave memory: 00000000fec00000 - 00000000fec10000

PM: Registered nosave memory: 00000000fec10000 - 00000000fee00000

PM: Registered nosave memory: 00000000fee00000 - 00000000fee01000

PM: Registered nosave memory: 00000000fee01000 - 00000000fff00000

PM: Registered nosave memory: 00000000fff00000 - 0000000100000000

Your BIOS doesn't leave a aperture memory hole

PM: Registered nosave memory: 0000000020000000 - 0000000024000000

Memory: 8008160k/9043968k available (5085k kernel code, 689288k absent, 346520k reserved, 7228k data, 1244k init)

please try 'cgroup_disable=memory' option if you don't want memory cgroups

Initializing cgroup subsys memory

Freeing initrd memory: 14611k freed

Non-volatile memory driver v1.3

crash memory driver: version 1.1

Freeing unused kernel memory: 1244k freed

Freeing unused kernel memory: 1040k freed

Freeing unused kernel memory: 1756k freed

[drm] nouveau 0000:03:00.0: 0: memory 0MHz core 500MHz shader 1200MHz voltage 1100mV fanspeed 100%

[drm] nouveau 0000:03:00.0: c: memory 0MHz shader 1600MHz

[TTM] Zone  kernel: Available graphics memory: 4013406 kiB.

[TTM] Zone  dma32: Available graphics memory: 2097152 kiB.

[drm] nouveau 0000:03:00.0: Stolen system memory at: 0x00d6000000

Quote:

Please use [code][code] around commands and screenshots that you're sharing. This option is available in menu when you Reply to Thread.

Since both physical as well as swap is enough to run https process successfully (as you can see most of your cpu is idle and https process is taking only few hundreds mb of memory). However to me, it sounds some pb with BIOS or might be some H/W failt.

If you noticed in above output of dmesg:-

Code:

~# dmesg | less

.......

.......

SMBIOS version 2.5 @ 0xF7530

 DMI: FUJITSU SIEMENS D2721-H1 /D2721-H1, BIOS 6.00 R1.05.2721.H1 06/26/2008

 Phoenix BIOS detected: BIOS may corrupt low RAM, working around it.

....

.......

And one more thing to check, that's /var/log/httpd. What does it say?

Thank you very much for your help, greatly appreciated. Here are some excerpts from the error log in var/log/httpd; are there other logs from which you'd like me to post excerpts?

A sampling from the January 13 error log:

Code:

PHP Warning:  Directive 'safe_mode' is deprecated in PHP 5.3 and greater in Unknown on line 0

[Sun Jan 13 03:20:04 2013] [error] python_init: Python version mismatch, expected '2.6.5', found '2.6.6'.

[Sun Jan 13 03:20:04 2013] [error] python_init: Python executable found '/usr/bin/python'.

[Sun Jan 13 03:20:04 2013] [error] python_init: Python path being used '/usr/lib64/python26.zip:/usr/lib64/python2.6/:/usr/lib64/python2.6/plat-linux2:/usr/lib64/python2.6/lib-tk:/usr/lib64/python2.6/lib-old:/usr/lib64/python2.6/lib-dynload'.

[Sun Jan 13 03:20:04 2013] [notice] mod_python: Creating 4 session mutexes based on 256 max processes and 0 max threads.

[Sun Jan 13 03:20:04 2013] [notice] mod_python: using mutex_directory /tmp 

[Sun Jan 13 03:20:04 2013] [warn] RSA server certificate CommonName (CN) `Parallels Panel' does NOT match server name!?

[Sun Jan 13 03:20:04 2013] [warn] RSA server certificate CommonName (CN) `Parallels Panel' does NOT match server name!?

[Sun Jan 13 03:20:04 2013] [warn] Init: Name-based SSL virtual hosts only work for clients with TLS server name indication support (RFC 4366)

A sampling from the January 19 error log (thousands of these sorts of messages):

Code:

[Sat Jan 19 11:10:40 2013] [error] [client 220.181.89.128] File does not exist: /var/www/vhosts/default/htdocs/robots.txt

[Sat Jan 19 11:13:01 2013] [error] [client 66.249.73.55] File does not exist: /var/www/vhosts/default/htdocs/artist

[Sat Jan 19 11:13:20 2013] [error] [client 66.249.73.17] File does not exist: /var/www/vhosts/default/htdocs/romantic

[Sat Jan 19 11:13:20 2013] [error] [client 66.249.73.134] File does not exist: /var/www/vhosts/default/htdocs/files

[Sat Jan 19 11:13:42 2013] [error] [client 66.249.73.215] File does not exist: /var/www/vhosts/default/htdocs/robots.txt

[Sat Jan 19 11:13:42 2013] [error] [client 66.249.73.200] File does not exist: /var/www/vhosts/default/htdocs/tags

Another sampling from the January 19 error log (thousands of these sorts of messages):

Code:

PHP Warning:  Directive 'safe_mode' is deprecated in PHP 5.3 and greater in Unknown on line 0

[Sat Jan 19 01:17:54 2013] [error] python_init: Python version mismatch, expected '2.6.5', found '2.6.6'.

[Sat Jan 19 01:17:54 2013] [error] python_init: Python executable found '/usr/bin/python'.

[Sat Jan 19 01:17:54 2013] [error] python_init: Python path being used '/usr/lib64/python26.zip:/usr/lib64/python2.6/:/usr/lib64/python2.6/plat-linux2:/usr/lib64/python2.6/lib-tk:/usr/lib64/python2.6/lib-old:/usr/lib64/python2.6/lib-dynload'.

[Sat Jan 19 01:17:54 2013] [notice] mod_python: Creating 4 session mutexes based on 256 max processes and 0 max threads.

[Sat Jan 19 01:17:54 2013] [notice] mod_python: using mutex_directory /tmp 

[Sat Jan 19 01:17:54 2013] [warn] RSA server certificate CommonName (CN) `Parallels Panel' does NOT match server name!?

[Sat Jan 19 01:17:54 2013] [warn] RSA server certificate CommonName (CN) `Parallels Panel' does NOT match server name!?

[Sat Jan 19 01:17:54 2013] [warn] Init: Name-based SSL virtual hosts only work for clients with TLS server name indication support (RFC 4366)

[Sat Jan 19 01:17:54 2013] [notice] Apache/2.2.15 (Unix) DAV/2 mod_fcgid/2.3.6 mod_python/3.3.1 Python/2.6.6 mod_ssl/2.2.15 OpenSSL/1.0.0-fips mod_perl/2.0.4 Perl/v5.10.1 configured -- resuming normal operations

[Sat Jan 19 01:17:55 2013] [notice] caught SIGTERM, shutting down

[Sat Jan 19 01:17:56 2013] [notice] suEXEC mechanism enabled (wrapper: /usr/sbin/suexec)

[Sat Jan 19 01:17:56 2013] [warn] RSA server certificate CommonName (CN) `Parallels Panel' does NOT match server name!?

[Sat Jan 19 01:17:56 2013] [warn] RSA server certificate CommonName (CN) `Parallels Panel' does NOT match server name!?

[Sat Jan 19 01:17:56 2013] [warn] Init: Name-based SSL virtual hosts only work for clients with TLS server name indication support (RFC 4366)

NOTE: the SIGTERM issue was addressed in this thread:

Code:

http://www.linuxquestions.org/questions/linux-server-73/apache-error-log-caught-sigterm-shutting-down-876010/

A sampling from the January 20 error log (thousands of these sorts of messages):

Code:

[Sun Jan 20 03:48:54 2013] [error] [client 208.115.111.73] File does not exist: /var/www/vhosts/default/htdocs/robots.txt

[Sun Jan 20 03:49:06 2013] [error] [client 66.249.73.37] File does not exist: /var/www/vhosts/default/htdocs/artist

[Sun Jan 20 03:49:06 2013] [error] [client 66.249.73.56] File does not exist: /var/www/vhosts/default/htdocs/artist

[Sun Jan 20 03:49:10 2013] [error] [client 66.249.73.53] File does not exist: /var/www/vhosts/default/htdocs/artist

[Sun Jan 20 03:49:18 2013] [error] [client 66.249.73.47] File does not exist: /var/www/vhosts/default/htdocs/artist

Not sure if this will help, but I took this screenshot this morning (attached). The empty spaces are the duration of the crash. If you look carefully you will notice a drop from a high of 2.4g to 2.2g.

Yesterday, I disabled in my Plesk control panel several domain forwards that I had set up (i.e., forward traffic from domain1.com to domain2.com).

That made an immediate impact on memory usage, but during the course of the day that use climbed again.

I also reduced in my.conf the number of max visitors from 500 down to 200.

If the BIOS has an issue, could this represent the fact that I have more or less permanently lost 1 gig or so of memory?

Also, the server has not crashed as of this morning since I made the changes above.

Thanks again for any help / insights you can provide.

Hello,

As per logs and all above stuff, I can conclude it's possibly a problem with BIOS chip, not just physically, but with it's configuration/setup. I have googled the error also and almost every discussion points to problem with BIOS. But I am not sure, to be honest, where exactly the problem is and how to solve it. However further searching and will get back if found something helpful.

Quote:

Originally Posted by shivaa (Post 4888411)

Thank you very much for your help. I have emailed tech support with the BIOs issue and will let you know what I hear back.

Response back from the tech department:

"This message indicates that the BIOS uses a range of memory addresses that is generally used by the Linux kernel, and that the kernel recognizes this and will not attempt to use that memory range. This will not cause any errors in the operation or your server."

1. what was the last thing you changed before the crashes started

2. can you show your httpd.conf and the .htaccess files (contents)
NB: best practice is to not use .htaccess files, but put the directives inside the relevant httpd.conf Dir section.
Better performance (httpd.conf is cached by Apache, .htaccess isn't). Also security; htaccess is in Document root; possibly hackable.

3. show OS type

Code:

cat /etc/*release*



uname -a

4. re crashes; have you looked at /var/log/messages ?