LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices


Reply
  Search this Thread
Old 05-18-2018, 07:19 AM   #1
1s440
LQ Newbie
 
Registered: Mar 2018
Posts: 18

Rep: Reputation: Disabled
Root cause analysis for server crash


Hi all,

Server is crashed due to high CPU load and somehow its up now. How to analyse the cause of the failure.
I checked /var/log/syslog and dmesg and messages but nothing found.

Can anyone suggest me
 
Old 05-18-2018, 07:28 AM   #2
pan64
LQ Guru
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 10,672

Rep: Reputation: 3175Reputation: 3175Reputation: 3175Reputation: 3175Reputation: 3175Reputation: 3175Reputation: 3175Reputation: 3175Reputation: 3175Reputation: 3175Reputation: 3175
usually it is not a reason to crash (high cpu load).
Would be nice to know what was running, probably you can find something in the logs of those apps.
 
Old 05-18-2018, 07:38 AM   #3
1s440
LQ Newbie
 
Registered: Mar 2018
Posts: 18

Original Poster
Rep: Reputation: Disabled
Some of the messages are like this.
kernel: [694662.597621] free:1994 slab:2523
mapped:5 pagetables:0 bounce:0
jupiter kernel: [694662.597636] DMA free:4064kB
min:60kB low:72kB high:88kB active:3964kB inactive:3684kB
present:16160kB pages_scanned:13851 all_unreclaimable? yes
jupiter kernel: [694662.597643] lowmem_reserve[]: 0
1002 1002 1002
jupiter kernel: [694662.597650] DMA32 free:3912kB
min:4016kB low:5020kB high:6024kB active:715948kB inactive:248440kB
present:1026160kB pages_scanned:1744182 all_unreclaimable? yes
[694662.597658] lowmem_reserve[]: 0 0 0 0
: [694662.597664] DMA: 494*4kB 11*8kB
1*16kB 2*32kB 0*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB
0*4096kB = 4064kB
jupiter kernel: [694662.597679] DMA32: 32*4kB 1*8kB
10*16kB 7*32kB 1*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB
0*4096kB = 3912kB
jupiter kernel: [694662.597699] 35 total pagecache pages
jupiter kernel: [694662.597703] Swap cache: add
314292, delete 314292, find 4998880/5004846
jupiter kernel: [694662.597714] Free swap = 0kB
jupiter kernel: [694662.597724] Total swap = 1048568kB
jupiter kernel: [694662.603505] 264192 pages of RAM
jupiter kernel: [694662.603526] 6120 reserved pages
jupiter kernel: [694662.603529] 4276 pages shared
jupiter kernel: [694662.603532] 0 pages swap cached
jupiter kernel: [694662.603537] Out of memory: kill
process 1290 (slapd) score 24211 or a child
jupiter kernel: [694662.603561] Killed process 1290 (slapd)
jupiter kernel: [694672.737297] smbd invoked
oom-killer: gfp_mask=0x1201d2, order=0, oomkilladj=0
jupiter kernel: [694672.737315] Pid: 18717, comm: smbd
 
Old 05-18-2018, 07:47 AM   #4
business_kid
LQ Guru
 
Registered: Jan 2006
Location: Ireland
Distribution: Slackware & Android
Posts: 8,867

Rep: Reputation: 913Reputation: 913Reputation: 913Reputation: 913Reputation: 913Reputation: 913Reputation: 913Reputation: 913
High CPU load isn't supposed to cause a crash on linux. Are you running M$ windows??
Give us real details not hardware, software, distro, ram & cache and what the load was caused by. Are you an experienced sysadmin? Is your box online? Secured? Patches applied?

Random resets are also software and malware related. Kernel panics are reported on screen, but not logged iirc; ram errors I don't know about (= segmentation faults for historical reasons) usually shut a process.

Maybe it's an unpredictable effect of the new Spectre/Meltdown patches?
 
Old 05-18-2018, 08:20 AM   #5
1s440
LQ Newbie
 
Registered: Mar 2018
Posts: 18

Original Poster
Rep: Reputation: Disabled
We are not running MS windows. But VM was installed through Vcenter. Unfortunately not sure what caused the crash because I am investigating. I am not an experienced system admin. Patches are not applied. its still old version of debian 6. I also observed some kernel errors in /var/log/messages. Really not sure whats the reason behind the crash.
 
Old 05-18-2018, 08:34 AM   #6
pan64
LQ Guru
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 10,672

Rep: Reputation: 3175Reputation: 3175Reputation: 3175Reputation: 3175Reputation: 3175Reputation: 3175Reputation: 3175Reputation: 3175Reputation: 3175Reputation: 3175Reputation: 3175
please post those "some kernel error messages"....
 
Old 05-18-2018, 08:59 AM   #7
1s440
LQ Newbie
 
Registered: Mar 2018
Posts: 18

Original Poster
Rep: Reputation: Disabled
May 13 10:03:07 jupiter kernel: [62385504.845584] CPU 1: hi: 186, btch: 31 usd: 172
May 13 10:03:07 jupiter kernel: [62385504.845586] Active:308091 inactive:188791 dirty:0 writeback:0 unstable:0
May 13 10:03:07 jupiter kernel: [62385504.845587] free:12215 slab:2820 mapped:4 pagetables:1595 bounce:0
May 13 10:03:07 jupiter kernel: [62385504.845589] DMA free:8124kB min:68kB low:84kB high:100kB active:2912kB inactive:1776kB present:16256kB pages_scanned:9101 all_unreclaimable? yes
May 13 10:03:07 jupiter kernel: [62385504.845590] lowmem_reserve[]: 0 873 2016 2016
May 13 10:03:07 jupiter kernel: [62385504.845594] Normal free:40272kB min:3744kB low:4680kB high:5616kB active:58636kB inactive:752620kB present:894080kB pages_scanned:1429067 all_unreclaimable? yes
May 13 10:03:07 jupiter kernel: [62385504.845595] lowmem_reserve[]: 0 0 9144 9144
May 13 10:03:07 jupiter kernel: [62385504.845598] HighMem free:464kB min:512kB low:1736kB high:2964kB active:1170816kB inactive:768kB present:1170432kB pages_scanned:3528704 all_unreclaimable? yes
May 13 10:03:07 jupiter kernel: [62385504.845599] lowmem_reserve[]: 0 0 0 0
May 13 10:03:07 jupiter kernel: [62385504.845603] DMA: 31*4kB 30*8kB 27*16kB 19*32kB 13*64kB 8*128kB 5*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 8124kB
May 13 10:03:07 jupiter kernel: [62385504.845608] Normal: 9166*4kB 12*8kB 8*16kB 2*32kB 1*64kB 1*128kB 0*256kB 0*512kB 1*1024kB 1*2048kB 0*4096kB = 40216kB
May 13 10:03:07 jupiter kernel: [62385504.845613] HighMem: 32*4kB 0*8kB 1*16kB 0*32kB 1*64kB 0*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 464kB
May 13 10:03:07 jupiter kernel: [62385504.845617] 73 total pagecache pages
May 13 10:03:07 jupiter kernel: [62385504.845619] Swap cache: add 7404283, delete 7404282, find 1147266373/1147714023
May 13 10:03:07 jupiter kernel: [62385504.845620] Free swap = 0kB
May 13 10:03:07 jupiter kernel: [62385504.845621] Total swap = 979956kB
May 13 10:03:07 jupiter kernel: [62385504.848256] 524288 pages of RAM
May 13 10:03:07 jupiter kernel: [62385504.848257] 294912 pages of HIGHMEM
May 13 10:03:07 jupiter kernel: [62385504.848258] 5254 reserved pages
May 13 10:03:07 jupiter kernel: [62385504.848259] 8161 pages shared
May 13 10:03:07 jupiter kernel: [62385504.848260] 1 pages swap cached
May 13 10:03:07 jupiter kernel: [62385504.848261] 0 pages dirty
May 13 10:03:07 jupiter kernel: [62385504.848261] 0 pages writeback
May 13 10:03:07 jupiter kernel: [62385504.848262] 4 pages mapped
May 13 10:03:07 jupiter kernel: [62385504.848263] 2820 pages slab
May 13 10:03:07 jupiter kernel: [62385504.848264] 1595 pages pagetables
May 13 14:04:53 jupiter kernel: [62385586.555477] smbd invoked oom-killer: gfp_mask=0x1200d2, order=0, oomkilladj=0
May 13 14:04:53 jupiter kernel: [62385586.555481] Pid: 28410, comm: smbd Not tainted 2.6.26-2-686 #1
May 13 14:04:53 jupiter kernel: [62385586.555504] [<c015919e>] oom_kill_process+0x4f/0x195
May 13 14:04:53 jupiter kernel: [62385586.555515] [<c01595c8>] out_of_memory+0x14e/0x17f
May 13 14:04:53 jupiter kernel: [62385586.555520] [<c015b530>] __alloc_pages_internal+0x2b8/0x34e
May 13 14:04:53 jupiter kernel: [62385586.555524] [<c015b5d2>] __alloc_pages+0x7/0x9
May 13 14:04:53 jupiter kernel: [62385586.555526] [<c0156da9>] __grab_cache_page+0x2b/0x5b
May 13 14:04:53 jupiter kernel: [62385586.555530] [<f896f6be>] ext3_write_begin+0x51/0x16d [ext3]
May 13 14:04:53 jupiter kernel: [62385586.555544] [<f88e276b>] do_get_write_access+0x2f8/0x331 [jbd]
May 13 14:04:53 jupiter kernel: [62385586.555553] [<c0157728>] generic_file_buffered_write+0xef/0x553
May 13 14:04:53 jupiter kernel: [62385586.555560] [<f88e246a>] journal_stop+0x148/0x151 [jbd]
May 13 14:04:53 jupiter kernel: [62385586.555566] [<c0157ff4>] __generic_file_aio_write_nolock+0x468/0x4cb
May 13 14:04:53 jupiter kernel: [62385586.555571] [<c0156be9>] find_lock_page+0x19/0x7c
May 13 14:04:53 jupiter kernel: [62385586.555576] [<c01580a9>] generic_file_aio_write+0x52/0xa9
May 13 14:04:53 jupiter kernel: [62385586.555580] [<f896bf99>] ext3_file_write+0x19/0x83 [ext3]
May 13 14:04:53 jupiter kernel: [62385586.555587] [<c0174506>] do_sync_write+0xbf/0x100
May 13 14:04:53 jupiter kernel: [62385586.555597] [<c0131a20>] autoremove_wake_function+0x0/0x2d
May 13 14:04:53 jupiter kernel: [62385586.555603] [<c01776f9>] sys_stat64+0x1e/0x23
May 13 14:04:53 jupiter kernel: [62385586.555607] [<c01bac85>] security_file_permission+0xc/0xd
May 13 14:04:53 jupiter kernel: [62385586.555614] [<c0174447>] do_sync_write+0x0/0x100
May 13 14:04:53 jupiter kernel: [62385586.555616] [<c0174c78>] vfs_write+0x83/0x120
May 13 14:04:53 jupiter kernel: [62385586.555619] [<c017524a>] sys_write+0x3c/0x63
May 13 14:04:53 jupiter kernel: [62385586.555622] [<c0103857>] sysenter_past_esp+0x78/0xb1
May 13 14:04:53 jupiter kernel: [62385586.555628] [<c02b0000>] quirk_ali7101_acpi+0x27/0x63
May 13 14:04:53 jupiter kernel: [62385586.555633] =======================
May 13 14:04:53 jupiter kernel: [62385586.555635] Mem-info:
May 13 14:04:53 jupiter kernel: [62385586.555635] DMA per-cpu:
 
Old 05-18-2018, 12:50 PM   #8
pan64
LQ Guru
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 10,672

Rep: Reputation: 3175Reputation: 3175Reputation: 3175Reputation: 3175Reputation: 3175Reputation: 3175Reputation: 3175Reputation: 3175Reputation: 3175Reputation: 3175Reputation: 3175
Quote:
smbd invoked oom-killer
this probably means samba has made an out of memory problem and probably that caused that crash (although I'm not 100% sure).
You might need to check your samba related setting (and probably the version of your samba packages??)
 
Old Today, 03:38 AM   #9
1s440
LQ Newbie
 
Registered: Mar 2018
Posts: 18

Original Poster
Rep: Reputation: Disabled
Samba version is 2.4 but it never happened earlier. So is this not a kernel issue?

Found this in samba log but i think this is after crash.
smbd/process.c:smbd_process(2068)
receive_message_or_smb failed: NT_STATUS_END_OF_FILE, exiting

Last edited by 1s440; Today at 04:04 AM.
 
Old Today, 04:15 AM   #10
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 16,572

Rep: Reputation: 2423Reputation: 2423Reputation: 2423Reputation: 2423Reputation: 2423Reputation: 2423Reputation: 2423Reputation: 2423Reputation: 2423Reputation: 2423Reputation: 2423
Yes, it is not a kernel issue. I see no evidence of a "crash" - as in kernel oops.

You have something consuming all your memory - not necessarily smbd, it may just be a victim. But whatever it is, it is bad enough to be impacting the system. The high CPU is probably memory-management trying to locate free-able page frames. Once the oom-killer gets enough memory back, the system will appear to come back to life.
Till is all happens again.

You need to check your monitoring history data to see what was happening over time - it may give some hints depending on what is being recorded.
 
Old Today, 06:28 AM   #11
1s440
LQ Newbie
 
Registered: Mar 2018
Posts: 18

Original Poster
Rep: Reputation: Disabled
Is there any monitoring log that I could add to the server so that all the logs will be saved to it?
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Analysis a Linux Server that has been compromised. hack3rcon Linux - Security 3 01-18-2016 04:47 PM
[SOLVED] Root server crash: hunting down the cause of the crash kikinovak Slackware 15 01-29-2014 04:22 PM
performance analysis in redhat server 5 dnyanesh.3 Linux - Newbie 1 02-17-2009 03:31 AM
LXer: A quick overview of Linux kernel crash dump analysis LXer Syndicated Linux News 0 08-16-2007 04:11 AM
Server Log Analysis Tools DtC Linux - Software 3 04-22-2003 10:25 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Server

All times are GMT -5. The time now is 10:31 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration