Linux - ServerThis forum is for the discussion of Linux Software used in a server related context.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Server is crashed due to high CPU load and somehow its up now. How to analyse the cause of the failure.
I checked /var/log/syslog and dmesg and messages but nothing found.
usually it is not a reason to crash (high cpu load).
Would be nice to know what was running, probably you can find something in the logs of those apps.
High CPU load isn't supposed to cause a crash on linux. Are you running M$ windows??
Give us real details not hardware, software, distro, ram & cache and what the load was caused by. Are you an experienced sysadmin? Is your box online? Secured? Patches applied?
Random resets are also software and malware related. Kernel panics are reported on screen, but not logged iirc; ram errors I don't know about (= segmentation faults for historical reasons) usually shut a process.
Maybe it's an unpredictable effect of the new Spectre/Meltdown patches?
We are not running MS windows. But VM was installed through Vcenter. Unfortunately not sure what caused the crash because I am investigating. I am not an experienced system admin. Patches are not applied. its still old version. I also observed some kernel errors in /var/log/messages. Really not sure whats the reason behind the crash.
this probably means samba has made an out of memory problem and probably that caused that crash (although I'm not 100% sure).
You might need to check your samba related setting (and probably the version of your samba packages??)
Samba version is 2.4 but it never happened earlier. So is this not a kernel issue?
Found this in samba log but i think this is after crash.
smbd/process.c:smbd_process(2068)
receive_message_or_smb failed: NT_STATUS_END_OF_FILE, exiting
Yes, it is not a kernel issue. I see no evidence of a "crash" - as in kernel oops.
You have something consuming all your memory - not necessarily smbd, it may just be a victim. But whatever it is, it is bad enough to be impacting the system. The high CPU is probably memory-management trying to locate free-able page frames. Once the oom-killer gets enough memory back, the system will appear to come back to life.
Till is all happens again.
You need to check your monitoring history data to see what was happening over time - it may give some hints depending on what is being recorded.
Not really - the data are not exposed or retained by default. Most servers probably have somethig like sysstat, but it tends not to be useful for historical analysis of (particularly) process metrics.
Maybe look at something like collectl - there will be a learning curve.
Another "quick look" option would be to use "top" and add the "swap" column - at least you can see who is using swap at that time. Maybe set up your own monitor - run it in batch mode with a delay of something like 10-20 minutes and write it to a file for later analysis.
Thanks for the information.How could i keep logging using collectl as I could not find the option. Any idea on How long the logging is stored?
I found another logging tool as "atop"? it directly writes to /var/log and logging is kept for 28 days.
Hello all,
I am getting below error when I am trying to execute atop command. Can anyone help me with this.
error while loading shared libraries: libncurses.so.6: cannot open shared object file: No such file or directory
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.