LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 01-30-2023, 02:46 PM   #1
dragaotecomodo
LQ Newbie
 
Registered: Jan 2023
Posts: 5

Rep: Reputation: 0
Server went into the freezer, I believe it is a result of bad cgrop distribution, how can I prove it?


Good morning, I have a server that went into a freezer!

When looking for the reason I found errors related to cgroup and memory, how can I draw the conclusion that it crashed due to cgroup memory misallocation?

lab$ sar -f /var/log/sa/sa30


11:10:01 AM all 9.39 0.00 10.06 11.71 0.00 68.85
11:20:02 AM all 16.80 0.00 8.95 5.05 0.00 69.20
11:30:01 AM all 4.14 0.00 5.88 7.12 0.00 82.85
Average: all 5.89 0.00 5.73 5.70 0.00 82.68
12:28:37 PM LINUX RESTART
12:39:30 PM LINUX RESTART
12:40:01 PM CPU %user %nice %system %iowait %steal %idle
12:50:01 PM all 1.66 0.01 3.55 0.22 0.00 94.56
Average: all 1.66 0.01 3.55 0.22 0.00 94.56
12:57:58 PM LINUX RESTART
01:50:01 PM CPU %user %nice %system %iowait %steal %idle
02:00:01 PM all 1.56 0.00 1.83 0.07 0.00 96.54
02:10:01 PM all 0.72 0.00 1.01 0.06 0.00 98.22
02:20:01 PM all 2.11 0.00 1.16 0.07 0.00 96.66


lab log $ lspci | grep ERROR
7f:14.2 System peripheral: Intel Corporation Haswell-E Integrated Memory Controller 0 Channel 0 ERROR Registers (rev 02)
7f:14.3 System peripheral: Intel Corporation Haswell-E Integrated Memory Controller 0 Channel 1 ERROR Registers (rev 02)
7f:17.2 System peripheral: Intel Corporation Haswell-E Integrated Memory Controller 1 Channel 0 ERROR Registers (rev 02)
7f:17.3 System peripheral: Intel Corporation Haswell-E Integrated Memory Controller 1 Channel 1 ERROR Registers (rev 02)
ff:14.2 System peripheral: Intel Corporation Haswell-E Integrated Memory Controller 0 Channel 0 ERROR Registers (rev 02)
ff:14.3 System peripheral: Intel Corporation Haswell-E Integrated Memory Controller 0 Channel 1 ERROR Registers (rev 02)
ff:17.2 System peripheral: Intel Corporation Haswell-E Integrated Memory Controller 1 Channel 0 ERROR Registers (rev 02)
ff:17.3 System peripheral: Intel Corporation Haswell-E Integrated Memory Controller 1 Channel 1 ERROR Registers (rev 02)



lab$ ls -lha dmesg
-rw-r--r-- 1 root root 121K Jan 30 12:57 dmesg

lab$ cat dmesg |egrep -i "Memory|error|fail"

Reserving 145MB of memory at 48MB for crashkernel (System RAM: 264192MB)
PM: Registered nosave memory: 000000000009c000 - 00000000000a0000
PM: Registered nosave memory: 00000000000a0000 - 00000000000e0000
PM: Registered nosave memory: 00000000000e0000 - 0000000000100000
PM: Registered nosave memory: 000000007a289000 - 000000007af0b000
PM: Registered nosave memory: 000000007af0b000 - 000000007b93b000
PM: Registered nosave memory: 000000007b93b000 - 000000007bab4000
PM: Registered nosave memory: 000000007bae9000 - 000000007baff000
PM: Registered nosave memory: 000000007bb00000 - 0000000090000000
PM: Registered nosave memory: 0000000090000000 - 00000000feda8000
PM: Registered nosave memory: 00000000feda8000 - 00000000fedac000
PM: Registered nosave memory: 00000000fedac000 - 00000000ff310000
PM: Registered nosave memory: 00000000ff310000 - 0000000100000000
Memory: 264373124k/270532608k available (5325k kernel code, 2193048k absent, 3966436k reserved, 7013k data, 1276k init)
please try 'cgroup_disable=memory' option if you don't want memory cgroups
Initializing cgroup subsys memory
Freeing initrd memory: 16711k freed
ipmi_si ipmi_si.0: Could not enable interrupts, failed set, using polled mode.
ERST: Error Record Serialization Table (ERST) support is initialized.
Non-volatile memory driver v1.3
crash memory driver: version 1.1
Freeing unused kernel memory: 1276k freed
Freeing unused kernel memory: 800k freed
Freeing unused kernel memory: 1588k freed
megaraid_sas 0000:03:00.0: Controller type: MR,Memory size is: 1024MB
ACPI Error: No handler for Region [SYSI] (ffff884053edf2b8) [IPMI] (20090903/evregion-319)
ACPI Error: Region IPMI(7) has no handler (20090903/exfldio-295)
ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PMI0._GHL] (Node ffff8820538b41a0), AE_NOT_EXIST
ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PMI0._PMC] (Node ffff8820538b41f0), AE_NOT_EXIST
 
Old 01-31-2023, 07:14 AM   #2
dragaotecomodo
LQ Newbie
 
Registered: Jan 2023
Posts: 5

Original Poster
Rep: Reputation: 0
lab$ uname -a
Linux 2.6.32-431.el6.x86_64 #1 SMP Sun Nov 10 22:19:54 EST 2013 x86_64 x86_64 x86_64 GNU/Linux

lab$ head /proc/meminfo
MemTotal: 264393500 kB
MemFree: 228819792 kB
Buffers: 330072 kB
Cached: 22385284 kB
SwapCached: 284 kB
Active: 6716752 kB
Inactive: 21804160 kB
Active(anon): 3330700 kB
Inactive(anon): 2477788 kB
Active(file): 3386052 kB

lab$ free -m
total used free shared buffers cached
Mem: 258196 34797 223399 0 322 21925
-/+ buffers/cache: 12548 245647
Swap: 8191 0 8191
 
Old 01-31-2023, 07:41 AM   #3
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,993

Rep: Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337
I see an ACPI error, but probably that is irrelevant. Otherwise those grepped logs are more or less useless.
https://community.hpe.com/t5/prolian...4#.Y5bW0HbMKHQ
 
Old 01-31-2023, 09:12 AM   #4
dragaotecomodo
LQ Newbie
 
Registered: Jan 2023
Posts: 5

Original Poster
Rep: Reputation: 0
I did a grep on the message removing the audit and this came back to me

Jan 27 12:28:47 rsyslogd: imuxsock does not run because we could not aquire any socket
Jan 27 12:28:47 rsyslogd: [origin software="rsyslogd" swVersion="5.8.10" x-pid="3974" x-info="http://www.rsyslog.com"] start
Jan 27 12:28:47 rsyslogd-2066: could not load module '/lib64/rsyslog/imjournal.so', dlopen: /lib64/rsyslog/imjournal.so: cannot open shared object file: No such file or directory
Jan 27 12:28:47 rsyslogd: the last error occured in /etc/rsyslog.conf, line 13:"$ModLoad imjournal"
Jan 27 12:28:47 rsyslogd-3003: invalid or yet-unknown config file command - have you forgotten to load a module? [try http://www.rsyslog.com/e/3003 ]
Jan 27 12:28:47 rsyslogd: the last error occured in /etc/rsyslog.conf, line 14:"$IMJournalStateFile imjournal.state"
Jan 27 12:28:47 rsyslogd-2124: CONFIG ERROR: could not interpret master config file '/etc/rsyslog.conf'. [try http://www.rsyslog.com/e/2124 ]
Jan 27 12:37:07 rsyslogd: [origin software="rsyslogd" swVersion="5.8.10" x-pid="3974" x-info="http://www.rsyslog.com"] exiting on signal 15.
Jan 27 12:39:40 rsyslogd: imuxsock does not run because we could not aquire any socket
Jan 27 12:39:40 rsyslogd: [origin software="rsyslogd" swVersion="5.8.10" x-pid="3876" x-info="http://www.rsyslog.com"] start
Jan 27 12:39:40 rsyslogd-2066: could not load module '/lib64/rsyslog/imjournal.so', dlopen: /lib64/rsyslog/imjournal.so: cannot open shared object file: No such file or directory
Jan 27 12:39:40 rsyslogd: the last error occured in /etc/rsyslog.conf, line 13:"$ModLoad imjournal"
Jan 27 12:39:40 rsyslogd-3003: invalid or yet-unknown config file command - have you forgotten to load a module? [try http://www.rsyslog.com/e/3003 ]
Jan 27 12:39:40 rsyslogd: the last error occured in /etc/rsyslog.conf, line 14:"$IMJournalStateFile imjournal.state"
Jan 27 12:39:40 rsyslogd-2124: CONFIG ERROR: could not interpret master config file '/etc/rsyslog.conf'. [try http://www.rsyslog.com/e/2124 ]


-- reboot --

Jan 27 13:45:05 rsyslogd: imuxsock does not run because we could not aquire any socket
Jan 27 13:45:05 rsyslogd: [origin software="rsyslogd" swVersion="5.8.10" x-pid="4454" x-info="http://www.rsyslog.com"] start
Jan 27 13:45:05 rsyslogd-2066: could not load module '/lib64/rsyslog/imjournal.so', dlopen: /lib64/rsyslog/imjournal.so: cannot open shared object file: No such file or directory
Jan 27 13:45:05 rsyslogd: the last error occured in /etc/rsyslog.conf, line 13:"$ModLoad imjournal"
Jan 27 13:45:05 rsyslogd-3003: invalid or yet-unknown config file command - have you forgotten to load a module? [try http://www.rsyslog.com/e/3003 ]
Jan 27 13:45:05 rsyslogd: the last error occured in /etc/rsyslog.conf, line 14:"$IMJournalStateFile imjournal.state"
Jan 27 13:45:05 rsyslogd-2124: CONFIG ERROR: could not interpret master config file '/etc/rsyslog.conf'. [try http://www.rsyslog.com/e/2124 ]
 
Old 01-31-2023, 09:37 AM   #5
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,993

Rep: Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337
what OS is it exactly? You ought to fix that missing imjournal issue
 
Old 01-31-2023, 09:50 AM   #6
dragaotecomodo
LQ Newbie
 
Registered: Jan 2023
Posts: 5

Original Poster
Rep: Reputation: 0
Seria um

Red Hat Enterprise Linux Server release 6.5 (Santiago)
Red Hat Enterprise Linux Server release 6.5 (Santiago)
 
Old 01-31-2023, 09:50 AM   #7
dragaotecomodo
LQ Newbie
 
Registered: Jan 2023
Posts: 5

Original Poster
Rep: Reputation: 0
what does this problem mean?

imjournal
 
Old 01-31-2023, 10:20 AM   #8
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,993

Rep: Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337
Quote:
/lib64/rsyslog/imjournal.so: cannot open shared object file: No such file or directory
This is a problem, at least I think, it should be solved.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Who works in a freezer? jamison20000e General 24 09-05-2017 10:07 AM
howto prove it is bad hardware causing network issues to the OEM? lleb Linux - Hardware 10 09-11-2013 03:38 PM
Batteries in freezer last longer vargadanis General 6 03-09-2007 02:49 PM
Red Hat 9 K3b freezer on setup ??? David@330 Linux - Newbie 1 08-31-2004 09:16 PM
Can't believe how smooth install went - but one problem CodeWarrior Slackware 2 05-10-2003 12:27 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 11:49 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration