LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices


Reply
  Search this Thread
Old 04-25-2024, 05:06 PM   #31
TB0ne
LQ Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 26,637

Rep: Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965

Quote:
Originally Posted by lenainjaune View Post
There was a freeze on 23/04 but unfortunately someone reboot before a photo was taken. We just warned users to take a photo before restarting. At least as the problem is always there with the cloned system disk this demonstrates that problem is not relative to the disk.

Yes we will ! We are considering putting 2 systems in redundancy and when the first will be unresponsive the second will take the relay ... and the more important, we will managing our telephony ourself with our Asterisk system.

As we managed to make the screen always on (parameter consoleblank=0 on grub configuration), we noticed that a simulated crash with a kernel panic (we followed this method to achieve it), displays also on the login screen.

Is it sufficient to have information before the freeze ?

We also experimented to make a journalctl command running at boot (so before login) in modifying /etc/rc.local to run this detached command journalctl --follow & and in this case the screen is flooded continuously with no pause (however we discovered that Ctrl + s can stop it and permit access to another tty with Ctrl + Alt + F2 or other). This flood is strange because in logon the command is flood-less. We suppose that is not the right way to do it.
Bolded a piece for emphasis only...you have been told this several times now, and that is the ONLY information that can help diagnose this issue. Not sure of the thought process in your diagnostic methods, but after the system freezes...it's ALREADY FROZEN. Quite obviously nothing will get logged after that point. You don't tell us what version/distro of Linux you're using, but a 4.x kernel is pretty old. You can either look in /var/log for a file (messages, syslog, etc...usual suspects) and inspect those, or you can look at "journalctl --list-boots", for a list of the log files and look at an older one with "journalctl -b <whatever number>", which will have the info in it.

Have you checked to see if disk space is running out? You mention asterisk as a PBX...a full disk can also cause problems, especially with long/undeleted voicemails. Regardless, it sounds like you need to actually hire a consultant to come take care of this, based on what you're posting.
 
Old 04-26-2024, 10:15 AM   #32
lenainjaune
LQ Newbie
 
Registered: Apr 2024
Posts: 18

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by wpeckham View Post
Information from before the freeze, in particular JUST before the freeze so it is likely to capture the cause, is the ONLY information that might be seriously helpful. AT the freeze logging will stop and you will get no information, and AFTER the freeze is also after the reboot and the cause information may be gone for good.
Ah ! We supposed what was displayed when it freeze will indicates the cause ... So what is the aim of what is displayed ?

How to track the problem before the freeze ?

Are external monitoring tools like Nagios is the only solution, or can we do the same locally (change debug level or audit more deeply different targets since the traditional logs not achieved it)

Quote:
Originally Posted by wpeckham View Post
If I understand correctly:
1. if you move the drive ti a different identical machine that one does freeze.
That would eliminate the original machine hardware EXCEPT the drive.
2. IF cloned to a new drive, it will still freeze. That eliminates the drive itself.

If those are both true, we have eliminated all of the hardware and only a software issue can be left.

What has changed about the software or configuration in the few weeks just before this started?
Yes you have well understood. We supposed from a long time the problem is not about hardware. The only thing we are not really sure is about the RAM. Have we used the in place RAM or have we used the RAM of the computer we moved from ? We will testing it again to ensure we did not miss this step and to definitively eliminate the hardware cause.

The problem about freezing is here from years but until now, we let the problem as is, as it occurred about each 1 or 2 month (before we had a PABX which crashed really often so this discomfort turned out to be more acceptable).

But since a few months the problem became more frequent to reach one freeze by week.

---

We also tried to install netdata to monitor what happens in the system but there is a conflict to install it. For now we abandon this idea and we did not dig further to preserve the system to avoid a bigger problem.

Last edited by lenainjaune; 04-26-2024 at 10:23 AM.
 
Old 04-26-2024, 10:37 AM   #33
lenainjaune
LQ Newbie
 
Registered: Apr 2024
Posts: 18

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by TB0ne View Post
Bolded a piece for emphasis only...you have been told this several times now, and that is the ONLY information that can help diagnose this issue.
We do not understand what you say to us ! You talk about bold ... do you refer to the fact we bolded some pieces of code to make it more readable or it is about something else ?

Quote:
Originally Posted by TB0ne View Post
Not sure of the thought process in your diagnostic methods, but after the system freezes...it's ALREADY FROZEN. Quite obviously nothing will get logged after that point.
Ok ! As we said before, we believed that the display when there is a freeze is sufficient to determine the cause.

Quote:
Originally Posted by TB0ne View Post

You don't tell us what version/distro of Linux you're using, but a 4.x kernel is pretty old.
It is indicated on the first post OP : Debian 9 (kernel : 4.9.0-19-amd64)

Yes it is old but when we installed the image, it was the embedded kernel.

Quote:
Originally Posted by TB0ne View Post
You can either look in /var/log for a file (messages, syslog, etc...usual suspects) and inspect those, or you can look at "journalctl --list-boots", for a list of the log files and look at an older one with "journalctl -b <whatever number>", which will have the info in it.
Yes we ever looked the logs with journalctl --err as indicated in OP. As a result we disabled ACPI, UPS monitor and more recently the inventory's agent.

We do not understand why you suggest to look the boot logs ... as, if there is a freeze nothing will be logged in files.

Too, is journalctl centralize all logs, or it is necessary to explore one by one ?

Quote:
Originally Posted by TB0ne View Post

Have you checked to see if disk space is running out? You mention asterisk as a PBX...a full disk can also cause problems, especially with long/undeleted voicemails.
Code:
root@host:~# LANG=C df -h
Filesystem                     Size  Used Avail Use% Mounted on
udev                           1.7G     0  1.7G   0% /dev
tmpfs                          342M  916K  342M   1% /run
/dev/mapper/ipbx--vg-root      14G  6.1G  6.6G  49% /
tmpfs                          1.7G   12K  1.7G   1% /dev/shm
tmpfs                          5.0M     0  5.0M   0% /run/lock
tmpfs                          1.7G     0  1.7G   0% /sys/fs/cgroup
/dev/mapper/ipbx--vg-tmp       880M   36K  818M   1% /tmp
/dev/mapper/ipbx--vg-var       4.7G  3.6G  853M  82% /var
/dev/mapper/ipbx--vg-home       51G   36M   48G   1% /home
/dev/sda1                      234M   61M  161M  28% /boot
tmpfs                          342M     0  342M   0% /run/user/0
root@host:~# LANG=C df -hi
Filesystem                    Inodes IUsed IFree IUse% Mounted on
udev                            425K   404  425K    1% /dev
tmpfs                           428K   563  427K    1% /run
/dev/mapper/ipbx--vg-root       872K  513K  360K   59% /
tmpfs                           428K     4  428K    1% /dev/shm
tmpfs                           428K     4  428K    1% /run/lock
tmpfs                           428K    15  428K    1% /sys/fs/cgroup
/dev/mapper/ipbx--vg-tmp         57K    33   57K    1% /tmp
/dev/mapper/ipbx--vg-var        311K   18K  293K    6% /var
/dev/mapper/ipbx--vg-home       3.3M   373  3.3M    1% /home
/dev/sda1                        61K   341   61K    1% /boot
tmpfs                           428K    11  428K    1% /run/user/0
=> nothing alarming ? Maybe the /var ...

Quote:
Originally Posted by TB0ne View Post
Regardless, it sounds like you need to actually hire a consultant to come take care of this, based on what you're posting.
Yes but no, as we will abandon this IPBX to replace by an Asterisk. Furthermore we want understand how to diagnose a such breakdown.

Last edited by lenainjaune; 04-27-2024 at 04:21 AM.
 
Old 04-26-2024, 10:58 AM   #34
wpeckham
LQ Guru
 
Registered: Apr 2010
Location: Continental USA
Distribution: Debian, Ubuntu, RedHat, DSL, Puppy, CentOS, Knoppix, Mint-DE, Sparky, VSIDO, tinycore, Q4OS,Manjaro
Posts: 5,640

Rep: Reputation: 2697Reputation: 2697Reputation: 2697Reputation: 2697Reputation: 2697Reputation: 2697Reputation: 2697Reputation: 2697Reputation: 2697Reputation: 2697Reputation: 2697
Quote:
Originally Posted by lenainjaune View Post
Ah ! We supposed what was displayed when it freeze will indicates the cause ... So what is the aim of what is displayed ?

How to track the problem before the freeze ?
Once the freeze starts NOTHING new will be logged or displayed. So the thing on the monitor will be the VERY LAST THING sent to it before the freeze starts. There is a pretty good chance that message WILL pertain to the cause.

I am glad you will be replacing these systems. I have not seen any mention about what changed to start the problem, but if you have been living with this since the system was new you have far more patience than I.

Troubleshooting these things takes a clear and pretty complete understanding of what the system does (hardware and software) and clear and logical progression of eliminating potential causes until there is only one left. That is not rocket science, but does require training or a deeply analytical mind. Training in higher mathematics seems to help, but you CAN develop a good technique. It may take time.

Last edited by wpeckham; 04-26-2024 at 11:06 AM.
 
Old Today, 07:18 AM   #35
lenainjaune
LQ Newbie
 
Registered: Apr 2024
Posts: 18

Original Poster
Rep: Reputation: 0
We had just replaced the RAM by a certified working RAM (as the users did not return a problem) and we are testing the old RAM with MemTest86 (v5.01) to ensure of its reliability.

We strongly suspect a telephony software bug but if we want to work around this we must knowing what is the problem (resources, network, database, etc.)

In parallel we are trying again to install a monitor tool.

You said that :
Quote:
Originally Posted by wpeckham View Post
IF it is software, and those logs are not giving you useful data, you might need to turn on better logging. Be warned, additional logging may degrade performance, but is the option most likely to give you useful "root cause" information.
What to change to better the logging return ?
 
Old Today, 08:27 AM   #36
TB0ne
LQ Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 26,637

Rep: Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965
Quote:
Originally Posted by lenainjaune
Ok ! As we said before, we believed that the display when there is a freeze is sufficient to determine the cause.
And as I said before, *THERE ARE MULTIPLE LOGS*. I bolded a piece of text that I wrote...very obviously, when the system freezes it's going to stop logging. You were given two exact commands to not only show you the previous logs, but how to display them. Did you read/understand/use those commands????
Quote:
Originally Posted by lenainjaune
Yes we ever looked the logs with journalctl --err as indicated in OP. As a result we disabled ACPI, UPS monitor and more recently the inventory's agent. We do not understand why you suggest to look the boot logs ... as, if there is a freeze nothing will be logged in files. Too, is journalctl centralize all logs, or it is necessary to explore one by one ?
No...again, when the system freezes it is *NOT GOING TO LOG ANYTHING*. You need the messages just BEFORE the freeze...can't be more plain than that. And if you need specific instructions to check any/all log files you think may be related, you really should hire someone to do this. These are basic troubleshooting steps.
Quote:
Originally Posted by lenainjaune View Post
We had just replaced the RAM by a certified working RAM (as the users did not return a problem) and we are testing the old RAM with MemTest86 (v5.01) to ensure of its reliability. We strongly suspect a telephony software bug but if we want to work around this we must knowing what is the problem (resources, network, database, etc.)

In parallel we are trying again to install a monitor tool. You said that :
Quote:
Originally Posted by wpeckham
IF it is software, and those logs are not giving you useful data, you might need to turn on better logging. Be warned, additional logging may degrade performance, but is the option most likely to give you useful "root cause" information.
What to change to better the logging return ?
Did you look in the manuals/documentation for the telephony software???

The /var partition appears to be a bit full...again, if it is FILLED totally with voicemails/etc., it could cause a problem. Those things may be lost/deleted after the crash, which would recover that space.
Quote:
Originally Posted by lenainjaune
Yes but no, as we will abandon this IPBX to replace by an Asterisk. Furthermore we want understand how to diagnose a such breakdown.
We've all been trying to tell you, but it appears you don't have much experience in such things. It would be far better to hire someone local to you and get them to walk through things with you, that are specific to your environment.
 
Old Today, 10:53 AM   #37
MadeInGermany
Senior Member
 
Registered: Dec 2011
Location: Simplicity
Posts: 2,798

Rep: Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201
Maybe you are running out of RAM?

Set SystemMaxUse=100M in /etc/systemd/journald.conf
Ensure you have a directory /var/log/journal/
so the log survives a reboot.
Ensure you have some swap configured (1GB is enough).
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Apache giving the error :Could not determine the server's fully qualified domain name bcf2 Linux - Server 47 02-13-2015 10:34 PM
[SOLVED] Ubuntu 13.10 - cursor freezes plus Software Center freezes Vocay2 Ubuntu 6 10-19-2013 11:58 AM
How do i determine my IP address? How do i determine my host name? jwymore Linux - Networking 5 02-07-2007 09:57 AM
fedora core 2 (FC2) freezes while running. Cannot boot into KDE it freezes mraswan Fedora 0 05-25-2004 07:46 PM
Apache: httpd: Could not determine the server's fully qualified domain name.. shirtboy Linux - Software 1 11-20-2003 03:47 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Server

All times are GMT -5. The time now is 04:03 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration