LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   segfault error (https://www.linuxquestions.org/questions/linux-newbie-8/segfault-error-697009/)

landysaccount 01-13-2009 11:22 AM

segfault error
 
Hello;

Today I've come across a big problem (to me): My router/firewall running debian Etch 4.0 with 2.6-28 kernel got locked. I could ping to it and ping to the internet but, I couldn't connect to it. I turned on the monitor and I see the following:

I tried to login:

login[1879]: segfault at 0 ip b7ec346d sp bf0e060 error 4 in libpam.so.0.79[b7ebf000+7000]

tried to reboot with ctrl-alt-del:

shutdown[2883]: segfault at bff4327d ip b7e27141 sp bfd1421c error 4 in libc-2.3.6.so[b7dc8000+127000]

Just locked. I pressed the reset button and got a whole bunch of segfault error and it didn't come up.

I have no idea of what's going on.

What's wrong?

unSpawn 01-13-2009 06:23 PM

I don't know. Could boot a Live CD, mount partitions readonly and try to figure out from reading logs if something got installed, de-installed, updated, reconfigured, et cetera recently?

GaijinPunch 01-13-2009 06:55 PM

Are you logging in on the console or via SSH?

landysaccount 01-13-2009 09:11 PM

Quote:

Originally Posted by GaijinPunch (Post 3407625)
Are you logging in on the console or via SSH?

I can't login at all.

Could it be a hardware problem?

GaijinPunch 01-13-2009 10:20 PM

Quote:

I tried to login:
How did you *try* to login?

Quote:

Could it be a hardware problem?
Most definitely could be. Segfaults generally occur when a program tries to access an ass piece of memory. My latest debacle was with brand new hardware. Couldn't compile anything that took more than a few minutes w/o a segmentation fault. Ran memtest86 for a full day w/ no errors. Swapped the memory out: viola - problem solved.

On that note, answer the obvious following questions:
1: Has any software changed?
2: Has any hardware changed?

If either is yes, investigate there. If not, as suggested by unSpawn, you should try to boot to a LiveCD, mount the drive, and look for hints in logs. If that doesn't work, you can try swapping out hardware to pinpoint it. As hinted, I would start with memory. It's cheap these days, and easy to do. To really test it, I would compile the kernel in an endless loop. I got a script somewhere (forgot where... I think gentoo forums) that id it and exited when there was an error.

Tinkster 01-14-2009 03:57 AM

Quote:

Originally Posted by landysaccount (Post 3407718)
I can't login at all.

Could it be a hardware problem?

If you didn't have any updates - yes, by all means.
I've seen segfaults as an indicator for both a buggy/dieing
chip-set, and for dieing RAMs.


Cheers,
Tink

landysaccount 01-14-2009 03:05 PM

Hello.

I forgot about the problem and didnt want to stress over it. Now I came home, plugged in the box turned on, and it boot up like a charm.

I don't know what magic solved the problem but, I will leave it on to see if the problem happens again. I was thinking of replacing the HD and the RAM and reinstall debian and see if it will work flawlessly. I don't know how to recreate the problem since I don't know how and when it happened but, is frustrating.

I will take a look at /var/log/messages and some other logs to see if I come across something there.

What you guys recommend me to do with this piece of crap?

Thank you.

landysaccount 01-14-2009 03:27 PM

After looking at my /var/log/messages file I see this:

Jan 12 22:20:01 trahersa-test squid[1719]: Squid Parent: child process 2665 exited due to signal 6
Jan 12 22:20:01 trahersa-test kernel: squid[1719]: segfault at 48100813 ip 48100813 sp bff5dfbc error 4 in libnss_files-2.3.6.so[b7c44000+9000]
Jan 12 22:48:14 trahersa-test -- MARK --
Jan 12 22:54:57 trahersa-test dhcpd: Wrote 5 leases to leases file.
Jan 12 23:08:15 trahersa-test -- MARK --
Jan 12 23:28:15 trahersa-test -- MARK --
Jan 12 23:48:15 trahersa-test -- MARK --
Jan 13 00:08:15 trahersa-test -- MARK --
Jan 13 00:28:15 trahersa-test -- MARK --
Jan 13 00:48:15 trahersa-test -- MARK --
Jan 13 01:08:15 trahersa-test -- MARK --
Jan 13 01:28:15 trahersa-test -- MARK --
Jan 13 01:39:50 trahersa-test dhcpd: Wrote 5 leases to leases file.
Jan 13 02:08:15 trahersa-test -- MARK --
Jan 13 02:18:15 trahersa-test kernel: exim4[1605]: segfault at 81f2b28 ip 0805b3dd sp bfed9f00 error 4 in exim4[8048000+a5000]
Jan 13 02:28:15 trahersa-test -- MARK --
Jan 13 02:48:16 trahersa-test -- MARK --
Jan 13 03:08:16 trahersa-test -- MARK --
Jan 13 03:28:16 trahersa-test -- MARK --
Jan 13 03:48:16 trahersa-test -- MARK --
Jan 13 04:08:16 trahersa-test -- MARK --
Jan 13 04:09:01 trahersa-test kernel: cron[2830]: segfault at 1a ip b7f0c08c sp bff2faac error 4 in libpam.so.0.79[b7f0b000+7000]
Jan 13 04:17:01 trahersa-test kernel: cron[2831]: segfault at 1a ip b7f0c08c sp bff2faac error 4 in libpam.so.0.79[b7f0b000+7000]
Jan 13 04:28:16 trahersa-test -- MARK --
Jan 13 04:39:01 trahersa-test kernel: cron[2832]: segfault at 1a ip b7f0c08c sp bff2faac error 4 in libpam.so.0.79[b7f0b000+7000]
Jan 13 05:02:01 trahersa-test kernel: cron[2833]: segfault at 10000 ip b7f0f48a sp bff2ff20 error 4 in libpam.so.0.79[b7f0b000+7000]
Jan 13 05:09:01 trahersa-test kernel: cron[2834]: segfault at 10000 ip b7f0f48a sp bff2ff20 error 4 in libpam.so.0.79[b7f0b000+7000]
Jan 13 05:17:01 trahersa-test kernel: cron[2835]: segfault at 10000 ip b7f0f48a sp bff2ff20 error 4 in libpam.so.0.79[b7f0b000+7000]
Jan 13 05:28:16 trahersa-test -- MARK --
Jan 13 05:39:01 trahersa-test kernel: cron[2836]: segfault at 10000 ip b7f0f48a sp bff2ff20 error 4 in libpam.so.0.79[b7f0b000+7000]
Jan 13 06:08:16 trahersa-test -- MARK --
Jan 13 06:09:01 trahersa-test kernel: cron[2837]: segfault at 10000 ip b7f0f48a sp bff2ff20 error 4 in libpam.so.0.79[b7f0b000+7000]
Jan 13 06:17:01 trahersa-test kernel: cron[2838]: segfault at 10000 ip b7f0f48a sp bff2ff20 error 4 in libpam.so.0.79[b7f0b000+7000]
Jan 13 06:25:01 trahersa-test kernel: cron[2839]: segfault at 10000 ip b7f0f48a sp bff2ff20 error 4 in libpam.so.0.79[b7f0b000+7000]
Jan 13 06:39:01 trahersa-test kernel: cron[2840]: segfault at 10000 ip b7f0f48a sp bff2ff20 error 4 in libpam.so.0.79[b7f0b000+7000]
Jan 13 07:08:17 trahersa-test -- MARK --
Jan 13 07:09:01 trahersa-test kernel: cron[2841]: segfault at 10000 ip b7f0f48a sp bff2ff20 error 4 in libpam.so.0.79[b7f0b000+7000]
Jan 13 07:17:01 trahersa-test kernel: cron[2842]: segfault at 10000 ip b7f0f48a sp bff2vf20 error 4 in |ibpam.so.0.79[b7f0b000+7000]
Jan 13 07:28:17 trahersa-t<E5>st -- MARK --
Jan 13 07:39:01 trahersa-test kernel: cron[2843]: segfault at 10000 ip b7f0f48a sp bff2ff20 error 4 in libpam.so.0.79[b7f0b000+7000]
Jan 13 08:08:17 trahersa-test -- MARK --
Jan 13 08:0;:01 trahersa-test kernel: cron[2844]: segfault at b7f116ae ip b7f0e420 sp bff30030 error 7 in lybpam.so>0.79[b7f0b000+7000]
Jan 13 08:17:01 trahersa-test kernel: cron[2845]: sugfault at b7f116ae ip b7f0e420 sp bff30030 error 7 in libpam.so.0.79[b7f0b000+7000]
...
...
..
..
Jan 13 11:03:05 trahersa-test kernel: login[1879]: segfault at 0 ip b7ec346d sp bff0e060 error 4 in libpam.so.0.79[b7ebf000+7000]
Jan 13 11:04:49 trahersa-test kernel: login[2857]: segfault at 0 ip b7ed246d sp bfd1c670 error 4 in libpam.so.0.79[b7ece000+7000]
Jan 13 11:04:55 trahersa-test kernel: login[1887]: segfault at 0 ip b7f4146d sp bf98dae0 error 4 in libpam.so.0.79[b7f3d000+7000]
Jan 13 11:04:59 trahersa-test kernel: shutdown[2883]: segfault at bff4327d ip b7e27141 sp bfd1421c error 4 in libc-2.3.6.so[b7dc8000+127000]
Jan 13 11:06:44 trahersa-test kernel: login[2874]: segfault at 0 ip b7eee46d sp bfe38790 error 4 in libpam.so.0.79[b7eea000+7000]
Jan 13 11:06:50 trahersa-test kernel: login[2884]: segfault at 0 ip b7f9d46d sp bfce7e40 error 4 in libpam.so.0.79[b7f99000+7000]
Jan 13 11:06:56 trahersa-test shutdown[2902]: shutting down for system reboot
Jan 13 11:06:56 trahersa-test kernel: rc[2905]: segfault at 3131d807 ip 3131d807 sp bfacdf20 error 4 in libnsl-2.3.6.so[b7e1a000+12000]
Jan 13 11:06:56 trahersa-test kernel: sulogin[2907]: segfault at 9 ip b7e94d11 sp bff7440c error 4 in libc-2.3.6.so[b7de4000+127000]
Jan 13 11:07:03 trahersa-test kernel: sulogin[2929]: segfault at 0 ip b7f89218 sp bfc955f8 error 4 in ld-2.3.6.so[b7f7f000+15000]
Jan 13 11:07:07 trahersa-test kernel: sulogin[2946]: segfault at 0 ip b7fe4466 sp bfeef990 error 6 in ld-2.3.6.so[b7fd9000+15000]
Jan 13 11:07:07 trahersa-test kernel: udevd[2947]: segfault at fc ip ffffe4ab sp bfacfd94 error 6






Can anyone figure something out after seeing that?

Tinkster 01-14-2009 04:23 PM

Open the case and remove dust from the motherboard ;}

Could be thermal issues that cause RAM (and/or chips) to
start failing after some time of running.



Cheers,
Tink

landysaccount 01-14-2009 06:11 PM

This is a brand new system and is clean, no dust.

Tinkster 01-14-2009 06:58 PM

In that case it's a thermal fault of some sort; take it to the shop
and have them check it. If it were a software related fault it would
be consistent, not good at boot and broken later.

landysaccount 01-15-2009 06:50 PM

Ok. It happened again. I left the system on and today approximately 24hrs later I got a segfault. This time is different. I was logged with putty and that connection didn't close. I was able to do other things in the system and tried to login with another putty session but just couldn't. It allowed me to type the username and password and after pressing enter it closed the window.

I rebooted and it came up alright. It booted again.

I noticed while booting a message:
rtc_cmos
rtc0: alarm up to one day.


I don't know what that means. I googled it and is something about acpid, which I don't have installed and is also disabled in the BIOS.

What else might be causing this problem? Looks like is hardware. Could it be a thermal issue like Tinkster mentioned?

GaijinPunch 01-15-2009 07:52 PM

I would highly recommend replacing the memory. It's cheap, and easy to do. If it's not that, the worst case is you have some extra memory (and in most cases, can use it somewhere). This definitely points to faulty hardware (of some sort). As pointed out, software related segfaults are consistent -- this is very inconsistent, and from your log, different applications are segfaulting.

Interested to hear the outcome.

landysaccount 01-16-2009 12:14 PM

Ok.

I took the box to the shop and technician there recommended replacing the CPU and fan, they replaced it for free. I will let the box run for a while to see if the problem happens again. If it does, I shall replace the memory.

I will keep you posted.

landysaccount 01-16-2009 12:17 PM

But, what does this really mean anyways:

rtc_cmos
rtc0: alarm up to one day.


All times are GMT -5. The time now is 06:06 PM.