LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Debian (https://www.linuxquestions.org/questions/debian-26/)
-   -   Debian box fell down go boom... (https://www.linuxquestions.org/questions/debian-26/debian-box-fell-down-go-boom-705180/)

arfon 02-16-2009 05:22 PM

Debian box fell down go boom...
 
We have a Debian box and when it boots in Single-user (rescue mode), it comes up fine.

When it boots normally, udev and several RC scripts segfault. The logs are useless, I've checked messages, debug, dmesg and kern.log and NONE recorded the problems.

Damnedest thing I have ever seen.

Anyway, this box has been running for over a year and has alot of data on it...

Is there anyway to repair the Debian installation without touching anything else?

Is there a magic apt-get command???

stress_junkie 02-16-2009 07:35 PM

I don't know how you would fix the operating system before you know the cause of the problem. Have you considered that you could have a hardware failure? The segfault says that a process is making an invalid request to access memory. The fact that this never happens when networking is turned off says that maybe a NIC is broken.

Try running memtest86 for a quick pass/fail. Also consider swapping a known good NIC into the machine.

unSpawn 02-16-2009 07:37 PM

Besides, I thought you opted for reinstalling from scratch, then you just stopped replying, now a new thread? What gives?

arfon 02-16-2009 10:12 PM

Quote:

I don't know how you would fix the operating system before you know the cause of the problem. Have you considered that you could have a hardware failure? The segfault says that a process is making an invalid request to access memory. The fact that this never happens when networking is turned off says that maybe a NIC is broken.
Not a HW problem, I swapped the disc into a known good box and it does the same thing. The HW also boots and runs a Live CD just fine.


Quote:

Besides, I thought you opted for reinstalling from scratch, then you just stopped replying, now a new thread? What gives?
I was until we found out that we can't just re-install. The company that made this server will not support it and wants us to buy a whole new box. I can get the OS on it but not the specialized applications that are also on it.

If there's a way to re-install the base debian packages OR upgrade the packages it will probably work.

If this was a Slackware box, I'd slap the install CDs in and do a base install (since I'm familiar with Slackware). I don't know how to do this with Debian.

stress_junkie 02-17-2009 07:57 AM

Quote:

Originally Posted by arfon (Post 3446309)
Not a HW problem, I swapped the disc into a known good box and it does the same thing. The HW also boots and runs a Live CD just fine.

Okay. I agree it's not hardware. But having said that I think that you are SOL. I hate to say it.

You could try doing an apt-get upgrade but your application may depend on keeping the same version of Debian.
Code:

apt-get update
apt-get upgrade

I'd make an image of the system first using partimage. But, as I said, I doubt that this is the answer.

Maybe this is an opportunity for your business to find a new vendor for this application. You're already hosed so the worst thing that could happen already has happened.

rweaver 02-17-2009 08:14 AM

It sounds like you have some corruption on at least some of your shared libraries. Updating to a newer version of them would likely fix the problem but if this machine is running specific applications by a vendor you deal with then you're tied to whatever they're using and a change would likely break the system. I would be far less concerned about the machine and far more concerned about the vendor refusing to support an item they deployed. You can force installation of packages that are already installed... and you can get a list of installed packages doing a dpkg -l... and force the reinstall with apt-get --reinstall packagename, I'd try to isolate what programs were segfaulting and ldd them, find the related package, and force a reinstall.

arfon 02-17-2009 08:18 AM

I'm thinking the "apt-get update" may work. I am surprised that Debian has no "apt-get repair" feature.


I'll try it today and let you all know what happens.

rweaver 02-17-2009 11:23 AM

Quote:

Originally Posted by arfon (Post 3446898)
I'm thinking the "apt-get update" may work. I am surprised that Debian has no "apt-get repair" feature.


I'll try it today and let you all know what happens.

Just make sure you have good backups... and you might want to make sure you're not set on "stable" "testing" "unstable" unless you're sure you want to upgrade releases (esp an issue since lenny just hit stable.)

ebmi 02-19-2009 01:52 AM

apt-get does kinda have a 'repair' feature. I've used it a couple times when my hard drive on my old computer decided to lose some bits and corrupt a few files. The command to use is

Code:

# apt-get install --reinstall packagename
The only problem is you have to know which file belongs to which package. You can find this out with the apt-file command,

Code:

# apt-file search file_name
This will return the package(s) that contain the given file name.

EDIT:
It may also be useful to know that the above apt-get command can miss conf files, i.e. files in the /etc directory. In order to replace those files you need to pass an option to dpkg through apt-get like this

Code:

# apt-get -o DPkg::Options::="--force-confmiss" install --reinstall packagename

nx5000 02-19-2009 03:53 AM

What gives ldd `which udevd`

Are you able to reproduce the segfault manually?
If yes, try to run it through strace ?

Quote:

The logs are useless, I've checked messages, debug, dmesg and kern.log and NONE recorded the problems.
Probably because your disk are not yet write enabled at this time.


All times are GMT -5. The time now is 12:05 AM.