LinuxQuestions.org
Go Job Hunting at the LQ Job Marketplace
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware
User Name
Password
Linux - Hardware This forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux?

Notices

Reply
 
Search this Thread
Old 02-02-2009, 10:38 AM   #31
aaroman
LQ Newbie
 
Registered: Nov 2003
Location: Romania
Distribution: Fedora 8
Posts: 6

Rep: Reputation: 0

Does anybody have a solution to this issue? I'm facing the same thing, and I can tell that it's not a power issue. I also used more than one kernel version. Stripped a lot of things out of it...
 
Old 02-02-2009, 10:51 AM   #32
dgermann
Member
 
Registered: Aug 2004
Distribution: Ubuntu 8.04.1 desk; Red Hat 9.0 server
Posts: 296

Original Poster
Rep: Reputation: 30
Thumbs up

aaroman--

What I have found works for me is to run fsck on the drive. The easiest way I have found to do that is to run this:
Code:
doug@doug2:~$ sudo tune2fs -c 1 /dev/sda1
reboot, then run this:
Code:
doug@doug2:~$ sudo tune2fs -c 17 /dev/sda1
Then whenever this happens I do the whole process all over again. That is not a solution to the root cause, but it at least lets me get on with my work for another month or two.

There is an extensive thread on all that has been suggested and tried, here: http://ubuntuforums.org/showthread.php?t=970006

Please let us know what happens for you when you try this!
 
Old 02-02-2009, 12:35 PM   #33
aaroman
LQ Newbie
 
Registered: Nov 2003
Location: Romania
Distribution: Fedora 8
Posts: 6

Rep: Reputation: 0
The computer is a node in a cluster (the problem nodes have quad processors, 8 or 4 GBytes of RAM, the mainboard with integrated video and lan and that's about it).
I had the problem with almost all of those nodes, but it seems that the problem is "fixed" for all of them (9 days up and running until now) except one, which seems to run for a few days then the reboot starts again. At first it was hard to stop, but now simply rebooting the node manually seems to fix the problem for a while.
I coulnd't find the cause. I suspected something with crond, but it wasn't the case. I recompiled the kernel avoiding all unecesary things. I actually tried a couple of 2.6 kernel versions, they behaved in the same way. I might be wrong, but it seems that at least in some cases, the reboot started after the node got a DHCPOFFER, so it might be something related with the netowrk card (e1000e driver, if it matters).

Being a diskless cluster, I tried creating a fresh ramdisk, I enlarged it from 32M to 64M, and I disabled running fsck on it (with tune2fs). Along with recompiling a smaller kernel it seems to work partially, that is the problem does not appear so often and with so many nodes as before. It looks like the fsck thing won't help me, because the nodes load a fresh filesystem each reboot, from the server.

There are also other nodes with hyperthreading Pentium processors, which seem to work with no issue whatsoever.

Last edited by aaroman; 02-02-2009 at 12:39 PM.
 
Old 02-02-2009, 01:46 PM   #34
PTrenholme
Senior Member
 
Registered: Dec 2004
Location: Olympia, WA, USA
Distribution: Fedora, (K)Ubuntu
Posts: 4,149

Rep: Reputation: 330Reputation: 330Reputation: 330Reputation: 330
Are you running a 64-bit OS or just an expanded memory optioned 32-bit one? I ask because the "High Precision Event Timer" in my AMD 64-bit processor does not seem to respond to some events as well as it should, causing excessive waits. That might be (but probably isn't) your problem, since the periodicity of your problem would be unlikely for an event timeout.

In fact, your comment about a possible network relation prompts me to ask, "Does your system make a DHCP connection when it boots with a 24 hour lease? What happens when the lease expires?" An expired lease shouldn't trigger a reboot, but perhaps you have a process using the connection when the lease expires that is triggering a reboot.
 
Old 02-02-2009, 01:52 PM   #35
dgermann
Member
 
Registered: Aug 2004
Distribution: Ubuntu 8.04.1 desk; Red Hat 9.0 server
Posts: 296

Original Poster
Rep: Reputation: 30
Question

PTrenholme--

Mine is a 32 bit machine, and my dhcp lease with Comcast has been in effect for well over 6 months.

aaroman--

What if you did the fsck on the server, since it clones its sessions to the diskless machines?
 
Old 02-02-2009, 02:07 PM   #36
aaroman
LQ Newbie
 
Registered: Nov 2003
Location: Romania
Distribution: Fedora 8
Posts: 6

Rep: Reputation: 0
Mine is 64 bits. Yes, the lease time is 24 hours. I don't think there is such a process that trigers a reboot when the lease is renewed, if it's not the dhclient itself, or even the network driver.

dgermann: Well, I actually made a fresh file system to be served to the nodes. There shouldn't be any problem on it. Besides that, with the same file system the other nodes work ok.
There are some directories from the server which are mounted on the clients, but again, most nodes work ok, only some (right now only one) have the issue. If the issue would be in the file system, either in the ram disk or on the server, the issue should appear on all nodes, since they are practically identical

This thing seems to start at random, but when it starts, the reboot is each hour, with pretty good precision. That doesn't look like a file system issue. It looks more like a watchdog of some sort.
 
Old 02-02-2009, 09:19 PM   #37
dgermann
Member
 
Registered: Aug 2004
Distribution: Ubuntu 8.04.1 desk; Red Hat 9.0 server
Posts: 296

Original Poster
Rep: Reputation: 30
Question

aaroman--

Yup, it's sure baffling. But it is good to know there are others having the same issue--it proves we're not crazy--or at least I will believe it is proof of such!

What OS are you using? Mine is Ubuntu 8.04.1.

The idea for the fsck came from someone running redhat or the free version of it, I forget the name....
 
Old 02-02-2009, 09:27 PM   #38
dgermann
Member
 
Registered: Aug 2004
Distribution: Ubuntu 8.04.1 desk; Red Hat 9.0 server
Posts: 296

Original Poster
Rep: Reputation: 30
Question

aaroman--

Yup, it's sure baffling. But it is good to know there are others having the same issue--it proves we're not crazy--or at least I will believe it is proof of such!

What OS are you using? Mine is Ubuntu 8.04.1.

The idea for the fsck came from someone running redhat or the free version of it, I forget the name....
 
Old 02-03-2009, 07:28 AM   #39
farslayer
Guru
 
Registered: Oct 2005
Location: Willoughby, Ohio
Distribution: linuxdebian
Posts: 7,231
Blog Entries: 5

Rep: Reputation: 189Reputation: 189
Since the reboot occurs 60 minutes after power on, could it somehow be related to power management ? like the system start to put something to sleep, and the machine bascially just falls over at that point ? since you are running fsck on the drives that would indicate it didn't do a soft reboot, but went down hard.

have you tried disabling APM or ACPI, or checking for a BIOS update for the motherboard ?

Also odd that it occurs for a day or so, then works fine for a month or two in between.

Yep it's a shot in the dark, but it looks like everything else you've tried has been so far as well..

Gotta love a mystery... or not...
 
Old 02-04-2009, 12:35 AM   #40
arnuld
Member
 
Registered: Dec 2005
Location: Punjab (INDIA)
Distribution: Arch
Posts: 209

Rep: Reputation: 30
Quote:
Originally Posted by dgermann View Post
PTrenholme--
Yours is a logical deduction. Unfortunately, I blew the execution of what you suggested.

I do not know anything about aptitude and have never used it before. Don't understand its screens. I ran update manager and even had it check for updates, but it found none. I then ran synaptic and had it check for updates and it did not highlight any.
If you don't know then learn. It is the Linux way of doing things. Forget Synaptic, use apt-get.
 
Old 02-04-2009, 04:06 AM   #41
aaroman
LQ Newbie
 
Registered: Nov 2003
Location: Romania
Distribution: Fedora 8
Posts: 6

Rep: Reputation: 0
Well, we're using some derivative of Red Hat I think... it's Fermi Linux. It's lightweight, we don't need fancy things on nodes, they are used for computation only.
On server it's CentOS.

Yes, I tried to disable AMP and ACPI, but that way for some reason it works with one processor only. I should start disabling one thing at a time
I'll try a newer kernel, I'll even look for a new bios if it's needed, and if the problem will be solved I'll report back.
 
Old 02-04-2009, 08:36 PM   #42
dgermann
Member
 
Registered: Aug 2004
Distribution: Ubuntu 8.04.1 desk; Red Hat 9.0 server
Posts: 296

Original Poster
Rep: Reputation: 30
aaroman--

PLease let us know.

Thanks!
 
Old 02-05-2009, 12:03 AM   #43
arnuld
Member
 
Registered: Dec 2005
Location: Punjab (INDIA)
Distribution: Arch
Posts: 209

Rep: Reputation: 30
I wonder even after so much of frustrating problem, people still use Ubuntu The Windows of Linux Culture (Sorry, I couldn't resist).

Try using Arch (Text based configuration and package management system), Gentoo (the only Meta Distribution as Larry - the Cow said) or Debian (Rock Solid)
 
Old 02-05-2009, 11:48 AM   #44
PTrenholme
Senior Member
 
Registered: Dec 2004
Location: Olympia, WA, USA
Distribution: Fedora, (K)Ubuntu
Posts: 4,149

Rep: Reputation: 330Reputation: 330Reputation: 330Reputation: 330
I believe "Ubuntu" is, mostly, a repackaging of "Debian testing" with some additional "non-free" repositories containing programs that might not be legally installed in countries that permit software copyrights to be enforced.
 
Old 02-05-2009, 12:10 PM   #45
aaroman
LQ Newbie
 
Registered: Nov 2003
Location: Romania
Distribution: Fedora 8
Posts: 6

Rep: Reputation: 0
As one might see, I have the same issue, and there's no Ubuntu or Debian near the cluster. I kind of suspect something related with the kernel.
 
  


Reply

Tags
automatic, rebooting


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
PC Reboots over and over lmanwarren Linux - Hardware 1 04-16-2005 08:26 PM
2.6.10 reboots salted Slackware 6 02-12-2005 09:30 PM
server reboots every 52 minutes cazzazullu Linux - Hardware 2 10-18-2004 06:34 AM
computer reboots ~5 minutes after halting lpc911 Linux - General 0 06-27-2004 03:30 AM
reboots itself ace135cc Slackware 3 09-13-2003 09:48 PM


All times are GMT -5. The time now is 04:14 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration