LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 08-27-2010, 12:19 AM   #1
centguy
Member
 
Registered: Feb 2008
Posts: 627
Blog Entries: 1

Rep: Reputation: 48
linux turned off by itself. Suspect over heating


I have an office workstatation that I installed with linux and ran
a 8-core job. It is expected that the job to finish in 6 days
but then it stopped after 2 days. The reason is that when I came to office
this morning the workstation is completely turned off for some reason.

Since the air-conditioner is turned off in the office overnight, I suspect
the temperature is too high for the cpu. Is there a script to
output the temperature to a file periodically so that I can pinpoint the
exact cause of the shutdown ?

Thanks!
 
Old 08-27-2010, 12:59 AM   #2
14moose
Member
 
Registered: May 2010
Posts: 83

Rep: Reputation: Disabled
Hi -

You're exactly right: excessive heat *can* (and will) cause a PC to shut down or reboot.

So can an incorrect UPS. For example, we had an HP Proliant D22 with (what I thought) was a more-than-adequate APC . But this particular server would spontaneously turn itself off intermittantly because of this problem (sensitivity to "step approximated sine waves"):

http://social.microsoft.com/forums/e...-90aed04c8231/

http://bizsupport1.austin.hp.com/biz...ctID=c01669175

Both of these issues (power waveform and system temperature) are hardware related, and really need to be addressed as such.

Having said all that: yes, there is a way to monitor your system's temperature with a script. "lm_sensors":

http://www.linuxjournal.com/article/6712

http://www.tech-faq.com/how-to-monit...mperature.html

http://www.lm-sensors.org/wiki/UsefulLinks

'Hope that helps!

Last edited by 14moose; 08-27-2010 at 01:00 AM.
 
1 members found this post helpful.
Old 08-27-2010, 05:05 AM   #3
centguy
Member
 
Registered: Feb 2008
Posts: 627

Original Poster
Blog Entries: 1

Rep: Reputation: 48
Hey. I appreciate a more explicit help rather to hunt down each alternative. Appreciate your comment!
 
Old 08-27-2010, 05:36 AM   #4
i92guboj
Gentoo support team
 
Registered: May 2008
Location: Lucena, Córdoba (Spain)
Distribution: Gentoo
Posts: 4,083

Rep: Reputation: 405Reputation: 405Reputation: 405Reputation: 405Reputation: 405
If you have lm_sensors installed something like this should suffice:

Code:
#!/bin/sh

while :; do
  date >> /root/sensors.txt
  sensors >> /root/sensors.txt
  sync
  sleep 10s
done
The date might not be relevant to you, or maybe you want to tweak it to show just the hour, your call.

The sync command will ensure that everything gets written, since you don't know when the system will hang or shut down.

You might want to tune the sleep time as well and the path to the log file.
 
1 members found this post helpful.
Old 08-27-2010, 11:24 PM   #5
centguy
Member
 
Registered: Feb 2008
Posts: 627

Original Poster
Blog Entries: 1

Rep: Reputation: 48
Thanks i92guboj!

I yum installed lm_sensors. And I don't know what to do with this:

Quote:
[root@centos52-64-dell ~]# sensors
No sensors found!
Make sure you loaded all the kernel drivers you need.
Try sensors-detect to find out which these are.
[root@centos52-64-dell ~]# sensors-detect
# sensors-detect revision 5291 (2008-06-23 23:40:46 -0700)

This program will help you determine which kernel modules you need
to load to use lm_sensors most effectively. It is generally safe
and recommended to accept the default answers to all questions,
unless you know what you're doing.

We can start with probing for (PCI) I2C or SMBus adapters.
Do you want to probe now? (YES/no):
Probing for PCI bus adapters...
Use driver `i2c-i801' for device 0000:00:1f.3: Intel ICH9

We will now try to load each adapter module in turn.
Module `i2c-i801' already loaded.
If you have undetectable or unsupported I2C/SMBus adapters, you can have
them scanned by manually loading the modules before running this script.

We are now going to do the I2C/SMBus adapter probings. Some chips may
be double detected; we choose the one with the highest confidence
value in that case.
If you found that the adapter hung after probing a certain address,
you can specify that address to remain unprobed.

Next adapter: SMBus I801 adapter at 0500 (i2c-0)
Do you want to scan it? (YES/no/selectively):
Client found at address 0x50
Probing for `Analog Devices ADM1033'... No
Probing for `Analog Devices ADM1034'... No
Probing for `SPD EEPROM'... Yes
(confidence 8, not a hardware monitoring chip)
Probing for `EDID EEPROM'... No
Client found at address 0x52
Probing for `Analog Devices ADM1033'... No
Probing for `Analog Devices ADM1034'... No
Probing for `SPD EEPROM'... Yes
(confidence 8, not a hardware monitoring chip)
Probing for `EDID EEPROM'... No

Some chips are also accessible through the ISA I/O ports. We have to
write to arbitrary I/O ports to probe them. This is usually safe though.
Yes, you do have ISA I/O ports even if you do not have any ISA slots!
Do you want to scan the ISA I/O ports? (YES/no):
Probing for `National Semiconductor LM78' at 0x290... No
Probing for `National Semiconductor LM78-J' at 0x290... No
Probing for `National Semiconductor LM79' at 0x290... No
Probing for `Winbond W83781D' at 0x290... No
Probing for `Winbond W83782D' at 0x290... No

Some Super I/O chips may also contain sensors. We have to write to
standard I/O ports to probe them. This is usually safe.
Do you want to scan for Super I/O sensors? (YES/no):
Probing for Super-I/O at 0x2e/0x2f
Trying family `National Semiconductor'... No
Trying family `SMSC'... No
Trying family `VIA/Winbond/Fintek'... No
Trying family `ITE'... Yes
Found `ITE IT8718F Super IO Sensors' Success!
(address 0x290, driver `it87')
Probing for Super-I/O at 0x4e/0x4f
Trying family `National Semiconductor'... No
Trying family `SMSC'... No
Trying family `VIA/Winbond/Fintek'... No
Trying family `ITE'... No

Some south bridges, CPUs or memory controllers may also contain
embedded sensors. Do you want to scan for them? (YES/no):
Silicon Integrated Systems SIS5595... No
VIA VT82C686 Integrated Sensors... No
VIA VT8231 Integrated Sensors... No
AMD K8 thermal sensors... No
AMD K10 thermal sensors... No
Intel Core family thermal sensor... Success!
(driver `coretemp')
Intel AMB FB-DIMM thermal sensor... No

Now follows a summary of the probes I have just done.
Just press ENTER to continue:

Driver `it87' (should be inserted):
Detects correctly:
* ISA bus, address 0x290
Chip `ITE IT8718F Super IO Sensors' (confidence: 9)

Driver `coretemp' (should be inserted):
Detects correctly:
* Chip `Intel Core family thermal sensor' (confidence: 9)

Do you want to overwrite /etc/sysconfig/lm_sensors? (YES/no):
Starting lm_sensors: No sensors found!
Make sure you loaded all the kernel drivers you need.
Try sensors-detect to find out which these are.
[FAILED]
[root@centos52-64-dell ~]# sensors
No sensors found!
Make sure you loaded all the kernel drivers you need.
Try sensors-detect to find out which these are.
[root@centos52-64-dell ~]#
 
Old 08-28-2010, 12:40 AM   #6
14moose
Member
 
Registered: May 2010
Posts: 83

Rep: Reputation: Disabled
Hi -

1. In the first post, we tried to:
a) answer your original question: "Yes, there are tools to monitor hardware status and yes, they are scriptable"

b) Suggest at least one other alternative BESIDES "temperature" you might want to be aware of (there are, of course, many other possibilities, too)

c) Give you enough information to at least start off on your quest

d) Remind you that monitoring is good - but prevention is better. Lm_sensors isn't going to do any good, for example, unless somebody moves the PC to either a refrigerated environment (ideal) or at least to a reasonably cool, well-ventilated room. You might also want to consider opening the PC up and "blowing it out". Couldn't hurt

2. Whatever you do with lm_hosts, you need to approach it with "eyes wide open". Again, the links I cited should help you interpret the data.

3. Your first "data interpretation question" appears to be "Did the sucker even install correctly? Is it configured?"

By the looks of your message, the answer appears to be "No".

Q: Did you try running "sensors-detect"? What happened?

Q: Did you try running "sensors"? What happened?

Q: Whats in your /etc/sysconfig/lm_sensors config file (which, I believe, replaces the "/etc/sensors.conf" file cited in the 2003 Linux Journal article I cited above)?

Q: Did you get a chance to scan the LJ article? Does it look like some of the information (in that version of lm_sensors) might be useful to your current scenario?

Q: Do you see anything in the system log (/var/log/messages)?

Thank you in advance!
 
Old 08-28-2010, 01:10 AM   #7
i92guboj
Gentoo support team
 
Registered: May 2008
Location: Lucena, Córdoba (Spain)
Distribution: Gentoo
Posts: 4,083

Rep: Reputation: 405Reputation: 405Reputation: 405Reputation: 405Reputation: 405
YOu might need to configure your sensors first, that can vary from distro to distro so I suggest you to check the docs, wikis, etc for your distro. There must be some easy guide to configure your hardware sensors.

The chance is that you need to load the right kernel module for your sensors to work, provided that your distro ships such module by default.
 
Old 08-28-2010, 03:55 AM   #8
centguy
Member
 
Registered: Feb 2008
Posts: 627

Original Poster
Blog Entries: 1

Rep: Reputation: 48
Thanks i92guboj. I realize this may not be a walk in the park.

14moose: I have output of three commands: "sensors; sensors-detect; and
sensors" in Post #5 of this thread. Thanks.
 
Old 08-28-2010, 12:24 PM   #9
14moose
Member
 
Registered: May 2010
Posts: 83

Rep: Reputation: Disabled
Sigh...

1. Thank you for posting the output. Sorry I didn't notice you posted *multiple* different commands.

2. Q: Independent of getting lm_sensors (and, possibly *other* monitoring tools) working, have you got the "hardware" and "environment" issues squared away?

Specifically:
a) *do* you have your PC in a "cool" room now?
b) *Have* you considered UPS, motherboard or other potential issues?

3. Q: What (if anything) is in your /etc/sysconfig/lm_sensors config file?

4. Q: Did you get a chance to look at the LinuxJournal article?
Or any of the other resources I linked to?

5. What *is* your hardware configuration?
A desktop PC? A blade server?

6. What is your OS? CentOS 5.5?

Thank you in advance .. PSM
 
Old 09-02-2010, 06:47 PM   #10
14moose
Member
 
Registered: May 2010
Posts: 83

Rep: Reputation: Disabled
centguy -

Did you resolve the problem? Did you get lm_sensors working, and an automated "software alert" system set up? Did you get the PC moved to a well ventilated room? Has the PC shut itself down and/or rebooted again?

Inquiring minds want to know
 
Old 09-02-2010, 09:49 PM   #11
centguy
Member
 
Registered: Feb 2008
Posts: 627

Original Poster
Blog Entries: 1

Rep: Reputation: 48
nope. Actually I kind of giving up. I don't have time to mess with the drivers.

The key question: "How to load the drivers" as mentioned in the sensors-detect?
Quote:
Make sure you loaded all the kernel drivers you need.
BTW, I am using CentOS 5.2 quit a lot!
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
What is heating up my laptop? nsp Linux - Laptop and Netbook 25 03-17-2009 07:32 AM
linux turned itself on? imaniman Linux - General 6 02-04-2008 02:10 AM
Curious/suspect problem: No X start in ANYONE of the Linux partitions faust_in_linux Linux - Security 2 08-16-2007 10:57 PM
I suspect my linux server is hacked. What should i do ?? td0l2 Linux - Security 6 06-24-2004 04:13 AM
About clock and heating Darth Linux - Hardware 2 05-18-2004 02:02 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 08:36 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration