My son is innocent!
I had been getting angry with my 2.5 year old son daily because I thought he was hitting the reset button on my slackbox. See, I work nights so his mother gets up with him in the morning... but when she gets up, she's been falling asleep on the couch! So he gets to run around and do whatever he wants until she or I get up.
The slackbox was being reset literally daily. Each day, I cringed at the possibility of corrupt files. Each day, though, everything was fine. The first time he did this (I watched him flick off the power bar,) I lost everything, which is why I've been frustrated with these daily occurances. So I was up with him instead, and even before anyone could touch the box, I noticed it had been reset. It just sits there at a login prompt ready for use. It always has messages about a bad filesystem getting fixed up (if I do a regular "shutdown -r now", I don't get the bad filesystem messages, so I know it is being reset incorrectly somehow.) How can I found out what is causing my system to reset every day like this (in the middle of the night, apparently, since it's fine before I go to bed.) Is there a process? Is there some key-combo other than CTRL+ALT+DEL that my cat might be stepping on. Even if there was, I leave the box completely logged out before I go to bed (I just leave it on so all my servers can run.) I know the cat can't push the reset button. Any insight would be appreciated. Thanks. |
Maybe you got hardware problems that cause a reboot?
|
Check the logs, see if it's happening at the same time every night and what is happening right before the reset.
|
What logs do I check?
|
last
|
if by "last" you mean /var/log/lastlog, that just contains a bunch of ^ and @ symbols. could you be more specific?
|
Check /var/log/dmesg and /var/log/messages. Also, do you have any cron jobs scheduled to run at night?
|
I'm not really sure what to make of dmesg and messages in /var/log. There is a lot to go through in them, and I'm not advanced with linux yet. As far as cron goes, I think what you're looking for us /etc/cron.daily? If so, the only thing in that directory is a script called slocate which runs "/usr/bin/updatedb -c" and another script called logrotate which runs "/usr/sbin/logrotate /etc/logrotate.conf"
|
I'm with meranto. Sounds like a possible hardware issue. It sounds like you keep it running all the time (or try to anyway). What's the ventilation like around the box and what measures do you take to reduce heat inside the box?
|
My server started to do spontaneous reboots - every once in a while, and I did not take it too seriously... but then the period between two reboots became shorter... and shorter... now rebooting every day... and even faster... up to the dreaded situation where it would reboot again before the computer had a chance to reach the command prompt.
That was the moment I decided that the hardware was wasted and I swapped it for the spare server I had been building in anticipation of this. So yeah... most probably you've got a hardware problem :-) On the other hand, I once had a situation where a server would reboot exactly once a week, at the same time. Nothing wrong with the server hardware... Turned out that the UPS it was connected to performed a "self-test" once a week, thereby cutting power to the attached computers for just a tiny fraction of a second. All computers but one managed to keep running! Eric |
I suppose it's possible that there is a hardware problem... but rebooting in the same time-frame every day? I just got home from work, and the PC is fine. It's been on all day. It only happens overnight. Is there a way I can check what time it is rebooting?
I think I'll reboot it before I go to bed tonight. If it's been on all day without rebooting, it can sure withstand 8 hours overnight after a fresh reboot. I'm confused by the comment about me trying to keep it running all day... isn't keeping it running all day a good thing? It runs servers... :\ |
You could try the following:
As root, at the command prompt, enter: grep -e "Nov 14" /var/log/messages > /home/user/test What this means is this: You are looking for an expression "Nov 14" in the file "messages" which is located in the /var/log/ directory. The ">" means you are sending the result of your search to the "test" file located in /home/user. Next, you would need Midnight Commander to do the following: Open mc (Midnight Commander), then navigate down /home/user until you highlight the file "test". Click F3 to view the file. When viewing, click F7 to bring up the search tool. Enter "restart" in the search field, and then choose OK The word "restart" will be highlighted. This will tell you the time that the computer was restarted on Nov. 14 Repeat F7 and the search will take you to the second instance of your computer restarting. Repeat until all restart times have been shown. Instead of "test" you could name the file Nov14. Do the same for Nov. 13 etc. |
I would also gues hardware issue (not much help, huh? :) )
I also think the logs may be helpful. If you are having trouble knowing what to look for in the logs, they aren't doing much good. "mv" them to some other name before bed. Then, when you wake up and see that the server has rebooted, the logs have "mostly" useful info. Alternatively, you can "echo" in a distinctive mark (in combination with ">>") before you go to bed. The issue will clearly occur after the mark. |
Quote:
Code:
#last |
Re: My son is innocent!
Quote:
I should warn you, though, that there are other bugs that have been introduced with the more recent versions. Still, I'm generally happy with this app. |
Re: Re: My son is innocent!
Quote:
|
#last says it's rebooting every morning at 04:40 every day for the past 8 days.
I figured if it's happening at the exact same time every day, maybe it IS a cron job. Well I noticed that whenever I ran slocate it said the database hasn't been updated in 8 days (!) So I ran the "slocate" script in /etc/cron.daily, which runs updatedb. I did this a few times. Each time, it would either reboot or lock up. I tried running updatedb myself instead of using the script in cron.daily and it still rebooted or froze. I guess we found the culprit. Is there a way to get updatedb working again, or am I going to have to disable that cron job? If the latter, how do I disable it? Thanks for all the help! |
is this a new computer?
Other than being a hardware issue it could also be a heating issue. Your motherboard may have a default reboot for excessive heat. You can check that in your BIOS. Just another thought is all. Best of luck to you. |
I know this problem.
Try swapping out the PSU (power supply) |
It's a brand new PSU.
To the last 2 replies... are you saying that even though it happens at the exact same time every morning, and it appears to be the updatedb command causing the reboot, that you still believe it to be a hardware issue? |
Given that it seems to be an updatedb issue. I would think it's 1 of 2 things:
1. There are corruptions in the filesystems that updatedb is performing on. 2. It's a harddrive issue so when certain spots of the harddrive (bad sectors?) are accessed the system becomes unstable. Not a linux expert so can't comment further. Good luck. |
Doesn't Slackware by default run updatedb at 4:30 am everyday? It does so on mine.
|
Quote:
|
If I run updatedb manually, it reboots (or sometimes just freezes.)
|
Quote:
|
Updatedb (the slocate script in /etc/cron.daily) runs at 4:40AM every day by default. You've found the problem.
Running updatedb or slocate -u uses lots of resources. It's very IO intensive on the hard drive and also uses lots of CPU cycles. The fact that the machine reboots or freezes every time this command means that the machine will probably reboot or freeze under any other similar command that puts a good amount of stress on the system. A great stress test for this sort of thing is recompiling the kernel. First things to check: make sure the CPU fan isn't blocked by a misplaced cable or large amounts of dust accumulation. Also check the BIOS to see if you can get CPU / Motherboard temperature readings. Try switching out the power supply, if you have another available. When you get large current draws from high system loads, the power may fluctuate from the PS which may in turn reboot the machine. It seems to me that either heat on the CPU or a bad power supply are at fault here. -- Shade |
Re: My son is innocent!
Quote:
One of these days you or your wife will wake up and find your 2 year old has wandered outside and got hit by a car!!!!!! NEVER leave a two year old alone and unwatched!!! |
Thanks Dr. Phil, but I was quite aware of that already. I see you've never made mistakes before, and therefore don't understand. That aside, he's not capable of undoing all of the locks to get outside. And even if he was capable of that, he still can't undo the deadbolt for the outside door, which requires a key.
Anyway, this is hardly relevent to LinuxQuestions.org. If you'd like to talk to me about parenting, you can email me at josh.darrell@gmail.com. |
Could it be possible that we're overengineering the problem here?
Have you tried reinstalling the updatedb package and checking your filesystem? It might be a corrupt binary for whatever reason ... thinking about it, you might want to consider a reinstall of glibc, too, just in case that bit got corrupted somewhere along the line. I'm sort of clutching at straws here, but reinstallation is cheaper than buying new hardware to replace possibibly faulty hardware. Like yuchai and Ilgar I highly recommend doing a manual disk check. Do you have a custom kernel? Perhaps there's some kernel level problem with disk access. I would rule out software problems before I started chasing hardware gremlins, personally. - Piete "I've still got more opinions!" Sartain. |
I would love to do a manual disk check, but I don't know how.
I am using the default 2.4.x kernel that comes with slack 10.2, so no custom kernel here. As for reinstalling the updatedb package, which package would that be? I don't see an updatedb package in /var/log/packages. Btw to everyone, I did chmod -x slocate in /etc/cron.daily, so as to prevent updatedb from running at 4:40am, and the PC was okay when I got up this morning. It did not freeze/reset. Thank you for the suggestions piete. Now if I could just figure out how to do them :) |
What do the sensors say about your temperature? Try doing
cat /proc/acpi/thermal_zone/THRM/temperature - doesn't have to be as root. If the temperature is already at a high level, updatedb could be the thing that would put it over the edge. |
cat: /proc/acpi/thermal_zone/THRM/temperature: No such file or directory
~$ cd /proc/acpi -bash: cd: /proc/acpi: No such file or directory :( |
Alrighty then, let's have a look at this. For expediency's sake, I shall give you the commands and tell you precisely how to use them, but i would have a look at the man pages for them, too, to double what we're gonna do =)
First off, a manual disk check: 1. I do manual checks (when necessary) from a boot disk (you can use disk 1 of your Slackware system for our purposes), but I feel sure you should be able to do it from within the system. Anyway, it doesn't hurt using a boot disk, since you'll be checking the root partition, and you don't really want it mounted at the time anyway. 2. Boot Slack disk1, you should find yourself at a console. The command you need is `fsck`. fsck itself is only a frontend to the other file checkers, check the man pages to be sure you know what you're after. I'll run through an example, below. Because I don't know what filesystem you're using or how your box is set up, I will make the assumptions that you're using ext2 and your root partition is /dev/hda1. It should cover the basics enough to get you started, at any rate. If you need more help, we can cross that bridge when we get to it. The filesystem check on ext2 is called either `fsck -t ext2`, `fsck.ext2` or it's actual binary `e2fsck`. I wil be using e2fsck, which has it's own associated man-page, and you won't get too confused then, I hope ;) The commands that are used to check consistency in the boot scripts can be found in /etc/rc.d/rc.S . 3. We want to check a disk for errors and attempt recovery on any errors we find. Code:
#e2fsck -pccfkv Please please please, if anyone knows better - tell me! I would be happy running this on my machine, but, I don't wanna wreck someone elses! Secondly, reinstallation (and location!) of updatedb Basically, `cat /var/log/packages/* | grep updatedb` will get you the location of updatedb, but ... i always found that a bit of a pain, so I adapted it and turned it into a script: Code:
#!/bin/bash Code:
#findme "/var/log/packages/*" updatedb Code:
piete@Melchior:~$ scripts/findme "/var/log/packages/*" updatedb I hope I've covered everything you need to set you straight, but you know where we are if you need additional help! Good luck, and I hope to hear you're problems are all gone next time I read a message from you ;) - Piete "Oh no not another essay" ... =D |
two crazy things
Hi there
First, my mother at a version i am forbidden to say (but it's over 40) keeps crashing Windows in a cybercafe. It doesn't happen in Linux. My girlfriend does crash everything at version 23.6 ;) Second, you should think about installing a video camera in front of the monitor and record everything. (this is what i would do). I did it, indeed, and I found interesting things ;) May the Force be with you! |
I use reiserfs.. I read the man page for reiserfsck, and it suggested I run reiserfsck --check --logfile check.log /dev/hda2 (since hda2 is where my / is located.)
Now, what I don't understand is how I'm supposed to write a logfile when booting from CD. But anyway, that doesn't even matter because I booted from Slack CD1 and ran reiserfsck --check /dev/hda2 (no logfile) and it told me it couldn't see a reiserfs superblock or something like that, and wanted me to run with the --rebuild-sb option. Before doing that, I figured I should probably ask about it on here first. |
Having a squint at the reiserfsck man page it doesn't appear that --rebuild-sb is inherantly destructive. I know I would be happy running it, but again, it's not my box and I don't run reiserfs =/
Not stunningly robust, but this little script will backup some of your config files. http://www.kaear.co.uk/linux/sysbackup.sh If i recall correctly, edit the SAVE variable in the script to point to where you want stuff saved (because it destroys the directory to empty it, make sure you give it a path that DOES NOT already exist) then run it with `sysbackup.sh --backup`. For a more complete backup, copy /etc, any configs you've editted outside of /etc, and any MySQL databases you have out of / and somewhere more secure. I reckon there's a 95% chance that there's nothing wrong with your filesystem (although why it can't find a valid superblock is a bit of a worry) and there's some other problem with your box, but, there's always that 5% chance that there is something else wrong and you risk loosing data. I'm not trying to be a scaremonger, just trying to warn you of possibilities. If you've got everything backed up and safe to mess with, then, give the --rebuild-sb a go. - Piete. |
do you have your slackware boot disk handy?
reboot with that and run fsck with your root mounted as read-only btw, i'm with Dr. Phil and Nurce Nacy on this one Quote:
|
Hey folks,
So here's what happened... I booted from the Slack 10.2 Install Disc 1, and tried "reiserfsck --check /dev/hda2" again. For whatever reason, it worked this time. It ran a check and told me that I had 5 fatal errors that would only be corrected with the --rebuild-tree option. So I bit down and ran "reiserfsck --rebuild-tree /dev/hda2." It ran its course, said it corrected all the errors, and returned me to my # prompt. I ejected the Slack CD, and gave the 3-finger salute. The PC booted just fine! I installed the slocate package, ran a "touch /var/lib/slocate/slocate.db" and then an "updatedb" and it all worked. So if this happens again, it'll probably be time to replace this old 20 GB hard drive. Thanks for all the help guys. I learned stuff! |
As a final comment, I would like to commend people for this discussion - very good advice, very good encouragement, ending up with a good solution. That is what makes this community no. 1.
|
All times are GMT -5. The time now is 07:01 PM. |