LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Slackware (https://www.linuxquestions.org/questions/slackware-14/)
-   -   My son is innocent! (https://www.linuxquestions.org/questions/slackware-14/my-son-is-innocent-382988/)

Seiken 11-14-2005 06:32 AM

My son is innocent!
 
I had been getting angry with my 2.5 year old son daily because I thought he was hitting the reset button on my slackbox. See, I work nights so his mother gets up with him in the morning... but when she gets up, she's been falling asleep on the couch! So he gets to run around and do whatever he wants until she or I get up.

The slackbox was being reset literally daily. Each day, I cringed at the possibility of corrupt files. Each day, though, everything was fine. The first time he did this (I watched him flick off the power bar,) I lost everything, which is why I've been frustrated with these daily occurances.

So I was up with him instead, and even before anyone could touch the box, I noticed it had been reset. It just sits there at a login prompt ready for use. It always has messages about a bad filesystem getting fixed up (if I do a regular "shutdown -r now", I don't get the bad filesystem messages, so I know it is being reset incorrectly somehow.)

How can I found out what is causing my system to reset every day like this (in the middle of the night, apparently, since it's fine before I go to bed.) Is there a process? Is there some key-combo other than CTRL+ALT+DEL that my cat might be stepping on. Even if there was, I leave the box completely logged out before I go to bed (I just leave it on so all my servers can run.) I know the cat can't push the reset button.

Any insight would be appreciated. Thanks.

meranto 11-14-2005 06:45 AM

Maybe you got hardware problems that cause a reboot?

cs-cam 11-14-2005 07:02 AM

Check the logs, see if it's happening at the same time every night and what is happening right before the reset.

Seiken 11-14-2005 07:27 AM

What logs do I check?

sweetnsourbkr 11-14-2005 08:59 AM

last

Seiken 11-14-2005 10:16 AM

if by "last" you mean /var/log/lastlog, that just contains a bunch of ^ and @ symbols. could you be more specific?

mdg 11-14-2005 11:03 AM

Check /var/log/dmesg and /var/log/messages. Also, do you have any cron jobs scheduled to run at night?

Seiken 11-14-2005 11:16 AM

I'm not really sure what to make of dmesg and messages in /var/log. There is a lot to go through in them, and I'm not advanced with linux yet. As far as cron goes, I think what you're looking for us /etc/cron.daily? If so, the only thing in that directory is a script called slocate which runs "/usr/bin/updatedb -c" and another script called logrotate which runs "/usr/sbin/logrotate /etc/logrotate.conf"

dracolich 11-14-2005 02:20 PM

I'm with meranto. Sounds like a possible hardware issue. It sounds like you keep it running all the time (or try to anyway). What's the ventilation like around the box and what measures do you take to reduce heat inside the box?

Alien Bob 11-14-2005 03:54 PM

My server started to do spontaneous reboots - every once in a while, and I did not take it too seriously... but then the period between two reboots became shorter... and shorter... now rebooting every day... and even faster... up to the dreaded situation where it would reboot again before the computer had a chance to reach the command prompt.
That was the moment I decided that the hardware was wasted and I swapped it for the spare server I had been building in anticipation of this.

So yeah... most probably you've got a hardware problem :-)

On the other hand, I once had a situation where a server would reboot exactly once a week, at the same time. Nothing wrong with the server hardware...
Turned out that the UPS it was connected to performed a "self-test" once a week, thereby cutting power to the attached computers for just a tiny fraction of a second. All computers but one managed to keep running!

Eric

Seiken 11-14-2005 08:48 PM

I suppose it's possible that there is a hardware problem... but rebooting in the same time-frame every day? I just got home from work, and the PC is fine. It's been on all day. It only happens overnight. Is there a way I can check what time it is rebooting?

I think I'll reboot it before I go to bed tonight. If it's been on all day without rebooting, it can sure withstand 8 hours overnight after a fresh reboot.

I'm confused by the comment about me trying to keep it running all day... isn't keeping it running all day a good thing? It runs servers... :\

zzak 11-14-2005 10:35 PM

You could try the following:
As root, at the command prompt, enter:
grep -e "Nov 14" /var/log/messages > /home/user/test

What this means is this:
You are looking for an expression "Nov 14" in the file "messages"
which is located in the /var/log/ directory.

The ">" means you are sending the result of your search to
the "test" file located in /home/user.

Next, you would need Midnight Commander to do the following:
Open mc (Midnight Commander), then navigate down /home/user until
you highlight the file "test".
Click F3 to view the file.
When viewing, click F7 to bring up the search tool.
Enter "restart" in the search field, and then choose OK
The word "restart" will be highlighted. This will tell you
the time that the computer was restarted on Nov. 14
Repeat F7 and the search will take you to the second instance
of your computer restarting.
Repeat until all restart times have been shown.

Instead of "test" you could name the file Nov14.
Do the same for Nov. 13 etc.

shilo 11-14-2005 11:41 PM

I would also gues hardware issue (not much help, huh? :) )

I also think the logs may be helpful. If you are having trouble knowing what to look for in the logs, they aren't doing much good. "mv" them to some other name before bed. Then, when you wake up and see that the server has rebooted, the logs have "mostly" useful info. Alternatively, you can "echo" in a distinctive mark (in combination with ">>") before you go to bed. The issue will clearly occur after the mark.

slackMeUp 11-15-2005 12:00 AM

Quote:

Originally posted by Seiken
if by "last" you mean /var/log/lastlog, that just contains a bunch of ^ and @ symbols. could you be more specific?
By 'last' he actually means...

Code:

#last
It's a command... it displays the last logins, and reboots...

dhave 11-15-2005 01:12 AM

Re: My son is innocent!
 
Quote:

Originally posted by Seiken
I had been getting angry with my 2.5 year old son daily because I thought he was hitting the reset button on my slackbox. See, I work nights so his mother gets up with him in the morning... but when she gets up, she's been falling asleep on the couch! So he gets to run around and do whatever he wants until she or I get up.

The slackbox was being reset literally daily. Each day, I cringed at the possibility of corrupt files. Each day, though, everything was fine. The first time he did this (I watched him flick off the power bar,) I lost everything, which is why I've been frustrated with these daily occurances.
[...]
Any insight would be appreciated. Thanks.

You could wait until an upgrade of the kid is available. I did this, and the current version of my son (he's at v. 17.4 now) is no longer goofing things up on my computer. In fact, he helps fix it. When my kid was at v. 2.x and even v. 3.x, however, I had a lot of problems of the type you describe.

I should warn you, though, that there are other bugs that have been introduced with the more recent versions. Still, I'm generally happy with this app.

cs-cam 11-15-2005 03:17 AM

Re: Re: My son is innocent!
 
Quote:

Originally posted by dhave
You could wait until an upgrade of the kid is available. I did this, and the current version of my son (he's at v. 17.4 now) is no longer goofing things up on my computer. In fact, he helps fix it. When my kid was at v. 2.x and even v. 3.x, however, I had a lot of problems of the type you describe.

I should warn you, however, that there are other bugs that have been introduced with the more recent versions. Still, I'm generally happy with this app.

Dork ;)

Seiken 11-15-2005 07:41 AM

#last says it's rebooting every morning at 04:40 every day for the past 8 days.

I figured if it's happening at the exact same time every day, maybe it IS a cron job. Well I noticed that whenever I ran slocate it said the database hasn't been updated in 8 days (!) So I ran the "slocate" script in /etc/cron.daily, which runs updatedb. I did this a few times. Each time, it would either reboot or lock up. I tried running updatedb myself instead of using the script in cron.daily and it still rebooted or froze. I guess we found the culprit.

Is there a way to get updatedb working again, or am I going to have to disable that cron job? If the latter, how do I disable it?

Thanks for all the help!

Thanotos 11-15-2005 11:35 AM

is this a new computer?

Other than being a hardware issue it could also be a heating issue.
Your motherboard may have a default reboot for excessive heat.

You can check that in your BIOS.

Just another thought is all. Best of luck to you.

Dankles 11-15-2005 11:40 AM

I know this problem.
Try swapping out the PSU (power supply)

Seiken 11-15-2005 02:07 PM

It's a brand new PSU.

To the last 2 replies... are you saying that even though it happens at the exact same time every morning, and it appears to be the updatedb command causing the reboot, that you still believe it to be a hardware issue?

yuchai 11-15-2005 02:21 PM

Given that it seems to be an updatedb issue. I would think it's 1 of 2 things:

1. There are corruptions in the filesystems that updatedb is performing on.
2. It's a harddrive issue so when certain spots of the harddrive (bad sectors?) are accessed the system becomes unstable.

Not a linux expert so can't comment further. Good luck.

Ilgar 11-15-2005 02:21 PM

Doesn't Slackware by default run updatedb at 4:30 am everyday? It does so on mine.

Ilgar 11-15-2005 02:24 PM

Quote:

Originally posted by yuchai
Given that it seems to be an updatedb issue. I would think it's 1 of 2 things:

1. There are corruptions in the filesystems that updatedb is performing on.
2. It's a harddrive issue so when certain spots of the harddrive (bad sectors?) are accessed the system becomes unstable.

Not a linux expert so can't comment further. Good luck.

We've posted at the same time -- yes I'd come to the same conclusion. Why not run a disk check or updatedb manually? In etiher case all corners of the hdd will be accessed and we'll see if there's a hardware problem.

Seiken 11-15-2005 02:31 PM

If I run updatedb manually, it reboots (or sometimes just freezes.)

dracolich 11-15-2005 03:15 PM

Quote:

I'm confused by the comment about me trying to keep it running all day... isn't keeping it running all day a good thing? It runs servers... :\
I just wanted to be sure I understood the situation correctly. My idea is that throughout the day the hardware inside the box build up a great amount of heat., and heat has a dramatic effect on overall system performance. If you have a small box or the inside of the box loaded with disk drives and cables and high performance cards then it's going to create a LOT of heat. After a while this can cause performance loss and even system "hiccups". That's why I asked about current ventilation and cooling.

Shade 11-15-2005 06:51 PM

Updatedb (the slocate script in /etc/cron.daily) runs at 4:40AM every day by default. You've found the problem.

Running updatedb or slocate -u uses lots of resources. It's very IO intensive on the hard drive and also uses lots of CPU cycles.

The fact that the machine reboots or freezes every time this command means that the machine will probably reboot or freeze under any other similar command that puts a good amount of stress on the system. A great stress test for this sort of thing is recompiling the kernel.

First things to check: make sure the CPU fan isn't blocked by a misplaced cable or large amounts of dust accumulation.
Also check the BIOS to see if you can get CPU / Motherboard temperature readings.

Try switching out the power supply, if you have another available. When you get large current draws from high system loads, the power may fluctuate from the PS which may in turn reboot the machine.

It seems to me that either heat on the CPU or a bad power supply are at fault here.

-- Shade

freakyg 11-16-2005 05:31 AM

Re: My son is innocent!
 
Quote:

Originally posted by Seiken
See, I work nights so his mother gets up with him in the morning... but when she gets up, she's been falling asleep on the couch! So he gets to run around and do whatever he wants until she or I get up.

First of all, A 2 year old should NEVER EVER be left to run around and do whatever they want!!!!!!!!!!!!!!!!!
One of these days you or your wife will wake up and find your 2 year old has wandered outside and got hit by a car!!!!!! NEVER leave a two year old alone and unwatched!!!

Seiken 11-16-2005 06:01 AM

Thanks Dr. Phil, but I was quite aware of that already. I see you've never made mistakes before, and therefore don't understand. That aside, he's not capable of undoing all of the locks to get outside. And even if he was capable of that, he still can't undo the deadbolt for the outside door, which requires a key.

Anyway, this is hardly relevent to LinuxQuestions.org. If you'd like to talk to me about parenting, you can email me at josh.darrell@gmail.com.

piete 11-16-2005 06:56 AM

Could it be possible that we're overengineering the problem here?

Have you tried reinstalling the updatedb package and checking your filesystem? It might be a corrupt binary for whatever reason ... thinking about it, you might want to consider a reinstall of glibc, too, just in case that bit got corrupted somewhere along the line.

I'm sort of clutching at straws here, but reinstallation is cheaper than buying new hardware to replace possibibly faulty hardware. Like yuchai and Ilgar I highly recommend doing a manual disk check. Do you have a custom kernel? Perhaps there's some kernel level problem with disk access. I would rule out software problems before I started chasing hardware gremlins, personally.

- Piete "I've still got more opinions!" Sartain.

Seiken 11-16-2005 10:07 AM

I would love to do a manual disk check, but I don't know how.

I am using the default 2.4.x kernel that comes with slack 10.2, so no custom kernel here.

As for reinstalling the updatedb package, which package would that be? I don't see an updatedb package in /var/log/packages.

Btw to everyone, I did chmod -x slocate in /etc/cron.daily, so as to prevent updatedb from running at 4:40am, and the PC was okay when I got up this morning. It did not freeze/reset.

Thank you for the suggestions piete. Now if I could just figure out how to do them :)

mjjzf 11-16-2005 10:52 AM

What do the sensors say about your temperature? Try doing
cat /proc/acpi/thermal_zone/THRM/temperature
- doesn't have to be as root. If the temperature is already at a high level, updatedb could be the thing that would put it over the edge.

Seiken 11-16-2005 11:02 AM

cat: /proc/acpi/thermal_zone/THRM/temperature: No such file or directory
~$ cd /proc/acpi
-bash: cd: /proc/acpi: No such file or directory

:(

piete 11-16-2005 12:06 PM

Alrighty then, let's have a look at this. For expediency's sake, I shall give you the commands and tell you precisely how to use them, but i would have a look at the man pages for them, too, to double what we're gonna do =)

First off, a manual disk check:

1. I do manual checks (when necessary) from a boot disk (you can use disk 1 of your Slackware system for our purposes), but I feel sure you should be able to do it from within the system. Anyway, it doesn't hurt using a boot disk, since you'll be checking the root partition, and you don't really want it mounted at the time anyway.

2. Boot Slack disk1, you should find yourself at a console. The command you need is `fsck`. fsck itself is only a frontend to the other file checkers, check the man pages to be sure you know what you're after. I'll run through an example, below.

Because I don't know what filesystem you're using or how your box is set up, I will make the assumptions that you're using ext2 and your root partition is /dev/hda1. It should cover the basics enough to get you started, at any rate. If you need more help, we can cross that bridge when we get to it.

The filesystem check on ext2 is called either `fsck -t ext2`, `fsck.ext2` or it's actual binary `e2fsck`. I wil be using e2fsck, which has it's own associated man-page, and you won't get too confused then, I hope ;)

The commands that are used to check consistency in the boot scripts can be found in /etc/rc.d/rc.S .

3. We want to check a disk for errors and attempt recovery on any errors we find.

Code:

      #e2fsck -pccfkv
I highly recommend you `man e2fsck` to see what I'm doing; but basically we're running a badblock checker, in verbose mode, that will attempt to automatically fix any errors.

Please please please, if anyone knows better - tell me! I would be happy running this on my machine, but, I don't wanna wreck someone elses!

Secondly, reinstallation (and location!) of updatedb

Basically, `cat /var/log/packages/* | grep updatedb` will get you the location of updatedb, but ... i always found that a bit of a pain, so I adapted it and turned it into a script:

Code:

#!/bin/bash

for nfile in $1
do
        if [ -f "$nfile" ]
        then
                if [ `cat $nfile | grep $2 | wc -l` -ge 1 ]
                then
                        echo "Found in ${nfile}:"
                        cat $nfile | grep $2
                fi
        fi
done

This copy that into a file (I call mine "findme") and then run it like this:

Code:

      #findme "/var/log/packages/*" updatedb
It'll return something that looks like this:

Code:

piete@Melchior:~$ scripts/findme "/var/log/packages/*" updatedb
Found in /var/log/packages/dcron-2.3.3-x86_64-1:
dcron: with cron, such as the nightly indexing with updatedb.
Found in /var/log/packages/slocate-2.7-x86_64-1:
etc/updatedb.conf.new
usr/man/man1/updatedb.1.gz

So, I reckon you need to reinstall the slocate package (uhm, please ignore the fact that I'm using a 64-bit machine, you want to reinstall slocate-i486) to fix updatedb.

I hope I've covered everything you need to set you straight, but you know where we are if you need additional help!

Good luck, and I hope to hear you're problems are all gone next time I read a message from you ;)
- Piete "Oh no not another essay" ...

=D

folkenfanel 11-16-2005 05:50 PM

two crazy things
 
Hi there

First, my mother at a version i am forbidden to say (but it's over 40) keeps crashing Windows in a cybercafe. It doesn't happen in Linux.

My girlfriend does crash everything at version 23.6 ;)

Second, you should think about installing a video camera in front of the monitor and record everything. (this is what i would do). I did it, indeed, and I found interesting things ;)

May the Force be with you!

Seiken 11-16-2005 09:34 PM

I use reiserfs.. I read the man page for reiserfsck, and it suggested I run reiserfsck --check --logfile check.log /dev/hda2 (since hda2 is where my / is located.)

Now, what I don't understand is how I'm supposed to write a logfile when booting from CD. But anyway, that doesn't even matter because I booted from Slack CD1 and ran reiserfsck --check /dev/hda2 (no logfile) and it told me it couldn't see a reiserfs superblock or something like that, and wanted me to run with the --rebuild-sb option. Before doing that, I figured I should probably ask about it on here first.

piete 11-17-2005 07:18 AM

Having a squint at the reiserfsck man page it doesn't appear that --rebuild-sb is inherantly destructive. I know I would be happy running it, but again, it's not my box and I don't run reiserfs =/

Not stunningly robust, but this little script will backup some of your config files.
http://www.kaear.co.uk/linux/sysbackup.sh

If i recall correctly, edit the SAVE variable in the script to point to where you want stuff saved (because it destroys the directory to empty it, make sure you give it a path that DOES NOT already exist) then run it with `sysbackup.sh --backup`.

For a more complete backup, copy /etc, any configs you've editted outside of /etc, and any MySQL databases you have out of / and somewhere more secure.

I reckon there's a 95% chance that there's nothing wrong with your filesystem (although why it can't find a valid superblock is a bit of a worry) and there's some other problem with your box, but, there's always that 5% chance that there is something else wrong and you risk loosing data.

I'm not trying to be a scaremonger, just trying to warn you of possibilities.

If you've got everything backed up and safe to mess with, then, give the --rebuild-sb a go.

- Piete.

chrisortiz 11-17-2005 11:17 AM

do you have your slackware boot disk handy?

reboot with that and run fsck with your root mounted as read-only


btw, i'm with Dr. Phil and Nurce Nacy on this one


Quote:

First of all, A 2 year old should NEVER EVER be left to run around and do whatever they want!!!!!!!!!!!!!!!!!
use duct tape and Nyquill, but when they get around version 4.3 they develop an immunity to both.

Seiken 11-17-2005 11:48 AM

Hey folks,

So here's what happened... I booted from the Slack 10.2 Install Disc 1, and tried "reiserfsck --check /dev/hda2" again. For whatever reason, it worked this time. It ran a check and told me that I had 5 fatal errors that would only be corrected with the --rebuild-tree option. So I bit down and ran "reiserfsck --rebuild-tree /dev/hda2." It ran its course, said it corrected all the errors, and returned me to my # prompt. I ejected the Slack CD, and gave the 3-finger salute. The PC booted just fine! I installed the slocate package, ran a "touch /var/lib/slocate/slocate.db" and then an "updatedb" and it all worked.

So if this happens again, it'll probably be time to replace this old 20 GB hard drive.

Thanks for all the help guys. I learned stuff!

mjjzf 11-17-2005 03:11 PM

As a final comment, I would like to commend people for this discussion - very good advice, very good encouragement, ending up with a good solution. That is what makes this community no. 1.


All times are GMT -5. The time now is 07:01 PM.