View/Stop and and all cron jobs?
I have a server that seems to hang/die sometime every week between Friday afternoon and early Monday morning.....I'm having a hard time troubleshooting what's causing it, but I imagine it's something in cron.weekly (?)....How can I find out if there's anything scheduled to run between those times? Do I have to check crontab for each and every user or what? Thanks....
|
Re: View/Stop and and all cron jobs?
Quote:
even though a user space process (anything that isn't run by root) shouldn't be able to crash the box... Cheers, Tink |
Thanks....There's NOTHING in there....However when I look in /var/spool/anacron, there are two entries (One in cron.daily and one in cron.weekly) that are just the past two dates when this thing has crashed. I tried to do a "more cron.daily" but all it gives me is that date...Any ideas?
|
I'm not using anacron (I start "at" jobs in
rc.local instead ;)) ... maybe there's a anacron log directory? :) Cheers, Tink |
Ok, after checking my "lastlog", I see this in there, followed by "last message repeated..." for about 1000 lines, clogging up my syslog and having gpm run at 99.9% of the CPU....
Nov 25 08:28:41 praesto1 gpm[938]: Error in read()ing first: No such file or dir ectory Nov 25 08:28:49 praesto1 last message repeated 202659 times |
Try running strace gpm &> file
Then look at the file, and see what's failing. Could be certain devices or something like that are missing. it also could be a config file that's bad. |
Here's what the file tells me....
execve("/usr/sbin/gpm", ["gpm"], [/* 51 vars */]) = 0 uname({sys="Linux", node="praesto1", ...}) = 0 brk(0) = 0x805a500 old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 x40017000 open("/etc/ld.so.preload", O_RDONLY) = -1 ENOENT (No such file or direct open("/etc/ld.so.cache", O_RDONLY) = 3 fstat64(3, {st_mode=S_IFREG|0644, st_size=69846, ...}) = 0 old_mmap(NULL, 69846, PROT_READ, MAP_PRIVATE, 3, 0) = 0x40018000 close(3) = 0 open("/lib/i686/libc.so.6", O_RDONLY) = 3 read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0@\307\1"..., 1024) 4 fstat64(3, {st_mode=S_IFREG|0755, st_size=5779542, ...}) = 0 old_mmap(NULL, 1291464, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0x4002a00 mprotect(0x4015c000, 38088, PROT_NONE) = 0 old_mmap(0x4015c000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 1000) = 0x4015c000 old_mmap(0x40162000, 13512, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP YMOUS, -1, 0) = 0x40162000 close(3) = 0 munmap(0x40018000, 69846) = 0 brk(0) = 0x805a500 brk(0x805a680) = 0x805a680 brk(0x805b000) = 0x805b000 open("/var/run/gpm.pid", O_RDONLY) = 3 fstat64(3, {st_mode=S_IFREG|0600, st_size=5, ...}) = 0 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 018000 read(3, "1411\n", 4096) = 5 close(3) = 0 munmap(0x40018000, 4096) = 0 kill(1411, SIG_0) = 0 open("/dev/tty0", O_WRONLY) = 3 ioctl(3, 0x541c, 0x8056fa0) = 0 close(3) = 0 fork() = 1416 --- SIGCHLD (Child exited) --- _exit(0) = ? |
Do the same but with strace -f
Like that it will follow child processes. problem here isn't in the main one, it's a fork that's causing the problems. |
Ok, here's the output...
execve("/usr/sbin/gpm", ["gpm"], [/* 51 vars */]) = 0 uname({sys="Linux", node="praesto1", ...}) = 0 brk(0) = 0x805a500 old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 x40017000 open("/etc/ld.so.preload", O_RDONLY) = -1 ENOENT (No such file or direct open("/etc/ld.so.cache", O_RDONLY) = 3 fstat64(3, {st_mode=S_IFREG|0644, st_size=69846, ...}) = 0 old_mmap(NULL, 69846, PROT_READ, MAP_PRIVATE, 3, 0) = 0x40018000 close(3) = 0 open("/lib/i686/libc.so.6", O_RDONLY) = 3 read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0@\307\1"..., 1024) 4 fstat64(3, {st_mode=S_IFREG|0755, st_size=5779542, ...}) = 0 old_mmap(NULL, 1291464, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0x4002a00 mprotect(0x4015c000, 38088, PROT_NONE) = 0 old_mmap(0x4015c000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 1000) = 0x4015c000 old_mmap(0x40162000, 13512, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP YMOUS, -1, 0) = 0x40162000 close(3) = 0 munmap(0x40018000, 69846) = 0 brk(0) = 0x805a500 brk(0x805a680) = 0x805a680 brk(0x805b000) = 0x805b000 open("/var/run/gpm.pid", O_RDONLY) = 3 fstat64(3, {st_mode=S_IFREG|0600, st_size=5, ...}) = 0 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 018000 read(3, "1411\n", 4096) = 5 close(3) = 0 munmap(0x40018000, 4096) = 0 kill(1411, SIG_0) = 0 open("/dev/tty0", O_WRONLY) = 3 ioctl(3, 0x541c, 0x8056fa0) = 0 close(3) = 0 fork() = 1416 --- SIGCHLD (Child exited) --- _exit(0) = ? |
Wierd.. my output looks nothing like that...
Are you using the newest gpm? |
OOps, my bad...I had killed gpm when it started taking up 99.9% of the CPU! Here's the output after starting it again...
execve("/usr/sbin/gpm", ["gpm"], [/* 51 vars */]) = 0 uname({sys="Linux", node="praesto1", ...}) = 0 brk(0) = 0x805a500 old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = x40017000 open("/etc/ld.so.preload", O_RDONLY) = -1 ENOENT (No such file or directory open("/etc/ld.so.cache", O_RDONLY) = 3 fstat64(3, {st_mode=S_IFREG|0644, st_size=69846, ...}) = 0 old_mmap(NULL, 69846, PROT_READ, MAP_PRIVATE, 3, 0) = 0x40018000 close(3) = 0 open("/lib/i686/libc.so.6", O_RDONLY) = 3 read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0@\307\1"..., 1024) = 1 4 fstat64(3, {st_mode=S_IFREG|0755, st_size=5779542, ...}) = 0 old_mmap(NULL, 1291464, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0x4002a000 mprotect(0x4015c000, 38088, PROT_NONE) = 0 old_mmap(0x4015c000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0x 1000) = 0x4015c000 old_mmap(0x40162000, 13512, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_AN YMOUS, -1, 0) = 0x40162000 close(3) = 0 munmap(0x40018000, 69846) = 0 brk(0) = 0x805a500 brk(0x805a680) = 0x805a680 brk(0x805b000) = 0x805b000 open("/var/run/gpm.pid", O_RDONLY) = 3 fstat64(3, {st_mode=S_IFREG|0600, st_size=5, ...}) = 0 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x 018000 read(3, "1411\n", 4096) = 5 close(3) = 0 munmap(0x40018000, 4096) = 0 kill(1411, SIG_0) = 0 open("/dev/tty0", O_WRONLY) = 3 ioctl(3, 0x541c, 0x8056fa0) = 0 close(3) = 0 fork() = 1539 [pid 1539] close(0) = 0 [pid 1539] close(1) = 0 [pid 1539] close(2) = 0 [pid 1539] open("/dev/console", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 0 [pid 1539] setsid() = 1539 [pid 1539] chdir("/") = 0 [pid 1539] umask(022) = 022 [pid 1539] gettimeofday({1069786274, 130636}, NULL) = 0 [pid 1539] getpid() = 1539 [pid 1539] open("/var/run//gpmiF88HQ", O_RDWR|O_CREAT|O_EXCL, 0600) = 1 [pid 1539] open("/var/run//gpmiF88HQ", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 2 [pid 1539] getpid() = 1539 [pid 1539] fstat64(2, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0 [pid 1539] mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, 1, 0) = 0x40018000 [pid 1539] write(2, "1539\n", 5) = 5 [pid 1539] close(2) = 0 [pid 1539] munmap(0x40018000, 4096) = 0 [pid 1539] link("/var/run//gpmiF88HQ", "/var/run/gpm.pid") = -1 EEXIST (File ists) [pid 1539] open("/var/run/gpm.pid", O_RDONLY) = 2 [pid 1539] fstat64(2, {st_mode=S_IFREG|0600, st_size=5, ...}) = 0 [pid 1539] mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, 1, 0) = 0x40018000 [pid 1539] read(2, "1411\n", 4096) = 5 [pid 1539] unlink("/var/run//gpmiF88HQ") = 0 [pid 1539] brk(0x805e000) = 0x805e000 [pid 1539] time([1069786274]) = 1069786274 [pid 1539] open("/etc/localtime", O_RDONLY) = 3 [pid 1539] fstat64(3, {st_mode=S_IFREG|0644, st_size=1017, ...}) = 0 [pid 1539] mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, 1, 0) = 0x40019000 [pid 1539] read(3, "TZif\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\4\0\0\0\4\0".. 4096) = 1017 [pid 1539] close(3) = 0 [pid 1539] munmap(0x40019000, 4096) = 0 [pid 1539] getpid() = 1539 [pid 1539] rt_sigaction(SIGPIPE, {0x401143c0, [], 0x4000000}, {SIG_DFL}, 8) = [pid 1539] socket(PF_UNIX, SOCK_DGRAM, 0) = 3 [pid 1539] fcntl64(0x3, 0x2, 0x1, 0x40114190) = 0 [pid 1539] connect(3, {sin_family=AF_UNIX, path="/dev/log"}, 16) = -1 ENOENT o such file or directory) [pid 1539] close(3) = 0 [pid 1539] rt_sigaction(SIGPIPE, {SIG_DFL}, NULL, 8) = 0 [pid 1539] time([1069786274]) = 1069786274 [pid 1539] getpid() = 1539 [pid 1539] rt_sigaction(SIGPIPE, {0x401143c0, [], 0x4000000}, {SIG_DFL}, 8) = [pid 1539] socket(PF_UNIX, SOCK_DGRAM, 0) = 3 [pid 1539] fcntl64(0x3, 0x2, 0x1, 0x40114190) = 0 [pid 1539] connect(3, {sin_family=AF_UNIX, path="/dev/log"}, 16) = -1 ENOENT o such file or directory) [pid 1539] close(3) = 0 [pid 1539] rt_sigaction(SIGPIPE, {SIG_DFL}, NULL, 8) = 0 [pid 1539] fstat64(0, {st_mode=S_IFCHR|0600, st_rdev=makedev(5, 1), ...}) = 0 [pid 1539] ioctl(0, 0x5401, {B38400 opost isig icanon echo ...}) = 0 [pid 1539] mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, 1, 0) = 0x40019000 [pid 1539] write(0, "gpm: oops() invoked from gpn.c(2"..., 36) = 36 [pid 1539] write(0, "gpm already running as pid 1411:"..., 59) = 59 [pid 1539] munmap(0x40019000, 4096) = 0 [pid 1539] _exit(1) = ? --- SIGCHLD (Child exited) --- _exit(0) = ? |
What's the contents of /etc/crontab (grep /etc/crontab -e "^[\*,0-9]") and dirs (grep /etc/crontab -e run-parts) containing cronjobs?
|
First one:
01 * * * * root run-parts /etc/cron.hourly 02 1 * * * root run-parts /etc/cron.daily 22 4 * * 0 root run-parts /etc/cron.weekly 42 4 1 * * root run-parts /etc/cron.monthly |
Second one:
# run-parts 01 * * * * root run-parts /etc/cron.hourly 02 1 * * * root run-parts /etc/cron.daily 22 4 * * 0 root run-parts /etc/cron.weekly 42 4 1 * * root run-parts /etc/cron.monthly Same thing?? |
Ahhh, OK. Could you list the contents of those four dirs please?
|
Cron.hourly:
[root@praesto1 cron.hourly]# ls -la | more total 12 drwxr-xr-x 2 root root 4096 Jan 3 2002 . drwxr-xr-x 59 root root 8192 Nov 25 08:16 .. Cron.Daily: [root@praesto1 cron.daily]# ls -la | more total 44 drwxr-xr-x 2 root root 4096 Nov 6 13:12 . drwxr-xr-x 59 root root 8192 Nov 25 08:16 .. -rwxr-xr-x 1 root root 276 Jun 24 2001 0anacron -rwxr-xr-x 1 root root 51 Sep 4 2001 logrotate -rwxr-xr-x 1 root root 402 Aug 31 2001 makewhatis.cron -rwxr-xr-x 1 root root 104 Sep 6 2001 rpm -rwxr-xr-x 1 root root 132 Oct 24 2002 run_nohup.sh -rwxr-xr-x 1 root root 132 Jun 24 2001 slocate.cron -rwxr-xr-x 1 root root 91 Aug 13 2001 sysstat -rwxr-xr-x 1 root root 193 Nov 28 2001 tmpwatch Cron.Weekly: [root@praesto1 cron.weekly]# ls -la | more total 20 drwxr-xr-x 2 root root 4096 Nov 6 13:12 . drwxr-xr-x 59 root root 8192 Nov 25 08:16 .. -rwxr-xr-x 1 root root 277 Jun 24 2001 0anacron -rwxr-xr-x 1 root root 399 Aug 31 2001 makewhatis.cron Cron.Monthly: [root@praesto1 cron.monthly]# ls -la | more total 16 drwxr-xr-x 2 root root 4096 Nov 6 13:12 . drwxr-xr-x 59 root root 8192 Nov 25 08:16 .. -rwxr-xr-x 1 root root 278 Jun 24 2001 0anacron |
Last time you did that, gpm was already running. Do..
killall -9 gpm rm /var/run/gpm.pid Then try the strace again. This time it should show what it's missing. |
New Output....
execve("/usr/sbin/gpm", ["gpm"], [/* 51 vars */]) = 0 uname({sys="Linux", node="praesto1", ...}) = 0 brk(0) = 0x805a500 old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, x40017000 open("/etc/ld.so.preload", O_RDONLY) = -1 ENOENT (No such file or dire open("/etc/ld.so.cache", O_RDONLY) = 3 fstat64(3, {st_mode=S_IFREG|0644, st_size=69846, ...}) = 0 old_mmap(NULL, 69846, PROT_READ, MAP_PRIVATE, 3, 0) = 0x40018000 close(3) = 0 open("/lib/i686/libc.so.6", O_RDONLY) = 3 read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0@\307\1"..., 1024 4 fstat64(3, {st_mode=S_IFREG|0755, st_size=5779542, ...}) = 0 old_mmap(NULL, 1291464, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0x4002a mprotect(0x4015c000, 38088, PROT_NONE) = 0 old_mmap(0x4015c000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 1000) = 0x4015c000 old_mmap(0x40162000, 13512, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|M YMOUS, -1, 0) = 0x40162000 close(3) = 0 munmap(0x40018000, 69846) = 0 brk(0) = 0x805a500 brk(0x805a680) = 0x805a680 brk(0x805b000) = 0x805b000 open("/var/run/gpm.pid", O_RDONLY) = -1 ENOENT (No such file or dire open("/dev/tty0", O_WRONLY) = 3 ioctl(3, 0x541c, 0x8056fa0) = 0 close(3) = 0 fork() = 1830 _exit(0) = ? |
That's wierd... well it dies, but that didn't write to the log file at all
My only suggestion is to recompile the newest gpm, or to get the rpm for it. |
# run-parts
01 * * * * root run-parts /etc/cron.hourly (etc) Is it me or do the times look a bit wonky? I mean (running Vixie-cron) I always thought the fields where: "minutes, hours, day of month, month, day of week"? Not that it will help you solve your problem, but the times they're run at look weird. If you can verify the contents of the other cronjobs are "sane" by your standards, then at least you can say your problem probably hasn't got to do with one of these. Only thing I can't recognize is the /etc/cron.daily/run_nohup.sh, but then it's run at such a time it couldn't be part of killing the box. Definately weird problem, especially because of the period you mention. Doesn't sysstat show wierd fluctuations in CPU or memory usage over that period? |
All times are GMT -5. The time now is 08:36 PM. |