Linux - GeneralThis Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
i am the admin for a linux at work and kinda was forced into something i was not trained for.
forced isn't really the word for it since i showed interest in it but what training i received was informal and brief.
anyway, on to my question. the box works great until the W2K it relies on goes down. when the win goes down for more than an hour and is brought back up then the lin is flooded and locks. the only way to fix this is to hardboot it. i think that doing this several times over the last few months has killed one of the main functions in the lin. all of the maintenance done on the lin is done remotely via Webmin but sometimes i just want to log in and work in a loving place. but when i log out the screen goes black and the lin locks up. the also occurs when trying to reboot from console. i have no idea where to look to find the problem nor do i have any idea what the problem could be. re-installing is a definate NO from the boss since we house over 1000 html pages and 100 scripts creating more pages and performing functions that are vital to our business. boss doesn't want to run the risk of loosing ANYTHING even though we have backups. boss is very anti-linux (very dissappointing for me) and is pussing me to work to porting everything to W2K.
i just caught myself rambling/ranting and i am sorry.
if anyone knows the answer to the reboot/logout locking problem please help. any ideas will help. thanks. (please no smart donkey responses)
thanks in advance.
joe
I seem to have this problem too, especially when using a realtek chipset, the only way i have found so far to remedy this problem with realtek (driver 8139too.o) is to set the card to run at 10mbs instead of 100. by using mii-tool
eg
ifconfig eth0 down
mii-tool -A 10baseT-FD eth0
ifconfig eth0 up
This will solve the problem for that chipset. (note i experienced this problem on rh 8, slackware 9 rc2 and CRUX)
(but this could also be down to my setup as i'm running a very old laptop with a 16 bit pcmcia bus, but this sorted my problems out)
what exactly is the relationship between the
win-machine and Linux-box, how does the
Linux one depend on the presence of the
windows thing?
One idea (if this is a specific problem of
one service) to run a cron job, that checks
for the presence of the win-machine every
5 minutes (ping, connect to SQL database,
whatever you are doing on it) and if it's gone
switches the service on Linux off, and back
on in the next cycle in case the machine is
up again.
tink,
this is the set up. we have the two boxes. the win is the server that houses our public page which kinda mirrors the lin. the lin is used to download weather images from different agencies as well as text forecasts and observations. (if i get too weather technical please forgive) we download this every minute and store each for one month. we interface with the lin via http filling out forms and making weather related products that we monitor thru the lin. now this is where the win comes in. the win is mounted under / for an easy transfer of data. every time a forecast/observation is downloaded then a copy is created and saved to the win so that when someone from the outside (outside of our office) wishes to see a spacific forecast for a spacific airport or city then they have an ever-ready** source. now as far as when the win going down and the lin choking following the reboot, i am not sure if this is a coing-kydink or not but i am guessing that when the win comes back up the lin is flooded by what i am not sure. this is only theory and no evidence is there on the true problem. so this is only a problem that we get when the win goes down.
as far as checking for one service choking the lin i don't think so. the crontab has about 30 or so jobs to do varying from every minute to once an hour to once every couple of hours. not all of them run to the win. some run straight to the web and some via ftp and i have a couple mirroring different sites. i believe all but two are running perl scripts. with the win mounted i am not sure how to check to see if it is still mounted and if i do get that part i don't know how to shut down the spacific cron jobs.
** I work on an Air Force base and sometimes the main connection to the internet is disconnected. we download all of the information so that we can still perform our duty even without internet access.
sorry for the length of this post but you asked for the relationship of the two boxes and inorder to explain i had to give a slight background first.
sorry for the length of this post but you asked for the relationship of the two boxes and inorder to explain i had to give a slight background first.
Not to worry, there's no such thing as too much
information :)
Quote:
now this is where the win comes in. the win is mounted under / for an easy transfer of data. every time a forecast/observation is downloaded then a copy is created and saved to the win
Hmm ... so the windows machine is not really actively
talking at the Linux box, is it?
Quote:
this is only theory and no evidence is there on the true problem. so this is only a problem that we get when the win goes down.
Did you look at /var/log/messages, debug,
syslog for any indication about what's going
on during the return of the WIn-Box? Maybe
even your cron-jobs are spitting out info
into some sort of log?
Quote:
with the win mounted i am not sure how to check to see if it is still mounted and if i do get that part i don't know how to shut down the spacific cron jobs.
The easiest thing would be to ping it (if it
completely dies, that is ... ) and then unmount
the mount-point if you don't get a response.
You'll have to find out which cron-jobs do
the copying, though, and build some sort
of check into them, as well, like "only copy
if mount lists the smb-share".
a responce on what the log file says may take a while. these crashes aren't as regular as the normal win box. sometime it takes a whole month before it crashes..... lol
but yes i will try and remember to keep you posted.
boss is very anti-linux (very dissappointing for me) and is pussing me to work to porting everything to W2K.
Quote:
these crashes aren't as regular as the normal win box. sometime it takes a whole month before it crashes..... lol
I forgot to respond to this in the first place ;)
Tell your boss to remove the cause of the
problem, not the victim ;)
Anyway, I hope you'll be able to work around
it with a little support from LQ and then point
out to your boss that a solution to the problem
on the other end of that line (the winDOHs box)
would have cost him megabuxx ;)
ok had to hard boot the linux for once in a long time. don't know what caused the problem but looks like, according to /var/log/messages there was a loss of memmory
here is an outtake of it that covers the time span that the box was down.
Apr 24 17:46:09 lfi-ws-linuxwx kernel: Out of Memory: Killed process 1033 (mysqld).
Apr 24 17:46:14 lfi-ws-linuxwx kernel: Out of Memory: Killed process 1850 (missionboard.cg).
Apr 24 17:46:19 lfi-ws-linuxwx sshd(pam_unix)[2009]: session closed for user joe
Apr 24 17:46:35 lfi-ws-linuxwx kernel: Out of Memory: Killed process 1950 (missionboard.cg).
Apr 24 17:47:04 lfi-ws-linuxwx ftpd[2098]: connection from lfi-ws-aosf1.langley.af.mil [131.6.72.47]
Apr 24 17:47:10 lfi-ws-linuxwx ftpd[2098]: FTP session closed
Apr 24 17:47:22 lfi-ws-linuxwx kernel: Out of Memory: Killed process 1953 (missionboard.cg).
Apr 24 17:47:30 lfi-ws-linuxwx kernel: Out of Memory: Killed process 1980 (missionboard.cg).
Apr 24 17:47:51 lfi-ws-linuxwx kernel: Out of Memory: Killed process 30631 (httpd).
Apr 24 17:49:48 lfi-ws-linuxwx kernel: Out of Memory: Killed process 30631 (httpd).
Apr 24 17:50:01 lfi-ws-linuxwx kernel: Out of Memory: Killed process 32059 (httpd).
Apr 24 17:50:01 lfi-ws-linuxwx kernel: Out of Memory: Killed process 32422 (httpd).
Apr 24 17:50:01 lfi-ws-linuxwx kernel: Out of Memory: Killed process 32424 (httpd).
Apr 24 17:50:01 lfi-ws-linuxwx kernel: Out of Memory: Killed process 32538 (httpd).
Apr 24 17:50:01 lfi-ws-linuxwx kernel: Out of Memory: Killed process 32669 (httpd).
Apr 24 17:50:01 lfi-ws-linuxwx kernel: Out of Memory: Killed process 32670 (httpd).
Apr 24 17:50:01 lfi-ws-linuxwx kernel: Out of Memory: Killed process 1708 (httpd).
Apr 24 17:50:01 lfi-ws-linuxwx kernel: Out of Memory: Killed process 31994 (httpd).
Apr 24 17:50:01 lfi-ws-linuxwx kernel: Out of Memory: Killed process 32423 (httpd).
Apr 24 17:50:01 lfi-ws-linuxwx kernel: Out of Memory: Killed process 1103 (httpd).
Apr 24 17:50:01 lfi-ws-linuxwx kernel: Out of Memory: Killed process 902 (xfs).
Apr 24 17:50:03 lfi-ws-linuxwx kernel: Out of Memory: Killed process 1868 (convert).
Apr 24 17:50:14 lfi-ws-linuxwx last message repeated 2 times
Apr 24 17:50:22 lfi-ws-linuxwx kernel: Out of Memory: Killed process 29694 (smbd).
Apr 24 17:50:24 lfi-ws-linuxwx kernel: Out of Memory: Killed process 1907 (convert).
Apr 24 17:50:53 lfi-ws-linuxwx kernel: Out of Memory: Killed process 2120 (httpd).
Apr 24 17:51:00 lfi-ws-linuxwx kernel: Out of Memory: Killed process 2127 (httpd).
Apr 24 17:51:33 lfi-ws-linuxwx kernel: Out of Memory: Killed process 2132 (httpd).
Apr 24 17:51:37 lfi-ws-linuxwx kernel: Out of Memory: Killed process 2140 (httpd).
Apr 24 17:51:40 lfi-ws-linuxwx kernel: Out of Memory: Killed process 2141 (httpd).
Apr 24 17:51:50 lfi-ws-linuxwx kernel: Out of Memory: Killed process 2142 (httpd).
Apr 24 17:52:23 lfi-ws-linuxwx kernel: Out of Memory: Killed process 2216 (missionboard.cg).
Apr 24 17:53:06 lfi-ws-linuxwx kernel: Out of Memory: Killed process 2147 (httpd).
Apr 24 17:53:18 lfi-ws-linuxwx kernel: Out of Memory: Killed process 2229 (httpd).
Apr 24 17:53:44 lfi-ws-linuxwx kernel: Out of Memory: Killed process 2243 (missionboard.cg).
Apr 24 17:53:51 lfi-ws-linuxwx kernel: Out of Memory: Killed process 2143 (httpd).
Apr 24 17:53:59 lfi-ws-linuxwx kernel: Out of Memory: Killed process 2149 (httpd).
Apr 24 18:12:17 lfi-ws-linuxwx syslogd 1.4-0: restart.
apr 24 18:12 was when i got it back up.
do you know why this would happen (Apr 24 17:53:59 lfi-ws-linuxwx kernel: Out of Memory: Killed process *)?? i am not sure unless RH 7.1 had a memory leak. any suggestions????
Apr 23 10:59:42 lfi-ws-linuxwx last message repeated 2 times
Apr 23 10:59:42 lfi-ws-linuxwx ftpd[26534]: CWD /var/www/cgi-bin
Apr 23 10:59:42 lfi-ws-linuxwx ftpd[26534]: PWD
Apr 23 10:59:42 lfi-ws-linuxwx last message repeated 2 times
Apr 23 10:59:42 lfi-ws-linuxwx ftpd[26534]: TYPE ASCII
Apr 23 10:59:42 lfi-ws-linuxwx ftpd[26534]: PORT
Apr 23 10:59:42 lfi-ws-linuxwx ftpd[26534]: STOR newmetwatchtool.cgi
Apr 23 10:59:42 lfi-ws-linuxwx ftpd[26534]: xferlog (recv): 1 lfi-ws-aosf2.langley.af.mil 84345 /var/www/cgi-bin/newmetwatchtool.cgi a _ i r lisa ftp 0 * c
Apr 23 11:00:27 lfi-ws-linuxwx ftpd[26534]: QUIT
Apr 23 11:00:27 lfi-ws-linuxwx ftpd[26534]: FTP session closed
Apr 23 11:00:30 lfi-ws-linuxwx su(pam_unix)[18853]: session closed for user root
Apr 23 11:00:35 lfi-ws-linuxwx sshd(pam_unix)[18807]: session closed for user lisa
Apr 23 16:31:50 lfi-ws-linuxwx ucd-snmp[1077]: Received SNMP packet(s) from 131.6.8.214
Apr 23 16:31:51 lfi-ws-linuxwx kernel: smb_request: result -104, setting invalid
Apr 23 16:31:51 lfi-ws-linuxwx kernel: smb_retry: successful, new pid=939, generation=12
Apr 23 16:31:52 lfi-ws-linuxwx kernel: hdd: No disk in drive
Apr 23 16:31:53 lfi-ws-linuxwx kernel: end_request: I/O error, dev 02:00 (floppy), sector 0
Apr 24 04:02:17 lfi-ws-linuxwx kernel: smb_trans2_request: result=-104, setting invalid
Apr 24 04:02:18 lfi-ws-linuxwx kernel: smb_retry: successful, new pid=939, generation=13
Apr 24 04:23:30 lfi-ws-linuxwx su(pam_unix)[16683]: session opened for user news by (uid=0)
Apr 24 04:23:31 lfi-ws-linuxwx su(pam_unix)[16683]: session closed for user news
Apr 24 15:53:50 lfi-ws-linuxwx ftpd[31406]: connection from lfi-ws-aosf1.langley.af.mil [***.***.***.***]
Apr 24 15:53:51 lfi-ws-linuxwx ftpd[31406]: USER joe
Apr 24 15:53:51 lfi-ws-linuxwx ftpd[31406]: PASS password
Apr 24 15:53:53 lfi-ws-linuxwx ftpd[31406]: FTP LOGIN FROM lfi-ws-aosf1.langley.af.mil [***.***.***.***], joe
Apr 24 15:53:53 lfi-ws-linuxwx ftpd[31406]: SYST
Apr 24 15:53:53 lfi-ws-linuxwx ftpd[31406]: PWD
Apr 24 15:53:53 lfi-ws-linuxwx last message repeated 2 times
Apr 24 15:53:53 lfi-ws-linuxwx ftpd[31406]: CWD /var/www
Apr 24 15:53:53 lfi-ws-linuxwx ftpd[31406]: PWD
Apr 24 15:53:54 lfi-ws-linuxwx last message repeated 2 times
Apr 24 15:53:54 lfi-ws-linuxwx ftpd[31406]: TYPE ASCII
Apr 24 15:53:54 lfi-ws-linuxwx ftpd[31406]: PORT
Apr 24 15:53:54 lfi-ws-linuxwx ftpd[31406]: LIST
Apr 24 15:53:57 lfi-ws-linuxwx ftpd[31406]: PWD
Apr 24 15:53:57 lfi-ws-linuxwx ftpd[31406]: CWD html
Apr 24 15:53:57 lfi-ws-linuxwx ftpd[31406]: PWD
Apr 24 15:53:57 lfi-ws-linuxwx last message repeated 2 times
Apr 24 15:53:57 lfi-ws-linuxwx ftpd[31406]: TYPE ASCII
Apr 24 15:53:57 lfi-ws-linuxwx ftpd[31406]: PORT
Apr 24 15:53:57 lfi-ws-linuxwx ftpd[31406]: LIST
Apr 24 15:54:03 lfi-ws-linuxwx ftpd[31406]: PWD
Apr 24 15:54:03 lfi-ws-linuxwx ftpd[31406]: CWD htdig
Apr 24 15:54:04 lfi-ws-linuxwx ftpd[31406]: PWD
Apr 24 15:54:04 lfi-ws-linuxwx last message repeated 2 times
Apr 24 15:54:04 lfi-ws-linuxwx ftpd[31406]: CWD /var/www/html
Apr 24 15:54:04 lfi-ws-linuxwx ftpd[31406]: PWD
Apr 24 15:54:16 lfi-ws-linuxwx last message repeated 2 times
Apr 24 15:54:16 lfi-ws-linuxwx ftpd[31406]: CWD /home/joe
Apr 24 15:54:16 lfi-ws-linuxwx ftpd[31406]: PWD
Apr 24 15:54:16 lfi-ws-linuxwx last message repeated 2 times
Apr 24 15:54:16 lfi-ws-linuxwx ftpd[31406]: CWD /var/www/html
Apr 24 15:54:17 lfi-ws-linuxwx ftpd[31406]: PWD
Apr 24 15:54:17 lfi-ws-linuxwx last message repeated 2 times
Apr 24 15:54:17 lfi-ws-linuxwx ftpd[31406]: TYPE ASCII
Apr 24 15:54:17 lfi-ws-linuxwx ftpd[31406]: PORT
Apr 24 15:54:17 lfi-ws-linuxwx ftpd[31406]: RETR index.htm
Apr 24 15:54:17 lfi-ws-linuxwx ftpd[31406]: xferlog (send): 1 *****computer name******5395 /var/www/html/index.htm a _ o r joe ftp 0 * c
Apr 24 15:54:33 lfi-ws-linuxwx ftpd[31406]: QUIT
Apr 24 15:54:33 lfi-ws-linuxwx ftpd[31406]: FTP session closed
Apr 24 17:45:54 lfi-ws-linuxwx sshd(pam_unix)[2009]: session opened for user joe by (uid=0)
this is the messages file before the out of memory section. this goes back to the last day. btw ******computer name***** = deleted due to security reasons and the ip addresses have been deleted for same reasons
A few questions,
How much ram do you have?
How big is the swap?
What type of CPU and how many?
Is the machine running the distro in your profile (man9)?
Are you running X on this box and if so what version and what type of video hardware?
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.