LinuxQuestions.org
Latest LQ Deal: Linux Power User Bundle
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 02-07-2012, 08:37 AM   #1
EnderX
Member
 
Registered: Nov 2006
Posts: 64

Rep: Reputation: 15
SuSE 11 server - ssh/scp apparently failing at odd times - probable log locations?


Our main server is running SuSE 11 (not sure exactly which version), and normally, things work well. However, at odd times, it seems to give up on the ssh/scp connections...I recognize I'm probably not explaining this well, but I'm not really sure what's going on.

The major symptom of what's happening is showing up in our remote system database replication - the perl script that does the replicating is using Net::SCP::Expect to draw the replication logs over to the remote systems from our main one. When the problem occurs, the logs show as having built, but no actual remote system can pick them up. The word 'remote' here is slightly misleading; a copy of the same 'remote' setup is present on the main system to make copies of the remote db if need be, and that also fails to pick up the logs. Running the process manually causes it to appear to freeze briefly, then issue a message on the next step saying that the expected file isn't present.

The second symptom of this is that when the problem occurs, the main server can be pinged, but cannot be reached by ssh (tried from putty on a windows xp system). The message returned is the 'connection refused' message.

I know how to 'solve' the problem - resetting the machine (full shutdown and manual restart) at the system itself will bring it back up and the problem won't be present any more. However, that doesn't protect it from the next time the problem occurs. It's that that I'm interested in tracking at the moment.

Given this, what logs are the likeliest place to start looking for why this might be happening? I don't know enough about how the various system logs are built/maintained to really know where I should be looking.
 
Old 02-08-2012, 02:40 AM   #2
acid_kewpie
Moderator
 
Registered: Jun 2001
Location: UK
Distribution: Gentoo, RHEL, Fedora, Centos
Posts: 43,417

Rep: Reputation: 1975Reputation: 1975Reputation: 1975Reputation: 1975Reputation: 1975Reputation: 1975Reputation: 1975Reputation: 1975Reputation: 1975Reputation: 1975Reputation: 1975
/var/log/secure and /var/log/messages would be your boys here. As I read it, both of the symptoms look like they are the same thing. Can you telnet to port 22 when it is in this state? That would be a useful demarcation point to know if there is still anything listening in the first place. (ssh -v to the server will also tell you if it is at least connecting).

Why are you rebooting the box instead of trying to restart the sshd service? Is this a lack of knowledge or does something make that impossible?
 
Old 02-08-2012, 04:58 AM   #3
jbradshaw
LQ Newbie
 
Registered: Feb 2012
Location: County Antrim, UK
Distribution: CentOS
Posts: 1

Rep: Reputation: Disabled
If you're using a VPN (especially PPTP) check the MTU on the tunnel interface (ppp0 or similar). We had problems with SSH/SCP dropping on large file transfers and edits as the PPTP packet couldn't fit in an 1500 byte Ethernet frame.

Dropping the MTU of the ppp0 interface (which is tunneled over eth0) to 1400 did the trick...
/sbin/ifconfig ppp0 mtu 1400
 
Old 02-08-2012, 07:57 AM   #4
EnderX
Member
 
Registered: Nov 2006
Posts: 64

Original Poster
Rep: Reputation: 15
@acid_kewpie:

I was turning off the system from ignorance of the situation; I did not know what service needed to be restarted, but I did know that restarting everything was guaranteed to restart what was needed.

I thank you for your suggestions as to where to look - the messages log indicated that sshd hit the oom_killer within the time range where the problem is known to have occurred. Tracking that in the other logs on the system, it looks like we've had other processes do the same thing at around the same time on other nights - I'm now suspecting that the problem (or the final piece of it, anyway) is our database's vacuum process, which is being called daily at ~10 minutes before the oom_killer gets thrown. Given that we've also got three or four other processes running against the database within that time window, I suspect that adding the vacuum is simply pushing memory usage past the limits.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] ssh scp key not working to ssh/scp without password rjo98 Linux - Newbie 9 08-22-2011 04:28 PM
[SOLVED] scp/ssh to cygwin server from linux server without password prompt... blainemiller Linux - Server 7 05-18-2011 02:04 PM
rsync server log times are incorrect skubik Linux - Software 2 01-28-2009 01:56 PM
Failing to log into ssh via ldap auth. Pam Problem? cehlers Linux - Security 1 10-10-2004 07:55 AM
Server failing at random times tobz Linux - Networking 4 07-27-2003 07:30 PM


All times are GMT -5. The time now is 11:45 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration