LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Server (https://www.linuxquestions.org/questions/linux-server-73/)
-   -   ssh_exchange_identification: Connection closed by remote host (https://www.linuxquestions.org/questions/linux-server-73/ssh_exchange_identification-connection-closed-by-remote-host-672530/)

marafa 09-26-2008 08:53 AM

ssh_exchange_identification: Connection closed by remote host
 
i have a setup with about 30 Suse Linux Enterpise Server 10 Service Pack 1 and i have a backup script that in pseudo code looks like:

for server in 1 to 30
do
ssh $server tar -cz $logdir && scp $logdir.tar.gz $central_log_repo &
done

the problem is while all the servers get the logs tarred, the return trip doesnt allows work. random servers log this error:
ssh_exchange_identification: Connection closed by remote host
where of course remote host is the $central_log_repo
there is no mention on $central_log_repo of any attempt by the offending server(s)


in summary, on one run server5 could scp the tarfile to $central_log_repo and on another it might fail because of the ssh error. how do i fix this?

trickykid 09-26-2008 10:48 AM

Probably need some more verbose output. Anyway to add a -v to see where it might be failing?

tredegar 09-26-2008 10:56 AM

Quote:

for server in 1 to 30
do
ssh $server tar -cz $logdir && scp $logdir.tar.gz $central_log_repo &
done
In your "script" you are starting 30 tarring jobs across 30 servers all simultaneously. They then try to send back the tarred log.
Maybe the 30 servers are trying to connect to make the scp transfer all at the same time.
Perhaps you have a limit set somewhere, and after a certain number of connections is reached, further connections are refused. If the remote servers all take different times to finish the tar job, the code works. If they all finish at (nearly) the same time, it breaks.

Try man limits.conf and this link: http://www.linuxweblog.com/limit-users-pam
I believe iptables can also set limits. You should look into that too.

trickykid 09-26-2008 11:15 AM

Quote:

Originally Posted by tredegar (Post 3292567)
In your "script" you are starting 30 tarring jobs across 30 servers all simultaneously.

Actually no, that's not correct. The script he pointed out would do one at a time. But you shouldn't put a & at the end of the scp script so you can ensure it finishes the scp copy.

If you're just copying logs, you should either setup a loghost to capture these and or create a script that's run via cron on each.

tredegar 09-26-2008 11:49 AM

Quote:

Actually no, that's not correct. The script he pointed out would do one at a time. But you shouldn't put a & at the end of the scp script so you can ensure it finishes the scp copy.
Thanks for clearing that up.

marafa 09-26-2008 01:08 PM

the script does throw each job in to the background. actually, i have to honestly say that
1. the ssh part is more like
ssh $server remote_backup.sh $central_log_repo &
i dint want to complicate the issue. but all the "master" script on the $central_log_repo does is calls a secondary remote_backup script and throws that script into the back ground.

2. the script is not mine. it came with a commercial application. the vendor will listen to my recommendations and may code them into the script.

3. i use clusterssh to manage these 30 servers and if i do scp $central_log_repo/file /tmp/. i see the same error pop up on some servers.

4. /etc/security/limits.conf is all hashed. nothing out of the ordinary as far as i can see.

5. both of you are right, from the output on screen, i see that it does starts 30 scp processes all at the same second, more or less but one after the other.

trickykid 09-26-2008 02:10 PM

Quote:

Originally Posted by marafa (Post 3292672)
5. both of you are right, from the output on screen, i see that it does starts 30 scp processes all at the same second, more or less but one after the other.

I seriously doubt its an issue with the amount of connections. Like mentioned before, get a more verbose output to see what's causing the failure in connection.

I did a similar test like you have, ssh into one box that kicks off an scp to another, it seems putting a & at the end of the whole connection string will put the initial ssh into the background, not the remote scp.

marafa 09-26-2008 04:26 PM

Quote:

Originally Posted by trickykid (Post 3292727)
I seriously doubt its an issue with the amount of connections. Like mentioned before, get a more verbose output to see what's causing the failure in connection.

I did a similar test like you have, ssh into one box that kicks off an scp to another, it seems putting a & at the end of the whole connection string will put the initial ssh into the background, not the remote scp.

yes thats right .. and part of the script that runs on the remote machine does scp. so the scp is also in the background inside that remote script.

and from my tests (reducing number of servers to 8 for example and others) the only thing left is number of connections but i dont know where else to look.

i would love to fix this and open to other suggestions

ps. -v -v -v was put on the remote script and i dint get anything extra. neither to tty nor to log

pps. btw, this error also shows up when i use clusterssh to scp from the repo machine to the 30 servers at the same instant, without any script being used (see point 3 above)

marafa 09-30-2008 05:06 AM

solution found at : http://archive.netbsd.se/?ml=openssh...7-10&t=5430083


All times are GMT -5. The time now is 01:44 PM.