[SOLVED] rsh problem: command does not complete

Tiphys · 07-01-2011, 09:44 AM

Hello,

I am using rsh to execute programs on a cluster. I want to automatize a process where instead of separately logging in a node and executing a program, I want to do it remotely, something like:

rsh t01 sh my_script.sh

where script has lots of commands and specific C programs in it. The problem is that, although the my_script.sh completes execution (I check that with top and other files that are used in the script - no problem there), the whole rsh command just hangs and does not exit for a long time (for hours) - interrupting and aborting it does not work.

I tried -n option with rsh, no chance there. Also tried giving it as a background job, no luck also. What am I doing wrong here, can that be because of unclosed file descriptors in C programs that I use in my script? I do not think so because these programs exit successfully and as I know upon exit all open file descriptors are closed.

The full rsh command I use is:

rsh t01 sh my_script.sh > log_file 2>&1

Your help is appreciated.

MensaWater · 07-01-2011, 10:52 AM

Quit using rsh for starters. It is a very insecure protocol. You should use ssh instead. With ssh you can setup trusts between the machines as you can with rsh and other r commands but this trust uses keys rather than hostnames or IPs so can't be spoofed as easily.

With ssh you can do a command line like:"
ssh t01 'sh my_script.sh > log_file 2>&1'" to make it create a log on the remote host.

aysheaia · 07-01-2011, 03:04 PM

Quote:

Originally Posted by Tiphys

rsh t01 sh my_script.sh
The problem is that [...] the whole rsh command just hangs and does not exit for a long time (for hours)

Quote:

Originally Posted by MensaWater

You should use ssh instead.

Funnily, I often have this kind of problem with ssh. Do you execute background jobs remotely ?
If so, there is an explanation of the problem at http://www.snailbook.com/faq/background-jobs.auto.html
I presume that rsh could have the same kind of problem.

Solution is given on this page : "redirect the background process stdin/stdout/stderr streams (e.g. to files, or /dev/null if you don't care about them)."
The drawback is that you will no more have your logs in a local file. But you could get them back with a scp.

Tiphys · 07-02-2011, 03:28 AM

Well, I prefer ssh too, however it is not installed in the nodes. Installing ssh is the last solution for me at this point. I use ssh only to connect to the master node, then I use rsh for remote execution on other nodes - it is a warewulf cluster by the way, which is good but old.

What is interesting is that, if I run my code without redirections, the problem still occurs, the logs are printed to stdout and stderr, the programs and commands complete successfully, then it just hangs again at the end of the script. So I guess these redirections I use for logging are not the problem.

Reuti · 07-02-2011, 04:47 AM

Are the started background processes still listed in the bash builtin jobs? You can try to remove them from there with disown. Maybe it helps.

Tiphys · 07-02-2011, 06:41 AM

Ok, I solved my problem. It is about lamboot which I use in my script, that is because lamboot does not close file descriptors when given with -v option, it says in the manpage of lamboot:

Quote:

Closing stdio

The stdio of each LAM daemon on a remote host that is launched by lamboot is closed by default. Normally, the stdio of the LAM daemon launched on the local host is left open so that the internal LAM tstdio(3) package works properly. However, it is sometimes desirable to close the stdio of the local LAM daemon as well. For example:

rsh somenode lamboot -s hostfile

This is because rsh waits for two conditions before exiting: lamboot to exit, and stdout / stderr to be closed. Without -s, stdout / stderr would not be closed, and rsh (and ssh) will hang even though lamboot had completed. -s causes the stdout / stderr of the local LAM daemon to be closed upon invocation, which will allow rsh to complete. Using -s will not affect lamboot in any other way, but it will prevent the tstdio(3) package from working properly.

That was exactly what I was looking for since it exactly happens in that way, my rsh command hangs even though lamboot command seems to complete successfully: Without -s, stdout / stderr would not be closed, and rsh (and ssh) will hang even though lamboot had completed. I just used -s option and it worked.

I am posting this in case someone faces the same problem.

Thanks for your suggestions.

Reuti · 07-02-2011, 07:05 AM

Are you speaking about LAM/MPI? Unless you have a legacy application, I would suggest to move to Open MPI or MPICH2. Both now operate without the need of any daemon. LAM/MPI is EOL for some time now.

Tiphys · 07-02-2011, 02:28 PM

Yeah, I would like to move on to Open-MPI + Beowulf cluster which is very comfortable to run parallel applications without any headaches, and I know LAM/MPI is only in maintenance state and they are collaborating with Open-MPI.

This is the wise and logical thing to do, however, you know, sometimes you do not have enough control on your supervisor in some subjects and this is one of it unfortunately

MensaWater · 07-05-2011, 08:33 AM

So how old is this setup that it doesn't allow for ssh which has been around for many many years? Are you sure you even have to "install" it? It may already be there and simply not be in use.

jhumkey · 03-19-2014, 09:35 AM

There are times rsh is "better suited" than ssh.

Yes, I know ssh is "more secure" but . . . we have hosts that run HACMP on AIX (similar things exist on Linux) . . . basically a pair of hosts operate as one. One is primary (doesn't matter which one) and the other watches the main. If the main dies for any reason, the secondary takes over and answers to the one and only common hostname.

For example (NY=NewYork where the system is located abc is the application/host) . . . so from far away, when I 'rsh NYabc' I get either NYabcA or NYabcB, whichever is primary at the moment (I don't care which it is.) If NYabcA is primary and pretending to be NYabc to the outside world, if NYabcA dies . . . NYabcB seamlessly takes over and becomes NYabc as far as the rest of the world knows. HACMP provides this monitoring for life and the automated failover.

So why is rsh better than ssh (in this example)? Because . . . ssh will realize the underlying host has changed and FAIL with "HOST Identification has changed, Someone is doing something Nasty" messages. ssh (unfortunately in this case) determines the underlying host has changed for that common NYabc name, and balks at completing the connection. With rsh . . . since the .rhosts file exists on both underlying computers (NYabcA and NYabcB) . . . the 'rsh NYabc' works, without being puzzled by the underlying host change.

So, in a "high availability" automatic host swapping environment . . . if you're reaching out from one central controller to multiple HA-nodes, where the underlying host at that HA-node can change . . . rsh/rcp actually works better than ssh. These (master and all HA-children) are all on a relatively secure INTRAnet and not exposed to the world at large, so there is limited actual chance of a true interloper.

I realize this is an old thread . . . but I've experienced the same issue 'rsh XXabc' just hangs forever. (I don't have root on these systems.) Last time this happened, someone had changed-bugged the /etc/hosts file on the remote host in question, that I was trying to attach to. Such that . . . I could nslookup and ping from the source host, but the destination host couldn't properly reverse-lookup the name of the source . . . and that blocks rsh/rcp/rlogin from working.

So . . . make sure /etc/hosts and name lookups are working for the return path of destination back to initiating host. See if that is causing your issue. (I realize the original poster/issue is long gone . . . but for those with "rsh hanging" issues . . . this might help.)

MensaWater · 03-19-2014, 12:22 PM

Why reply to a thread that was marked as resolved years before you joined?

You may convince yourself that doing insecure transfers over the internet is somehow expedient and therefore preferable but I'd say it isn't at all.

Your example is a clear indication of misunderstanding both of the issue with rsh and the facilities of ssh.

With rsh ANYONE can say they are the system you say they are because only host name is checked.

With ssh you can put the same key for both the hosts on the same user so that either host will be seen as the "trusted" host but no one that doesn't have access to the private key will be able to spoof either host. That is to say just because you ran ssh-genkey on one host is no reason you can't copy it from that host to another rather than running a separate ssh-genkey. In fact it is quite common to do this in cluster scenarios where any node might make the connection.

There are in fact rare scenarios in INTERNAL transfers where you might use the more insecure tools for speedy file transfers (e.g. if you're transferring a full database dump) but those should be the exception rather than the norm. In such situations it is normal to open the commands:
a) Only for specific access (i.e. use something like iptables to restrict the inbound access only to the IP of the remote server so they have to spoof both the name and the IP)
b) Only for a short period of time for a specific transfer. At one job I worked at they had it setup so that if you forgot to close the access you'd opened for such a transfer it would automatically do it after 24 hours.

You'll never convince me that compromising security for expediency is a good general practice.

jhumkey · 03-19-2014, 01:33 PM

When I joined, or when the original problem was posted . . . is not relevant.

If someone years later, was having problems with "rsh hanging" and Googled their way here (like I just did now in Mar 2014) . . . posting additional information that might help them solve their NEW-CURRENT problem . . . is rarely harmful.

My point was . . .

rsh can fail and hang, if name lookup (or ip lookup) is failing in either the forward or reverse directions. So check the /etc/hosts file and name/ip lookups in both directions. If you can't fix it . . . consult your admin (someone with root) and get them to help. THAT was my primary point.

You misread . . . I was very careful to say "INTRAnet not exposed to the world at large" . . . not INTERNET exposed to everyone.

NO, I'm not suggesting that rsh is EVER more secure. My point was, that in the (real life) example I referred to . . . even with "authorized keys" and "known hosts" files set on all three hosts involved . . . SSH FAILS COMPLETELY. So I use rsh (even though its inherently less secure) because . . . it at least works.

In your example multiple nodes appeared to be connecting in to one unchanging point. MY example, was one point connecting to a distant end point where the host at the distant end point was changing underneath the same hostname. (After the primary fails and the backup takes its place.) Even with "authorized hosts" and "known hosts" established and set in all three locations of this example . . . the HOST key changes after the failover. So ssh on my host, reaching out and seeing that (now different) host key . . . thinks a "man in the middle" attack is in progress and fails.

I didn't use rsh in this example because "its better than ssh" (its not) . . . I used it because, in this example, ssh FAILS to function. (Sensing a "man in the middle" attack, when none is actually occurring.)

MensaWater · 03-20-2014, 08:06 AM

My point is that you don't really understand how ssh works.

Suggesting host files are somehow increasing the security of rsh is incredibly off base.

I have yet to run into a situation where multiple changing end points occur on a daily basis and I can't imagine such a thing. Since the keys are "software" rather than "hardware" they can easily be moved to various end points (or source points) and the trusts setup. There must be a finite number of these and you would be far better served by automating to the key movement than by avoiding keys to use insecure tools.

On occasion we have trading partners that change servers and do not migrate keys. In such a case we always:
a) Remove the old known_host
b) Do a manual connection and accept the new key.
We do NOT do the above unless we have positive confirmation that the host was in fact supposed to have changed. As I've often told folks here it is unlikely we could script this and even if we could it would be an extremely bad idea. Such changes should not be occurring daily.

If in fact you have no confidence in the end points or source points you should NEVER be using a trusted setup via rsh but instead always require login and password because in fact there is no way to actually "trust" them. If you are doing such a setup I'd suggest you really are compromising security for your organization.

Since it is clear you'd rather think you're right than learn I'll let your next post be the last word on this.

jhumkey · 03-20-2014, 09:09 AM

I'm pretty clear on how ssh works, and the multiple ways its better than the ancient and insecure rsh.

You keep putting words in my mouth, as if you think I'm claiming rsh is more secure. Its not. I never said that. I never hinted at that. ssh is far more secure in multiple ways, in almost every way. (Not only authenticating the endpoints, and authenticating the people connecting, but encrypting the traffic once the connection is made, for example). So please stop placing the claim on me that I'm saying rsh is more secure than ssh. Its not. You are the one that keeps placing that false claim on me.

I used rsh for my case . . . because ssh was failing. (And, my choice was use the less-secure-rsh or FAIL. So I picked the option that would work at the time.)

You have apparently never had to deal with (not my favorite software by a long stretch) HACMP . . . whenever the primary host dies, or the admin's need to do maintenance on it (software patches / hardware maintenance) and they "failover" to the alternate host (either automatically for a failure, or manually for maintenance) . . . ssh begins failing for me at that point, because it sees a new host in place of one it was familiar with. I can't get the admin's to swap "host" level keys over to the alternate. And as I said, since you always need some clear/certain way to identify hosts there is some justification in "not swapping host level keys".

I'm not the only person to have this problem. Other websites suggest changes IF YOU ARE ON an "INTRAnet" . . . not open to the world at large . . . that allow ssh to ignore the swapping endpoint. Like . . .

#!/bin/ksh
print ''
print 'Run a common command on all hosts'
print ''
print ===--- Plant1
ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no Plant1_apphostname "$@" 2>&1
print ===--- Plant2
ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no Plant2_apphostname "$@" 2>&1

OBVIOUSLY . . . this turns off "endpoint checking", and that's bad. Yes, you still have encrypted traffic, and yes, you use your .ssh/authorized_keys2 files, but it gets around that shifting known host issue, and as such, should only be used on limited access INTRAnet networks, where the likelihood of a true "man in the middle" is very low.

Example taken from . . . http://linuxcommando.blogspot.com/20...-checking.html (So, while you seem to think I'm nuts . . . it appears lots of other people are nuts with me . . . and facing the same issue . . . )

ssh is better in almost every way. (But until I found a way to turn off endpoint checking it was TOTALLY FAILING, so I couldn't use it)

You win.