Quote:
Originally Posted by mgok
I am trying to test if reading a large text file (15GB) is faster on one computer or on multiple. I assume that multiple will be faster, but the lab I am using has been having issues and this test will help determine what is wrong.
I am new to using linux and using ssh to access other computers. What I have so far is two java programs: Time and Read. Time just returns the current date in milliseconds and Read goes through the large text file line by line using scanner. There are 15 computers available for me to use in the lab, and my question is how to run the Read java class at the same time on all 15 computers (hopefully increasing performance speed)?
From my own research using & lets you run multiple jobs at once, but whenever I do a command like: "ssh computer1 java Read &" the terminal says that the main of the program cannot be found. And when I try something like: "ssh computer1 cd /directoryOfFile && java Read &", the job says it is done immediately, which must mean it is not running the program since it takes some time to go through a 15GB file. Any tips?
Thank you!!
|
Well, putting the run in the background DOES cause it to terminate immediately. 1. the ssh connection is terminated. 2. processes attached to the terminal (stdin/out/error) get terminated automatically on logout. You might try "ssh computer1 'cd /directoryOfFile && nohup java Read &' " might do the job. The "nohup" detaches so that automatic termination doesn't occur - it also reconnects stdout and stderr of the process to the log file nohup generates, and stdin to /dev/null.
As a side note, the speed depends on what your file server is. If it is NFS, it will run faster... as long as the file buffers in memory are the ones called for.
If it is not, then the entire thing slows down as buffers have to be reloaded, the NFS server could run out of service daemons, your network slows down...
It really depends on multiple things - network topology, and then the nature of the file server(s). For clusters, gluster tends to do a much better job than NFS (it can use more memory for cache...), and using multiple gluster servers spreads the network load out.