LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 12-24-2019, 12:50 AM   #1
pritamthecoder
LQ Newbie
 
Registered: Dec 2019
Posts: 1

Rep: Reputation: Disabled
Error While Opening Multiple SSH Connections through PBS in Linux Server


I have a cluster with one master node and 8 working nodes. In general I use pbs script to give processes to the working nodes from the master. Here I am talking about one single working node. Each of the nodes have 32 cores. So, I tried to send 32 concurrent processes from master to a particular working node. But only some of them are actually being executed while the rest are not.

Just for the checking purpose I am executing a code which prints a number after looping through a large enough loop.

Code:
#include<stdio.h>
int main(int argc, char *argv[])
{
  long i,j=0;
  for(i=0;i<1000000000;i++)
  {
    j++;
  }
  printf("\n%ld",atoi(argv[1]));
  return 1;
}
Now, this is being compiled and put into an object file named, test_pbs.

My pbs script is as follows,

Code:
#!/bin/sh

#PBS -N nfs
#PBS -l nodes=8:ppn=32
#PBS -o "<path>/stdout.log"
#PBS -e "<path>/stderr.log"

echo "Starting the sieving..."
ssh <node_url> "<path>/test_pbs 0" &
ssh <node_url> "<path>/test_pbs 1" &
ssh <node_url> "<path>/test_pbs 2" &
ssh <node_url> "<path>/test_pbs 3" &
ssh <node_url> "<path>/test_pbs 4" &
ssh <node_url> "<path>/test_pbs 5" &
ssh <node_url> "<path>/test_pbs 6" &
ssh <node_url> "<path>/test_pbs 7" &
ssh <node_url> "<path>/test_pbs 8" &
ssh <node_url> "<path>/test_pbs 9" &
ssh <node_url> "<path>/test_pbs 10" &
ssh <node_url> "<path>/test_pbs 11" &
ssh <node_url> "<path>/test_pbs 12" &
ssh <node_url> "<path>/test_pbs 13" &
ssh <node_url> "<path>/test_pbs 14" &
ssh <node_url> "<path>/test_pbs 15" &
ssh <node_url> "<path>/test_pbs 16" &
ssh <node_url> "<path>/test_pbs 17" &
ssh <node_url> "<path>/test_pbs 18" &
ssh <node_url> "<path>/test_pbs 19"
Now, it should print 0-19 in the stdout.log file. But some of them are printed and for the rest I am getting the line ssh_exchange_identification: read: Connection reset by peer in ther stderr.log file. I am using CENT OS in the cluster and I have also checked, /etc/security/limits.conf

Code:
# /etc/security/limits.conf
#
#This file sets the resource limits for the users logged in via PAM.
#It does not affect resource limits of the system services.
#
#Also note that configuration files in /etc/security/limits.d directory,
#which are read in alphabetical order, override the settings in this
#file in case the domain is the same or more specific.
#That means for example that setting a limit for wildcard domain here
#can be overriden with a wildcard setting in a config file in the
#subdirectory, but a user specific setting here can be overriden only
#with a user specific setting in the subdirectory.
#
#Each line describes a limit for a user in the form:
#
#<domain>        <type>  <item>  <value>
#
#Where:
#<domain> can be:
#        - a user name
#        - a group name, with @group syntax
#        - the wildcard *, for default entry
#        - the wildcard %, can be also used with %group syntax,
#                 for maxlogin limit
#
#<type> can have the two values:
#        - "soft" for enforcing the soft limits
#        - "hard" for enforcing hard limits
#
#<item> can be one of the following:
#        - core - limits the core file size (KB)
#        - data - max data size (KB)
#        - fsize - maximum filesize (KB)
#        - memlock - max locked-in-memory address space (KB)
#        - nofile - max number of open file descriptors
#        - rss - max resident set size (KB)
#        - stack - max stack size (KB)
#        - cpu - max CPU time (MIN)
#        - nproc - max number of processes
#        - as - address space limit (KB)
#        - maxlogins - max number of logins for this user
#        - maxsyslogins - max number of logins on the system
#        - priority - the priority to run user process with
#        - locks - max number of file locks the user can hold
#        - sigpending - max number of pending signals
#        - msgqueue - max memory used by POSIX message queues (bytes)
#        - nice - max nice priority allowed to raise to values: [-20, 19]
#        - rtprio - max realtime priority
#
#<domain>      <type>  <item>         <value>
#

#*               soft    core            0
#*               hard    rss             10000
#@student        hard    nproc           20
#@faculty        soft    nproc           20
#@faculty        hard    nproc           50
#ftp             hard    nproc           0
#@student        -       maxlogins       4

# End of file
* soft memlock unlimited
* hard memlock unlimited
But there is no hard coded limitation on the number of processes to be spawn or maximum numbers of login. I have also tried to physically make >= 20 concurrent connections to the node. But it os successful.

So, my question is that where the issue arises ?
 
Old 12-24-2019, 09:24 PM   #2
berndbausch
LQ Addict
 
Registered: Nov 2013
Location: Tokyo
Distribution: Mostly Ubuntu and Centos
Posts: 6,316

Rep: Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002Reputation: 2002
Quote:
Originally Posted by pritamthecoder View Post
Now, it should print 0-19 in the stdout.log file. But some of them are printed and for the rest I am getting the line ssh_exchange_identification: read: Connection reset by peer in ther stderr.log file.
Is there consistency? I.e., does the connection to a given node always succeed or fail, or does the connection to a given node sometimes succeed, and sometimes fail?

As a first step, I would check /var/log/secure on the failing nodes and use the ssh client's -v option to get more detail about the failure. For even more detail, use -vv and -vvv.
It's also possible to run the ssh server with a debug option, though this implies it running in the foreground if I am not mistaken.

DISCLAIMER: I know nothing about PBS and can't tell if your problem is related to the scheduler in any way.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
PBS name changing mijohnst Linux - Software 2 07-18-2005 11:06 AM
[HP VECTRA VL400 Desktop Phoenix BIOS] HD Speed pbs sshd Linux - Hardware 0 07-20-2004 05:15 AM
does anyone remember that pbs physics/science show? Muddy General 8 05-11-2004 02:04 PM
Red hat 9 linksys wireless router pci card pbs cant recognze adp pmasser1 Linux - Hardware 0 01-05-2004 08:57 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 09:38 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration