Latest LQ Deal: Complete CCNA, CCNP & Red Hat Certification Training Bundle
Go Back > Forums > Linux Forums > Linux - Newbie
User Name
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!


  Search this Thread
Old 05-11-2008, 08:00 AM   #1
LQ Newbie
Registered: Jun 2007
Posts: 23

Rep: Reputation: 15
Question on job submission on a Linux Cluster

Hello All,

The post concerns my questions on Torque. I am working on a IBM BladeCenter JS21 Linux cluster.

I am new to Torque. Before, I have been programming parallel program with MPI + C++ on a IBM BladeCenter cluster(running Linux) where Torque-2.1.1 is installed.The cluster consists of a admin node and 14 computing nodes.

to acquaint myself with Torque, i made a simple MPI + C++ program (just creating files) looking like the following

// ##### Test_PBS_Pgm.cpp ######

#include <string>
#include <string.h>
#include <iostream>
int main(int argc, char *argv[])
int User_Close_GtkWindow_MPI = 0;
int Query_FileName_Len;
int i;
int myid, numprocs;

double startwtime = 0.0, endwtime;
int namelen;
char processor_name[MPI_MAX_PROCESSOR_NAME];

MPI_Get_processor_name(processor_name,&namelen); //MPI

// CREATE FILES, File names depend on individual process ID
std::string InPut_FileName= "/home/yongchen/Temp___";
char strMyID[10];
sprintf(strMyID,"%d",myid );
InPut_FileName += strMyID;
FILE * fileSegedDoc;
fileSegedDoc =fopen(InPut_FileName.c_str(), "w");

It was observed that the program can run successfully with the following command

mpiexec -n NUMBER_NODES ./Test_PBS_Pgm

Later, I tried to submit the job using Torque.Unfortunately, I encounter some problems.

I initiate 14 computing nodes using the command

mpdboot -n 14

Then I use the command


to see whether all 14 computing nodes are ready

'mpdtrace' will produce a list of node names of running computing nodes. The list is shown below.Actually, the list is the same as the contents in the file mpd.hosts (mpd.hosts will be used in a script later)


the script for submitting the job is shown below

// ###### #################
#PBS -l nodes=3pn=1
$MPIRUN -np $NCPUS -machinefile ../mpd.hosts $myPROG >& out2

I use


to submit the job

After submitting, I got a job number like YY.xcat1 (xcat1 is the name of admin node)

It is observed that Temp___0 Temp___1 Temp___2 three files can be created successfully.

However, when I run 'mpdtrace' again.the node names hpc12, hpc13, and hpc14 are absent in the list comaring to the previous list. If I submit the job again,i.e., executing the command 'qsub'again. The job will failed. This means that after executing the job, hpc12, hpc13, and hpc14 exit from the computing node community for unknown reason. This is confirmed by the subsequent observation. In the file out2, I can see the following error message

mpiexec-hpc14: cannot connect to local mpd (/tmp/mpd2.console.yong)
possilbe causes:
1. no mpd is running on this host
2. an mpd is running but ws stated without a "console"

However, I can see all 14 nodes is 'free' with the command 'pbsnodes'

In addition,in output file Test_PBS_Pgm.oYY, I can see that hpc12, hpc13, and hpc14 were used for this job, but the name of created files were Temp___0 Temp___1 Temp___2.

Another problem is concerned with execution speed.

It is very quick to run the job with directly using 'mpiexec -n NUMBER_NODES ./Test_PBS_Pgm '

However, it will take 10 seconds if I submit the job using 'qsub'.

I execute 'qstat' after I execut 'qsub'. I observed that the state of job had been in 'Q' state (i.e.,waiting in the queue for execution) for serveral seconds. However, there were no other jobs in the queue at all.

These problems have sticked me for a long time.

Please help me. I appreciated any help very much



Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off

Similar Threads
Thread Thread Starter Forum Replies Last Post
Job submission on cluster kossi_vi Linux - Server 1 03-09-2007 09:01 AM
Capture at job number upon submission AmyBVT Programming 4 03-10-2006 12:31 AM
Cluster question: Is it possible to make a bulletproof cluster? ValidiusMaximus Linux - Software 1 09-06-2005 02:07 PM
Linux cluster question maenho Linux - Software 5 06-15-2005 08:51 PM
a Linux Cluster simple question!!? chutsu Linux - Hardware 6 01-07-2005 12:30 AM > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 06:02 AM.

Main Menu
Write for LQ is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration