Hydra - Running jobs
Jobs are run on Hydra by submitting a jobscript to the Torque queuing system.
Jobs are submitted to the Torque queuing system by issuing the command:
qsub myscript
where myscript contains the relevant Torque commands.
Below are some generic examples of scripts with brief descriptions of each of the various Torque components. These may be adapted to suit your needs. Please note that you only need change those bits shown in red in order to get a functioning jobscript for Torque:
Sample Torque Jobscript for a Sequential Job
#!/bin/sh
#PBS -V
### Job name
#PBS -N MyJobName
### Join queuing system output and error files into a single output file
#PBS -j oe
### Send email to user when job ends or aborts
#PBS -m ae
### email address for user
#PBS -M Your-email-Address
### Queue name that job is submitted to
#PBS -q hydra
### Request nodes NB THIS IS REQUIRED
#PBS -l nodes=1,walltime=HH:MM:SS
# This job's working directory
echo Working directory is $PBS_O_WORKDIR
cd $PBS_O_WORKDIR
echo Running on host `hostname`
echo Time is `date`
# Run the executable
MyProgram+Arguments
NOTES:
- All lines beginning with
#PBS are interpreted as Torque commands directly to the queuing system.
- Output and error messages will be joined into a file that will be called something like
MyJobName.oXXX in the directory from which the job is submitted.
- MyJobName should be a concise but identifiable alphanumeric name for the job (starting with a letter, NOT a number).
- MyProgram+Arguments should include the full path to the program
- The routing queue hydra will allocate your job to an appropriate queue for execution
- If a specific walltime is not requested by your job in the format HH:MM:SS, it will only be allocated 1 hour by default
- A copy of this sample script can be obtained here.
Sample Torque Jobscript for a MPI Job
#!/bin/sh
#PBS -V
### Job name
#PBS -N MyMPIJobName
### Output files
#PBS -j oe
### Mail to user when job ends or aborts
#PBS -m ae
### Mail address for user
#PBS -M Your-email-Address
### Queue name
#PBS -q hydra
### Number of nodes
#PBS -l nodes=X:ppn=2,walltime=HH:MM:SS
# Calculate the number of processors to be used
NP=$(wc -l $PBS_NODEFILE | awk '{print $1}')
# This job's working directory
echo Working directory is $PBS_O_WORKDIR
cd $PBS_O_WORKDIR
echo Running on host `hostname`
echo Time is `date`
echo Using nodes
cat $PBS_NODEFILE
# Run the executable
mpiexec -n $NP MyProgram+Arguments
NOTES:
- All lines beginning with
#PBS are interpreted as PBS commands.
- Output and error messages will be joined into a file that will be called something like
MyMPIJobName.oXXX in the directory from which the job is submitted.
- MyMPIJobName should be a concise but identifiable alphanumeric name for the job, starting with a letter, NOT a number.
- nodes=X:ppn=2
X is the number of nodes requested.
ppn=2 indicates that each requested node has two processors per node: this should be included.
- MyProgram+Arguments should include the full path to the program.
Note the use of mpiexec instead of mpirun to run MPI programs under Torque.
Also note that the -n option indicates the number of MPI processes to run and can be up to 2x the number of nodes requested (X)
- If a specific walltime is not requested by your job in the format HH:MM:SS, it will only be allocated 1 hour by default
- A copy of this sample script can be obtained here
Checking a Job's Status in the Queue
Once a job has been submitted to Torque via the qsub command a job.id of the form XXX.hydra will be displayed on the screen. This job.id is helpful for displaying the progress of your job via the qstat command. To check on a job's status in the queue on Hydra, type
qstat
Output similar to the following will be displayed:
Job id Name User Time Use S Queue
---------------- ---------------- ---------------- -------- - -----
787.hydra flicprop3 wkamleh 2873:45: R large
886.hydra flicprop0 wkamleh 3115:56: R large
915.hydra flicprop2 wkamleh 2370:40: R large
920.hydra dyn.mparappi mparappi 00:25:22 R small
921.hydra dyn.mparappi mparappi 0 Q seq
You can readily identify your job's place and status in the queueing system using either the job.id or the name you provided in the jobscript.
The above sample output show 3 jobs running on the queue large, 1 job running on the queue small and 1 queued job that has not yet started on the queue seq.
NOTE: The job names flicprop0, flicprop2,flicprop3 are useful in that they are concise and uniquely identify jobs by name. However, the other 2 jobs in the queue are badly named.
Deleting a Queued Job
To delete a queued or running job type
qdel job.id
where the job.id is that given by the output of qstat
NOTE: You will only be able to delete your own jobs.
PLEASE READ THE MANUAL PAGES FOR THESE TORQUE COMMANDS !!
Hydra User's Guide
|