SAPAC logo The South Australian Partnership for Advanced Computing spacer
Information about Information for Search SAPAC Contact SAPAC

Hydra - Running jobs

Jobs are run on Hydra by submitting a jobscript to the Torque queuing system.

Jobs are submitted to the Torque queuing system by issuing the command:

  qsub myscript

where myscript contains the relevant Torque commands.

Below are some generic examples of scripts with brief descriptions of each of the various Torque components. These may be adapted to suit your needs. Please note that you only need change those bits shown in red in order to get a functioning jobscript for Torque:


Sample Torque Jobscript for a Sequential Job

#!/bin/sh

#PBS -V

### Job name
#PBS -N MyJobName

### Join queuing system output and error files into a single output file
#PBS -j oe

### Send email to user when job ends or aborts
#PBS -m ae

### email address for user
#PBS -M Your-email-Address

### Queue name that job is submitted to
#PBS -q hydra

### Request nodes NB THIS IS REQUIRED
#PBS -l nodes=1,walltime=HH:MM:SS

# This job's working directory
echo Working directory is $PBS_O_WORKDIR
cd $PBS_O_WORKDIR
echo Running on host `hostname`
echo Time is `date`

# Run the executable
MyProgram+Arguments

NOTES:

  1. All lines beginning with #PBS are interpreted as Torque commands directly to the queuing system.
  2. Output and error messages will be joined into a file that will be called something like MyJobName.oXXX in the directory from which the job is submitted.
  3. MyJobName should be a concise but identifiable alphanumeric name for the job (starting with a letter, NOT a number).
  4. MyProgram+Arguments should include the full path to the program
  5. The routing queue hydra will allocate your job to an appropriate queue for execution
  6. If a specific walltime is not requested by your job in the format HH:MM:SS, it will only be allocated 1 hour by default
  7. A copy of this sample script can be obtained here.


Sample Torque Jobscript for a MPI Job

#!/bin/sh

#PBS -V

### Job name
#PBS -N MyMPIJobName

### Output files
#PBS -j oe

### Mail to user when job ends or aborts
#PBS -m ae

### Mail address for user
#PBS -M Your-email-Address

### Queue name
#PBS -q hydra
### Number of nodes
#PBS -l nodes=X:ppn=2,walltime=HH:MM:SS


# Calculate the number of processors to be used
NP=$(wc -l $PBS_NODEFILE | awk '{print $1}')


# This job's working directory
echo Working directory is $PBS_O_WORKDIR
cd $PBS_O_WORKDIR
echo Running on host `hostname`
echo Time is `date`
echo Using nodes
cat $PBS_NODEFILE

# Run the executable
mpiexec -n $NP MyProgram+Arguments

NOTES:

  1. All lines beginning with #PBS are interpreted as PBS commands.
  2. Output and error messages will be joined into a file that will be called something like MyMPIJobName.oXXX in the directory from which the job is submitted.
  3. MyMPIJobName should be a concise but identifiable alphanumeric name for the job, starting with a letter, NOT a number.
  4. nodes=X:ppn=2
    X is the number of nodes requested.
    ppn=2 indicates that each requested node has two processors per node: this should be included.
  5. MyProgram+Arguments should include the full path to the program.
    Note the use of mpiexec instead of mpirun to run MPI programs under Torque.
    Also note that the -n option indicates the number of MPI processes to run and can be up to 2x the number of nodes requested (X)
  6. If a specific walltime is not requested by your job in the format HH:MM:SS, it will only be allocated 1 hour by default
  7. A copy of this sample script can be obtained here


Checking a Job's Status in the Queue

Once a job has been submitted to Torque via the qsub command a job.id of the form XXX.hydra will be displayed on the screen. This job.id is helpful for displaying the progress of your job via the qstat command. To check on a job's status in the queue on Hydra, type

  qstat

Output similar to the following will be displayed:

Job id           Name             User             Time Use S Queue
---------------- ---------------- ---------------- -------- - -----
787.hydra      flicprop3        wkamleh             2873:45: R large          
886.hydra      flicprop0        wkamleh             3115:56: R large        
915.hydra      flicprop2        wkamleh             2370:40: R large        
920.hydra      dyn.mparappi     mparappi            00:25:22 R small      
921.hydra      dyn.mparappi     mparappi                   0 Q seq       
      

You can readily identify your job's place and status in the queueing system using either the job.id or the name you provided in the jobscript.

The above sample output show 3 jobs running on the queue large, 1 job running on the queue small and 1 queued job that has not yet started on the queue seq.

NOTE: The job names flicprop0, flicprop2,flicprop3 are useful in that they are concise and uniquely identify jobs by name. However, the other 2 jobs in the queue are badly named.


Deleting a Queued Job

To delete a queued or running job type

  qdel job.id

where the job.id is that given by the output of qstat

NOTE: You will only be able to delete your own jobs.


PLEASE READ THE MANUAL PAGES FOR THESE TORQUE COMMANDS !!

 

Hydra User's Guide

SAPAC SITE MAP