SAPAC logo The South Australian Partnership for Advanced Computing spacer
Information about Information for Search SAPAC Contact SAPAC

Titan - Running jobs

Jobs are run on Titan by submitting a jobscript to the Torque queuing system.

Jobs are submitted to the Torque queuing system by issuing the command:

  qsub myscript

where myscript contains the relevant Torque commands.

Below are some generic examples of scripts with brief descriptions of each of the various Torque components. These may be adapted to suit your needs. Please note that you only need change those bits shown in red in order to get a functioning jobscript for Torque:


Sample Torque Jobscript for a Sequential Job

#!/bin/sh

#PBS -V

### Job name
#PBS -N MyJobName

### Join queuing system output and error files into a single output file
#PBS -j oe

### Send email to user when job ends or aborts
#PBS -m ae

### email address for user
#PBS -M Your-email-Address

### Queue name that job is submitted to
#PBS -q titan

### Request nodes NB THIS IS REQUIRED
#PBS -l nodes=1:ppn=1,walltime=HH:MM:SS

# This job's working directory
echo Working directory is $PBS_O_WORKDIR
cd $PBS_O_WORKDIR
echo Running on host `hostname`
echo Time is `date`

# Run the executable
MyProgram+Arguments

NOTES:

  1. All lines beginning with #PBS are interpreted as Torque commands directly to the queuing system.
  2. Output and error messages will be joined into a file that will be called something like MyJobName.oXXX in the directory from which the job is submitted.
  3. MyJobName should be a concise but identifiable alphanumeric name for the job (starting with a letter, NOT a number).
  4. MyProgram+Arguments should include the full path to the program
  5. The routing queue titan will allocate your job to an appropriate queue for execution
  6. If a specific walltime is not requested by your job in the format HH:MM:SS, it will only be allocated 1 hour by default
  7. A copy of this sample script can be obtained here.


Sample Torque Jobscript for a Parallel MPI Job

#!/bin/sh

#PBS -V

### Job name
#PBS -N MyMPIJobName

### Output files
#PBS -j oe

### Mail to user when job ends or aborts
#PBS -m ae

### Mail address for user
#PBS -M Your-email-Address

### Queue name
#PBS -q titan

### Number of nodes
#PBS -l nodes=X:ppn=2,walltime=HH:MM:SS


# Calculate the number of processors to be used
NP=$(wc -l $PBS_NODEFILE | awk '{print $1}')


# This job's working directory
echo Working directory is $PBS_O_WORKDIR
cd $PBS_O_WORKDIR
echo Running on host `hostname`
echo Time is `date`
echo Using nodes
cat $PBS_NODEFILE

# Run the executable
mpirun -np $NP -machinefile $PBS_NODEFILE MyProgram+Arguments

NOTES:

  1. All lines beginning with #PBS are interpreted as PBS commands.
  2. Output and error messages will be joined into a file that will be called something like MyMPIJobName.oXXX in the directory from which the job is submitted.
  3. MyMPIJobName should be a concise but identifiable alphanumeric name for the job, starting with a letter, NOT a number.
  4. nodes=X:ppn=2
    X is the number of nodes requested.
    ppn=2 indicates that each requested node has two processors per node: this should be included.
  5. MyProgram+Arguments should include the full path to the program.
    Note that the -np option indicates the number of MPI processes to run and can be up to 2x the number of nodes requested (X).
    Note also that the -machinefile $PBS_NODEFILE option passes the Torque assigned list of nodes to MPICH.
  6. If a specific walltime is not requested by your job in the format HH:MM:SS, it will only be allocated 1 hour by default
  7. A copy of this sample script can be obtained here


NOTE:

Some of the nodes available on Titan have access to larger local disks (40GB or 20GB) and up to 512MB RAM. These nodes have been tagged with the properties "disk40", "disk20" and "mem512", as appropriate. If your job requires access to any of these resources then they need to be specifically requested within the torque job submission script. This is easily done by merely altering the content of the "#PBS -l" line, which is the resource list request, as in the following example;

 #PBS -l nodes=2:ppn=2:disk40:mem512,walltime=02:00:00

requests 2 nodes, each with 2 cpu's and having 40GB disk drives and 512MB RAM per node, for a 2 hour period.


Checking a Job's Status in the Queue

Once a job has been submitted to Torque via the qsub command a job.id of the form XXX.hydra will be displayed on the screen. This job.id is helpful for displaying the progress of your job via the qstat command. To check on a job's status in the queue on Titan, type

  qstat

Output similar to the following will be displayed:

Job id           Name             User             Time Use S Queue
---------------- ---------------- ---------------- -------- - -----
787.titan      flicprop3        wkamleh             2873:45: R student          
886.titan      flicprop0        wkamleh             3115:56: R student        
915.titan      flicprop2        wkamleh             2370:40: R student        
920.titan      dyn.mparappi     mparappi            00:25:22 R student      
921.titan      dyn.mparappi     mparappi                   0 Q student       
      

You can readily identify your job's place and status in the queueing system using either the job.id or the name you provided in the jobscript.

The above sample output show 4 jobs running on the queue student and 1 queued job that has not yet started on the queue student.

NOTE: The job names flicprop0, flicprop2,flicprop3 are useful in that they are concise and uniquely identify jobs by name. However, the other 2 jobs in the queue are badly named.


Deleting a Queued Job

To delete a queued or running job type

  qdel job.id

where the job.id is that given by the output of qstat

NOTE: You will only be able to delete your own jobs.


PLEASE READ THE MANUAL PAGES FOR THESE TORQUE COMMANDS !!

Manual pages can be read by invoking the man command, for example;

   man qstat

will bring up the manual page for the torque qstat command. Navigation through manpages is done via the following keys;

  • Space Bar - Scroll forward a page at a time

  • Down Arrow - Scroll forward a line at a time

  • "B" key - Scroll backwards a page at a time

  • Up Arrow - Scroll backwards a line at a time

  • "Q" key - Quit the manual and return to the command line prompt

 

Titan User's Guide

SAPAC SITE MAP