Aquila - Running jobs
Jobs are run on Aquila by submitting a jobscript to the Torque queuing system.
Jobs are submitted to the Torque queuing system by issuing the command:
qsub myscript
where myscript contains the relevant Torque commands.
Below are some generic examples of scripts with brief descriptions of each of the various Torque components. These may be adapted to suit your needs. Please note that you only need change those bits shown in red in order to get a functioning jobscript for Torque:
Sample Torque Jobscript for a Sequential Job
#!/bin/csh
#PBS -V
### Job name
#PBS -N MyJobName
### Join queuing system output and error files into a single output file
#PBS -j oe
### Send email to user when job ends or aborts
#PBS -m ae
### email address for user
#PBS -M Your-email-Address
### Queue name that job is submitted to
#PBS -q aquila
### Request nodes NB THIS IS REQUIRED
#PBS -l ncpus=1,nodes=ppn=1,mem=1GB,walltime=HH:MM:SS
# This job's working directory
echo Working directory is $PBS_O_WORKDIR
cd $PBS_O_WORKDIR
echo Running on host `hostname`
echo Time is `date`
# Run the executable
MyProgram+Arguments
NOTES:
- All lines beginning with
#PBS are interpreted as Torque commands directly to the queuing system.
- Output and error messages will be joined into a file that will be called something like
MyJobName.oXXX in the directory from which the job is submitted.
- MyJobName should be a concise but identifiable alphanumeric name for the job (starting with a letter, NOT a number).
- MyProgram+Arguments should include the full path to the program
- If a specific amount of memory is not requested by your job, it will only be allocated 1GB by default. Your job will terminate if it tries to access more than the requested amount of memory.
- If a specific walltime is not requested by your job in the format HH:MM:SS, it will only be allocated 1 hour by default
- A copy of this sample script can be obtained here.
Sample Torque Jobscript for an OpenMP Job
#!/bin/csh
#PBS -V
### Job name
#PBS -N MyOpenMPJobName
### Output files
#PBS -j oe
### Mail to user when job ends or aborts
#PBS -m ae
### Mail address for user
#PBS -M Your-email-Address
### Queue name
#PBS -q aquila
### Number of processors, amount of memory and time required
#PBS -l ncpus=XX,nodes=ppn=XX,mem=YYGB,walltime=HH:MM:SS
# This job's working directory
echo Working directory is $PBS_O_WORKDIR
cd $PBS_O_WORKDIR
echo Running on host `hostname`
echo Time is `date`
setenv OMP_NUM_THREADS XX
# Run the executable
MyProgram+Arguments
NOTES:
- All lines beginning with
#PBS are interpreted as PBS commands.
- Output and error messages will be joined into a file that will be called something like
MyOpenMPJobName.oXXX in the directory from which the job is submitted.
- MyOpenMPJobName should be a concise but identifiable alphanumeric name for the job, starting with a letter, NOT a number.
- ncpus=XX,nodes=ppn=XX
XX is the number of processors requested. This will be the number of OpenMP threads you require.
ncpus,nodes and ppn parts should all be included.
- mem=YYGB
YY is the amount of memory requested.
If a specific amount of memory is not requested by your job, it will only be allocated 1GB by default.
- setenv OMP_NUM_THREADS XX
XX is the number of OpenMP threads requested.
- MyProgram+Arguments should include the full path to the program.
- If a specific amount of memory is not requested by your job it will only be allocated 1GB by default
- If a specific walltime is not requested by your job in the format HH:MM:SS, it will only be allocated 1 hour by default
- A copy of this sample script can be obtained here
Sample Torque Jobscript for a MPI Job
#!/bin/csh
#PBS -V
### Job name
#PBS -N MyMPIJobName
### Output files
#PBS -j oe
### Mail to user when job ends or aborts
#PBS -m ae
### Mail address for user
#PBS -M Your-email-Address
### Queue name
#PBS -q aquila
### Number of nodes, amount of memory and time required
#PBS -l ncpus=XX,nodes=ppn=XX,mem=YYGB,walltime=HH:MM:SS
# This job's working directory
echo Working directory is $PBS_O_WORKDIR
cd $PBS_O_WORKDIR
echo Running on host `hostname`
echo Time is `date`
# Run the executable
mpirun -np XX MyProgram+Arguments
NOTES:
- All lines beginning with
#PBS are interpreted as PBS commands.
- Output and error messages will be joined into a file that will be called something like
MyMPIJobName.oXXX in the directory from which the job is submitted.
- MyMPIJobName should be a concise but identifiable alphanumeric name for the job, starting with a letter, NOT a number.
- ncpus=XX,nodes=ppn=XX
XX is the number of processors requested.
ncpus,nodes and ppn parts should all be included.
- mem=YYGB
YY is the amount of memory requested.
If a specific amount of memory is not requested by your job, it will only be allocated 1GB by default.
- MyProgram+Arguments should include the full path to the program.
- If a specific walltime is not requested by your job in the format HH:MM:SS, it will only be allocated 1 hour by default
- A copy of this sample script can be obtained here
Checking a Job's Status in the Queue
Once a job has been submitted to Torque via the qsub command a job.id of the form XXX.aquila will be displayed on the screen. This job.id is helpful for displaying the progress of your job via the qstat command. To check on a job's status in the queue on Aquila, type
qstat
Output similar to the following will be displayed:
Job id Name User Time Use S Queue
---------------- ---------------- ---------------- -------- - -----
787.aquila flicprop3 wkamleh 2873:45: R aquila
886.aquila flicprop0 wkamleh 3115:56: R aquila
915.aquila flicprop2 wkamleh 2370:40: R aquila
920.aquila dyn.mparappi mparappi 00:25:22 R aquila
921.aquila dyn.mparappi mparappi 0 Q aquila
You can readily identify your job's place and status in the queueing system using either the job.id or the name you provided in the jobscript.
The above sample output show 4 jobs running on the queue aquila and 1 queued job that has not yet started on the queue aquila.
NOTE: The job names flicprop0, flicprop2,flicprop3 are useful in that they are concise and uniquely identify jobs by name. However, the other 2 jobs in the queue are badly named.
Deleting a Queued Job
To delete a queued or running job type
qdel job.id
where the job.id is that given by the output of qstat
NOTE: You will only be able to delete your own jobs.
PLEASE READ THE MANUAL PAGES FOR THESE TORQUE COMMANDS !!
Aquila User's Guide
|