![]() |
![]() |
||||
![]() |
![]() |
![]() |
![]() |
||
Running jobs Jobs are submitted to the cluster using Sun's Cluster Runtime Environment (CRE). The CRE is easy to use, but does not support batch queues. Under CRE, jobs are placed on the most lightly loaded nodes. If no unloaded nodes are available, multiple processes will be run on each node. This can be very inefficient and should be avoided if possible. Running Jobs using CRE With Sun's Cluster Runtime Environment (CRE), both sequential and parallel jobs are submitted to the cluster by using the mprun command. To run a sequential job, the command is mprun myprogram where myprogram denotes the standard command you would use to execute your program on the front end, including specification of command line parameters and any input or output files. You can also use redirects for standard input and output. For a parallel job, the command is mprun -np N myprogram where N is the number of processors you want to use. By default the CRE will automatically select the nodes used for executing your job. If instead you want to specify particular nodes for your job to run on, you need to create a machine file that contains a list of which nodes the program should run on, and how many processes should be run on each of the nodes. For example to run a 16-processor job on nodes orion04,orion05,orion06,orion07, with 4 processes per node (i.e. one process on each processor), the machine file (or rank map in CRE jargon) should look like:
and the job should be submitted using the -Mf option to specify the machine file (called machinefile in this example): mprun -np 8 -Mf machinefile myprogram The CRE is configured to see two separate clusters, one for the main cluster (called orion10) and one for the dual processor front end (called orion00). The default setup will run jobs on the main cluster. If you want to run a short parallel test job on the front end, you will need to specify the cluster using mprun -c orion00 -np N myprogram You can check what jobs and processes are running using the mpps command, which is similar to the Unix ps command.
You can check the status and time-averaged load of the nodes using the mpinfo command.
You can kill jobs using the mpkill command (using the job ID returned by mpps), or alternatively you can use the standard kill command on the process running on the front end. DO NOT use kill -9, since that will not allow the process running on the front end to first terminate the sub-processes running on each node. NOTE: The mpps and mpkill are rather fragile and not very tolerant of jobs or machines crashing. You may therefore see processes listed by mpps that are not actually running, and you may not be able to kill processes with mpkill. NOTE: background jobs submitted using mprun that do not require input will be suspended waiting on input even though it is not required. To avoid this problem, use the -n flag, which tells the program to read standard input from /dev/null, for example; mprun -n -np 8 myprogram > output.file &
|
| SAPAC SITE MAP |