SAPAC logo The South Australian Partnership for Advanced Computing spacer
Information about Information for Search SAPAC Contact SAPAC

Porting programs

Sequential Programs

Sequential programs should run without change on a single processor of the cluster. You can therefore use the cluster without knowing how to write parallel programs, simply by submitting multiple sequential jobs.

Automatic Parallelization

If you want to speed up your program by running concurrently on multiple processors, the easiest way is to just ask the compiler to try to automatically parallelize your standard C, C++ or Fortran program. The effectiveness of automatic parallelization is heavily dependent on the structure of the program, and it does not always work well. The resulting parallel program will use shared memory, so it can only be run on a single node of the cluster, i.e. on at most 4 processors.

For example, to use the f90 compiler to automatically parallelize a standard Fortran program, you will need to use the compiler flags

  -autopar -stackvar
      

when compiling and linking the program.

The number of parallel processes (or threads) you want to use is specified by an environment variable PARALLEL that must be set either in your current shell or in a runfile script. This should be at most 4 on Orion, which will allocate 1 process per processor on a node.

You will probably also need to increase the stack size in order for the parallel program to run (otherwise you may see a Segmentation fault error when you try to run the program). This is done using the unlimit command to remove stack size limits. You may also need to set the STACKSIZE environment variable to increase the stack size for each thread. For example, the following commands are used to set up the environment in order to run 4 processes with 32 MBytes of stack for each process:

unlimit
setenv PARALLEL 4
setenv STACKSIZE 32768

You will then need to submit the job (see the section on running jobs) using the command to run on multiple processors, e.g.

mprun -np 4 myprogram

You should always check the performance of the parallel program on different numbers of processors, and compare it to the performance of the sequential program. There is no point running on 4 processors if the program runs just as fast on 2 processors (or 1 processor!).

Check Sun's documentation (e.g. on Fortran programming) and the man pages for the compilers (cc or f90) for more information on using automatic parallelization.

Parallel Programming

Alternatively, you can port or write your programs using a standard parallel programming language. Programs written using High Performance Fortran (HPF), Message Passing Interface (MPI), or OpenMP (shared memory directives) can be compiled and run on the cluster. OpenMP programs can only be run on one node (4 processors) since they use shared memory. HPF programs can be run on up to 32 processors (this is a license restriction). MPI jobs can be run on any number of processors.

HPF

Programs written in Fortran 90 and using Fortran 90 array syntax can be ported to HPF fairly simply by adding compiler directives to specify the distribution of arrays over processors. For more information on HPF and the Portland Group HPF compiler that is provided on Orion, see the PGHPF User's Guide and the PGHPF Reference Manual. There is a good online HPF Programming Course from Edinburgh Parallel Computing Centre. The High Performance Fortran Handbook by C.H. Koelbel et al. is a useful reference, and there is a copy in the CSSM library.

Programs written in Connection Machine Fortran (CMF) can easily be ported to HPF. A simple script called cmf2hpf, which automatically converts CMF code to HPF, is available in /usr/local/bin on Orion. Be sure to read the comments at the top of the script before using it, since it is not a general converter and will not work for all CMF codes. Since the cluster has a different architecture to the CM-5, you may need to tune your program to get good performance on the cluster.

MPI

You can use MPI to parallelize programs written in Fortran, C or C++. This is much more difficult to program than HPF or OpenMP, but typically gives better performance. For more information on MPI, you can look at this list of materials for learning MPI. There is a good online MPI Programming Course from Edinburgh Parallel Computing Centre. A standard reference book is Using MPI: Portable Parallel Programming with the Message-Passing Interface, by William Gropp, Ewing Lusk and Anthony Skjellum, MIT Press, 1994.

Note that the Sun HPC ClusterTools provides a good parallel debugger and performance analyzer known as Prism, which is based on software developed for the Connection Machine. See the Sun documentation for more information.

Parallel Scientific Software Libraries

For some programs, the majority of the time is taken up in standard routines such as matrix solve, FFT, or computing eigenvalues. In that case, it is possible to use libraries containing parallel versions of these routines, which should speed up your program without requiring you to write any parallel code.

The Sun HPC ClusterTools provides the Sun Scientific Software Library (SSL), which contains optimized parallel versions of standard mathematical routines. See the Sun documentation for more information.

Sun Performance Library is a sequential scientific software library that includes a version of LAPACK.

For help with porting programs and optimizing performance on the cluster, contact the SAPAC helpdesk.

Orion User's Guide

SAPAC SITE MAP