SAPAC logo The South Australian Partnership for Advanced Computing spacer
Information about Information for Search SAPAC Contact SAPAC

Problems, features and workarounds

Sun ClusterTools

  • mpps continues to display details of some jobs that are no longer running, particularly if they have crashed or been killed. Only jobs with STATE given as RUN should be running on the machine, the rest are phantom or zombie processes. Even some jobs listed as running may not exist.
    If mpps lists a job that is not actually running, please notify the SAPAC helpdesk, who can kill it by running mpkill -C on the master node (only root can do this).

  • Killing a job using mpkill does not always work, particularly if the job has hung or has some other systems problem. In that case, you will need to find the process number of the job using ps and kill the job using the standard Unix kill command. DO NOT use kill -9, or else the kill command will not terminate the sub-processes running on the cluster.

  • Occasionally a script to submit a job will give a spurious error that it cannot find the executable file. You will need to check that the job has actually started running correctly, and if not, just resubmit it and it will usually work the second time.

  • Sometimes parallel jobs will hang without doing anything. Killing the job and rerunning it usually works.

  • Background jobs submitted using mprun that do not require input will be suspended waiting on input even though it is not required. To avoid this problem, use the -n flag, which tells the program to read standard input from /dev/null, e.g.

      mprun -n -np 16 myprogram > output.file & 

PGHPF Compiler

  • Do not create a subdirectory called pghpf in the directory you are using to compile programs using pghpf, or the compile will fail with an error

       pghpf-error-Unable to open - .pghpfrc 
  • The -fast compiler optimization flag (or any optimization above -O2, e.g. -O4 or -O5) does not get passed to the native Sun compiler. You need to explicitly tell the pghpf compiler to pass this option, using the -W0 flag. So instead of using -fast, you should use -fast -W0,-fast.

  • Occasionally the PGHPF compiler will give a compilation error when compiling programs or subroutines containing intrinsic functions such as cos, exp, log, etc with -fast. Recompiling this routine using -W0,-xvector=no should fix the problem.

  • The PGHPF compiler is unable to compile some programs or subroutines at the highest optimization level, i.e. using -fast. Try compiling at a lower optimization level, e.g. -O4 -W0,-O4.

  • Initializing arrays when they are declared can cause problems with the PGHPF compiler. Initializing arrays after they are declared will fix this problem.

  • Using FORMAT statements can cause the PGHPF 3.2 compiler to die when compiling HPF programs, giving Segmentation Fault, Bus Error or INTERNAL errors. Even when it does compile, the program may crash. In this case you will need to remove the FORMAT statements and just include the appropriate formatting in the WRITE statements.


Please contact the Orion system administrators at the SAPAC helpdesk with any other problems you encounter.

 

Orion User's Guide

SAPAC SITE MAP