Next:
Compile-time options Up: Installing and Building POP Previous: Make Procedure   Contents


Domain decomposition

In order to understand some aspects of compiling and running POP, a few words must be said here about how POP breaks up a problem to run on different threads and processors. Note that even the serial versions decompose the domain in order to achieve better performance on cache-based microprocessors.

In POP, the full horizontal domain size (nx_global,ny_global) is broken up into domains or blocks. The size of these blocks can be chosen to achieve better performance as described below. Any block size can be chosen, but to avoid padding the domain with extra points, the block size in each direction should be chosen such that it divides the global domain size in that direction evenly.

Once the domain has been decomposed into blocks, the blocks are distributed among the processors or nodes, ignoring blocks that only contain land points. The distribution of blocks across processors or nodes can be performed using either a load-balanced distribution to try to give all processors an equal amount of work or a Cartesian distribution which ensures that the block's north, south, east and west neighbors remain nearest neighbors. A load-balanced distribution is generally better for the baroclinic section of the code; a Cartesian distribution is better for the barotropic solver. Different distributions can be specified for the baroclinic and barotropic parts of the code.

Such a domain decomposition allows some flexibility in tuning the model for the best performance. Generally, a smaller block size will improve processor performance on cache-based microprocessors and a smaller block size should ensure a better load balance and better land point elimination. However, smaller block sizes add complexity to the communication routines (boundary updates, global reductions) and will result in a performance penalty for the barotropic solver. The user will need to experiment with a few combinations to find the best configuration for the simulation being run.

CESM1 Notes

During the CESM1 configure stage, CESM1 POP2 scripts interact with the CESM1 scripts to automatically specify a recommended domain decomposition, based upon domain size, processor counts, and the grid used (eg, gx1v6, gx3v7, or tx0.1v2).



Next:
Compile-time options Up: Installing and Building POP Previous: Make Procedure   Contents

2010-01-26