The example Network Queuing System (NQS) shell script shown below is a complete script, suitable for actual use. This script first prepares each component for execution and then executes the components simultaneously. While the specifics shown below are a recommended method for running the coupled system on NCAR's Cray computers, variations on this method will also work, some of which may be more appropriate for different situations.
The coupled model batch job script below is a top-level NQS batch job script which in turn calls five separate subscripts called "setup scripts." These subscripts, "cpl.setup.csh," "atm.setup.csh," "ice.setup.csh," "lnd.setup.csh," and "ocn.setup.csh," are responsible for building their respective executable codes and gathering any required input data files, also see § 5. The setup scripts receive input data from the parent NQS script by means of several environment variables. After calling the setup subscripts, the parent NQS script executes the Coupler and all component models simultaneously as background processes. The parent script waits for these background processes to complete; when they do, the coupled run has completed and the NQS script saves stdout output and terminates. Following the example is a more detailed explanation of what is being done in the various parts of the NQS script.
#======================================================================= # This is a CSM coupled model NQS batch job script #======================================================================= #----------------------------------------------------------------------- # (a) Set NQS options #----------------------------------------------------------------------- # QSUB -q reg # select batch queue # QSUB -lT 5:10:00 -lt 5:00:00 # set CPU time limits # QSUB -lM 35Mw -lm 20Mw # set memory limits # QSUB -mb -me -eo # combine stderr & stout # QSUB -s /bin/csh # select shell script # QSUB # no more QSUB options #----------------------------------------------------------------------- #----------------------------------------------------------------------- # (b) Set env variables available to model setup scripts (below) # CASE, CASESTR, RUNTYPE, ARCH , MAXCPUS, MSGLIB, SSD, # MSS , MSSDIR , MSSRPD , MSSPWD, RPTDIR #----------------------------------------------------------------------- setenv CASE test.00 # case name setenv CASESTR '(CSM test)' # short descriptive text string setenv RUNTYPE initial # run type setenv ARCH C90 # machine architecture setenv MAXCPUS 8 # max number of CPUs available setenv MSGLIB MPI # message passing library setenv SSD TRUE # SSD is available? setenv MSS FALSE # MSS is available? setenv MSSDIR /DOE/csm/$CASE # MSS directory path name setenv MSSRPD 365 # MSS file retention period setenv MSSPWD 'rosebud' # MSS file password setenv RPTDIR $HOME # where restart pointer file are saved setenv CSMROOT /fs/cgd/csm # root directory for model codes setenv CSMSHARE $CSMROOT/share # directory of "shared" code #----------------------------------------------------------------------- # (c) Specify input, output, and execution directories # o the component model setup.csh scripts must be in $NQSDIR # o stdout & stderr output is saved in $LOGDIR #----------------------------------------------------------------------- set EXEDIR = /tmp/doe/$CASE # model runs here set NQSDIR = ~doe/$CASE # model setup scripts are here set LOGDIR = $NQSDIR # stdout output goes here #----------------------------------------------------------------------- # (d) Prepare component models for execution # o create execution directories: $EXEDIR/[atm|cpl|lnd|ice|ocn] # o execute the component model setup scripts located in $NQSDIR # (these scripts have access to env variables set above) # o see the man page for details about the Cray assign function #----------------------------------------------------------------------- setenv FILENV ./.assign # allow separate .assign files for each model set LID = "`date +%y%m%d-%H%M%S`" # create a unique log file ID mkdir -p $EXEDIR cd $EXEDIR foreach model (cpl atm ice lnd ocn) mkdir $EXEDIR/$model cd $EXEDIR/$model $NQSDIR/$model.setup.csh >&! $model.log.$LID || exit 2 end #----------------------------------------------------------------------- # (e) Execute models simultaneously (allocating CPUs) #----------------------------------------------------------------------- rm $TMPDIR/*.$LOGNAME.* # rm any old msg pipe files ja $TMPDIR/jacct # start Cray job accounting cd $EXEDIR/cpl env NCPUS=$MAXCPUS cpl -l 0 -n 5 -t 600 < cpl.parm >>&! cpl.log.$LID & cd $EXEDIR/atm env NCPUS=$MAXCPUS atm -l 1 -t 600 < atm.parm >>&! atm.log.$LID & cd $EXEDIR/ocn env NCPUS=$MAXCPUS ocn -l 2 -t 600 < ocn.parm >>&! ocn.log.$LID & cd $EXEDIR/ice env NCPUS=2 ice -l 3 -t 600 < ice.parm >>&! ice.log.$LID & cd $EXEDIR/lnd env NCPUS=$MAXCPUS lnd -l 4 -t 600 < lnd.parm >>&! lnd.log.$LID & ja -tsclh $TMPDIR/jacct # end Cray job accounting #----------------------------------------------------------------------- # (f) save model output (stdout & stderr) to $LOGDIR #----------------------------------------------------------------------- cd $EXEDIR gzip -v */*.log.$LID cp -p */*.log.$LID* $LOGDIR #======================================================================= # End of nqs shell script #=======================================================================
Items (a) through (f) in the above job script are now reviewed.
(a) Set NQS options
The Network Queuing System (NQS) is a special facility available under UNICOS (Cray's version of UNIX). NQS is a batch job facility: you submit your job to NQS and NQS runs your job. The QSUB options set here select the queue, the maximum memory required, the maximum time required, the combining of the NQS script's stdout and stderr, and the shell to interpret the NQS script. See the qsub man page on a UNICOS computer for more information.
(b) Set environment variables for use in the model setup scripts
While the executing of the Coupler and the component models is explicitly done in this NQS script, the building of executables from source code, the gathering of necessary input data files, and any other pre-execution preparation is deferred to the subscripts "cpl.setup.csh," "atm.setup.csh," "ice.setup.csh," "lnd.setup.csh," and "ocn.setup.csh." The 14 environment variables set in the NQS script may be used by the setup scripts to prepare the respective codes for execution. These environment variables are specifically intended to be used as input to the component model setup scripts - these variables are not intended to be accessed by component model executables. It is strongly suggested that component model binaries do not contain a hard-coded dependence on these environment variables. The environment variables are:
(c) Specify input, output, and execution directories
Here we specify the directory where the model will run, the directory where the model setup scripts are found, and the directory where stdout and stderr output data will be saved when the simulation finishes.
(d) Prepare component models for execution
Here the execution directory and component model subdirectories are created and the Coupler and component model setup scripts are invoked. The purpose of the setup scripts is to build their respective executable codes, document what source code and data files are being used, and gather or create any necessary input data files. It is recommended that each component model have it's own, separate setup script. This natural decomposition of code allows the persons responsible for a given model to create an appropriate setup script for their model without being confused by the details of another model. Setting $FILENV to ./.assign allows each executable to create and use it's own, independent assign file. Assign is a UNICOS specific file I/O utility that may or may not be used by the various executables. See the assign man page on a UNICOS system for more details.
(e) Execute component models simultaneously
In section (d), via the setup scripts, all necessary pre-execution preparations were taken care of. At this point, all models are ready to be run. In this section we execute the Coupler and all component models simultaneously as background processes. Command line environment variables allow one to specify different numbers of CPUs to the different component models. The "-l" command line options are used by the message passing system, MPI, to assign logical process numbers to the component models. The "-t 600" options tells MPI how many seconds to wait for a message to be received before assuming that an error has occured the message will never be sent. The ja command is a UNICOS job accounting utility which provides data on CPU time used, memory used, etc. See the ja man page on a UNICOS system for more details.
(f) Save model output (stdout & stderr)
A separate stdout output file, combined with the stderr output, from each component model is compressed and saved to the directory $LOGDIR.