The example Network Queuing System (NQS) shell script shown below is a complete script, suitable for actual use. This script first prepares each component for execution and then executes the components simultaneously. While the specifics shown below are a recommended method for running the coupled system on NCAR's Cray computers, variations on this method will also work, some of which may be more appropriate for different situations.
The coupled model batch job script below is a top-level NQS batch job script which in turn calls five separate subscripts called "setup scripts." These subscripts, "cpl.setup.csh," "atm.setup.csh," "ice.setup.csh," "lnd.setup.csh," and "ocn.setup.csh," are responsible for building their respective executable codes and gathering any required input data files, also see subsections II and III. The setup scripts receive input data from the parent NQS script by means of several environment variables. After calling the setup subscripts, the parent NQS script executes the Coupler and all component models simultaneously as background processes. The parent script waits for these background processes to complete; when they do, the coupled run has completed and the NQS script saves stdout output and terminates. Following the example is a more detailed explanation of what is being done in the various parts of the NQS script.
#======================================================================= # This is a CSM coupled model NQS batch job script #======================================================================= #----------------------------------------------------------------------- # (a) Set NQS options #----------------------------------------------------------------------- # QSUB -q reg # select batch queue # QSUB -lT 5:10:00 -lt 5:00:00 # set CPU time limits # QSUB -lM 35Mw -lm 20Mw # set memory limits # QSUB -mb -me -eo # combine stderr & stout # QSUB -s /bin/csh # select shell script # QSUB # no more QSUB options #----------------------------------------------------------------------- #----------------------------------------------------------------------- # (b) Set env variables available to model setup scripts (below) # CASE, CASESTR, RUNTYPE, ARCH , MAXCPUS, MSGLIB, SSD, # MSS , MSSDIR , MSSRPD , MSSPWD, RPTDIR #----------------------------------------------------------------------- setenv CASE test.00 # case name setenv CASESTR '(CSM test)' # short descriptive text string setenv RUNTYPE initial # run type setenv ARCH CRAY # machine architecture setenv MAXCPUS 8 # max number of CPUs available setenv MSGLIB MPI # message passing library setenv SSD TRUE # is SSD available? setenv MSS TRUE # is MSS available? setenv MSSDIR /DOE/csm/$CASE # MSS directory path name setenv MSSRPD 365 # MSS file retention period setenv MSSPWD 'rosebud' # MSS file password setenv RPTDIR $HOME # where restart pointer file are saved #----------------------------------------------------------------------- # (c) Specify input, output, and execution directories # o the component model setup.csh scripts must be in $NQSDIR # o stdout & stderr output is saved in $LOGDIR #----------------------------------------------------------------------- set EXEDIR = /usr/tmp/doe/$CASE # model runs here set NQSDIR = ~doe/$CASE # model setup scripts are here set LOGDIR = $NQSDIR # stdout output goes here #----------------------------------------------------------------------- # (d) Prepare component models for execution # o create execution directories: $EXEDIR/[atm|cpl|lnd|ice|ocn] # o execute the component model setup scripts located in $NQSDIR # (these scripts have access to env variables set above) # o see the man page for details about the Cray assign function #----------------------------------------------------------------------- setenv FILENV ./.assign # allow separate .assign files for each model set LID = "`date +%y%m%d-%H%M%S`" # create a unique log file ID mkdir -p $EXEDIR cd $EXEDIR foreach model (cpl atm ice lnd ocn) mkdir $EXEDIR/$model cd $EXEDIR/$model $NQSDIR/$model.setup.csh >&! $model.log.$LID || exit 2 end #----------------------------------------------------------------------- # (e) Execute models simultaneously (allocating CPUs) #----------------------------------------------------------------------- rm $TMPDIR/*.$LOGNAME.* # rm any old msg pipe files ja $TMPDIR/jacct # start Cray job accounting cd $EXEDIR/cpl env NCPUS=$MAXCPUS cpl -l 0 -n 5 -t 600 < cpl.parm >>&! cpl.log.$LID & cd $EXEDIR/atm env NCPUS=$MAXCPUS atm -l 1 -t 600 < atm.parm >>&! atm.log.$LID & cd $EXEDIR/ocn env NCPUS=$MAXCPUS ocn -l 2 -t 600 < ocn.parm >>&! ocn.log.$LID & cd $EXEDIR/ice env NCPUS=2 ice -l 3 -t 600 < ice.parm >>&! ice.log.$LID & cd $EXEDIR/lnd env NCPUS=$MAXCPUS lnd -l 4 -t 600 < lnd.parm >>&! lnd.log.$LID & ja -tsclh $TMPDIR/jacct # end Cray job accounting #----------------------------------------------------------------------- # (f) save model output (stdout & stderr) to $LOGDIR #----------------------------------------------------------------------- cd $EXEDIR gzip -v */*.log.$LID cp -p */*.log.$LID* $LOGDIR #======================================================================= # End of nqs shell script #=======================================================================
Items (a) through (f) in the above job script are now reviewed.
(a) Set NQS options
The Network Queuing System (NQS) is a special facility available under UNICOS (Cray's version of UNIX). NQS is a batch job facility: you submit your job to NQS and NQS runs your job. The QSUB options set here select the queue, the maximum memory required, the maximum time required, the combining of the NQS script's stdout and stderr, and the shell to interpret the NQS script. See the qsub man page on a UNICOS computer for more information.
(b) Set environment variables for use in the model setup scripts
While the executing of the Coupler and the component models is explicitly done in this NQS script, the building of executables from source code, the gathering of necessary input data files, and any other pre-execution preparation is deferred to the subscripts "cpl.setup.csh," "atm.setup.csh," "ice.setup.csh," "lnd.setup.csh," and "ocn.setup.csh." The 12 environment variables set in the NQS script may be used by the setup scripts to prepare the respective codes for execution. These environment variables are specifically intended to be used as input to the component model setup scripts - these variables are not intended to be accessed by component model executables. It is strongly suggested that component model binaries do not contain a hard-coded dependence on these environment variables. The environment variables are:
(c) Specify input, output, and execution directories
Here we specify the directory where the model will run, the directory where the model setup scripts are found, and the directory where stdout and stderr output data will be saved when the simulation finishes.
(d) Prepare component models for execution
Here the execution directory and component model subdirectories are created and the Coupler and component model setup scripts are invoked. The purpose of the setup scripts is to build their respective executable codes, document what source code and data files are being used, and gather or create any necessary input data files. It is recommended that each component model have it's own, separate setup script. This natural decomposition of code allows the persons responsible for a given model to create an appropriate setup script for their model without being confused by the details of another model. Setting $FILENV to ./.assign allows each executable to create and use it's own, independent assign file. Assign is a UNICOS specific file I/O utility that may or may not be used by the various executables. See the assign man page on a UNICOS system for more details.
(e) Execute component models simultaneously
In section (d), via the setup scripts, all necessary pre-execution preparations were taken care of. At this point, all models are ready to be run. In this section we execute the Coupler and all component models simultaneously as background processes. Command line environment variables allow one to specify different numbers of CPUs to the different component models. The "-l" command line options are used by the message passing system, MPI, to assign logical process numbers to the component models. The "-t 600" options tell MPI how many seconds to wait for a message to be received before assuming that an error has occured the message will never be sent. The ja command is a UNICOS job accounting utility which provides data on CPU time used, memory used, etc. See the ja man page on a UNICOS system for more details.
(f) Save model output (stdout & stderr)
A separate stdout output file, combined with the stderr output, from each component model
is compressed and saved to the directory $LOGDIR.
Here we describe all the necessary pre-execution preparation of the Coupler. This is
done by a cshell setup script called "cpl.setup.csh," which is called by the parent NQS
batch job shell script described in subsection I. The purpose of the setup script is to build
an executable Coupler code, gather and/or create any necessary data files, and to document
exactly which code and data files are being used in a given simulation.
The Coupler has its own, separate subdirectory, $EXEDIR/cpl, see subsection I, in
which the setup script is run, in which the Coupler executable resides, and in which all the
Coupler's input and output files are kept.
The environment variables
$CASE, $CASESTR, $RUNTYPE, $ARCH, $MSGLIB, $MAXCPUS, $SSD,
$MSS, $MSSDIR, $MSSRPD, $MSSPWD, and $RPTDIR
are input from the parent NQS script and are available for use, see subsection I.
Note that the example NQS script in subsection I makes use of separate and independent
setup subscripts for the Coupler and each component model. While this is not required,
it is a natural decomposition of code, and is highly recommended. Because the Coupler,
or any other component model, has its own setup script, the persons responsible for this code
can edit this file as necessary without being confused by extraneous details about other
component models, and without the danger of inadvertently introducing errors into the
top-level NQS run script or the setup scripts of the other component models.
An example Coupler setup script is given below:
(a) Build an executable
The goal here is to build an executable Coupler binary in the current working directory.
This is done by first identifying a source code directory,
copying all files from that directory,
selecting a resolution-dependent "dims.h" file,
executing a script that creates the necessary include files needed by the makefile,
and invoking a makefile.
The dimensions of the component models (i.e. the number of x and y grid points) must
be known to the Coupler at compile time and are all specified in one
resolution-dependent file: dims.h. Notice how an appropriate dims.h file is selected.
In general, several dims.h.* files may be available for building a Coupler binary
suitable for various component model resolutions.
The details of how to build the executable (eg. preprocessor options and compiler options)
are contained in the standard Makefile.
See the User's Guide subsection IV, or a make man page on any UNIX system for more information.
(b) Document the source code used
Here we make a detailed listing of the source code used, a list of revision control system
(CVS) information, and list of the contents of the current working directory.
This information can be used to identify what Coupler source code, input namelist, etc. were
used in a particular simulation. This is not necessary but is strongly suggested.
(c) Create an input parameter namelist file
Here we are creating a Coupler input parameter namelist file. To create an appropriate
namelist we must know and specify whether the run is an initial run, a continuation run,
a branch run, or a regeneration run. Checking the value of the environment variable
$RUNTYPE gives this information. See the User's Guide section on Coupler Input
Parameters for a complete description of input namelist variables.
The environment variables $CASE, $CASESTR, and $MSSDIR are used to
construct a descriptive text string found in history files
and to create history and restart data path names.
Recall from the discussion in the User's Guide subsection I that these environment
variables are set in the NQS script specifically for use by this, and other, setup scripts.
This particular input namelist was selected just to illustrate how the "cpl.setup.csh"
script works. For a detailed discussion of Coupler input parameters,
along with several example input namelists, see section J.
Each component model has a separate subdirectory,
$EXEDIR/atm, $EXEDIR/ice, $EXEDIR/lnd, or $EXEDIR/ice, see subsection I,
in which the setup script is run, and where the executable and all input and output files are kept.
As in the Coupler setup script, the environment variables
$CASE, $CASESTR, $RUNTYPE, $ARCH, $MSGLIB, $MAXCPUS, $SSD,
$MSS, $MSSDIR, $MSSRPD, $MSSPWD, and $RPTDIR
are set by the parent NQS script and are available for the setup scripts to use, see subsection I.
These environment variables are not intended to be accessed by the component model codes themselves.
It is recommended that the executable codes use input namelist parameters instead of accessing
environment variables.
If the UNICOS specific assign function is used prior to executing a code,
the assigning of I/O units should be done here in the setup script.
Setting environment variable FILENV = ./.assign in the top level run script, see subsection I,
allows each component model to do it's own I/O unit assignments, using it's own .assign file,
without conflicting with the I/O unit assignments of other models.
See the assign man page on a UNICOS system for more details.
The original source code was developed using the CVS revision control system, but only
one "tagged" version of the Coupler is available within any source code distribution.
This information can be used to identify the code contained in a particular distribution.
A user may wish to modify their copy of the Coupler source code.
If one wishes to modify the Coupler source code, it is strongly recommended that
one first study the Pseudo-Code section in this document in conjunction with
studying the source code file "main.F." This should provide a good overview of
how the Coupler works, a necessary prerequisite for successful code modification.
I.II   Preparing the Flux Coupler for a Coupled Run
#=======================================================================
# File name cpl.setup.csh
# Purpose prepares Flux Coupler (aka "cpl") for execution
# Assumptions these env variables have been set by a parent shell:
# CASE, CASESTR, RUNTYPE, ARCH, MSGLIB, MAXCPUS, SSD,
# MSS, MSSDIR, MSSRPD, MSSPWD, RPTDIR
#=======================================================================
#-----------------------------------------------------------------------
# (a) Build an executable
#-----------------------------------------------------------------------
mkdir src
cd src
set SRCDIR = ~/csm/cpl4.0
cp -fp $SRCDIR/* .
cp -fp dims.h.T31_x3 dims.h
Makeprep
make EXEC=cpl4
cd ..
ln -s src/cpl4 cpl
#-----------------------------------------------------------------------
# (b) document the source code used
#-----------------------------------------------------------------------
pwd ; ls -alF src
grep CVS src/*.[hF]
#-----------------------------------------------------------------------
# (c) Create an input parameter namelist file
#-----------------------------------------------------------------------
if ($RUNTYPE == 'initial' ) then
set rest_type = 'initial'
set rest_bfile = '/unused/rest_bfile'
set rest_date = 00010101
else if ($RUNTYPE == 'continue') then
set rest_type = 'continue'
set rest_bfile = '/unused/rest_bfile'
set rest_date = -999
else if ($RUNTYPE == 'branch' ) then
set rest_type = 'branch'
set rest_bfile = '/KAUFF/csm/b003.01/cpl/r0010-01-01'
set rest_date = 00010101
else if ($RUNTYPE == 'regen' ) then
set rest_type = 'regen'
set rest_bfile = "$MSSDIR/cpl/r1999-01-01"
set rest_date = 19990101
else
echo 'unknown RUNTYPE = ' $RUNTYPE ; exit -1
endif
cat >! cpl.parm << EOF
&cpl_parm
case_name = '$CASE '
case_desc = '$CASESTR '
rest_type = '$rest_type '
rest_dir = '$MSSDIR/cpl/ '
rest_pfile = '$RPTDIR/cpl.$CASE.rpointer '
rest_bfile = '$rest_bfile '
rest_date = $rest_date
rest_freq = 'monthly'
stop_option = 'newyear'
hist_freq = 'monthly'
/
EOF
cat cpl.parm
#=======================================================================
# End of Coupler setup shell script
#=======================================================================
Items (a) through (c) in the above Coupler setup script are now reviewed.
I.III   Preparing Component Models for a Coupled Run
Preparing the component models (atmosphere, ice, land, ocean) for a coupled run
is analogous to preparing the Coupler, see subsection II.
Such preparation is done in the component model setup scripts
"atm.setup.csh," "ice.setup.csh," "lnd.setup.csh," and "ocn.setup.csh,"
which are called by the parent NQS batch job shell script described in subsection I.
The purpose of the setup scripts is to build their respective executable codes,
document what source code and data files are being used,
and gather or create any necessary input data files.
I.IV   Source Code Maintenance
The distribution FORTRAN source code for the Flux Coupler comes with a Makefile
which uses the Cray f90 pre-processor, compiler, and loader to create
object files and link them into an executable code. Makefiles tend to be
very site-specific, so the user is expected to modify the Makefile as necessary
if the Cray f90 system is not available.
This page is maintained by kauff @ ucar.edu