To run a case, the user must submit the batch script $CASE.$MACH.run. In addition, the user needs to also modify env_run.xml for their particular needs.
env_run.xml contains variables which may be modified during
the course of a model run. These variables comprise coupler namelist
settings for the model stop time, model restart frequency, coupler
history frequency and a flag to determine if the run should be flagged
as a continuation run. In general, the user needs to only set the
variables $STOP_OPTION
and $STOP_N
. The
other coupler settings will then be given consistent and reasonable
default values. These default settings guarantee that restart files
are produced at the end of the model run.
As mentioned above, variables that control runtime settings are found in env_run.xml. In the following, we focus on the handling of run control (e.g. length of run, continuing a run) and output data. We also give a more detailed description of CCSM restarts.
Before a job is submitted to the batch system, the user needs first check that the batch submission lines in $CASE.$MACH.run are appropriate. These lines should be checked and modified accordingly for appropriate account numbers, time limits, and stdout/stderr file names. The user should then modify env_run.xml to determine the key run-time settings, as outlined below:
CONTINUE_RUN
Determines if the run is a restart run. Set to FALSE when initializing a startup, branch or hybrid case. Set to TRUE when continuing a run. (logical)
When you first begin a branch, hybrid or startup run, CONTINUE_RUN must be set to FALSE. When you successfully run and get a restart file, you will need to change CONTINUE_RUN to TRUE for the remainder of your run. Details of performing model restarts are provided below.
RESUBMIT
Enables the model to automatically resubmit a new run. To get multiple runs, set RESUBMIT greater than 0, then RESUBMIT will be decremented and the case will be resubmitted. The case will stop automatically resubmitting when the RESUBMIT value reaches 0.
Long CCSM runs can easily outstrip supercomputer queue time limits. For this reason, a case is usually run as a series of jobs, each restarting where the previous finished.
STOP_OPTION
Ending simulation time.
Valid values are: [none, never, nsteps, nstep, nseconds, nsecond, nminutes, nminute, nhours, nhour, ndays, nday, nmonths, nmonth, nyears, nyear, date, ifdays0, end] (char)
STOP_N
Provides a numerical count for $STOP_OPTION. (integer)
STOP_DATE
Alternative yyyymmdd date option, negative value implies off. (integer)
REST_OPTION
Restart write interval.
Valid values are: [none, never, nsteps, nstep, nseconds, nsecond, nminutes, nminute, nhours, nhour, ndays, nday, nmonths, nmonth, nyears, nyear, date, ifdays0, end] (char)
Alternative yyyymmdd date option, negative value implies off. (integer)
REST_N
Number of intervals to write a restart. (integer)
REST_DATE
Model date to write restart, yyyymmdd
STOP_DATE
Alternative yyyymmdd date option, negative value implies off. (integer)
By default,
STOP_OPTION = ndays STOP_N = 5 STOP_DATE = -999 |
The default setting is only appropriate for initial testing. Before a
longer run is started, update the stop times based on the case
throughput and batch queue limits. For example, if the model runs 5
model years/day, set RESUBMIT
=30, STOP_OPTION
= nyears,
and STOP_N
= 5.
The model will then run in five year increments, and stop after 30
submissions.
Each CCSM component produces its own output datasets consisting of history, restart and output log files. Component history files are in netCDF format whereas component restart files may be in netCDF or binary format and are used to either exactly restart the model or to serve as initial conditions for other model cases.
Archiving is a phase of a CCSM
model run where the generated output
data is moved from $RUNDIR (normally $EXEROOT/run) to a local disk area (short-term archiving)
and subsequently to a long-term storage system (long-term
archiving). It has no impact on the production run except to clean up
disk space and help manage user quotas. Short and long-term archiving
environment variables are set in the env_mach_specific
file. Although short-term and long-term archiving are implemented
independently in the scripts, there is a dependence between the two
since the short-term archiver must be turned on in order for the
long-term archiver to be activated. In env_run.xml, several
variables control the behavior of short and long-term
archiving. These are described below.
LOGDIR
Extra copies of the component log files will be saved here.
DOUT_S
If TRUE, short term archiving will be turned on.
DOUT_S_ROOT
Root directory for short term archiving. This directory must be visible to compute nodes.
DOUT_S_SAVE_INT_REST_FILES
If TRUE, perform short term archiving on all interim restart files, not just those at the end of the run. By default, this value is FALSE. This is for expert users ONLY and requires expert knowledge. We will not document this further in this guide.
DOUT_L_MS
If TRUE, perform long-term archiving on the output data.
DOUT_L_MSROOT
Root directory on mass store system for long-term data archives.
DOUT_L_HTAR
If true, DOUT_L_HTAR the long-term archiver will store history data in annual tar files.
DOUT_L_RCP
If TRUE, long-term archiving is done via the rcp command (this is not currently supported).
DOUT_L_RCP_ROOT
Root directory for long-term archiving on rcp remote machine. (this is not currently supported).
Several important points need to be made about archiving:
By default, short-term archiving is enabled and long-term archiving is disabled.
All output data is initially written to $RUNDIR.
Unless a user explicitly turns off short-term archiving, files will be moved to $DOUT_S_ROOT at the end of a successful model run.
If long-term archiving is enabled, files will be
moved to $DOUT_L_MSROOT
by
$CASE.$MACH.l_archive, which is run as a
separate batch job after the successful completion of a model run.
Users should generally turn off short term-archiving when developing
new CCSM
code.
If long-term archiving is not enabled, users must monitor quotas and usage in the $DOUT_S_ROOT/ directory and should manually clean up these areas on a frequent basis.
Standard output generated from each CCSM
component is saved in a "log
file" for each component in $RUNDIR
.
Each time the model is run, a
single coordinated datestamp is incorporated in the filenames of all
output log files associated with that run. This common datestamp is
generated by the run script and is of the form YYMMDD-hhmmss, where
YYMMDD are the Year, Month, Day and hhmmss are the hour, minute and
second that the run began (e.g. ocn.log.040526-082714). Log files are
also copied to a user specified directory using the variable $LOGDIR
in env_run.xml. The default is a 'logs' subdirectory beneath the
case directory.
By default, each component also periodically writes history files (usually monthly) in netCDF format and also writes netCDF or binary restart files in the $RUNDIR directory. The history and log files are controlled independently by each component. History output control (i.e. output fields and frequency) is set in the Buildconf/$component.buildnml.csh files.
The raw history data does not lend itself well to easy time-series analysis. For example, CAM writes one or more large netCDF history file(s) at each requested output period. While this behavior is optimal for model execution, it makes it difficult to analyze time series of individual variables without having to access the entire data volume. Thus, the raw data from major model integrations is usually postprocessed into more user-friendly configurations, such as single files containing long time-series of each output fields, and made available to the community.
As an example, for the following example settings
DOUT_S = TRUE DOUT_S_ROOT = /ptmp/$user/archive DOUT_L_MS = TRUE DOUT_L_MSROOT /USER/csm/b40.B2000 |
the run will automatically submit the $CASE.$MACH.l_archive to the queue upon its completion to archive the data. The system is not bulletproof, and the user will want to verify at regular intervals that the archived data is complete, particularly during long running jobs.