next up previous contents
Next: 6 Testing the CCSM Up: UsersGuide Previous: 4 Building the CCSM   Contents

Subsections

5 Running the CCSM

This section will describe some of the practical aspects of setting up a short test run or a longer production run. The best way to get scripts configured for either a test or production run is to use the ccsm gui, described elsewhere. This section will provide some guidance about how to set up a configuration manually.

5.1 Start up

There are a number of steps required to configure a CCSM run manually.

5.1.1 Running a simple test case of the fully coupled model on the NCAR IBM machine, blackforest

Assume that the new case name is mytest1 and the root directory where the CCSM model is located in /home/$LOGNAME/ccsm2.0/ .

5.1.2 Changing the configuration

To modify the model configuration, several changes must be made to the main run script. Assume a model case name of mytest2 for this case. Go through the section above using mytest2 as the case name. Then in addition to the changes in the section above, modify the main script, mytest2.run, as follows:

As an aside, there is a standard shorthand naming convention for configuration setups. In particular,

This case naming convention is used in the gui, and the tested configurations are summarized in the section describing supported configurations.

5.1.3 Changing the RUNTYPE

RUNTYPE is set in the main run script and determines how the CCSM run is to be started. RUNTYPE can be "startup", "continue", "branch", or "hybrid".

startup represents a new case started from some model specific initial files or state.
continue is a continuation of a case and guarantees exact restart capability.
branch is like a continuation run, but the CASE name is changed. A set of restart files is used to start a branch run and exact restart is guaranteed if source code or model input hasn't changed. Typically, however, the purpose of a branch run is the evaluate the impact of a modification of the model. The exact restart guarantee ensures that any differences between the original run and the branch run are due to the modification introduced to the branch run.
hybrid is a startup from atmosphere and land initial condition files and ocean and ice restart files. The model is started as if it were a startup case with a 1 day lag in the start of the ocean model. An exact restart is not guaranteed due to the atmosphere and land models using ``initial'' files and because of the ocean time lag


For startup and continue runs, no specific script changes are required. For continue runs, the appropriate restart files must be placed in the executable directories. The scripts attempt to do this for the case automatically by searching directories for restart files. For branch runs, the environment variables REFCASE and REFDATE must be set to the name of the previous case and date in the REFCASE run where the new branch run will started from. Those restart files must be available to the new case. For hybrid runs, the environment variables REFCASE, REFDATE, and BASEDATE must be set in the main script. Those represent the prior case and date and the new starting date for this case. Hybrid runs allow a change to both case and starting date.

The RUNTYPES startup, branch, and hybrid are all used to start a new case. The RUNTYPE continue is used to continue any run, no matter what initial RUNTYPE was used.

5.1.4 Running on a new machine

The CCSM release is targeted at the NCAR IBM SP. Due to the distinct nature of individual computer sites, many aspects of the CCSM scripts may need to be changed when running on a new machine. These include

General guidance about specific aspects of the scripts that need to be changed in order to run on different machines can be found in the scripts/tools/test.a1.mods.* files. The * represents the hostname of various machines where the model has already been run. Each of the files contains the machine-specific modification that are required.

5.1.5 Getting into production

There is a specific procedure that should be used to start a production run. The model defaults are set to run a short test case. The general procedures for starting a production run are as follows.

5.2 What is the ccsm_joe file?

The ccsm_joe (Job Operating Environment) file is created by the main CCSM run script every time it executes. It can be thought of as a case specific resource file for CCSM. The ccsm_joe file contains a summary of some of the CCSM-specific environment variables. It is a useful debugging tool as it summarizes many of the important variables in the latest run.

It is also used by other scripts to determine the case specific variables. The CCSM harvester ($CASE.har), archiver (ccsm_archive), and a number of the scripts in the CCSM tools directory ($SCRIPTS/tools) use this file to set case specific variables.

5.3 How does auto-RESUBMIT work?

The CCSM model will automatically resubmit the main run script if the integer parameter in the file $SCRIPTS/RESUBMIT is greater than zero. In section (h) of the main run script, the script captures the value in the RESUBMIT file, decides whether to resubmit and if so, decrements the integer in the RESUBMIT file. When the resubmit parameter decrements to zero, auto resubmission will stop. This provides flexibility to the user to prevent runaway jobs. Initially, users should set the RESUBMIT integer to some moderate value, like 2. Once confidence has been established, the integer in RESUBMIT can be increased. The default value is zero, so the script will not RESUBMIT automatically by default.

When using RESUBMIT, RUNTYPE should usually be continue; otherwise the same initial period of the run will likely be run over and over.

5.3.1 Runaway jobs

Occasionally, the model will stop prematurely (due to a hardware problem or a model problem). If this happens, often the scripts will continue to resubmit themselves and this will usually lead to a ``runaway'' situation. To stop runaway jobs, first set the resubmit parameter to zero in the RESUBMIT file, then try to kill currently active runaway jobs.

5.4 Batch queuing challenges

Generally, each machine at each site has a unique batch queueing environment. This is less true for IBM machines which seem to use loadleveler nearly universally. Even with loadleveler, however, different sites have different loadleveler configurations. In particular, users may need to change the network parameter and class. Certainly with any queueing systems, users need to be aware of queue names, processor resources, and time limits.

CCSM has been run under loadleveler, NQS, LSF, and PBS on various machines. In the $TOOLS directory are files named test.a1.mod.*. These files provide guidance about both hardware and batch setups for specific machines.

The default values in the CCSM script provide batch commands for loadleveler, NQS, and PBS batch systems. The file $TOOLS/test.a1.mods.nirvana provides guidance for an LSF queueing system.

Users need to be careful to implement appropriate changes to the CCSM scripts for their particular environment.

NOTE: Both the main run script and the harvester script are setup to run in batch environments and both need to be modified when using non-NCAR batch queuing systems.

5.5 Modifying source code

Source code is provided with the CCSM release. It is not unreasonable to make code changes directly in the files within the CCSM models directories and then rebuild and run the model.

However, the recommended approach is to create directories named src.atm, src.lnd, src.ice, src.ocn, and src.cpl in the directory where the case-specific scripts reside, $SCRIPTS/$CASE. Users can then copy source code files from the main CCSM models directories into these component-specific directories and then modify those files directly. By default, the CCSM scripts and build environment will use files in the src.* directories before files in the models source directories. In other words, source code in src.* directories has higher priority than source code in the models directories.

There are a number of benefits to doing this. First, it preserves the release source code so that differences can be carried out later and users can backtrack to the generic release source if desired. Second, it allows users to configure multiple experiments, each with each experiment having its own unique source code changes without requiring copies of the entire source tree. For instance, a sensitivity study could be carried out on a source code specified parameter by just copying one source code file into the src.* directory for a number of cases and then modifying that file in the scr.* directory.

5.6 CCSM Data Management

This section briefly outlines the data flow for a production run. All binaries, model input, and model output exist at some point in specific directories in the executable area. The scripts generate the model binaries and get model input. The model then executes, and model output (restart files, history files, and log files) are written directly into the executable directories. Once the model has completed execution, the script ccsm_archive is run and the model output is moved into the archival directory. Subdirectories are created in the archive directory for each component as well as a restart and restart.tars directory. The most recent set of restart files, including pointer files, are copied into the restart subdirectory of the archive directory. That directory is tarred up and copied to the restart.tars directory.

Once model output has been archived, the harvester executes, checkin files in the archive directory. If the files have successfully been moved to the mass store previously, the copy of the file in the archive directory is removed. If that file has not been copied to the mass store previously, the harvester performs that operation. In effect, the harvester must pass through the model output files twice in the archive area before removing them. On the first pass, the file is copied to the mass store, and on the second pass the existence of the mass store copy is verified and the local copy of the file is removed.

5.7 What is harvesting doing?

The CCSM harvester is a separate job that saves the model output to a separate file storage device, generally known as a mass storage system. This could be the NCAR mass store, an hpss, or similar. The overall CCSM2.0 data flow is described separately. A part of the data flow is moving model output from the executable directories to a temporary archive directory by a script called ccsm_archive. The harvester is then used to move data from the archive directory to the mass storage system.

The harvester script can be found in the main $SCRIPTS/$CASE directory and is usually named $CASE.har. The harvester can be run interactively or in batch mode. The harvester provided should be considered a template for customizing the harvesting process. Each user and each site may have different needs for harvesting. The harvesting script largely takes advantage of scripts in the $SCRIPTS/tools directory.

Overall, the harvester works as follows:

The harvester takes advantage of the local copy of ccsm_joe to determine many of the case specific variables. A number of harvester variables are set in the main run script including $LMSOUT, $MACOUT, and $RFSOUT.

5.8 Monitoring the integration

Several steps must be taken to monitor the integration. These include monitoring the model run, archiving, harvesting, and disk quotas.

5.9 Data processing

All history output files are netCDF files and conform to the CF netCDF metadata convention. Many tools for processing, analyzing and visualizing netCDF files can be found on the netCDF web site: www.unidata.ucar.edu/netcdf.

5.10 Comparing output to NCAR controls

NCAR control output is normally available on the NCAR mass storage system. Check the CCSM web page www.cesm.ucar.edu for the latest information on model output availability.


next up previous contents
Next: 6 Testing the CCSM Up: UsersGuide Previous: 4 Building the CCSM   Contents
csm@ucar.edu