After making changes to the model it's critical to test that the model still works as it did without the changes. We've provided a mechanism for doing basic system testing on model changes using the same infrastructure as given in Chapter 2.
The script "test-model.pl" in the "cam1/models/atm/cam/test/system" directory runs a suite of basic testing for the model. This is designed for when you are changing the model code, or porting to another machine. It provides a good check that the basic functionality of the model still works despite the changes the user has introduced. Although, the user could run these tests on their own, the script provides a nice easy to use, interface to basic testing. It also has the advantage that it was designed for testing so it does look for likely "gotcha's" that could occur with code changes. In this section we will go over how to use "test-model.pl" for basic acceptance testing. First, we go over the command-line arguments that can be given to "test-model.pl", next we discuss some of the environment variables that are useful to use with "test-model.pl", and last we discuss how to "test-model.pl" in machine batch ques. The discussion of command-line arguments includes how to change resolutions and how to run at different labs. The discussion on environment variables includes how to over-ride the default behavior with environment variables so that you can control such things as number of CPU's and nodes for the parallel decomposition.
"test-model.pl" is built around the build/run scripts in the "cam1/models/atm/cam/bld" directory. All of the configuration, build, and run steps are taken from the scripts in "cam1/models/atm/cam/bld". As such many of the details on the usage of "test-model.pl" are covered in the section on "run-model.pl" 2.1.2.
In this section we go over the simplest most straight forward usage of "test-model.pl". This includes interactive use at NCAR, interactive use at other cooperating LABS, and batch submission.
Interactive usage at NCAR:
The basic operation of "test-model.pl" for interactive submission is straight-forward.
On a NCAR machine the user need only log onto the machine they wish to test on and type:
cd cam1/models/atm/cam/test/system/
test-model.pl
Interactive use at other cooperating LABS:
Running interactively at a cooperating LAB (dao, ornl, llnl, or nersc) requires only
slightly more effort. The requirements here, are only to designate which LAB you are
running at, and the directory location of the model datasets. For example, on the
NERSC machine "gseaborg" I might do the following (using csh).
setenv LAB "nersc"
setenv CSMDATA /u4/erik/data
cd cam1/models/atm/cam/test/system/
test-model.pl
By putting the above environment variables in my ".cshrc" the settings will be established every time I login to the given machine. Then I can run "test-model.pl" as above without having to remember special options or having to remember to set these variables each time. For information on how to obtain the model datasets see Section 2.1.1.
Batch usage:
Batch scripts that execute "test-model.pl" have been setup for each lab. These are typically
named: $LAB_batch.csh. So for example, there are batch scripts named: llnl_compass.csh,
nersc_batch.csh, llnl_blue.csh, ncar_batch.csh, and ornl_batch.csh. The batch scripts
are setup with specific batch commands, and possibly some environment variable settings,
then the script merely runs test-model.pl with both the "-clean" and "-nofail" options.
The "-clean" option makes sure object files are rebuild by cleaning out the build directory
and "-nofail" will run the entire script even if it fails. Then look at the "test.$arch.log"
file to see if the script successfully completed all of the tests. To submit the script
to batch, follow the directions given at the top of the specific batch script. For example,
the "ornl_batch.csh" is submitted with:
env SCRIPT_DIR=`pwd` llsubmit ornl_batch.csh
"test-model.pl" is designed such that the common settings the user might want to control can be set either by command line arguments or by environment variables that are set before the user runs the script. In some cases, the user can control the script by either a command-line argument -- or a environment variable as up to the user. In this section we go over the possible command line arguments to the script. We include discussion on how to change the resolution for the tests, how to run at different labs, how to do a port-validation against a trusted machine, how to have test-model.pl compare to a control code library, and how to control the selection of the tests that are performed.
The "-help" option to "test-model.pl" lists all of the possible command-line options.
Set CSMDATA to /fs/cgd/csm/inputdata
process_args:: Process the input arguments
Usage: perl test-model.pl [options]
Options are:
-lab = Set the lab you are running at
(of ncar ornl dao nersc llnl default) [ncar]
(Also set by setting the env variable LAB)
-help = Help (this message, also lists tests performed)
-noclean = Don't clean the old directories out.
-nofail = Continue even if errors are found
-resume = Continue a previous run of the script at the point it left off at
-errgro mach(plat) = List the remote machine and platform to use as the
baseline for error-growth tests (ie. -e "babyblue.ucar.edu(aix)" )
(list of valid platforms are: dec_osf linux solaris irix aix)
(tests must be in the default location on the remote machine)
-skip dy:#:res = Skip to given dynamics and test # (or range of numbers)
(example -skip sld:9 start with sld dynamics test no. 9)
(or -skip fv:2-4 start with fv dynamics test no. 2 and do up to test 4)
(or -skip all:2 start at tests 2 for all dynamics)
(or -skip eul:2-4:T42L26 do tests 2 through 4 for eul at T42 with 26 levs)
-compare cam-root-dir = Compare to given version of the model in this directory
This is the root directory of the cam1 directory tree
(Example -compare /home/erik/cam1)
(Also set by setting the env variable CONT_ROOTDIR)
Batch operation: To submit to batch you need to create and/or edit the
submission information at the top of a script. This directory contains
simple batch submission scripts for each lab that can be used for this
purpose. Most likely the que name and possibly the number of nodes might
need to be changed. You need to edit either the PBS, NQS, or Loadleveler
sections, depending on which queuing system your machine supports.
To specify non-standard options for batch operation edit the execution
statement in the batch script for test-model.pl.
submit to batch as either
env SCRIPT_DIR=`pwd` qsub ncar_batch.csh
env SCRIPT_DIR=`pwd` llsubmit ncar_batch.csh
List of tests that are performed:
for each dynamics at these resolutions:
eul: Horizontal Resolution: T21 # of vertical levels: 26
fv: Horizontal Resolution: 4x5 # of vertical levels: 18
sld: Horizontal Resolution: T21 # of vertical levels: 26
01_initial_debug_run_SPMD
02_initial_debug_run_nonSPMD
03_initial
04_restart
05_initial_compare_to_restart
06_control_nonpert
07_error_growth_adiabatic
08_error_growth_adiabatic_pert
09_error_growth_full_physics
10_error_growth_full_physics_pert
11_control_adiabatic
12_control_full_physics
13_somT42 only eul dynamics
14_366phys_and_lsm only eul dynamics
Terminating at CAM_test.pm line 341.
Table 2.2: Commandline arguments to test-model.pl
Option | Description | -lab LAB | Sets the lab you are running at (ncar,dao,ornl,llnl,or nersc). |
---|---|
-help | The help message that lists the usage, as well as informaton about the test that will be performed. |
-noclean | Don't clean out the directories before building. By default directories will be cleaned and complete build of the whole system will be done. |
-nofail | Don't stop if an error is encountered, try to run the other tests as well. |
-resume | Restart the script at the point where it was before it failed. |
-errgro "machine-domain-name(platform)" | Calculate the error-growth relative to the given trusted machine. |
-skip dyn:tests:resolution | Skip some tests, only running the tests that are specified. Also, used to set the resolution that tests will be performed at. |
-compare control-source-directory | Compare this code to the given control source code. |
test-model.pl is setup to run with usable defaults at the following labs: ncar, nersc, dao, llnl, and ornl. The defaults settings used are those given in the "CAM_lab.pm" perl object. When running at a different lab you will have to either use the "-lab" command line option or set the environment variable "LAB" to use the defaults for that lab. Platform specific settings are set by the machine you are running test-model.pl on.
Besides LAB the other required environment variable setting is CSMDATA the location of the input datasets. At NCAR CCSM input datasets are stored on a NFS mounted disk on all machines at "/fs/cgd/csm/inputdata". CAM developers who just need the datasets to run CAM stand-alone can get a copy from the "CAM Developers page" at:
http://www.cgd.ucar.edu/~cam/cam_data.shtml
The file is a GNU zipped tar file. To unpack it:
gunzip cam1.7.scidac-atm.datasets.tar.gz tar xvf cam1.7.scidac-atm.datasetsThe directory created is called "inputdata" and the directory structure is as follows:
inputdata inputdata/atm ------------------- CCSM Atmosphere component datasets inputdata/atm/cam1 -------------- CAM component datasets inputdata/atm/cam1/inic --------- Initial condition datasets inputdata/atm/cam1/inic/gaus ---- Gaussian initial condition datasets inputdata/atm/cam1/inic/fv ------ Finite-volume dy-core initial condition datasets inputdata/atm/cam1/inic/sld ----- Semi-Lagrange dy-core initial condition datasets inputdata/atm/cam1/ggas --------- Greenhouse gas datasets inputdata/atm/cam1/sst ---------- Sea-Surface temperature datasets. inputdata/atm/cam1/ozone -------- Ozone datasets. inputdata/atm/cam1/hrtopo ------- High resolution topography (used when interpolating datasets) inputdata/atm/cam1/rad ---------- Radiation datasets inputdata/lnd ------------------- CCSM Land component datasets inputdata/lnd/clm2 -------------- CLM2 component datasets inputdata/lnd/clm2/inidat ------- CLM2 initial condition datasets inputdata/lnd/clm2/inidat/cam --- CLM2 initial condition datasets for use with CAM inputdata/lnd/clm2/srfdat ------- CLM2 time invariant surface description datasets inputdata/lnd/clm2/srfdat/cam --- CLM2 surface description datasets for use with CAMNote: The clm2 datasets do not include the "mksrfdat" and "rawdata" directories and datasets required to create new surface datasets at different resolutions. If you need to create new surface datasets at resolutions not provided in the "inidat" and "srfdat" directories you will need to obtain the datasets at NCAR in the "/fs/cgd/csm/inputdata/lnd/clm2/mksrfdat" directory.
An important feature of test-model.pl is the ability to compare to a previous program library. This is useful in order to ensure that the changes you are making do not change answers compared to the previous version. The default way of running "test-model.pl" without command line options does not do a comparison to another version (although if comparisons have been done and files still exist it will run comparisons -- this will be discussed later). Using the command line option "-compare"you can compare to a previous program library by giving the full path to the root of the library to compare to. For example, if my test library is checked out under "/fs/cgd/data0/erik/test_cam1" and I am in the test directory "/fs/cgd/data0/erik_test_cam1/models/atm/cam1/test/system" and I want to compare the "test_cam1" directory with the base model I've checked out under "/fs/cgd/data0/erik/base_cam1" then I use "-c" as follows:
cd /fs/cgd/data0/erik/test_cam1/models/atm/cam1/test/system test-model.pl -compare /fs/cgd/data0/erik/base_cam1When the "-compare" option is used, tests 6, 11, and 12 are done (see the list of tests in section 4.1.2 above). If "-compare" is not used these tests are not done. Test 6 is compared to test 5, and test 11 is compared to test 7, and test 12 is compared to test 9. If all three of these comparisons are identical -- at the end of the model run the model is identified as being bit-for-bit with the control library. This is reported at the end of test-model.pl (and in the log file) as follows:
If the comparison shows that one of the above comparisons is not identical the difference is reported as not identically zero with the following at the end of test-model.pl:
Many times differences to a control library are intended to be bit-for-bit. As explained above simply by using the "-compare" option test-model.pl can easily identify if two model libraries give identical answers. However, it is more difficult to verify if changes are within machine roundoff. Using the "-compare" option and plotting both the comparison and the error-growth tests the user can verify if the changes are within roundoff. Typically this is done by plotting the root-mean-square (RMS) differences of temperature for the control-comparison and the error growth. The results of the "*.cprout" files already have the RMS differences reported for each field. A simple way to extract the RMS differences of temperature is using the UNIX command "grep" as follows:
grep " RMS T " *.cproutA program to plot up the cprout files is included in the "cam1/models/atm/cam/bld" directory, called "graphgrowth.csh". This c-shell script uses "xgraph" to plot up any of the "*.cprout" files created from running test-model.pl. The script has a simple prompt interface asking the user the following questions.
The figures below show plots for differences that are within roundoff and differences that are not within roundoff and hence answer changing. When the line for the comparison is below or very near the relevant error-growth curve (relevant adiabatic or full-physics), the differences are within roundoff. If the comparison curve is higher than the error-growth the differences are greater than roundoff.
A summary of the tests performed is shown above under section 4.1.2, under the output given when the user uses the "-help" option. To do a subset of the tests that are done by default use the "-skip" option. "-skip" allows you to pick the dynamics and either a range of tests to perform or a specific test to start at and continue from.
To change resolution use the "-skip" option to specify the dynamics to use, tests to run, and the horizonal and vertical resolution to run the selected tests at. For example, to run tests 1 through 14 or test-model.pl with Eulerian dynamics at T42 resolution with 26 levels you would do the following.
test-model.pl -skip eul:1-14:T42L26
Part of the default tests that "test-model.pl" runs are error-growth tests on the machine the user is running. Error growth tests run a simulation, and then compare this simulation with a different simulation where the initial conditions are randomly perturbed by an amount equal to machine roundoff. The nature of the model's equation is such that these two simulations will continue to diverge until the model states are completely different. Since, the process of this divergence ("error growth") takes simulation time to achieve we can use this as a benchmark to find problems in the model. We can also use this "error-growth" process to validate that simulations performed on one machine will give climate simular to another. To do this we compare the non-perturbed simulations on the two machines to the error-growth of the trusted machine.
To do this with test-model.pl we use the "-errgro" option. The "-errgro" option expects a machine domain name followed by the platform type in parenthesis. So for example to compare error growth on the machine I am running on to the NCAR machine "blackforest.ucar.edu" which is a AIX IBM machine (so type "aix"). I would need to do the following:
On Blackforest
test-model.pl
On the machine to test
test-model.pl -errgro "blackforest.ucar.edu(aix)"
Before running with the "errgro" option you need to run the error-growth tests on the trusted machine (instead of running the whole suite, you use the "-skip" option to just run the error-growth tests). The files from that machine will then be copied from the remote machine using "scp". You will need to either have scp setup so that passwords are not required between the two machines in question, or will need to enter a password interactively. Hence, you may not be able to use this option in batch mode.
Table 2.3: List of platform names to use
Platform | Name to use |
---|---|
IBM | aix |
Sun-OS | sunos |
SGI | sgi |
Compaq-alpha | dec_osf |
Linux | linux |
List of variables that can be set before running "test-model.pl". Note, all of these other than "CONT_SRCDIR" are referenced in Table 2.3 and Table 2.4. CONT_SRCDIR is a environment variable only used with test-model.pl.
In the "cam1/models/atm/cam/test/system" directory there are several sample batch scripts for use at various specific labs or machines.
env SCRIPT_DIR=`pwd` llsubmit ncar_batch.csh
The "-help" option lists the specific tests that are perofrmed. See Section 4.1.2 for the list of specific tests. Here is a list of what the bottom-line of these tests are:
Questions on these pages can be sent to... erik@ucar.edu .