Next: Appendix A: Testing Terminology Up: dev_guide Previous: 12 Component Model Testing, Contents

Subsections

13.1 Model Testing Procedures for the CCSM
- 13.1.1 Development Testing Steps
- 13.1.2 Ongoing Test Steps
13.2 Model Validation Procedures for the CCSM
13.3 Port Validation of the CCSM

13 System Testing and Validation

Regular system testing and validation of the CCSM is required to ensure that model quality and integrity is maintained throughout the development process. This section establishes the system testing standards and the procedures that will be used to verify the standards have been met. It is assumed that component model development teams have unit tested their component prior to making it available for system testing. See section ) for more information on testing of individual components and unit-testing of individual subroutine and modules within components.

There are two general categories of model evaluations: frequent short test runs and infrequent long validation integrations.

Model testing refers to short (3 to 31 day) model runs designed to verify that the underlying mechanics and performance of the coupled model continues to meet specifications. This includes verifying that the model actually starts up and runs, benchmarking model performance and relative speed/cost of each model component as well as checking that the model restarts exactly. These tests are done on each of the target platforms. Model testing does not address whether the model answer is correct, it merely verifies that it mechanically operates as specified

Model validation involves longer (at least 1 year) integrations to ensure that the model results are in acceptable agreement with both previous model climate statistics and observed characteristics of the real climate system. Model validation occurs with each minor CCSM version (i.e. CCSM2.1, CCSM2.2) or at the request of the CCSM scientists and working groups. Once requested, model validation is only carried out after CCSM scientists have been consulted and the model testing phase is successfully completed. The model validation results are documented on a publicly assessable web page
(http://www.cesm.ucar.edu/models/ccsm2.0beta/testing/status.html).

Port validation is defined as verification that the differences between two otherwise identical model simulations obtained on different machines or using different environments are caused by machine roundoff errors only.

13.1 Model Testing Procedures for the CCSM

Formal testing of the CCSM is required for each tagged version of the model. The CCSM quality assurance lead is responsible for ensuring that these tests are run, either by personally doing it or having them run by a qualified person. If a model component is identified as having a problem, the liaison for that component is expected to make resolving that problem their highest priority. The results of the testing and benchmarking will be included in the tagged model to document the run characteristics of the model. The actual testing and analysis scripts will be part of the CCSM CVS repository to encourage use by outside users.

13.1.1 Development Testing Steps

Successful build CCSM shall compile on each of the target platforms with no changes to the scripts, codes or datasets.
Successful startup CCSM will start from an initial state and run for 10 days.
Successful restart CCSM will start from an initial state and halt after 5 days, then restart and run from day 6 to day 10.
Successful branch CCSM will start from an initial state and halt after 5 days, then carry out a branch case with only a case name change and run from day 6 to day 10.
Exact restart A bit-for-bit match must occur between the 10-day initial run and the restart run and branch runs using the same number of processors.
Signal trapping A signal trapping test should be conducted with the environment variable DEBUG set to true in the Makefile.
Other diagnostics A diagnostic test will be performed with info_dbug set to level 2 in the coupler input.
Port diagnostics A port diagnostic test will be performed with info_bcheck set to level 3 in the coupler input and a 10 day run will be carried out.
Performance benchmarking The total CPU time, memory usage, output volume, GAU cost, disk space use and wall clock time for the 10 day run will be recorded. The relative cost of each component will also be recorded.
Test report The results of all steps above are to be documented in a test report with emphasis on results, comparisons to the previous test and recommendations for improvements. Any faults or defects observed shall be noted and must be brought to the attention of the liaison responsible for that component and the software engineering manager.

13.1.2 Ongoing Test Steps

Smoke-test A major criteria used in evaluating the effectiveness of a test procedure is the length of time which has lapsed since the last time the system was tested. To test for system or software changes, an automated six day test run will be made each weekend with the latest CCSM distribution on each of the supported platforms. A restart test will conducted on the first weekend of each month.
Test report The results of steps 1 will be automatically documented in a test report.

13.2 Model Validation Procedures for the CCSM

Model Validation occurs with each Minor CCSM version (i.e. CCSM2.1, CCSM2.2) or at the request of the CCSM scientists and working groups. Before starting a validation run, the CCSM Quality Assurance Lead will consult with the CCSM scientists to design the validation experiment.

Pre-Validation Run Steps:

Tests successfully The validation will successfully complete the testing steps outlined above.
Scientist sign-on The CCSM scientists must agree to make themselves available to informally analyze the results of the run during the run and formally review the results within one week of the completion of the run.

Validation Steps:

Comparison with previous model runs Result agrees with previous model runs
Comparison with observed climate Result agrees with observed climate

13.3 Port Validation of the CCSM

13.3.1 Background

Port validation is defined as verification that the differences between two otherwise identical model simulations obtained on different machines or using different environments are caused by machine roundoff errors only. Roundoff errors can be caused by using two machines with different internal floating point representation, or by using a different number of processing elements on the same machine which may cause a known re-ordering of some calculations, or by using different compiler versions or options (on a single machine or different machines) which parse internal computations differently.

The following paper offers a primary reference for port validation (hereafter referred to as RW):

Rosinski, J.M. and D.L. Williamson: The Accumulation of Rounding Errors and Port Validation for Global Atmospheric Models. Journal of Scientific Computation, Vol. 18, No. 2, March 1997.

As established in RW, three conditions of model solution behavior must be fulfilled to successfully validate a port of atmospheric general circulation models:

during the first few timesteps, differences between the original and ported solutions should be within one to two orders of magnitude of machine rounding;
during the first few days, growth of the difference between the original and ported solutions should not exceed the growth of an initial perturbation introduced into the lowest-order bits of the original solution;
the statistics of a long simulation must be representative of the climate of the model as produced by the original code.

The extent to which these conditions apply to models other than an atmospheric model has not yet been established. Also, note that the third condition is not the focus of this section (see section 13.2).

13.3.2 Full CCSM Port Validation

Validation of the full CCSM system, defined as the combination of all active model components participating in the full computation, is a two-step process:

Validate each model as a standalone system
Validate the coupled system

Validation of each component model alone should be performed by the model developers, and it may not be necessary to perform the standalone tests as part of regular, frequent validation testing.

To validate the fully coupled CCSM, the objective is to establish a procedure which will allow one to conclude confidently that the port of the full system (all components active) is valid. However, there are at least two potential problems which should be noted:

Will the procedure be sufficient to draw conclusions confidently? That is, it must have little potential to conclude a good port when the port is, in fact, bad.
Upon a conclusion that the port is bad, it is likely that no information will be available pinpointing which component of the full system is suspect.

13.3.3 Recommended Procedure

The general procedure for port validation of the full CCSM is to examine the growth of differences between two solutions over a suitable number of integral timesteps. This error growth can be compared to the growth of differences between two solutions on a single machine, where the differing solution was produced by introducing a random perturbation of the smallest amplitude which can be felt by the model at the precision of the machine.

It is recommended that the procedure examine the growth of differences in a state variable which resides at the primary physical interface (that is, the surface), where the accumulation of errors in all components will act quickly and where the action of the CCSM coupler is also significant (for example, grid mapping).

It is also recommended that the procedure be performed on a coupled system where the exchange of information between active components is frequent. Exchanges of information a model day boundaries may mask the detection of an invalid port because the magnitude of the error differences could reach roundoff saturation levels prior to an exchange of data. See example 5 in section 13.3.4.

The recipe for CCSM validation is as follows:

run the CCSM on a selected machine on which confidence in the solution has been established;
re-run the CCSM on the same machine, introducing an initial error perturbation in the atmospheric model 3-D temperature using the procedure available in the CCM (see -need a web link-);
run the CCSM on the target machine using the same code, same model input namelist files, and same model input data files, and compare the error growth in the perturbed solution versus the error growth in the ported solution.

The errors should satisfy the first two conditions described in RW.

Specific recommendations for a port validation of CCSM:

Item Recommendation

length of test 5-30 days

field to examine 2-D surface temperature on atmospheric grid

frequency of samples every timestep

size of perturbation smallest which can be felt on original machine (1.0E-14)

error statistic RMS difference of field, area-averaged

Note that the field being examined must be processed using the full machine precision. The field must be saved at full machine precision during the model history archival step, and the error statistic must be computed at full machine precision.

13.3.4 Port Validation Examples

Example 1. Perturbation Error Growth

A typical perturbation error growth of the globally averaged RMS difference of surface temperature using a control and a low-order bit perturbation of CCM on 16pes of the IBM SP. Two days (144 atmospheric timesteps) are shown. Note that the first few timesteps satisfy the first condition of RW.

$\begin{figure}\epsfysize =5in \epsffile{images/ex01.eps}\end{figure}$

Example 2. Machine Port
Black line is the perturbation error growth on the original machine (same as example 1). Red line is the grow in differences between the simulation on the original machine and the simulation on 64pes of an SGI Origin 2000, and the blue line is the grow of differences from a simulation on 32pes of an IBM SP. Note that the first two days (144 timesteps) satisfy the second condition of RW.

$\begin{figure}\epsfysize =5in \epsffile{images/ex02.eps}\end{figure}$

Example 3. Bad Port I
Same as example 2, but blue line is a port where the default Greenhouse gas concentration was modified accidentally in the atmospheric source code. The first and second conditions of RW are violated.

$\begin{figure}\epsfysize =5in \epsffile{images/ex03.eps}\end{figure}$

Example 4. Bad Port II
Same as example 2, but blue line is a port where the second order diffusion coefficient was raised by 15% in the atmospheric model namelist input. The first and second conditions of RW are violated.

$\begin{figure}\epsfysize =5in \epsffile{images/ex04.eps}\end{figure}$

Example 5. Frequency of Model Data Exchange
Same as example 2, but blue line is a port where the ocean model vertical diffusion coefficient was lowered intentionally. While the first and second RW conditions are satisfied, the port was forced to have been bad. The problem is that the ocean and atmosphere were directed to exchange data only at day boundaries (72 atmospheric timesteps), and thus the coupler did not communicate the ocean solution to the atmosphere until the start of the second day. The error in the ocean model solution had already reached the roundoff saturation level by the time the atmospheric model received the information. For port validation, this example demonstrates that the exchanges of data between components must occur more frequently than time scale at which the roundoff error reaches a level (saturated) value.

$\begin{figure}\epsfysize =5in \epsffile{images/ex05.eps}\end{figure}$

Next: Appendix A: Testing Terminology Up: dev_guide Previous: 12 Component Model Testing, Contents

csm@ucar.edu

Item	Recommendation
length of test	5-30 days
field to examine	2-D surface temperature on atmospheric grid
frequency of samples	every timestep
size of perturbation	smallest which can be felt on original machine (1.0E-14)
error statistic	RMS difference of field, area-averaged