Complex software such as the CCSM requires extensive testing in order to prevent model defects and to provide stable, solid models to work with. Layered testing has shown to be the most effective in catching software defects. Layered testing refers to testing on different levels, both testing individual subroutines as well as more complex systems. There may be several layers of simple to more complex systems tested as well. Testing the individual component models stand-alone is an example of a system less complex than the entire CCSM. Unit-testing is the first layer - testing individual subroutines or modules. Unit-testing by itself will not catch defects that are dependent on relationships between different modules - but testing the entire system sometimes will not catch errors within an individual module. That is why using both extremes is useful in catching model defects. Section covers testing for the entire CCSM modeling system, this section goes over testing of individual model components and unit-testing of subroutines and modules within those components. Another way to help eliminate code errors are periodic code-reviews. Code-reviews can be implemented in many different fashions, but in general it involves having at least one person besides the author go through the written code and examine the implementation both for design and errors. Jones-1986[3] states that ``the average defect-detection rate is only 25 percent for unit testing, 35 percent for function testing, and 45 percent for integration testing. In contrast, the average effectiveness of design and code inspections are 55 percent and 60 percent. McConnel-1993[4] also notes that as well as being more effective in catching errors, code-reviews also catch different types of errors than testing does. In addition when developers realize their code will be reviewed - they tend to be more careful themselves when programming.
Since, the CCSM and the component models take substantial computer resources to run - catching errors early can cut computing costs significantly. In addition to that as pointed out by McConnell-1993[4] development time decreases dramatically when formal quality assurance methods including code-reviews are implemented.
Each component model needs to develop and maintain it's own suite of testing for that given component. It is recommended that analysis of the kinds of testing required for each model by each component models development team be done and written down in a formal testing-plan. Also creating automated tests to run a suite of standard tests can be useful to ensure the models work and continue to work as needed. This is especially useful for making sure models continue to work on multiple platforms. McConnell-1996[5] refers to this as the daily ``build and smoke test'' you daily build and run your code to ensure it continues to work and doesn't just sit there and ``smoke''.
In order to design a comprehensive testing plan we want to take advantage of the following types of tests.
Unit-tests are a good way to flush out certain types of defects. Since unit-tests only run on one subroutine they are easier to use, faster to build and run, allow more comprehensive testing on a wider range of input data, help document how to use and check for valid answers, and allows faster testing of individual pieces. By building and maintaining unit-tests the same tests can be run and used by other developers as part of a more comprehensive testing package. Without maintaining unit-tests developers often do less testing than required - since system tests are so much harder to do - or they have to ``hack'' together their own unit-tests for each change. By maintaining unit-tests we allow others to leverage off previous work and provide a format to quickly do extensive checking.
Good unit-tests will do the following:
By analyzing the code to be tested different test cases can be designed to ensure that all logical statements are exercised in the unit-test. Similarly input can be designed to test logical threshold states (boundary analysis). Testing scientific validity is of course the most difficult. But, sometimes testing states where the answer is known analytically can be useful. And ensuring (or measuring) the degree to which energy, heat, or mass is conserved for conservative processes can also often be done. These types of tests may also be applied for more complex functional and system tests as well.
Functional tests take a given sub-set of the system and test this set for a particular functionality. Scientific functional tests are common. For example, the Column Radiation Model (CRM) is used to check the radiation part of the atmospheric model. Quite often scientific functional tests are implemented as name-list options to component models, the atmospheric model has a name-list item to test dynamics only by turning the physics off. Functional tests could also be created for infra-structure issues such as parallel decomposition or handling of input or output data. Important functional tests should be maintained in CVS as separate modules that include the directories maintained for the main component model.
System tests for a given component model need to ensure that the given model compiles, builds, and runs and that it passes important model requirements. For example, most models require that restarts give results that are bit-for-bit to continuous simulations. This requirement can be tested fairly easily.
Unit-tests for component models should meet the following minimum requirements.
Component model system tests should meet the following minimum requirements.
Formal reviews of the code where the code is gone through line-by-line in groups or in pairs has shown to be one of the most effective way to catch errors McConnell-1993 [4]. As such it is recommended that component model development teams create a strategy for regularly reviewing the code.
Code reviews can be implemented in different ways.
It is recommended that development teams provide both a mechanism to review incremental changes, and also have formal walk-through of important pieces of code in group. This serves two purposes: the design is communicated to a larger group, and the design and implementation is also reviewed by the entire group.
By adopting quality assurance techniques CCSM model codes can both be of greater quality, development time can be lowered, and machine time can be cut by decreasing errors.