12 Component Model Testing, Unit-Testing, Code Reviews

Complex software such as the CCSM requires extensive testing in order to prevent model defects and to provide stable, solid models to work with. Layered testing has shown to be the most effective in catching software defects. Layered testing refers to testing on different levels, both testing individual subroutines as well as more complex systems. There may be several layers of simple to more complex systems tested as well. Testing the individual component models stand-alone is an example of a system less complex than the entire CCSM. Unit-testing is the first layer - testing individual subroutines or modules. Unit-testing by itself will not catch defects that are dependent on relationships between different modules - but testing the entire system sometimes will not catch errors within an individual module. That is why using both extremes is useful in catching model defects. Section covers testing for the entire CCSM modeling system, this section goes over testing of individual model components and unit-testing of subroutines and modules within those components. Another way to help eliminate code errors are periodic code-reviews. Code-reviews can be implemented in many different fashions, but in general it involves having at least one person besides the author go through the written code and examine the implementation both for design and errors. Jones-1986[3] states that ``the average defect-detection rate is only 25 percent for unit testing, 35 percent for function testing, and 45 percent for integration testing. In contrast, the average effectiveness of design and code inspections are 55 percent and 60 percent. McConnel-1993[4] also notes that as well as being more effective in catching errors, code-reviews also catch different types of errors than testing does. In addition when developers realize their code will be reviewed - they tend to be more careful themselves when programming.

Since, the CCSM and the component models take substantial computer resources to run - catching errors early can cut computing costs significantly. In addition to that as pointed out by McConnell-1993[4] development time decreases dramatically when formal quality assurance methods including code-reviews are implemented.

12.1 Component Model Testing

Each component model needs to develop and maintain it's own suite of testing for that given component. It is recommended that analysis of the kinds of testing required for each model by each component models development team be done and written down in a formal testing-plan. Also creating automated tests to run a suite of standard tests can be useful to ensure the models work and continue to work as needed. This is especially useful for making sure models continue to work on multiple platforms. McConnell-1996[5] refers to this as the daily ``build and smoke test'' you daily build and run your code to ensure it continues to work and doesn't just sit there and ``smoke''.

12.1.1 Designing Good Tests

In order to design a comprehensive testing plan we want to take advantage of the following types of tests.

unit-testing: Testing done on a single subroutine or module.
functional-testing: Testing for a given functional group of subroutines or modules, for example, testing model dynamics alone without the model physics.
system-testing: Testing done on the whole system.

12.1.2 unit-tests

Unit-tests are a good way to flush out certain types of defects. Since unit-tests only run on one subroutine they are easier to use, faster to build and run, allow more comprehensive testing on a wider range of input data, help document how to use and check for valid answers, and allows faster testing of individual pieces. By building and maintaining unit-tests the same tests can be run and used by other developers as part of a more comprehensive testing package. Without maintaining unit-tests developers often do less testing than required - since system tests are so much harder to do - or they have to ``hack'' together their own unit-tests for each change. By maintaining unit-tests we allow others to leverage off previous work and provide a format to quickly do extensive checking.

Good unit-tests will do the following:

Applicable requirements are checked.
Exercise every line of code.
Check that the full range of possible input data works. (i.e. if Temperature is input check that values near both the minimum and maximum possible values work)
Boundary analysis - logical statements that refer to threshold states are checked to ensure they are correct.
Check for bad input data.
Test for scientific validity.

By analyzing the code to be tested different test cases can be designed to ensure that all logical statements are exercised in the unit-test. Similarly input can be designed to test logical threshold states (boundary analysis). Testing scientific validity is of course the most difficult. But, sometimes testing states where the answer is known analytically can be useful. And ensuring (or measuring) the degree to which energy, heat, or mass is conserved for conservative processes can also often be done. These types of tests may also be applied for more complex functional and system tests as well.

12.1.3 Functional-tests

Functional tests take a given sub-set of the system and test this set for a particular functionality. Scientific functional tests are common. For example, the Column Radiation Model (CRM) is used to check the radiation part of the atmospheric model. Quite often scientific functional tests are implemented as name-list options to component models, the atmospheric model has a name-list item to test dynamics only by turning the physics off. Functional tests could also be created for infra-structure issues such as parallel decomposition or handling of input or output data. Important functional tests should be maintained in CVS as separate modules that include the directories maintained for the main component model.

12.1.4 System-tests

System tests for a given component model need to ensure that the given model compiles, builds, and runs and that it passes important model requirements. For example, most models require that restarts give results that are bit-for-bit to continuous simulations. This requirement can be tested fairly easily.

12.1.5 CCSM Testing requirements and implementation details

Unit-tests for component models should meet the following minimum requirements.

Maintained in CVS either with the rest of the models source code or as a separate module that can be used. Module and/or directory name should be easily identifiable such as `unit_testers'.
Have documentation on how to use it.
Check error conditions so that an error will print out problems.
Prompt for any input in a useful way. (so you don't have to read the code to figure out you have to enter something).
Have a Makefile associated with it. It may be useful to leverage off the main Makefile so that the compiler options are the same and so that platform dependencies don't have to be maintained twice.
In general unit-tests should be run with as many compiler debug options on as possible (bounds checking, signal trapping etc).

Component model system tests should meet the following minimum requirements.

Ensure that the given model will compile, build and run on at least one production platform.
Ensure that the given model will work with the CCSM system on at least one production platform.

12.2 Code-Reviews

Formal reviews of the code where the code is gone through line-by-line in groups or in pairs has shown to be one of the most effective way to catch errors McConnell-1993 [4]. As such it is recommended that component model development teams create a strategy for regularly reviewing the code.

12.2.1 Strategies for Implementation of Code-Reviews

Code reviews can be implemented in different ways.

Code librarian - Before code is checked into CVS it goes through a ``librarian'' who not only is responsible for testing, and validation of the changes - but also reviews it for design and following code standards.
Peer reviews - Before code is checked into CVS a peer developer reviews the changes.
Pair programming - All code is developed with two people looking at the same screen (one of the practices of Extreme Programming - [2]).
Formal configuration management - All code modifications are presented to a configuration management team who extensively reviews and tests changes and incorporates changes as dictated by project management.
Formal group walk-through - Code is presented and gone through by an entire group.
Formal individual walk-through - Different individuals are assigned and take responsibility to review different subroutines.

It is recommended that development teams provide both a mechanism to review incremental changes, and also have formal walk-through of important pieces of code in group. This serves two purposes: the design is communicated to a larger group, and the design and implementation is also reviewed by the entire group.

By adopting quality assurance techniques CCSM model codes can both be of greater quality, development time can be lowered, and machine time can be cut by decreasing errors.

Next: 13 System Testing and Up: dev_guide Previous: 11 Data Management Conventions Contents

csm@ucar.edu