Debugging Tests That Fail

This section describes what steps can be taken to try to identify why a test failed. The primary information associated with reviewing and debugging a run can be found in the Section called Troubleshooting runtime problems in Chapter 8.

First, verify that a test case is no longer in the batch queue. If that's the case, then review the possible test results and compare that to the result in the TestStatus file. Next, review the TestStatus.out file to see if there is any additional information about what the test did. Finally, go to the troubleshooting section and work through the various log files.

Finally, there are a couple other things to mention. If the TestStatus file contains "RUN" but the job is no longer in the queue, it's likely that the job either timed out because it exceeded its specified wall clock time, or the job hung or exited abnormally due to some run-time error. Check the batch log files to see if the job was killed due to a time limit, and if it was, increase the time limit and either resubmit the job, or generate a new test case and update the time limit before submitting it.

Also, a test case can fail because either the job didn't run properly or because the test conditions (i.e. exact restart) weren't met. Try to determine whether the test failed because the run failed, or because the test did not meet the test conditions. If a test is failing early in a run, it's usually best to set up a standalone case with the same configuration in order to debug problems. If the test is running fine, but the test conditions are not being met (i.e. exact restart), then that requires debugging of the model in the context of the test conditions.

Not all tests will pass for all model configurations. For more information, please check the known problems page for this release to find out which machines have problems with which compsets and/or resolutions.