Debugging Failed Tests

This section describes what steps can be taken to try to identify why a test failed. The primary information associated with reviewing and debugging a run can be found in the Section called Troubleshooting runtime problems in Chapter 10.

First, verify that a test case is no longer in the batch queue. If that's the case, then review the possible test results and compare that to the result in the TestStatus file. Next, review the TestStatus.out file to see if there is any additional information about what the test did. Finally, go to the troubleshooting section and work through the various log files.

Finally, there are a couple other things to mention. If the TestStatus file contains "RUN" but the job is no longer in the queue, it's possible that the job either timed out because the wall clock on the batch submission was too short, or the job hung due to some run-time error. Check the batch log files to see if the job was killed due to a time limit, and if it was increase the time limit and either resubmit the job or generate a new test case and update the time limit before submitting it.

Also, a test case can fail because either the job didn't run properly or because the test conditions (i.e. exact restart) weren't met. Try to determine whether the test failed because the run failed or because the test did not meet the test conditions. If a test is failing early in a run, it's usually best to setup a standalone case with the same configuration in order to debug problems. If the test is running fine, but the test conditions are not being met (i.e. exact restart), then that requires debugging of the model in the context of the test conditions.

Not all tests will pass for all model configurations. Some of the issues we are aware of are