next up previous contents
Next: 4.4 Porting the model Up: 4. Simple Code Modifications Previous: 4.2 Adding New Output   Contents

Subsections

4.3 Trouble-Shooting Model Changes

If the cause of abnormal termination is unclear, the user should first ensure that the model is run single-threaded with SPMD off. Abnormal termination in a multi-tasked job can result in confusing ancillary error messages.

We address several possible causes of model failure. Resource allocation errors will be addressed first, followed by remedies for suspected coding errors. Finally, analysis tools are described for physics formulation errors (i.e., where there is an error in modifications to a prognostic variable calculation).

4.3.1 Resource Allocation Errors

A system resource problem which may occur on linux architectures is that the default stack size on linux machines is sometimes too small for larger resolution runs or when running on multiple processors. The model usually fails with a segmentation fault. The user should try increasing the stack size if this problem occurs. The stack size can be set to its maximum by using the limit command. Typing limit alone will print the system resource limits. To set the stack size to its maximum type limit stacksize unlimited .

When running the message passing code on multiple processors it is necessary to place the limit command in the user's shell startup script. Since the message passing software usually starts new processes, the user must make sure that these processes have the larger stack size when started by MPI. An easy way to determine that new shells have the larger stack size is to execute the command rsh machine limit (where machine is the name of a computer on which to start the remote shell.)

Once the stack size has been increased try running the model again. If the stack size was too small before it should run to completion.

4.3.2 Coding Errors

We suggest that for debugging purposes only statically allocated memory locations and/or stack space be initialized to "indefinite". Furthermore,array bounds checking should be turned on if possible. The standard Makefile achieves this if configure is invoked with the -debug option.

If the model is running but producing incorrect or suspicious history files, a quick and easy-to-use diagnostic program, cprnc, is available. cprnc is available in the directory models/atm/cam/tools. This program provides a statistical analysis of differences in history file data. No command line arguments are required. cprnc compares fields of the same name on each file, printing out statistics about the number of differences found, location and magnitude of worst absolute difference, location and magnitude of worst relative difference, RMS difference, maximum and minimum field values, and average field values.


next up previous contents
Next: 4.4 Porting the model Up: 4. Simple Code Modifications Previous: 4.2 Adding New Output   Contents
Jim McCaa 2004-10-22