Chapter 7. Porting CESM

Table of Contents
Porting to a new machine
Port Validation

One of the first steps many users will have to address is getting the CESM model running on their local machine. This section addresses that step. This section will describe two different ways of going about that. First, using a generic machine to setup a case, get that case running, then backing out the new machine settings. Second, setting up some new machine settings, creating a case, testing it, and iterating on the machine settings. There are similarities and overlap in both methods. The generic method is likely to produce a running case faster. But eventually, users will want to setup the CESM scripts so their local machine is supported out-of-the-box. This greatly eases setting up cases and benefits groups of users by requiring the port be done only once. Finally, some steps to validate the model will be recommended.

Note: When porting using either of the two methods described above, you will want to initially get a dead, X, compset running at a low resolution. So you could, for instance, start with an X compset at resolution f45_g37. This allows you to determine whether all prerequisite software is in place and working. Once that is working move to an A compset with resolution f45_g37. Once that's working, run a B compset at resolution f45_g37. Finally when all the previous steps have run correctly, run your target compset and resolution.

Porting to a new machine

Porting using a generic machine

This section describes how to setup a case using a generic machine name and then within that case, how to modify the scripts to get that case running on a local machine. In this section, the case name test1 and the generic machine generic_linux_intel will be used in the example. But the specific casename, generic machine, resolution, and compset to test is at the discretion of the user.

  1. Run create_newcase choosing a generic machine name that is closest to the local machine type. Typing

    
> create_newcase -l
    
    will provide a list of possible machines. The generic machines start with the name "generic_". The generic machines are different from the supported machines because extra inline documentation is provided and the user will have to modify some of the resolved scripts.

    Additional command line arguments are required for the generic machines to help setup some of the local environment variables. Typing

    
> create_newcase -h
    
    provides a description of the command line arguments. The create_newcase will look something like this for a generic machine
    
> cd cesm/scripts
    > create_newcase -case test1 \
                     -res f19_g16 \
                     -compset X \
                     -mach generic_linux_intel \
                     -scratchroot /ptmp/username \
                     -din_loc_root_csmdata /home/ccsm/inputdata \
                     -max_tasks_per_node 8 \
    

  2. Run configure.

    
> cd test1
    > configure -case
    
    If there are errors at this step, the best approach might be to port starting from the machine files instead of a generic machine. See the Section called Porting via user defined machine files.

  3. Edit the scripts to be consistent with the local machine. Search for "GENERIC_USER" in the scripts. That tag will highlight inline documentation and areas that will likely need to be modified. In particular, modifications will be needed in the following files.

    • env_mach_specific is where modules, paths, or machine environment variables need to be set. See the "GENERIC_USER" inline documentation in that file.

    • Macros.generic_linux_intel is a Macros file for gmake for the system. In general, that entire file should be reviewed but there are some particular comments about setting the paths for the netcdf and mpi external libraries. See the "GENERIC_USER" inline documentation in that file. In general you need to set NETCDF_PATH and MPICH_PATH and that can be set in the Macros file, but they could also be set in the default user paths, by an explicit addition to the local path in the env_mach_specific file, or via setting NETCDF_PATH and MPICH_PATH environment variables in the env_mach_specific file. If you want the value in the Macro's file to always be used you may need to comment out the if statement that checks if it's set elsewhere before overriding it to a hardwired value. While CESM supports use of pnetcdf in pio, it's generally best to ignore that feature during initial porting. pio works well with standard netcdf.

    • test1.generic_linux_intel.run is the job submission script. Modifications are needed there to address the local batch environment and the job launch. See the "GENERIC_USER" inline documentation in that file.

  4. Build the case

    
> ./test1.generic_linux_intel.build
    
    This step will often fail if paths to compilers, compiler versions, or libraries are not set properly, if compiler options are not set properly, or if machine environment variables are not set properly. Review and edit the env_mach_specific and Macros.generic_linux_intel files, clean the build,
    
> ./test1.generic_linux_intel.clean_build
    
    and try rebuilding again.

  5. Run the job using the local job submission command. qsub is used here for example.

    
> qsub test1.generic_linux_intel.run
    
    The job will fail to submit if the batch commands are not set properly. The job could fail to run if the launch command is incorrect or if the batch commands are not set consistent with the job resource needs. Review the run script and try resubmitting.

  6. Once a case is running, then the local setup for the case can be converted into a specific set of machine files, so future cases can be setup using the user defined machine name, not the generic machine, and cases should be able to run out-of-the-box. This step is very similar to the steps associated with porting using user defined machine files, see the Section called Porting via user defined machine files.

    Basically, files in cesm/scripts/ccsm_utils/Machines will be added or modified to support the user defined machine out-of-the-box. An env_machopts, Macros, and mkbatch file will be added and the config_machines.xml file will be modified. First, pick a name that will be associated with the local machine. Generally, that's the name of the local machine, but it could be anything. bugsbunny will be used in the description to follow and the bugsbunny setup will be based on the test1 example case above that is running on bugsbunny. To add bugsbunny to the list of supported machines, do the following

    • Edit cesm/scripts/ccsm_utils/Machines/config_machines.xml. Add an entry for bugsbunny. A good idea is to copy one of the existing entries and then edit it. The machine specific env variables that need to be set in config_machines.xml for bugsbunny are already set in the env files in the test1 case directory that was created from the generic machine. Those values can be translated directly into the config_machines.xml files for bugsbunny. That's a starting point anyway. In some cases, variables might need to be made more general. For instance, the port person's user name and the initial test case should not appear in the variable definitions.

    • Copy the env_mach_specific file from the test1 case directory to cesm/scripts/ccsm_utils/Machines as follows

      
> cd cesm/scripts/test1
      > cp env_mach_specific ../ccsm_utils/Machines/env_machopts.bugsbunny
      

    • Copy the Macros file from the test1 case directory to cesm/scripts/ccsm_utils/Machines as follows

      
> cd cesm/scripts/test1
      > cp Macros.generic_linux_intel  ../ccsm_utils/Machines/Macros.bugsbunny
      
      Then edit the cesm/scripts/ccsm_utils/Machines/Macros.bugsbunny file and delete everything up to the lines
      
#===============================================================================
      # The following always need to be set
      
      That first section of the Macros file is added automatically when a case is configured so should not be included in the machine specific setting.

    • Create a mkbatch.bugsbunny file in cesm/scripts/ccsm_utils/Machines. The easiest way to do this is probably to copy the mkbatch.generic_linux_intel file from that directory to mkbatch.bugsbunny

      
> cd cesm/scripts/ccsm_utils/Machines
      > cp mkbatch.generic_linux_intel mkbatch.bugsbunny
      
      Then edit the mkbatch.bugsbunny to match the changes made in the test1.generic_linux_intel.run file in the test1 case. Remove the GENERIC_USER inline documentation and where that documentation existed, update the batch commands and job launch commands to be consistent with the test1 run script. The first part of the mkbatch script computes values that can be used in the batch commands. It might require some extra iteration to get this working for all cases, processor counts, and processor layouts.

    • Test the new machine setup. Create a new case based on test1 using the bugsbunny machine setup

      
> cd cesm/scripts
      > create_newcase -case test1_bugsbunny \
                       -res f09_g16 \
                       -compset X \
                       -mach bugsbunny 
      
      Then configure, build, and run the case and confirm that test1_bugsbunny runs fine and is consistent with the original test1 case. Once that works, test other configurations then move to port validation, see the Section called Port Validation.

Porting via user defined machine files

This section describes how to add support for a new machine using machine specific files. The basic approach is to add support for the new machine to the CESM scripts directly and then to test and iterate on that setup. Files in cesm/scripts/ccsm_utils/Machines will be added or modified to support the user defined machine out-of-the-box. An env_machopts, Macros, and mkbatch file will be added and the config_machines.xml file will be modified. First, pick a name that will be associated with the local machine. Generally, that's the name of the local machine, but it could be anything. wilycoyote will be used in the description to follow. It's also helpful to identify an existing supported machine that is similar to your machine to use as a starting point in porting. If the user defined machine is a linux cluster with an intel compiler, then after reviewing the current supported machines using


> cd cesm/scripts
> ./create_newcase -l
dublin_intel, hadley, or generic_linux_intel would be good candidates as starting points. Starting with a generic machine provides some additional inline documentation to aid in porting. If a generic machine is used, search for the tag "GENERIC_USER" in the scripts for additional documentation. In the example below, dublin_intel will be used as the starting point. To add wilycoyote to the list of supported machines, do the following

  • Edit cesm/scripts/ccsm_utils/Machines/config_machines.xml. Add an entry for wilycoyote. A good idea is to copy one of the existing entries and then edit the values for wilycoyote. You could start with the dublin_intel settings although nearly any machine will be ok. There are several variable settings here. The definition of these variables can be found in the appendix, see Appendix D, Appendix E, Appendix F, Appendix G, and Appendix H. Some of the important ones are MACH which should be set to wilycoyote, EXEROOT which should be set to a generic working directory like /tmp/scratch/$CCSMUSER/$CASE, DIN_LOC_ROOT_CSMDATA which should be set to the path to the ccsm inputdata directory, BATCHQUERY and BATCHJOBS which specify the query and submit command lines for batch jobs and are used to chain jobs together in production, and MAX_TASKS_PER_NODE which set the maximum number of tasks allowed on each hardware node.

  • Copy an env_machopts file to env_machopts.wilycoyote. Start with the dublin_intel file.

    
> cd cesm/scripts/ccsm_utils/Machines
    > cp env_machopts.dublin_intel env_machopts.wilycoyote
    
    Edit env_machopts.wilycoyote to change the environment setup, paths, modules, and environment variables to be consistent with wilycoyote.

  • Copy a Macros file to Macros.wilycoyote. Start with the dublin_intel file.

    
> cd cesm/scripts/ccsm_utils/Machines
    > cp Macros.dublin_intel Macros.wilycoyote
    
    Then review and edit the Macros.wilycoyote file as needed. Pay particular attention to the compiler name, and the netcdf and mpi paths. While the compiler options for a given compiler are pretty consistent across machines, invoking the compiler and the local paths for libraries are not. While CESM supports use of pnetcdf in pio, it's generally best to ignore that feature during initial porting. pio works well with standard netcdf.

  • Copy a mkbatch file to mkbatch.wilycoyote file. Start with the dublin_intel file.

    
> cd cesm/scripts/ccsm_utils/Machines
    > cp mkbatch.dublin_intel mkbatch.wilycoyote
    
    Then edit the mkbatch.wilycoyote to be consistent with wilycoyote. In particular, the batch commands and the job launching will probably need to be changed. The batch commands and setup are the first section of the script. The job launching can be found by searching for the string "CSM EXECUTION".

  • After an initial pass is made to setup the new machine files, try creating a case, building and running. Getting this to work will be an iterative process. Changes will probably be made in both the machine files in cesm/scripts/ccsm_utils/Machines for wilycoyote and in the case as testing proceeds. Whenever the machine files are updated, a new case should be setup. Whenever something is changed in the case scripts to fix a problem, that change should be migrated back to the wilycoyote machine files. In general, it's probably easiest to modify the machine files and create new cases until the case configures successfully. Once the case is configuring, it's often easiest to edit the case scripts to fix problems in the build and run. Once a case is running, those changes in the case need to be backed out into the wilycoyote machine files and then those machine files can be tested with a new case.

    
> cd cesm/scripts
    > create_newcase -case test_wily1 \
                     -res f19_g16 \
                     -compset X \
                     -mach wilycoyote 
    > cd test_wily1
    > configure -case
    > ./test_wily1.wilycoyote.build
    > qsub test_wily1.wilycoyote.run
    
    Eventually, the machine files should work for any user and any configuration for wilycoyote.