Doing perturbation error growth tests

Doing perturbation error growth tests is a way to validate a port of the model to a new machine or to verify that changes are only roundoff. The steps are the same in either case, but in the discussion below I will assume you are doing a port validation to a new machine (but in parentheses I will put a reminder that it could also be for code-mods). The basic idea is to run a case on the trusted machine (trusted code) and another with initial conditions perturbed by roundoff and compare the results of the two. The difference between these two simulations (the error) will grow over time and describe a curve that we compare with the error growth on the new machine (code changes). The error growth on the new machine is the difference between the non-perturbed state on the trusted machine and the non-perturbed state on the new machine (code changes). If the new machine (code changes) are well-behaved the plot of this error growth compared to the error growth curve on the trusted machine should be similar. If the changes are NOT well-behaved the changes from the new machine (code changes) will be larger than the perturbation changes. In summary the simulations and steps that need to be performed are:

  1. Run a simulation with the trusted code on the trusted machine.

  2. Run a simulation with the trusted code on the trusted machine with initial conditions perturbed by roundoff (using a namelist item to do so).

  3. Run a simulation with the new code on the non-trusted machine (code changes).

  4. Do a plot of the RMS difference of TSOI between simulation 1 and simulation 2.

  5. Do a plot of the RMS difference of TSOI between simulation 1 and simulation 3.

  6. Compare the two plots in steps 4 and 5.

  7. If the plots compare well the new machine (code changes) is running as well as the trusted machine.

  8. If the plots do NOTcompare well the new machine is NOTrunning as well as the trusted machine. Typically the recommendation here is to lower the optimization level on the new machine and try again (or in the case of code changes, modify or simplify the code changes to get something that should be closer).

Now we will give a detailed description of the procedure with examples and the exact steps to perform.

Using Perturbation Error Growth Analysis to Verify a Port to a New Machine

  1. Running non-perturbed on trusted machine

    The first step is to run a non-perturbed case on the trusted machine. You need to run all of the steps with the same compset and same resolution. For these examples we will use 2-degree resolution with the ICN compset for 2000 conditions. You need to run for three days with a cold-start.

    Example 4-5. Example non-perturbed error growth simulation

    
> cd scripts
    > create_newcase -case trustedMachinePergro0 -compset ICN -res f19_g16 \
    -mach bluefire
    > cd trustedMachinePergro0
    # Set the non-perturbed PERGRO use-case
    > xmlchange -file env_conf.xml -id CLM_NML_USE_CASE -val pergro0
    # Set coldstart on so arbitrary initial conditions will be used
    > xmlchange -file env_conf.xml -id CLM_FORCE_COLDSTART -val on
    # Set PERGRO on in the configure
    > $EDITOR env_conf.xml  # add "-pergro on" to CLM_CONFIG_OPTS
    # Now configure and build
    > configure -case
    > trustedMachinePergro0.build
    # Set it to run for three days and turn archiving off
    > xmlchange -file env_run.xml -id STOP_N -val 3 
    > xmlchange -file env_run.xml -id DOUT_S -val FALSE
    # Run the case and then you will save the history file output for later use
    > bsub < trustedMachinePergro0.run
    

  2. Running perturbed on the trusted machine

    The next step is to run a perturbed case on the trusted machine.

    Example 4-6. Example perturbed error growth simulation

    
> cd scripts
    > create_newcase -case trustedMachinePergroRnd -compset ICN -res f19_g16 \
    -mach bluefire
    > cd trustedMachinePergroRnd
    # Set the perturbed PERGRO use-case
    > xmlchange -file env_conf.xml -id CLM_NML_USE_CASE -val pergro
    # Set coldstart on so arbitrary initial conditions will be used
    > xmlchange -file env_conf.xml -id CLM_FORCE_COLDSTART -val on
    # Set PERGRO on in the configure
    > $EDITOR env_conf.xml  # add "-pergro on" to CLM_CONFIG_OPTS
    # Now configure and build
    > configure -case
    > trustedMachinePergroRnd.build
    # Set it to run for three days and turn archiving off
    > xmlchange -file env_run.xml -id STOP_N -val 3 
    > xmlchange -file env_run.xml -id DOUT_S -val FALSE
    # Run the case and then you will save the history file output for later use
    > bsub < trustedMachinePergroRnd.run
    

  3. Running non-perturbed on the new machine

    The next step is to run a non-perturbed case on the new machine. Here we will demonstrate using the machine intrepid.

    
> cd scripts
    > create_newcase -case newMachinePergro0 -compset ICN -res f19_g16 \
    -mach intrepid
    > cd newMachinePergro0
    # Set the non-perturbed PERGRO use-case
    > xmlchange -file env_conf.xml -id CLM_NML_USE_CASE -val pergro0
    > xmlchange -file env_conf.xml -id CLM_FORCE_COLDSTART -val on
    # Set PERGRO on in the configure
    > $EDITOR env_conf.xml  # add "-pergro on" to CLM_CONFIG_OPTS
    # Now configure and build
    > configure -case
    > newMachinePergro0.build
    # Set it to run for three days and turn archiving off
    > xmlchange -file env_run.xml -id STOP_N -val 3 
    > xmlchange -file env_run.xml -id DOUT_S -val FALSE
    # Run the case and then you will save the history file output for later use
    > bsub < newMachinePergro0.run
    

  4. Plotting the differences

    You can use the cprnc program to compute root mean square differences between the relevant history files. See the Section called Using the cprnc tool to compare two history files in Chapter 2 for more information on it and how to build it. On many platforms you will need to set some environment variables in order to complete the build (see the Section called Common environment variables and options used in building the FORTRAN tools in Chapter 2 for more information on building the tools).

    
# Build the cprnc program
    > cd models/lnd/clm/tools/cprnc
    > gmake
    # Now go to your case directory and run cprnc on the trusted-machine with and without
    # perturbation
    > cd ../../../../../scripts/trustedMachinePergro0
    > ../../models/lnd/clm/tools/cprnc/cprnc trustedMachinePergro0.clm2.h0.001-01-01.00000.nc \
    ../trustedMachinePergroRnd/trustedMachinePergroRnd.clm2.h0.001-01-01.00000.nc > trustedPergro.log
    # Copy the history file from the new machine to here
    #
    # And now run cprnc on the trusted-machine and the new machine both without perturbation
    > ../../models/lnd/clm/tools/cprnc/cprnc trustedMachinePergro0.clm2.h0.001-01-01.00000.nc \
    ../newMachinePergro0/newMachinePergro0.clm2.h0.001-01-01.00000.nc > newPergro.log
    # Now extract out the RMS differences to both
    > grep RMS trustedPergro.log | awk '{print $3}' > trustedPergro.dat
    > grep RMS newPergro.log     | awk '{print $3}' > newPergro.dat
    # And plot the two curves up to your screen
    > env TYPE=x11 RMSDAT=newPergro.dat RMSDAT2=trustedPergro.dat ncl \
    ../../models/lnd/clm/tools/ncl_scripts/pergroPlot.ncl
    
    Here is a sample plot for two trusted machines (using data that Sheri Mickelson provided to us). The green line is the error growth for bluefire, and the red is the error growth for intrepid.

    Figure 4-1. Sample Perturbation Error Growth Curve