Quick-Start to using your own datasets in clm4
          ===============================================

Assumptions: You are already familiar with the use of the cpl7 scripts
             for creating cases to run with "standalone" clm. See the
             Quickstart.GUIDE and the README files and documentation in
             the scripts directory for more information on this process.
             We also assume that the env variable $CSMDATA points to the
             location of the standard datasets for your machine 
             (/fis/cgd/cseg/csm/inputdata on bluefire). We also assume that the
             following variables are used to point to the appropriate
             values that you want to use for your case. Mask is included
             as part of your resolution for your case, and SIM_YEAR and
             SIM_YEAR_RANGE will be set appropriately for the particular use
             case that you choose for your compset (i.e. 1850_control, 
             20thC_transient etc.).

                 SIM_YEAR -------- Simulation year       (i.e. 1850, or 2000)
                 SIM_YEAR_RANGE -- Simulation year range (i.e. constant, or 1850-2000)
                 MASK ------------ Land mask             (i.e. navy, USGS, or gx1v6)

Process:

       0.) Why do this?

       An alternative to the steps below, is to create your case, and hand-edit the
       relevant namelists as appropriate with your own datasets. One reason for
       the process below is so that we can do automated testing on dataset inclusion.
       But, it also provides the following functionality to the user:
           a.) New cases with the same datasets only require a small change to 
               env_conf.xml and env_run.xml (steps 5,6, and 8)
           b.) You can clone new cases based on a working case, without having to 
               hand-edit all of the namelists for the new case in the same way.
           c.) The process will check for the existence of files when cases are
               configured so you can have the scripts check that datasets exist
               rather than finding out at run-time after submitted to batch.
           d.) The process checks for valid namelists, and makes it less likely 
               for you to put an error or typo in the namelists.
           e.) The *.input_data_list files will be accurate for your case,
               you can use the check_input_data script to do queries on the files.
           f.) Your dataset names will be closer to standard names, and easier
               for inclusion in standard clm (with the exception of creation dates).
           g.) The regional extraction script (see 3.b below) will automatically create
               files with names following this convention.

       1.) Create your own dataset area -- link it to standard dataset location

       Create a directory to put your own datasets (such as /ptmp/$USER/my_inputdata).
       Use the script link_dirtree to link the standard datasets into this location.
       If you already have complete control over the datasets in $CSMDATA -- you
       can skip this step.

       		setenv MYCSMDATA /ptmp/$USER/my_inputdata
       		scripts/link_dirtree $CSMDATA $MYCSMDATA

       If you do this you can find the files you've added with...

           find $MYCSMDATA -type f -print

       and you can find the files that are linked to the standard location with...

           find $MYCSMDATA -type l -print

       2.) Establish a "user dataset identifier name" string

       You need a unique identifier for your datasets for a given resolution,
       mask, area, simulation-year, and simulation year-range. The identifier
       can be any string you want -- but we have the following suggestions:

       Suggestions for global grids:

       		setenv MYDATAID ${degLat}x${degLon}

       Suggestions for regional grids: either give the number of points in the grid

       		setenv MYDATAID nxmpt_citySTATE
       		setenv MYDATAID nxmpt_cityCOUNTRY
       		setenv MYDATAID nxmpt_regionCOUNTRY
       		setenv MYDATAID nxmpt_region

                       or give the total size of the gridcells

       		setenv MYDATAID nxmdeg_citySTATE
       		setenv MYDATAID nxmdeg_cityCOUNTRY

        for example: setenv MYDATAID 10x15 -- global 10x15 grid
                     setenv MYDATAID 1x1pt_boulderCO -- single-point for Boulder CO
                     setenv MYDATAID 5x5pt_boulderCO -- 5x5 region around Boulder CO
                     setenv MYDATAID 1x1deg_boulderCO - 1x1 degree region around Boulder CO
                     setenv MYDATAID 13x12pt_f19_alaskaUSA1 - 13x12 gridcells from f19
                                                 (1.9x2.5) global resolution over Alaska

       3.) Add your own datasets in the standard locations in that area

       3.a) Create datasets using the standard tools valid for any specific points

       Use the tools in models/lnd/clm/tools to create new datasets. Tools
       such as: mkgriddata, mksurfdata, mkdatadomain, and the regridding tools
       in ncl_scripts 

       (see the models/lnd/clm/bld/namelist_files/namelist_defaults_usr_files.xml 
        for the exact syntax for all files).

       surfdata:    copy files into: 
             $MYCSMDATA/lnd/clm2/surfdata/surfdata_${MYDATAID}_simyr${SIM_YEAR}.nc
       fatmgrid:    copy files into:
             $MYCSMDATA/lnd/clm2/griddata/griddata_${MYDATAID}.nc
       fatmlndfrc:  copy files into:
             $MYCSMDATA/lnd/clm2/griddata/fracdata_${MYDATAID}_${MASK}.nc
       domainfile:  copy files into:
             $MYCSMDATA/atm/datm7/domain.clm/domain.lnd.${MYDATAID}_${MASK}.nc

       3.b) Use the regional extraction script to get regional datasets from the global ones
       Use the getregional_datasets.pl script to extract out regional datasets of interest.
       Note, the script works on all files other than the "finidat" file as it's a 1D vector file.

       For example, Run the extraction for data from 52-73 North latitude, 190-220 longitude
       that creates 13x12 gridcell region from the f19 (1.9x2.5) global resolution over Alaska.

                cd models/lnd/clm/tools/ncl_scripts
                ./getregional_datasets.pl -sw 52,190 -ne 73,220 -id $MYDATAID \
                -mycsmdata $MYCSMDATA

       Repeat this process if you need files for multiple sim_year, and sim_year_range values.

       4.) Setup your case

       Follow the standard steps for executing "scripts/create_newcase" and customize
       your case as appropriate.

       i.e.

       		./create_newcase -case my_userdataset_test -res pt1_pt1 -compset I1850 \
                -mach bluefire

       The above example implies that: MASK=gx1v6, SIM_YEAR=1850, and SIM_YEAR_RANGE=constant.
       5.) Edit the env_run.xml in the case to point to your new dataset area

       Edit DIN_LOC_ROOT_CSMDATA in env_run.xml to point to $MYCSMDATA

       		./xmlchange -file env_run.xml -id DIN_LOC_ROOT_CSMDATA -val $MYCSMDATA

       6.) Edit the env_conf.xml in the case to point to your user dataset identifier 
       name.

       Edit CLM_USRDAT_NAME  to point to $MYDATAID

       		./xmlchange -file env_conf.xml -id CLM_USRDAT_NAME -val $MYDATAID
       		./xmlchange -file env_conf.xml -id CLM_PT1_NAME    -val $MYDATAID

       7.) Configure the case as normal

       		./configure -case

       8.) Run your case as normal