CESM Research Tools: CLM4 in CESM1.0.4 User's Guide Documentation | ||
---|---|---|
Prev | Chapter 2. Using the CLM tools to create your own input datasets | Next |
mksurfdata is used to create surface-datasets from grid datasets and raw datafiles at half-degree resolution to produce files that describe the surface characteristics needed by CLM (fraction of grid cell covered by different land-unit types, and fraction for different vegetation types, as well as things like soil color, and soil texture, etc.). To run mksurfdata you can either use the mksurfdata.pl script which will create namelists for you using the build-namelist XML database, or you can run it by hand using a namelist that you provide (possibly modeled after an example provided in the models/lnd/clm/tools/mksurfdata directory). The namelist for mksurfdata is sufficiently complex that we recommend using the mksurfdata.pl tool to build them. In the next section we describe how to use the mksurfdata.pl script and the following section gives more details on running mksurfdata by hand and the various namelist input variables to it.
The script mksurfdata.pl can be used to run the mksurfdata program for several configurations, resolutions, simulation-years and simulation year ranges. It will create the needed namelists for you and move the files over to your inputdata directory location (and create a list of the files created, and for developers this file is also a script to import the files into the svn inputdata repository). It will also use the build-namelist XML database to determine the correct input files to use, and for transient cases it will create the appropriate mksrf_fdynuse file with the list of files for each year needed for this case. And in the case of urban single-point datasets (where surface datasets are actually input into mksurfdata) it will do the additional processing required so that the output dataset can be used once again by mksurfdata. Because, it figures out namelist and input files for you, it is recommended that you use this script for creation of standard surface datasets. If you need to create surface datasets for customized cases, you might need to run mksurfdata on it's own. But you could use mksurfdata.pl with the "-debug" option to give you a namelist to start from. For help on mksurfdata.pl you can use the "-help" option as below:
> cd models/lnd/clm/tools/mksurfdata > mksurdata.pl -help |
SYNOPSIS mksurfdata.pl [options] OPTIONS -crop Add in crop datasets -dinlc [or -l] Enter the directory location for inputdata (default /fs/cgd/csm/inputdata) -debug [or -d] Don't actually run -- just print out what would happen if ran. -dynpft "filename" Dynamic PFT/harvesting file to use (rather than create it on the fly) (must be consistent with first year) -exedir "directory" Directory where mksurfdata program is (by default assume it's in the current directory) -glc_nec "number" Number of glacier elevation classes to use (by default 0) -irrig If you want to include irrigated crop in the output file. -years [or -y] Simulation year(s) to run over (by default 1850,2000) (can also be a simulation year range: i.e. 1850-2000) -help [or -h] Display this help. -nomv Don't move the files to inputdata after completion. -res [or -r] "resolution" Resolution(s) to use for files (by default all ). -rcp [or -c] "rep-con-path" Representative concentration pathway(s) to use for future scenarios (by default -999.9, where -999.9 means historical ). -usrname "clm_usrdat_name" CLM user data name to find grid file with. NOTE: years, res, and rcp can be comma delimited lists. OPTIONS to override the mapping of the input gridded data with hardcoded input -pft_frc "list of fractions" Comma delimited list of percentages for veg types -pft_idx "list of veg index" Comma delimited veg index for each fraction -soil_cly "% of clay" % of soil that is clay -soil_col "soil color" Soil color (1 [light] to 20 [dark]) -soil_fmx "soil fmax" Soil maximum saturated fraction (0-1) -soil_snd "% of sand" % of soil that is sand |
To run the script with optimized mksurfdata for a 4x5 degree grid for 1850 conditions, on bluefire you would do the following:
In the above section we show how to run mksurfdata through the mksurfdata.pl using input datasets that are in the build-namelist XML database. When you are running with input datasets that are NOT available in the XML database you either need to add them as outlined in Chapter 3, or you need to run mksurfdata by hand, as we will outline here.
When running mksurfdata by hand you will need to prepare your own input namelist. There are sample namelists that are setup for running on the NCAR machine bluefire. You will need to change the filepaths to run on a different machine. The list of sample namelists include
mksurfdata.namelist -- standard sample namelist. |
mksurfdata.regional -- sample namelist to build for a regional grid dataset (5x5_amazon) |
mksurfdata.singlept -- sample namelist to build for a single point grid dataset (1x1_brazil) |
mksrf_fdynuse
is a filename that
includes the filepaths to other files. The filepaths in this file will have to
be changed as well. You also need to make sure that the line lengths remain the same
as the read is a formatted read, so the placement of the year in the file, must remain
the same, even with the new filenames. One advantage of the mksurfdata.pl
script is that it will create the mksrf_fdynuse
file for you.
We list the namelist items below. Most of the namelist items are filepaths to give to the input half degree resolution datasets that you will use to scale from to the resolution of your grid dataset. You must first specify the input grid dataset for the resolution to output for:
mksrf_fgrid
Grid dataset
mksrf_ffrac
land fraction and land mask dataset
mksrf_fglacier
Glacier dataset
mksrf_flai
Leaf Area Index dataset
mksrf_flanwat
Land water dataset
mksrf_forganic
Organic soil carbon dataset
mksrf_fmax
Max fractional saturated area dataset
mksrf_fsoicol
Soil color dataset
mksrf_fsoitex
Soil texture dataset
mksrf_ftopo
Topography dataset (this is used to limit
the extent of urban regions and is used for glacier multiple elevation classes)
mksrf_furban
Urban dataset
mksrf_fvegtyp
PFT vegetation type dataset
mksrf_fvocef
Volatile Organic Compound Emission Factor
dataset
mksrf_fdynuse
"dynamic land use" for transient
land-use/land-cover changes. This is an ASCII text file that lists the filepaths
to files for each year and then the year it represents (note: you MUST change the
filepaths inside the file when running on a machine NOT at NCAR).
We always use this file, even for creating datasets of a fixed year. Also note
that when using the "pft_" settings this file will be an XML-like file with settings
for PFT's rather than filepaths (see the Section called Experimental options to mksurfdata below).
all_urban
If entire area is urban (typically used for
single-point urban datasets, that you want to be exclusively urban)
mksrf_firrig
Irrigation dataset, if you want
activate the irrigation model over generic cropland
(experimental mode, normally NOT used)
mksrf_gridnm
Name of output grid resolution (if not
set the files will be named according to the number of longitudes by latitudes)
mksrf_gridtype
Type of grid (default is 'global')
nglcec
number of glacier multiple elevation classes.
Can be 0, 1, 3, 5, or 10. When using the resulting dataset with CLM you can then run
with glc_nec
of either 0 or this value.
(experimental normally use the default of 0, when running with the land-ice
model in practice only 10 has been used)
numpft
number of Plant Function Types (PFT)
in the input vegetation mksrf_fvegtyp
dataset. You change
this to 20, if you want to create a dataset with prognostic crop activated. The
vegetation dataset also needs to have prognostic crop types on it as well.
(experimental normally not changed from the default of 16)
outnc_large_files
If output should be in NetCDF large file
format
outnc_double
If output should be in double
precision (normally we turn this on)
pft_frc
array of fractions to override PFT
data with for all gridpoints (experimental mode, normally NOT used).
pft_idx
array of PFT indices to override PFT
data with for all gridpoints (experimental mode, normally NOT used).
soil_clay
percent clay soil to override
all gridpoints with (experimental mode, normally NOT used).
soil_color
Soil color to override
all gridpoints with (experimental mode, normally NOT used).
soil_fmax
Soil maximum fraction to override
all gridpoints with (experimental mode, normally NOT used).
soil_sand
percent sandy soil to
override all gridpoints with (experimental mode, normally NOT used).
After creating your namelist, when running on a non NCAR machine you will need to get the files from the inputdata repository. In order to retrieve the files needed for mksurfdata you can do the following on your namelist to get the files from the inputdata repository, using the check_input_data script which also allows you to export data to your local disk.
Example 2-7. Getting the raw datasets for mksurfdata to your local machine using the check_input_data script
> cd models/lnd/clm/tools/mksurfdata # First remove any quotes and copy into a filename that can be read by the # check_input_data script > sed "s/'//g" namelist > clm.input_data_list # Run the script with -export and give the location of your inputdata with $CSMDATA > ../../../../../scripts/ccsm_utils/Tools/check_input_data -datalistdir . \ -inputdata $CSMDATA -check -export # You must then do the same with the fdynuse file referred to in the namelist # in this case we add a file = to the beginning of each line > awk '{print "file = "$1}' pftdyn_hist_simyr2000-2000.txt > clm.input_data_list # Run the script with -export and give the location of your inputdata with $CSMDATA > ../../../../../scripts/ccsm_utils/Tools/check_input_data -datalistdir . \ -inputdata $CSMDATA -check -export |
The options: pft_frc, pft_idx, soil_clay, soil_color, soil_fmax, and soil_sand are also new and considered experimental. They provide a way to override the PFT and soil values for all grid points to the given values that you set. This is useful for running with single-point tower sites where the soil type and vegetation is known. Note that when you use pft_frc, all other landunits will be zeroed out, and the sum of your pft_frc array MUST equal 100.0. Also note that when using the "pft_" options the mksrf_fdynuse file instead of having filepath's will be an XML-like file with PFT settings. Unlike the file of file-paths, you will have to create this file by hand, mksurfdata.pl will NOT be able to create it for you (other than the first year which will be set to the values entered on the command line). Note, that when PTCLM is run, it CAN create these files for you from a simpler format (see the Section called Dynamic Land-Use Change Files for use by PTCLM in Chapter 6). Instead of a filepath you have a list of XML elements that give information on the PFT's and harvesting for example:
<pft_f>100</pft_f><pft_i>1</pft_i><harv>0,0,0,0,0</harv><graz>0</graz> |
In this section we give the recommendations for how to use mksurfdata to give similar results to the files that we created when using it.
If you look at the standard surface datasets that we have created and provided for use,
there are three practices that we have consistently done in each (you also see these in
the sample namelists and in the mksurfdata.pl script). The first is
that we always output data in double precision (hence outnc_double
is set to .true.). The next is that we always use the procedure
for creating transient datasets (using mksrf_fdynuse
) even when
creating datasets for a fixed simulation year. This is to ensure that the fixed year
datasets will be consistent with the transient datasets. When this is done a
"surfdata.pftdyn" dataset will be created -- but will NOT be used in CLM. If you look
at the sample namelist mksurfdata.namelist you note that it
sets mksrf_fdynuse
to the file
pftdyn_hist_simyr2000.txt, where the single file entered is
the same PFT file used in the rest of the namelist (as mksrf_fvegtyp
).
The last practice that we always do is to always set mksrf_ftopo
,
even if glacier elevation classes are NOT active. This is
important in limiting urban areas based on topographic height, and hence is important
to use all the time. The glacier multiple elevation classes will be used as well if
you are running a compset with the active glacier model.
There are two other important practices for creating urban single point datasets. The
first is that you often will want to set all_urban
to
.true. so that the dataset will have 100% of the gridcell output
as urban rather than some mix of: urban, vegetation types, and other landunits. The
next practice is that most of our specialized urban datasets have custom values for
the urban parameters, hence we do NOT want to use the global urban dataset to get
urban parameters -- we use a previous version of the surface dataset for the urban
parameters. However, in order to do this, we need to append onto the previous surface
dataset the grid and land mask/land fraction information from the grid and fraction
datasets. This is done in mksurfdata.pl using the NCO
program ncks. An example of doing this for the Mexico City, Mexico
urban surface dataset is as follows:
> ncks -A $CSMDATA/lnd/clm2/griddata/griddata_1x1pt_mexicocityMEX_c090715.nc \ $CSMDATA/lnd/clm2/surfdata/surfdata_1x1_mexicocityMEX_simyr2000_c100407.nc > ncks -A $CSMDATA/lnd/clm2/griddata/fracdata_1x1pt_mexicocityMEX_navy_c090715.nc \ $CSMDATA/lnd/clm2/surfdata/surfdata_1x1_mexicocityMEX_simyr2000_c100407.nc |
The final issue is how to build mksurfdata. When NOT optimized
mksurfdata is very slow, and can take many hours to days to
even run for medium resolutions such as one or two degree. So usually you will want
to run it optimized. Possibly you also want to use shared memory parallelism using
OpenMP with the SMP
option. The problem with running optimized is that
answers will be different when running optimized versus non-optimized for most
compilers. So if you want answers to be the same as a previous surface dataset, you
will need to run it on the same platform and optimization level. Likewise, running
with or without OpenMP may also change answers (for most compilers it will NOT, however
it does for the IBM compiler). However, answers should be the same regardless of the
number of threads used when OpenMP is enabled. Note, that the output surface datasets
will have attributes that describe whether the file was written out optimized or not,
with threading or not and the number of threads used, to enable the user to more
easily try to match datasets created previously. For more information on the different
compiler options for the CLM4 tools see the Section called Common environment variables and options used in building the FORTRAN
tools.