Adding data for use by PTCLM

PTCLM Group Site Lists

The "sitegroupname" option to PTCLM1.110726 looks for groups of sites in the files in the PTCLM_sitedata directory under the PTCLM directory. You can add new names available for this option including your own lists of sites, by adding more files in this directory. There are three files for each "sitegroupname": $SITEGROUP_sitedata.txt, $SITEGROUP_soildata.txt and $SITEGROUP_pftdata.txt (where $SITEGROUP is the name that would be entered as "sitegroupname" to PTCLM). Each file needs to have the same list of sites, but gives different information: site data, PFT data, and soil data respectively. Although the site codes need to be the same between the three files, the files do NOT have to be in the same order. Each file has a one-line header that lists the contents of each column which are separated by commas. The first column for each of the files is the "site_code" which must be consistent between the three files. The site code can be any unique character string, but in general we use the AmeriFlux site code.

Site data file: $SITEGROUP_sitedata.txt): The header for this file is:


site_code,name,state,lon,lat,elev,startyear,endyear,alignyear
The columns: name, state, and elevation are informational only. Name is a longer descriptive name of the site, and state is the state for U.S. sites or country for non U.S. sites. The columns: lon and lat are the longitude and latitude of the location in decimal degrees. The last three columns are the start and ending year for the data and the align year for an 1850 case for the data. The align year is currently unused.

Soil data file: $SITEGROUP_soildata.txt): The header for this file is:


site_code,soil_depth,n_layers,layer_depth,layer_sand%,layer_clay%
The first three fields after "site_code" are currently unused. The only two that are used are the percent sand and clay columns to set the soil texture.

PFT data file: $SITEGROUP_pftdata.txt): The header for this file is:


site_code,pft_f1,pft_c1,pft_f2,pft_c2,pft_f3,pft_c3,pft_f4,pft_c4,pft_f5,pft_c5
This file gives the vegetation coverage for the different vegetation types for the site. The file only supports up to five PFT's at the same time. The columns with "pft_f" are the fractions for each PFT, and the columns with "pft_c" is the integer index of the given PFT. Look at the pft-physiology file to see what the PFT index for each PFT type is.

Dynamic Land-Use Change Files for use by PTCLM

There is a mechanism for giving site-specific land-use change in PTCLM. Adding site specific files to the PTCLM_sitedata directory under PTCLM allows you to specify the change in vegetation and change in harvesting (for the CN model) for that site. Files are named: $SITE_dynpftdata.txt. There is a sample file for the US-Ha1 site called: US-Ha1_dynpftdata.txt. The file has a one-line header with the information that the file has, and then one-line for each year with a transition. The header line is as follows:


trans_year,pft_f1,pft_c1,pft_f2,pft_c2,pft_f3,pft_c3,pft_f4,pft_c4,pft_f5,pft_c5,har_vh1,har_vh2, \
har_sh1,har_sh2,har_sh3,graze,hold_harv,hold_graze
This file only requires a line for each year where a transition or harvest happens. As in the "pftdata" file above "pft_f" refers to the fraction and "pft_c" refers to the PFT index, and only up to five vegetation types are allowed to co-exist. The last eight columns have to do with harvesting and grazing. The last two columns are whether to hold harvesting and/or grazing constant until the next transition year and will just be either 1 or 0. This file will be converted by the PTCLM_sitedata/cnvrt_trnsyrs2_pftdyntxtfile.pl script in the PTCLM directory to a format that mksurfdata can read that has an entry for each year for the range of years valid for the compset in question.

Converting AmeriFlux Data for use by PTCLM

AmeriFlux data comes in comma separated format and is available from: http://public.ornl.gov/ameriflux/dataproducts.shtml. Before you download the data you need to agree to the usage terms.

Here is a copy of the usage terms from the web-site on June/13/2011.

"The AmeriFlux data provided on this site are freely available and were furnished by individual AmeriFlux scientists who encourage their use. Please kindly inform the appropriate AmeriFlux scientist(s) of how you are using the data and of any publication plans. Please acknowledge the data source as a citation or in the acknowledgments if the data are not yet published. If the AmeriFlux Principal Investigators (PIs) feel that they should be acknowledged or offered participation as authors, they will let you know and we assume that an agreement on such matters will be reached before publishing and/or use of the data for publication. If your work directly competes with the PI's analysis they may ask that they have the opportunity to submit a manuscript before you submit one that uses unpublished data. In addition, when publishing, please acknowledge the agency that supported the research. Lastly, we kindly request that those publishing papers using AmeriFlux data provide preprints to the PIs providing the data and to the data archive at the Carbon Dioxide Information Analysis Center (CDIAC)."

The above agreement applies to the "US-UMB" dataset imported into our repository as well, and Gil Bohrer is the PI on record for that dataset.

The CESM can NOT handle missing data, so we recommend using the "Level 4" Gap filled datasets. The fields will also need to be renamed. The "WS" column becomes "WIND", "PREC" becomes "PRECmms", "RH" stays as "RH", "TA" becomes "TBOT", "Rg" becomes "FSDS", "Rgl" becomes "FLDS", "PRESS" becomes "PSRF". "ZBOT" can just be set to the constant of "30" (m). The units of Temperature need to be converted from "Celsius" to "Kelvin" (use the value in SHR_CONST_TKFRZ in the file models/csm_share/shr/shr_const.F90 of 273.15. The units of Pressure also need to be converted from "kPa" to "Pa". LATIXY, and LONGXY should also be set to the latitude and longitude of the site. With data hourly or half-hourly the default settings will be fine. But, if the temporal frequency is less than that see Example 5-7 for an example of how to change it.

PTCLM transient example over a shorter time period

Example 6-5. Example of running PTCLM for transient land-use 1991-2006 for US-Ha1 on bluefire

This is an example of using PTCLM for Harvard Forest (AmeriFlux site code US-Ha1). In order to do this we would've needed to have converted the AmeriFlux data into NetCDF format as show in the the Section called Converting AmeriFlux Data for use by PTCLM section above. Also note that this site has a site-specific dynamic land-use change file for it PTCLM_sitedata/US-Ha1_dynpftdata.txt in the PTCLM directory and this file will be used for land-use change and harvesting rather than the global dataset.


> cd scripts/ccsm_utils/Tools/lnd/clm/PTCLM
# We are going to use forcing data over 1991 to 2006, but we need to start with
# a transient compset to do so, so we use the 20th Century transient: 1850-2000
# Note: When creating the fpftdyn dataset for this site it will use the 
#     PTCLM_sitedata/US-Ha1_dynpftdata.txt
# file for land-use change and harvesting
> ./PTCLM.py -m bluefire -s US-Ha1 -d $MYCSMDATA --sitegroupname AmeriFlux \
-c I_1850-2000_CN
> mkdir $MYCSMDATA/atm/datm7/CLM1PT_data/1x1pt_US-Ha1
> cd $MYCSMDATA/atm/datm7/CLM1PT_data/1x1pt_US-Ha1
# Copy data in NetCDF format to this directory, filenames should be YYYY-MM.nc
# The fieldnames on the file should be: 
#    FLDS,FSDS,LATIXY,   LONGXY,   PRECTmms,PSRF,RH,TBOT,WIND,ZBOT
# With units
#    W/m2,W/m2,degrees_N,degrees_E,mm/s,    Pa,  %, K,   m/s, m
# The time coordinate units should be: days since YYYY-MM-DD 00:00:00
> cd ../../../../../US-Ha1_I_1850-2000_CN
# We need to turn cold-start on, so it doesn't expect an initial condition file
# preferably, you would generate your own initial condition file and then use
# the --finidat option to PTCLM to enter it
> ./xmlchange -file env_conf.xml -id CLM_FORCE_COLDSTART -val on
# Now we need to set the start date to 1991, and have it cycle forcing data
# over 1991 to 2006
> ./xmlchange -file env_conf.xml -id RUN_STARTDATE -val 1991-01-01
> ./xmlchange -file env_conf.xml -id DATM_CLMNCEP_YR_ALIGN -val 1991
> ./xmlchange -file env_conf.xml -id DATM_CLMNCEP_YR_END -val 2006
> ./xmlchange -file env_conf.xml -id CLM_NAMELIST_OPTS -val \
# Similarly for Nitrogen deposition data we cycle over: 1991 to 2006
"model_year_align_ndep=1991,stream_year_first_ndep=1991,stream_year_last_ndep=2006"
# Now configure the case, and we'll edit the DATM namelist for prescribed aerosols
> ./configure -case
# We also need to change the DATM to run with aerosols over the 1991-2006 period
cat << EOF > patch.diff
*** datm.buildnml.csh.orig  2011-06-14 09:28:20.000000000 -0600
--- datm.buildnml.csh 2011-06-14 09:28:57.000000000 -0600
***************
*** 32,38 ****
     dataMode       = 'CLMNCEP'
     domainFile     = '$DOMAINFILE'
     streams        = 'clm1PT.1x1pt_US-Ha1.stream.txt 1991 1991 2006 ',
!                     'presaero.stream.txt 1849 1849 2006'
     vectors        = 'null','null'
     mapmask        = 'nomask','nomask'
     mapalgo        = 'nn','nn'
--- 32,38 ----
     dataMode       = 'CLMNCEP'
     domainFile     = '$DOMAINFILE'
     streams        = 'clm1PT.1x1pt_US-Ha1.stream.txt 1991 1991 2006 ',
!                     'presaero.stream.txt 1991 1991 2006'
     vectors        = 'null','null'
     mapmask        = 'nomask','nomask'
     mapalgo        = 'nn','nn'
EOF
# Apply the above patch to the DATM build namelist file
> patch Buildconf/datm.buildnml.csh patch.diff