CCSM4/CESM1 Output Filename Requirements

21 February 2011

This document presents naming conventions for CESM output files, which fall into two broad categories: those files generated by the CCSM4/CESM1 component models at run-time ("model output data") and those created by post-processing the run-time files ("post-processed data"). This document describes the filename conventions of both types of files, for both CCSM4 and CESM.

At the present time, this document does not describe the output files generated by the data-assimilation version of the CESM ("dart" files).

1. CESM Model Output Data Filenames

The general filename formats for output files generated at run-time by the CESM component models are:

$output = $CASE.$scomp.$type.[$string.]$date[$ending]
$log = $CASE.$gcomp.$ltype.$logdate

where:

[] denotes an optional filename element

$output denotes any model history, restart, initial, or diagnostic output file

$log denotes any model log file

$CASE = (A-Z,a-z,0-9), "." (dot), and/or "_" (underscore).
$CASE is the case-name character string, which must be 80 or fewer characters long. See the CESM User's Guide for more information on the definition of a $CASE.

$scomp = (cam2,clm2,pop,cice,glc,cpl,datm,dice,dlnd,docn)
$scomp is the string that indicates the specific component-model name.

$gcomp = (atm,lnd,ocn,ice,glc,cpl)
$gcomp is the string that indicates the generic component-model name.

$type = (h*,r*,i*,d*)
$type is a one-, two-, or three-character string which denotes the output file type. In the future, more characters may be required. $type must begin with h (history), r (restart), i (initial), or d (diagnostic) and may be followed by up to two optional characters (a-z,0-9). Certain restart $type strings have special meanings: rs (cam surface restart), rh (restart history), and rd (restart diagnostic).
One known exception to this is that the cice model sometimes generates history files with a $type string of unlimited length and which may include the "_" (underscore) character or other characters.

$string
$string is an optional, typically short character string (A-Z,a-z,0-9) "." (dot), "_" (underscore), and/or "-" (dash) which is used to further identify the history file type.

$ltype = log
$ltype is the character string log which denotes the model output log type.

$date = (yyyy-mm-dd-sssss, yyyy-mm-dd, yyyy-mm, yyyy)
$date is the model date string which is based on a yyyy-mm-dd-sssss convention where:
- yyyy (0000,9999) is the year string (if necessary, yyyy can be increased to more than four digits, but four is the minimum and anything greater than four requires script and code modifications)
- mm (01,02,...,12) is the month string
- dd (01,02,...,31) is the day string
- sssss (00000,86399) is the seconds string.
$logdate = yymmdd-hhmmss
$logdate is the log-file date string, which is a real-world date string of the form indicated in the definition.

$ending = (.bin, .nc, .hdr)
$ending denotes the file format. The following optional endings are supported: .bin (binary), .nc (netCDF), and .hdr (ascii text). The absence of $ending implies binary format.

CCSM3 model developers were required to conform to the following points. Many of them are still valid for CESM:

Model output data streams should conform to these standards as much as possible. Unique diagnostic files or other output files could have some non-standard naming conventions if that makes them easier to identify.
Restart dates are for the current timestep to run, so would be, for example, 0002-01-01-00000 if written at the end of year 1, month 12.
All restart filenames have the format $CASE.$scomp.r*.yyyy-mm-dd-sssss, unless information is in the rpointer files about what auxiliary files are required.
All history filenames have the format $CASE.$scomp.h?[.$string].[$date][$ending]. The second character of the type and the optional string are left up to the developer. There are no requirements, although optional string names must be coordinated with the CESM archiving scripts.
All monthly average history files look like $CASE.$scomp.h?[.$string].yyyy-mm[$ending]. The date string, NOT THE TYPE, tells you whether the file is yearly, monthly, daily averaged or other.
History files and restart history files are connected via $CASE.$scomp.h* and $CASE.$scomp.rh* even though the dates might be quite different. The dates for restart history files must conform with the restart date. The date for history files is set by the averaging strategy or the last date data were written.
Care must be used when not using the optional history character. For instance, for a case where you might have $CASE.$scomp.h.yyyy.nc and $CASE.$scomp.h.yyyy-mm.nc for an annual average and monthly average outputs, the date makes the files unique. However, their restart equivalents will have identical names because the date string is the yyyy-mm-dd-sssss of the restart. There will be a conflict. It is up to the model developer, and generally not the user, to make this robust.
In short:
- All restart-file types start with r.
- All history-file types start with h.
- All initial-file types start with i.
- All restart-file dates are of the form yyyy-mm-dd-sssss.
- All restarts of history files use the history type prepended with an r.
- The history-file date is the only indication of the type of average contained in the file. For example, yyyy-mm indicates the history file contains monthly averaged fields.
- Some models have multiple restarts, some have multiple histories, etc. Model developers should decide whether to attach an optional character to the r, h or i file-type designator and what that character would be.

2. CESM Model Output File Locations

Depending on the stage of the model execution and the options selected in the CESM case, run-time output files will reside in different locations. See the CESM User's Guide for the definitions of $RUNDIR, $DOUT_S_ROOT, and $DOUT_L_MSROOT and how they are set.

During execution, files in the $RUNDIR directory are:

$RUNDIR/$output

$RUNDIR/$log

If the short-term archiving option is active, files will be moved to the $DOUT_S_ROOT directory following execution. In this case, the files are segregated by component and type (history, restart or log):

$DOUT_S_ROOT/$CASE/$gcomp/$subdir/$output (history and diagnostics files)
$DOUT_S_ROOT/$CASE/rest/$date/$output (all files needed for restart)
$DOUT_S_ROOT/$CASE/$gcomp/logs/$log (log files)

$DOUT_S_ROOT/$CASE/$gcomp/rest/$date/$output (additional restart)

If the long-term archiving option is active, files will be moved to the $DOUT_L_MSROOT directory after they have been staged to the short-term location. In this case, the files are segregated by component and type (history, restart, or log):

$DOUT_L_MSROOT/$CASE/$gcomp/$subdir/$output (history and diagnostics files)
$DOUT_L_MSROOT/$CASE/rest/$date/$output (all files needed for restart)
$DOUT_L_MSROOT/$CASE/$gcomp/log/$log (log files)

$DOUT_L_MSROOT/$CASE/$gcomp/rest/$date/$output (additional restart)

$DOUT_S_ROOT is the short-term archiving root directory
$DOUT_L_MSROOT is the long-term archiving root directory
$subdir = [init,hist] is a subdirectory for initial or history files
$rdate = yyyy-mm-dd-sssss is the restart date for all files in the subdirectory

Example of Long-Term Archive Filenames:

The * below represents an "optional" character.

/$DOUT_L_MSROOT/$CASE/atm/hist/$CASE.cam2.h*.yyyy-mm.nc ........................./hist/$CASE.cam2.h*.yyyy-mm-dd-sssss.nc ........................./init/$CASE.cam2.i.yyyy-mm-dd-sssss.nc ......................lnd/hist/$CASE.clm2.h*.yyyy-mm.nc ........................./hist/$CASE.clm2.h*.yyyy-mm-dd-sssss.nc ......................ocn/hist/$CASE.pop.h*.yyyy-mm.nc ........................./hist/$CASE.pop.hm.yyyy-mm-dd-sssss.nc (movie stream) ........................./hist/$CASE.pop.hs.yyyy-mm-dd-sssss.nc (snapshot stream) ......................ice/hist/$CASE.cice.h.yyyy-mm.nc ........................./init/$CASE.cice.i.yyyy-mm-dd-sssss.nc ......................cpl/hist/$CASE.cpl.h*.yyyy.nc ........................./hist/$CASE.cpl.h*.yyyy-mm.nc ........................./hist/$CASE.cpl.h*.yyyy-mm-dd.nc ........................./hist/$CASE.cpl.h*.yyyy-mm-dd-sssss.nc ......................rest/yyyy-mm-dd-sssss/$CASE.cam2.h0.YYYY-12.nc ........................../yyyy-mm-dd-sssss/$CASE.cam2.i.yyyy-01-01-00000.nc ........................../yyyy-mm-dd-sssss/$CASE.cam2.r.yyyy-01-01-00000.nc ........................../yyyy-mm-dd-sssss/$CASE.cam2.rs.yyyy-01-01-00000.nc ........................../yyyy-mm-dd-sssss/$CASE.cice.r.yyyy-01-01-00000.nc ........................../yyyy-mm-dd-sssss/$CASE.clm2.h0.YYYY-12.nc ........................../yyyy-mm-dd-sssss/$CASE.clm2.r.yyyy-01-01-00000 ........................../yyyy-mm-dd-sssss/$CASE.clm2.r.yyyy-01-01-00000.nc ........................../yyyy-mm-dd-sssss/$CASE.cpl.r.yyyy-01-01-00000.nc ........................../yyyy-mm-dd-sssss/$CASE.pop.r.yyyy-01-01-00000 ........................../yyyy-mm-dd-sssss/$CASE.pop.r.yyyy-01-01-00000.hdr ........................../yyyy-mm-dd-sssss/$CASE.pop.ro.yyyy-01-01-00000 ........................../yyyy-mm-dd-sssss/$CASE.rpointer.atm ........................../yyyy-mm-dd-sssss/$CASE.rpointer.drv ........................../yyyy-mm-dd-sssss/$CASE.rpointer.ice ........................../yyyy-mm-dd-sssss/$CASE.rpointer.lnd ........................../yyyy-mm-dd-sssss/$CASE.rpointer.ocn.ovf ........................../yyyy-mm-dd-sssss/$CASE.rpointer.ocn.restart ........................../yyyy-mm-dd-sssss/$CASE.rpointer.ocn.tavg where YYYY = one year prior to yyyy.

3. Post-Processed Data Filenames

In the preceding section, naming conventions were established for model output data files generated by the CESM model as it executes. In this section, the conventions are extended in order to define rules for data files that result from the post-processing of CESM model output data.

Post-processed data files may include temporal averages (eg, seasonal, annual, or decadal), spatial averages (eg, zonal, meridional, global), timeseries, or other diagnosed quantities (eg, meridional overturning streamfunction, barotropic streamfunction). The following rules are intended to provide a consistent structure for naming each of these types of files.

The development of these naming conventions was guided by the desire to:

maintain a close and logical connection to the original model output filenames
allow for the easy identification of the processed files' contents from the filename itself
separate at a high level the standard CESM processed data, which are available to the general CESM community, from the CESM model output data or the more specialized processed data, such as data intended for the IPCC inter-comparison project
allow for the creation of unique filenames

Users are free to use their own naming conventions in their own personal directories, but filenames of all data files that are written to NCAR HPSS /CCSM/csm/ directories must follow the conventions described in this document.

If there are particular post-processing circumstances that are not addressed in this document, it is important to discuss them with CSEG first, prior to creating non-standard filenames. Only after the issues have been resolved and the documentation updated should the files be created with new naming conventions.

3.1 Filename Requirements for Post-Processed CESM Data

Data files which result from the post-processing of /CCSM/csm model output may be stored in the /CCSM/csm directory on the NCAR mass-storage system only if the filenames conform to the following general format:

$DIRNAME/$FILENAME

where

$DIRNAME = /CCSM/csm/$CASE/$gcomp/$subdir/$tdir/$tperiod

$FILENAME = $CASE.$scomp.$type.$SSTRING.[$TPREFIX.]$TSTRING[.$ending]

quantities within square brackets [] are optional, and $CASE, $gcomp, $scomp, and $type are as defined in Section 1.

The following are definitions of the various components of $DIRNAME and $FILENAME; note that several examples follow below, to illustrate the use of these options.

$subdir = (proc,ipcc)
$subdir differentiates broad classifications of processed data files at a high level. proc is used for the standard suite of CESM post-processing; ipcc is used for the more specialized processing required for the IPCC intercomparisons (data have been re-gridded and variable names have been changed)

$tdir = (tavg,tseries)
$tdir is used to distinguish time-averages from time series; note that the tseries directory can include timeseries of time-averaged quantities; see examples below

$tperiod = [hourly${N},daily,monthly,seasonal,annual,${N}year]
$tperiod denotes the time period over which the data were processed. ${N} = 2,3,... Use of $tperiod = seasonal is deferred; see $TPREFIX below

$SSTRING
$SSTRING provides a flexible means to describe non-temporal aspects of the file contents; the rules governing $SSTRING are listed below

$TPREFIX
$TPREFIX is intended to serve the highly specialized function of identifying specific seasons in the case in which $tperiod = seasonal. This option is deferred, pending further discussion

$TSTRING
$TSTRING provides a means to describe temporal aspects of the file contents, either a specific time period or a range of time periods represented in the processed file; the rules governing $TSTRING are listed below

$ending = [nc,bin]
$ending is an optional filename suffix used to describe the file format
- $ending = .nc indicates the file is in netCDF format
- $ending = .bin or $ending absent indicates the file is in native binary format

Both $SSTRING and $TSTRING have additional rules that are intended to allow for the creation of a unique filename that helps to unambiguously identify the contents of the file: $SSTRING Format: substring1[_substring2[_substring3...]]

$SSTRING Rules:

The complete absence of $SSTRING has a particular meaning: there have been no spatial operations done on the original model history-file contents, and all of the original history-file variables are contained within the post-processed data file
$SSTRING may contain one or more "descriptors," each of which is denoted by substring${N}, ${N}=1,2,... in the format above. A descriptor identifies important aspects of the file contents, such as the names of fields that have been extracted from the original history files, or spatial operations that have been performed. Certain standard descriptor names have been established and are cataloged below
If there are multiple descriptors in $SSTRING, each is separated from the other by the underscore character ("_")
$SSTRING may contain field information. For netCDF files, use the short_name value(s), such as such as UU, VVEL, or UVEL_VVEL
The absence of a field name in $SSTRING indicates that all fields from the original history file are included in the processed file
Naming conventions for some spatial operators have been established:

zavg -- zonally averaged data
gavg -- globally averaged data
nh -- northern-hemisphere averaged data
sh -- southern-hemisphere averaged data

The format and rules for $TSTRING, which is intended to indicate the time or time periods of the original data files which were processed in order to create this file, are as follows:

$TSTRING Format: datestring1[_operator_datestring2]

$TSTRING Rules:

$TSTRING datestrings must follow the conventions for model output files, eg
- yyyy-mm-dd-sssss -- instantaneous
- yyyy-mm-dd -- daily average
- yyyy-mm -- monthly average
- yyyy -- annual average
$TSTRING datestrings may be separated by an operator which describes the temporal operations that were performed on the original data file(s) in order to produce this file. The following temporal operators are defined
- tavg -- time average between consecutive time slices
- cat -- concatenation of all consecutive time slices contained within a series of history files, which creates a timeseries
If present, the $TSTRING temporal operator is separated from datestring1 and datastring2 by the underscore character ("_")

IMPORTANT NOTE: Because of character-string limitations on the NCAR MSS, the above TSTRING conventions were unable to be followed for CCSM4/CESM CMIP5 cases. So from April 2010 until the retirement of the NCAR MSS in 2011, the following departures from conventions were in effect:

"yyyymm-yyyymm" was used instead of "yyyy-mm_cat_yyyy-mm"
if the case name included the string "track1", it was excluded from the file name (but not the path name).
if the case name was still too long, and it contained the string "1deg" or "2deg", this part of the string was changed to "1d" or "2d"

The upshot is that a postprocessed file that should have been named, for example:

/CCSM/csm/b40.20th.track1.1deg.007/lnd/proc/tseries/monthly/b40.20th.track1.1deg.007.clm2.h0.QSNWCPICE_NODYNLNDUSE.1850-01_cat_2005-12.nc

(138 characters long) was written instead as

/CCSM/csm/b40.20th.track1.1deg.007/lnd/proc/tseries/monthly/b40.20th.1deg.007.clm2.h0.QSNWCPICE_NODYNLNDUSE.185001-200512.nc

(125 characters) which is under the character limit. Note that the $CASE is never changed in the path, just in the actual filename.

3.1.1 Standard Filename String Names

Several descriptive names have been identified as "standard" CESM analysis string names and are therefore restricted to the meaning defined below. Presently, standard names have been established only for post-processed ocean files. Users of other CESM components are encouraged to submit additional candidates for standard string names, and should be free to develop suitable descriptive string names to suit their own applications, as long as they do not conflict the the reserved strings listed below.

In the event that more than one, but not all, of the original component output fields are extracted into a processed file, the user must decide how to describe the file contents. For a small number of fields, the user may decide that it is best to combine the individual field names, such as UVEL_VVEL_WVEL. For a large number of fields, the user may instead decide to define a meaningful string which describes the collection of fields. The user who decides to define such a string must register that string name with the CSEG prior to storing the files in the official proc or ipcc directories.

SST (ocn) -- sea surface temperature
SSS (ocn) -- sea surface salinity
BSF (ocn) -- barotropic streamfunction
k${N} (ocn) -- vertical k-index, where ${N} =1,2,... (eg, k1 means "k=1")
${N}m (ocn) -- vertical distance in meters from the surface; eg, 50m means "at 50m depth"
hyphen ("-") -- the hyphen character can be embedded in a single substring to indicate an inclusive range (eg, k1-k3 means "k=1 through k=3,inclusive")

3.1.2 Examples

Combinations of the various string names are intended to provide sufficient flexibility to describe the file contents. The following examples are used to illustrate standard practices:

Directory names ($DIRNAME)

/CCSM/csm/b40.1850.track1.1deg.006/atm/proc/tseries/annual /CCSM/csm/b40.1850.track1.1deg.006/cpl/proc/tseries/monthly /CCSM/csm/b40.1850.track1.1deg.006/ice/proc/tavg/annual /CCSM/csm/b40.1850.track1.1deg.006/ocn/proc/tavg/10year /CCSM/csm/b40.1850.track1.1deg.006/atm/ipcc/tavg/annual

One Field

/CCSM/csm/b40.1850.track1.1deg.006/atm/proc/tseries/b40.1850.track1.1deg.006.cam2.h0.UU.0400-01_cat_0499-12.nc

timeseries of monthly average atmospheric UU velocities, for the period January 400 through December 499, inclusive

Multiple Fields

/CCSM/csm/b40.1850.track1.1deg.006/ocn/proc/tavg/5year/b40.1850.track1.1deg.006.pop.h.UVEL_VVEL_WVEL.0300_tavg_0304.nc

5-year time average of pop U,V,and W velocities, averaged over years 300 through 304, inclusive

All Fields

/CCSM/csm/b40.1850.track1.1deg.006/atm/proc/tavg/10year/b40.1850.track1.1deg.006.cam2.h0.0400_tavg_0409.nc

10-year time average of all atmospheric model h0 history fields, averaged over years 400 through year 409, inclusive (note the absence of $SSTRING)

Zonal, Global, and Hemispheric Averages

/CCSM/csm/b40.1850.track1.1deg.006/atm/proc/tseries/monthly/b40.1850.track1.1deg.006.cam2.UU_zavg.h0.0400-01_cat_0499-12.nc
timeseries of the monthly zonally averaged atmospheric model UU velocity field, for the period January 0400 through December 0409, inclusive

/CCSM/csm/b40.1850.track1.1deg.006/ice/proc/tseries/monthly/b40.1850.track1.1deg.006.cice.h.vvel_nh.0400-01_cat_0499-12.nc
timeseries of the monthly northern-hemisphere averaged ice-model v velocity field for the period January 400 through December 499, inclusive.
Annual Averages

/CCSM/csm/b40.1850.track1.1deg.006/ocn/proc/tavg/annual/b40.1850.track1.1deg.006.pop.h.0880.nc
annual average of all ocean-model history fields, averaged over year 0880

/CCSM/csm/b40.1850.track1.1deg.006/ocn/proc/tavg/50year/b40.1850.track1.1deg.006.pop.h.TEMP.0200_tavg_0249.nc
50-year average of the pop temperature field, TEMP, averaged over years 200 through 249, inclusive

/CCSM/csm/b40.1850.track1.1deg.006/ocn/proc/tseries/annual/b40.1850.track1.1deg.006.pop.h.SALT.0200_cat_0249.nc
timeseries of the annual averages of the pop salinity field, SALT, from years 200 through 249, inclusive
Designating Time Periods in Filenames
- annual average -- although the conventions allow for more than one way to identify an annual average, the use of $TSTRING = YYYY is recommended; eg,
  
  /CCSM/csm/b40.1850.track1.1deg.006/ocn/proc/tavg/b40.1850.track1.1deg.006.pop.h.0500.nc
  is annual average of all pop history fields for year 0500
Miscellaneous
- cat and tavg usage -- note the following associations between _cat_ and tseries, and between _tavg_ and tavg /CCSM/csm/b40.1850.track1.1deg.006/atm/proc/tseries/b40.1850.track1.1deg.006.cam2.h0.UU.0400-01_cat_0499-12.nc /CCSM/csm/b40.1850.track1.1deg.006/atm/proc/tavg/10year/b40.1850.track1.1deg.006.cam2.h0.0400_tavg_0409.nc

3.1.3 Conventions for Ensemble Averages

Conventions for Ensemble-Average files have not been addressed by this document. The development of good filenaming conventions for these ensemble-average files will require thoughtful analysis of existing requirements.

CESM Models