The amount of data generated during CCSM simulations would quickly become unmanageable without establishing data management conventions. The goals of the CCSM data conventions are unifying the format of data files generated by the various CCSM components, cataloging the data generated by numerous CCSM production runs, documenting the contents of a data file, and enabling software tools that display data and perform operations on data files. Also see the CCSM Data Policy at www.cesm.ucar.edu/experiments.
CCSM selected netCDF as the standard data format for CCSM related datasets. All CCSM models either create netCDF or provide a filter to convert history files into netCDF. The use of netCDF makes CCSM output data readily accessible to a variety of existing graphics and analysis packages.
Here are the CCSM netCDF Conventions adopted for the output datasets of the CCSM. The convention is designed for the representation of gridded geophysical data. The CCSM netCDF Convention is very similar to the COARDS Conventions, with a few exceptions and additions to meet CCSM requirements.
The CCSM also has Case and File Name conventions to help keep track of the numerous simulations and their output data.
This document contains a description of the netCDF conventions adopted for the output datasets of the NCAR CCSM. The convention is designed for the representation of gridded geophysical data.
The netCDF interface enables but does not require the creation of self-describing datasets. The purpose of this convention is to require conforming datasets to contain sufficient metadata that they are self-describing in the sense that each variable in the file has an associated description of what it represents, including physical units if appropriate, and that each value can be located in space and time. Also required is a higher level description pertaining to the variables as a group that gives the source of the data and a history of manipulations that may have been performed on it.
An important potential benefit of a convention is that it can enable software tools that display data and perform operations on specified subsets of the data to do their tasks with minimal user intervention. It is possible using netCDF to provide the metadata describing how a field is located in time and space in many different ways that a human reader would immediately recognize as equivalent. The purpose in restricting how the metadata is represented is to make it practical to write software that allows a machine to be able to figure out what data goes where without need for human intervention. The main restrictions we impose are to require the use of coordinate variables and several variable attributes. No restrictions are placed on the order of a variable's dimensions.
The long_name attribute is required for all variables.
The use of coordinate variables is required for all dimensions that correspond to 1D space or time coordinates. Since coordinate variables as defined in section 2.3.1 of the NetCDF User's Guide are only adequate for describing rectilinear grids, we define a coordinates attribute below to deal with data on irregular grids.
The units attribute is required for all variables used to describe coordinates, and for all other variables for which it is appropriate. The values of the units attributes are character arrays that are recognized by UNIDATA's udunits package whenever possible. Exceptions are that the units "degree(s)" are not allowed. This convention describes some new values of the units attribute to be used in certain circumstances in which no udunits conformable value exists.
The global attributes title, source, and history are required to provide a minimal description of what the data is, where it came from, and what's been done to it.
The Conventions attribute is required to identify the location of this document.
In addition to the required features listed above, several conventions have been established to help in the description of gridded data that does not represent pointwise values of the field, but rather represents some characteristic of the field over intervals in space and/or time. A common example of this is time averaged data.
This convention has been significantly influenced by the COARDS convention. The differences that may result in files that are not COARDS compliant are the following:
The NCAR-CCSM convention is more general than COARDS. It is possible to write netCDF files that are compliant with both NCAR-CCSM and COARDS, but NCAR-CCSM conventions don't restrict files to being COARDS compliant.
The long_name attribute is required for all variables. The value is a character array that contains a long descriptive name.
The units attribute is required for all variables that describe coordinates, and for all other variables whose values represent a dimensional quantity. The values of the units attributes are character arrays that are recognized by udunits whenever possible. The units degree(s) are not allowed (see section 11.2.3 for appropriate units of longitude and latitude). udunits doesn't have specifications for dimensionless quantities like percent, ppb, and fraction. We recommend that the units attribute still be used, but have not standardized any dimensionless specifiers except for those to indicate vertical coordinates as defined below.
The term coordinate variable is defined in the netCDF User's Guide to be a 1D array that is used to describe a coordinate in a rectilinear grid. This convention uses the term coordinate variable in that sense. When non-rectilinear grids are used then coordinate information must be represented using higher dimension arrays which do not fit the coordinate variable model. To handle this case the coordinates attribute is introduced below. The term coordinate will be used generically to refer to either the 1D or multidimensional case.
The use of coordinate variables is required for all space or time coordinates that can be represented as 1D arrays.
The values of a coordinate variable must be either strictly increasing or strictly decreasing. The values need not be evenly spaced. Missing values are not permitted. A multidimensional coordinate may use the _FillValue or missing_value attributes to indicate grid points that are not used, e.g., in a reduced grid where the number of longitudes on a latitude circle decreases as the latitude gets closer to the poles.
The units attribute is required for all space and time coordinates. The reason for this requirement is that the units attributes can be used by
applications (possibly in conjunction with other metadata) to identify what type of coordinate axis corresponds to each of a variable's
dimensions. This correspondence does not need to be established by placing restrictions on the dimension ordering.
A. Time
We require the units for a time coordinate to be parsable by udunits as in the following modified excerpt from the udunits documentation:
days since 1992-10-8 15:15:42.5 -6:00
indicates days since October 8th, 1992 at 3 hours, 15 minutes and 42.5 seconds in the afternoon in the time zone which is six hours to the west of Coordinated Universal Time (i.e. Mountain Daylight Time). The time zone specification can also be written without a colon using one or two-digits (indicating hours) or three or four digits (indicating hours and minutes).
The acceptable units for time are listed in the file udunits.dat. The most commonly used of these strings (and their abbreviations) includes day (d), hour (hr, h), minute (min) and second (sec, s). Plural forms are also acceptable. The date string may include date alone; date and time; or date, time, and time zone. The date string is required.
We recommend that the unit year be used with caution. The udunits package defines a year to be exactly 365.242198781 days (the interval between 2 successive passages of the sun through vernal equinox). It is not a calendar year. Udunits includes the following definitions for years: a common_year is 365 days, a leap_year is 366 days, a Julian_year is 365.25 days, and a Gregorian_year is 365.2425 days.
The calendar calculations done by the udunits package use a mixed Gregorian/Julian calendar, i.e., dates prior to 1582-10-15 are assumed to use the Julian calendar. Time coordinates that use other calendars are thus not able to make use of the udunits library for this purpose. However, it is still required to use the time unit format described above as this contains all the information required to make calendar calculations once the calendar has been specified. We describe a calendar attribute for the time coordinate variable below that may be used for this purpose.
Coordinate variables representing climatological time (an axis of 12 months, 4 seasons, etc. that is located in no particular year) should be encoded like other time axes but with the added restriction that they be encoded to begin in the year 0000. For example,
days since 0000-06-15 00:00 0
indicates days since beginning of the model run starting Jun 15, 0Z.
Time coordinates used for paleoclimate research my involve calendars based on different orbital parameters from those of the present. In this
case additional information besides that contained in the calendar attribute may be required. A description of the orbital parameters may be
included by using the attribute orbital_parameters described below. A complete description of the calendar may be included using the global
attribute define_calendar.
B. Dimensional Vertical Coordinates
The acceptable units for dimensional vertical (depth or height) coordinate variables are:
Plural forms are also acceptable.
The direction of positive (i.e., the direction in which the coordinate is increasing), whether up or down, cannot in all cases be inferred from the units. The direction of positive is useful for applications displaying the data. For this reason the attribute positive as defined in the COARDS convention is required if the vertical axis units are not a valid unit of pressure. Otherwise its inclusion is optional. The positive attribute may have the value up or down (case insensitive).
For example, if an oceanographic netCDF file encodes the depth of the surface as 0 and the depth of 1000 meters as 1000 then the axis would use attributes as follows:
axis_name:units="meters"; axis_name:positive="down";
If, on the other hand, the depth of 1000 meters were represented as -1000 then the value of the positive attribute would have been up. If the units attribute value is a valid pressure unit the default value of the positive attribute is down.
Notice that a vertical coordinate variable will be identifiable either by having units of pressure or by the presence of the positive attribute with a
value of up or down (case insensitive).
C. Dimensionless Vertical Coordinates
For dimensionless vertical coordinates we introduce conventions designed to facilitate the calculation of the corresponding dimensional coordinates that are required to locate the data spatially. For each of the dimensionless vertical coordinates described below an example of the convention in CDL notation is given for a dimension with the name z.
float z(z) ; z:long_name = "hybrid level at layer midpoints" ; z:units = "hybrid_sigma_pressure" ; z:positive = "down" ; z:A_var = "hyam" ; z:B_var = "hybm" ; z:P0_var = "pref" ; z:PS_var = "psurf" ;
A units attribute of hybrid_sigma_pressure means that the pressure at gridpoint (x(i),y(j),z(k)) is given by p(i,j,k) = A(k)*PO + B(k)*PS(i,j) where the variable names for A, B, P0, and PS are given by the attributes A_var, B_var, P0_var, and PS_var respectively.
float z(z) ; z:long_name = "sigma level at layer midpoints" ; z:units = "sigma_level" ; z:positive = "down" ; z:B_var = "z" ; z:P0_var = "ptop" ; z:PS_var = "psurf" ;
A units attribute of sigma_level means that the pressure at gridpoint (x(i),y(j),z(k)) is given by p(i,j,k) = P0 + B(k)*(PS(i,j)-P0) where the variable names for B, P0, and PS are given by the attributes B_var, P0_var, and PS_var respectively.
We recommend that the rule for converting from a dimensionless to a dimensional coordinate be included as a global attribute whose name is same as that of the units specifier with the string define_ prepended. Here is an example for the CCM3 vertical coordinate:
// global attributes: :define_hybrid_sigma_pressure = "\n", "Pressure at a grid point (lon(i),lat(j),lev(k)) is computed \n", "using the formula: \n", " p(i,j,k) = A(k)*PO + B(k)*PS(i,j) \n", "where A, B, PO, and PS are contained in the variables whose \n", "names are given by the attributes of the vertical coordinate \n", "variable A_var, B_var, P0_var, and PS_var respectively. \n", "" ;
D. Latitude
The recommended unit of latitude is degrees_north. Also acceptable are degree_north, degree_N, and degrees_N.
E. Longitude
The recommended unit of longitude is degrees_east (eastward positive). Also acceptable are degree_east, degree_E, and degrees_E. The unit degrees_west (westward positive) is not recommended because it implies a negative conversion factor from degrees_east.
Longitudes may be represented modulo 360. Thus, for example, -180, 180, and 540 are all valid representations of the International Dateline and 0
and 360 are both valid representations of the Prime Meridian. Note, however, that the sequence of numerical longitude values stored in the
netCDF file must be monotonic in a non-modulo sense.
F. Non-Rectilinear Coordinates
When multidimensional variables are required to describe a coordinate, these variables are identified as coordinates by use of the coordinates attribute. The value of the attribute is a string containing the names of variables that describe the coordinate system. There must be at least as many coordinates as there are dimensions for the field variable. There may be more however as in the case of describing a field defined along a trajectory through space.
Examples in CDL notation:
1. Latitude and longitude coordinates both requiring 2D variables:
dimensions: nlon = 128 ; nlat = 64 ; lev = 18 ; variables: float lon(nlat,nlon) ; lon:long_name = "longitude" ; lon:units = "degrees_east" ; float lat(nlat,nlon) ; lat:long_name = "latitude" ; lat:units = "degrees_north" ; float lev(lev) ; lev:long_name = "level" ; lev:units = "mbar" ; float T(lev,nlat,nlon) ; T:long_name = "temperature" ; T:units = "K" ; T:coordinates = "lon lat lev" ;
The coordinates attribute of T tells you that grid point (i,j,k) is located at (lon(i,j),lat(i,j),z(k)) (in FORTRAN index conventions). The location of this position in physical space is determined by the interpretations of the coordinates themselves. Because of this there is no restriction on the order in which the coordinate names appear in the coordinate attribute string. Notice that the vertical coordinate is required to use a coordinate variable.
2. Longitude coordinate requiring 2D variable and missing values:
dimensions: nlon = 128 ; lat = 64 ; variables: float lon(lat,nlon) ; lon:long_name = "longitude" ; lon:units = "degrees_east" ; lon:_FillValue = -999.f ; float lat(lat) ; lat:long_name = "latitude" ; lat:units = "degrees_north" ; float PS(lat,nlon) ; PS:long_name = "surface pressure" ; PS:units = "Pa" ; PS:coordinates = "lon lat" ; PS:_FillValue = 1.e36f
The 2D longitude coordinate indicates that the longitude values depend on the latitude, and there are missing values. This is the situation when using a grid in which the number of longitude points on a latitude circle decreases going towards the poles.
3. Trajectories:
dimensions: time = 1000 ; variables: float lon(time) ; lon:long_name = "longitude" ; lon:units = "degrees_east" ; float lat(time) ; lat:long_name = "latitude" ; lat:units = "degrees_north" ; float z(time) ; z:long_name = "level" ; z:units = "km" ; z:positive = "up" ; double time(time) ; time:long_name = "time" ; time:units = "days since 1970-01-01 00:00:00" ; time:calendar = "gregorian" ; float O3(time) ; O3:long_name = "ozone concentration" ; O3:units = "ppbv" ; O3:coordinates = "lon lat z time" ;
The required global attributes are intended to provide information about where the data came from and what has been done to it. This information is mainly for the benefit of human readers. These attributes are all character arrays. For readability in ncdump outputs it is recommended to embed newline characters into the arrays to break them into lines.
In order to calculate a new date and time given a base date, base time and a time increment one must know what calendar to use. For this purpose we recommend that the attribute calendar be assigned to time coordinate (or be global) whenever its units are of the form time units since base date, base time. The values currently defined for calendar are:
In paleoclimate research the calendar may be described with reference to an orbit that is different from that of present day. This should be described by using an attribute of the time coordinate (or global) called orbital_parameters to give a list of the names of variables that contain the eccentricity, obliquity, and perihelion data. The method of attaching calendar dates to the orbit can be described using the global attribute define_calendar, for example:
// global attributes: :define_calendar = "\n", "First day and length of angular months for 126 kyr B.P. based \n", "on a 365 day year with vernal equinox fixed to March 21 (the \n", "Day/Month values refer to the present calendar): \n", " Day/Month Length \n", "January 25/12 34 \n", "February 28/01 31 \n", "March 28/02 32 \n", "April 01/04 30 \n", "May 01/05 29 \n", "June 30/05 27 \n", "July 26/06 28 \n", "August 24/07 28 \n", "September 21/08 28 \n", "October 18/09 32 \n", "November 20/10 32 \n", "December 21/11 34 \n", "" ;
It is often the case that data on a grid does not represent the point values of some field variable but instead represents some characteristic of the field (i.e., the result of some mathematical operation performed on the field values) over intervals of space and/or time. Typically this might be a weighted average or perhaps the minimum or maximum values over the intervals. Hence we require methods to describe both the characteristic of the field over intervals and what the intervals are that correspond to the grid points.
To represent the characteristic of the field over intervals we introduce attributes of the form coord_op, where the string coord should be replaced by the actual name of the coordinate, for example lat_op, or time_op. The values of these attributes are character arrays that describe the operation that has been performed on the corresponding coordinate. The currently defined values are:
"point" | value is at a point (default, does not need to be specified) |
"minimum" | minimum of values over an interval |
"maximum" | maximum of values over an interval |
"sum" | sum of values over an interval |
"average" | average of values over an interval |
"rms" | root mean square of values over an interval |
"range" | difference between maximum and minimum values over an interval |
coord_op attributes may be applied to individual variables or may be global which implies that the operation has been applied to all variables
which use the indicated coordinate. If coord_op is a global attribute, it may still be applied to a variable to override the value of the global
attribute.
To represent the intervals we add the attribute bounds to the appropriate coordinate. The value of bounds is the name of the variable that contains the interval boundaries. In the case where the grid is defined by coordinate variables, and where the intervals are contiguous this can be a variable whose dimension size is one larger than the dimension size of the corresponding coordinate variable. The relationship between a coordinate variable, say x, and the variable that contains the interval bounds, say x_bound, is that the value x(i) is contained in the interval with boundaries x_bound(i) and x_bound(i+1). In the more general case (still 1D) where the intervals may be disjoint or possibly overlapping then the variable containing the bounds is 2D with the 2nd dimension (in C notation) being the same size as the dimension of the corresponding coordinate variable. In this case the relationship is that the value x(i) is contained in the interval with boundaries x_bound(0,i) and x_bound(1,i).
Examples in CDL notation:
1. Time averaged variable, contiguous intervals:
dimensions: time_bound = 4 ; time = 3 ; variables: double time(time) ; time:long_name = "time" ; time:units = "days since 1970-01-01 00:00:00" ; time:calendar = "gregorian" ; time:bounds = "time_bound" ; double time_bound(time_bound) ; time_bound:long_name = "time interval boundaries" ; time_bound:units = "days since 1970-01-01 00:00:00" ; float gaTS(time) ; gaTS:long_name = "global average surface temperature" ; gaTS:units = "K" ; gaTS:time_op = "average" ; data: time = .25, .5, .75 ; time_bound = 0., .25, .5, .75 ;
The variable gaTS represents 6 hour time averages. The first time sample corresponds to a time average starting at 1970-01-01 0Z and ending at 1970-01-01 6Z. The values of the time coordinate are arbitrarily set to the end of the corresponding averaging interval.
2. Time averaged variable, disjoint intervals:
dimensions: d2 = 2 ; time = 3 ; variables: double time(time) ; time:long_name = "time" ; time:units = "days since 1970-01-01 00:00:00" ; time:calendar = "gregorian" ; time:bounds = "time_bound" ; double time_bound(d2,time) ; time_bound:long_name = "time interval boundaries" ; time_bound:units = "days since 1970-01-01 00:00:00" ; float gaTS(time) ; gaTS:long_name = "global average surface temperature" ; gaTS:units = "K" ; gaTS:time_op = "average" ; data: time = 31., 396., 761. ; time_bound(0,*) = 0., 365., 730. ; time_bound(1,*) = 31., 396., 761. ;
The variable gaTS represents monthly time averages for January 1970, 1971, and 1972.
It is useful to adopt a convention for certain types of dimensions that do not map into netCDF coordinate variables, but that aren't simply generic ordinal dimensions. For example, we save certain quantities associated with particular islands or ocean basins. Here the coordinate value is the character string containing the island or basin name which cannot be used as a netCDF coordinate. One can use character string variables that have the same name as the dimension with _label attached to associate these values with the corresponding dimensions. e.g,.
dimensions: time = UNLIMITED ; // (12 currently) z_t = 45 ; nchar = 132 ; islands = 8 ; basins = 8 ; variables: double time(time) ; time:long_name = "time" ; time:units = "days since 0000-00-00 00:00:00" ; float z_t(z_t) ; z_t:long_name = "Depth (T grid)" ; z_t:units = "centimeters" ; z_t:positive = "down" ; char islands_label(islands, nchar) ; islands_label:long_name = "Islands" ; char basins_label(basins, nchar) ; basins_label:long_name = "Ocean Basins" ; float T_horz(time, basins, z_t) ; T_horz:long_name = "Horizontal Average Potential Temperature" ; T_horz:units = "celsius" ; float pisle(time, islands) ; pisle:long_name = "Island Streamfunction" ; pisle:units = "centimeters^3/second" ;
For grids that are rectilinear in a cartographic projection space the units of the coordinate variables (typically meters or km) are not sufficient to allow one to determine which one corresponds to the abscissa and which to the ordinate. Nor do the units allow one to determine the order in which the coordinates should be used as arguments to a function that returns the corresponding longitude and latitude values. We suggest the attribute proj_coordinates be used to name the coordinate variables that represent the abscissa and ordinate respectively in the cartesian system (order is important here). proj_coordinates can be a variable attribute or a global attribute allowing variable override.
We recommend the USGS software package proj as the reference implementation for cartographic projections. This means that proj defines the parameter interface and transformation behavior for projections, although you are free to use any implementation that agrees with proj. To define the projection, we recommend therefore using the attribute proj_parameters whose value is a string that would be used to describe the projection to the proj software. The proj software is described in a user's manual that may be viewed by clicking on the highlighted proj string and then on the highlighted string PROJ.4.3.ps.gz.
Example in CDL notation:
Mercator projection, central meridian at 90W.
dimensions: x = 128 ; y = 64 ; z = 18 ; time = 100 ; variables: float x(x) ; x:long_name = "x-coordinate" ; x:units = "km" ; float y(y) ; y:long_name = "y-coordinate" ; y:units = "km" ; float z(z) ; z:long_name = "level" ; z:units = "km" ; z:positive = "up" ; double time(time) ; time:long_name = "time" ; time:units = "days since 1970-01-01 00:00:00" ; time:calendar = "noleap" ; float U(time,z,y,x) ; U:long_name = "zonal wind component" ; U:units = "m/s" ; U:proj_coordinates = "x y" ; // global attributes: :proj_parameters = "+proj=merc +lon_0=90W" ;
If the projection is not found in the proj software then use the attribute define_projection to supply a complete description or reference to where a description can be found.
We recommend making the direction of all flux quantities explicit by the use of the variable attribute flux_direction. The values should be directions such as up, down, north, south, east, or west.
NCAR Climate Systems Model
UNIDATA NetCDF Home Page
UNIDATA's NetCDF Conventions Page
UNIDATA udunits Software
U.S. Geological Survey proj Software
Brian Eaton, eaton@ncar.ucar.edu