The parallel IO (PIO) library is included with CCSM4 and is automatically built as part of the CCSM build. Several CCSM4 components use the PIO library to read and/or write data. The PIO library is a set of interfaces that support serial netcdf, parallel netcdf, or binary IO transparently. The implementation allows users to easily configure the pio setup on the fly to change the method (serial netcdf, parallel netcdf, or binary data) as well as various parameters associated with PIO to optimize IO performance.
CCSM4 prefers that data be written in CF compliant netcdf format to a single file that is independent of all parallel decomposition information. Historically, data was written by gathering global arrays on a root processor and then writing the data from the root processor to an external file using serial netcdf. The reverse process (read and scatter) was done for reading data. This method is relatively robust but is not memory scalable, performance scalable, or performance flexible.
PIO works as follows. The PIO library is initialized and information is provided about the method (serial netcdf, parallel netcdf, or binary data), and the number of desired IO processors and their layout. The IO parameters define the set of processors that are involved in the IO. This can be as few as one and as many as all processors. The data, data name and data decomposition are also provided to PIO. Data is written through the PIO interface in the model specific decomposition. Inside PIO, the data is rearranged into a "stride 1" decomposition on the IO processors and the data is then written serially using netcdf or in parallel using pnetcdf.
There are several benefits associated with using PIO. First, even with serial netcdf, the memory use can be significantly decreased because the global arrays are decomposed across the IO processors and written in chunks serially. This is critical as CCSM4 runs at higher resolutions where global arrays need to be minimized due to memory availability. Second, pnetcdf can be turned on transparently potentially improving the IO performance. Third, PIO parameters such as the number of IO tasks and their layout can be tuned to reduce memory and optimize performance on a machine by machine basis. Fourth, the standard global gather and write or read and global scatter can be recovered by setting the number of io tasks to 1 and using serial netcdf.
CCSM4 uses the serial netcdf implementation of PIO and pnetcdf is turned off in PIO by default. Several components provide namelist inputs that allow use of pnetcdf in PIO. To use pnetcdf, a pnetcdf library (like netcdf) must be available on the local machine and PIO pnetcdf support must be turned on when PIO is built. This is done as follows
Locate the local copy of pnetcdf. It must be version 1.1.1 or library
Set LIB_PNETCDF in the Macros file to the directory of the pnetcdf library (ie. /contrib/pnetcdf1.1.1/lib).
Add PNETCDF_PIO to the pio CONFIG_ARGS variable in the Macros file, and set it to the directory of the top level of a standard pnetcdf installation (ie /contrib/pnetcdf1.1.1).
Run the clean_build script if the model has already been built.
Run the build script to rebuilt pio and the full CCSM4 system.
Change component IO namelist settings to pnetcdf and set appropriate IO tasks and layout.
The PNETCDF_PIO variable tells pio to build with pnetcdf support turned on. The LIB_PNETCDF variable tells the CCSM Makefile to link in the pnetcdf library at the link step of the CCSM4 build.
There is an ongoing effort between CCSM, pio developers, pnetcdf developers and hardware vendors to understand and improve the IO performance in the various library layers. To learn more about pio, see http://code.google.com/p/parallelio.