NCAR's Experience Porting and Running CESM2 on a Medium-sized Linux Cluster
NCAR typically runs CESM fully coupled finite volume dynamical core on large super-computers using 4096 cores on yellowstone and 2160 cores on cheyenne. However, we also port, run and regularly tested CESM on a more moderately-sized Linux cluster.
NCAR's Climate and Global Dynamics (CGD) division maintains a medium-size Linux cluster called hobart to support research and development.
This page details our experiences on hobart that might help other institutions port and run CESM2 on their Linux clusters.
* NOTE * This is for information purposes only. Please use the DiscussCESM forums to post your questions regarding porting and running on your particular Linux cluster.
Linux Cluster Hardware Specifications
- Single login node with the following specifications:
Hostname : hobart Operating System : CentOS Linux release 7.2.1511 (Core) x86_64 Kernal : 3.10.0-327.el7.x86_64 Processor(s) : 16 X Intel(R) Xeon(R) CPU W5580 @ 3.20GHz CPU MHz : 3192.072 Total Memory : 74.05 GB Total Swap : 1.04 GB
- 32 compute nodes with the following specifications for each node:
Operating System : CentOS Linux release 7.2.1511 (Core) x86_64 Kernal : 3.10.0-327.el7.x86_64 Processor(s) : 48 X Intel(R) Xeon(R) CPU ES-2670 v3 @ 2.30GHz CPU MHz : 23000.000 Total Memory : 98.59 GB Total Swap : 1.04 GB
- Available shared disk space for run and build directories :
- 5.0 T
- inter-connect network fabric :
- QLogic InfiniBand, QDR with PSM
Linux Cluster Software Specifications
- CESM2 release code
- Python (2.7 or greater) but not Python 3
- Queueing system : PBS
- Fortran compiler 2003 :
- Test Case #1 used Intel 126.96.36.199
Test Case #2 used GNU 5.4.0
- MPI library
- Test Case #1 used mvapich v2.2.1 compiled with Intel and Qlogic libraries
Test Case #2 used openmpi v2.0.2 compiled with gcc
- NetCDF4 library
- Test Case #1 used NetCDF v4.3.2 compiled with Intel
and parallel-NetCDF v1.7.0 with Intel and mvapich v2.2.1 library Test Case #2 used NetCDF v188.8.131.52 compiled with gcc-5.4.0
Test Case #1 Description
- Fully-coupled 1850 simulation at 1 degree resolution using 32 nodes (1536 cores) and 16 nodes (768 cores)
create_newcase --case b.e20.B1850.f09_g17.01.intel --res f09_g17 --compset B1850 --compiler intel --mpilib mvapich2
Test Case #2 Description
- Fully-coupled 1850 simulation at 1 degree resolution using 16 nodes (768 cores)
create_newcase --case b.e20.B1850.f09_g17.01.gnu --res f09_g17 --compset B1850 --compiler gnu --mpilib openmpi
Fully-coupled Run Perfomance
The CESM2.0 timing table includes the timing and balanced model decomposition across MPI processors for the test cases listed above. Your performance may vary depending on your cluster configuration.
Porting and Testing
This NCAR medium-sized Linux cluster is included in routine regression testing as part of CESM model development. Please refer to the CIME User's Guide for details regarding porting and testing on a new machine.