Blind Evaluation of Lossy Data-Compression in LENS1>
We are currently conducting a blind experiment to evaluate the impact of data compression on the LENS community project data. We have contributed three additional runs to the LENS project and have compressed and reconstructed the CAM output of one or two of the new ensemble runs. The challenge is for climate scientists to identify which of the additional ensemble members (31 - 33) has had its atmospheric data compressed and reconstructed. In particular, we are interested in feedback from the climate community detailing which ensembles member(s) you believe have been compressed, and, more importantly, why. We are especially interested in details of the analysis that led to your conclusion.
For more information, see our preliminary work in [1] that proposes a number of quality metrics that can be used to determine whether it is acceptable to compress particular variables from the Community Earth System Model (CESM). Results in [1] indicate that it is possible to achieve a compression rate of 5 to 1 using the fpzip [2, 3] compression algorithm without introducing statistically distinguishable changes to output fields. We have also used fpzip for the LENS project data, and obtained the following average compression ratios for the monthly, daily and 6-hourly LENS data:
COMPARISON OF COMPRESSION METHODS ON LENS DATA
COMPRESSION RATIO | MONTHLY | DAILY | 6-HOURLY |
---|---|---|---|
netcdf-4 compressed/original | .51 | .70 | .63 |
fpzip compressed/original | .15 | .22 | .18 |
Please direct questions or feedback to Allison Baker (abaker at ucar.edu).
[1] A.H. Baker, H. Xu, J.M. Dennis, M.N. Levy, D. Nychka, S.A. Mickelson, J. Edwards, M. Vertenstein, A. Wegener, “A Methodology for Evaluating the Impact of Data Compression on Climate Simulation Data.” Proc. of the 23rd International ACM Symposium on High Performance Parallel and Distributed Computing (HPDC14), Vancouver, CA, 2014, pp. 203-214.
[2] Peter Lindstrom and Martin Isenburg, "Fast and Efficient Compression of Floating-Point Data" IEEE Transactions on Visualization and Computer Graphics, 12(5):1245-1250, September-October 2006.
[3] http://computation.llnl.gov/casc/fpzip/