Journal cover Journal topic
Atmospheric Chemistry and Physics An interactive open-access journal of the European Geosciences Union
Journal topic

Journal metrics

Journal metrics

  • IF value: 5.668 IF 5.668
  • IF 5-year value: 6.201 IF 5-year
  • CiteScore value: 6.13 CiteScore
  • SNIP value: 1.633 SNIP 1.633
  • IPP value: 5.91 IPP 5.91
  • SJR value: 2.938 SJR 2.938
  • Scimago H <br class='hide-on-tablet hide-on-mobile'>index value: 174 Scimago H
    index 174
  • h5-index value: 87 h5-index 87
Discussion papers
© Author(s) 2019. This work is distributed under
the Creative Commons Attribution 4.0 License.
© Author(s) 2019. This work is distributed under
the Creative Commons Attribution 4.0 License.

Submitted as: research article 31 Jul 2019

Submitted as: research article | 31 Jul 2019

Review status
This discussion paper is a preprint. A revision of the manuscript is under review for the journal Atmospheric Chemistry and Physics (ACP).

How should we aggregate data? Methods accounting for the numerical distributions, with an assessment of aerosol optical depth

Andrew M. Sayer1,2 and Kirk D. Knobelspiesse2 Andrew M. Sayer and Kirk D. Knobelspiesse
  • 1GESTAR, Universities Space Research Association, Columbia, MD, USA
  • 2NASA Goddard Space Flight Center, Greenbelt, MD, USA

Abstract. Many applications of geophysical data – whether from surface observations, satellite retrievals, or model simulations – rely on aggregates produced at coarser spatial (e.g. degrees) and/or temporal (e.g. daily, monthly) resolution than the highest available from the technique. Almost all these aggregates report the arithmetic mean and standard deviation as summary statistics, which are what data users employ in their analyses. These statistics are most meaningful for Normally-distributed data; however, for some quantities, such as aerosol optical depth (AOD), it is well-known that distributions are on large scales closer to Lognormal, for which geometric mean and standard deviation would be more appropriate. This study presents a method to assess whether a given sample of data are more consistent with an underlying Normal or Lognormal distribution, using the Shapiro-Wilk test, and tests AOD frequency distributions on spatial scales of 1° and daily, monthly, and seasonal temporal scales. A broadly consistent picture is observed using Aerosol Robotic Network (AERONET), Multiangle Imaging Spectroradiometer (MISR), Moderate Resolution Imagining Spectroradiometer (MODIS), and Goddard Earth Observing System Version 5 Nature Run (G5NR) data. These data sets are complementary: AERONET has the highest AOD accuracy but is sparse; MISR and MODIS represent different satellite retrieval techniques and sampling; as a model simulation, G5NR is spatiotemporally complete. As time scales increase from days to months to seasons, data become increasingly more consistent with Lognormal than Normal distributions, and the differences between arithmetic and geometric mean AOD become larger, with geometric mean becoming systematically smaller. Assuming Normality systematically overstates both the typical level of AOD and its variability. There is considerable regional heterogeneity in the results: in low-AOD regions such as the open ocean and mountains, often the AOD difference is sufficiently small (< 0.01) as to be unimportant for many applications, especially on daily timescales. However, in continental outflow regions and near source regions over land, and on monthly or seasonal time scales, the difference is frequently larger than the Global Climate Observation System (GCOS) goal uncertainty on a climate data record (the larger of 0.03 or 10 %). This is important because it shows the sensitivity to averaging method can and often does introduce systematic effects larger than the total goal GCOS uncertainty. Using three well-studied AERONET sites, the magnitude of estimated AOD trends is shown to be sensitive to the choice of arithmetic vs. geometric means, although the signs are consistent. The main recommendations from the study are that (1) the distribution of a geophysical quantity should be analysed in order to asses how best to aggregate it; (2) ideally AOD aggregates such as satellite level 3 products (but also ground-based data and model simulations) should report geometric mean or median rather than (or in addition to) arithmetic mean AOD; and (3) as this is unlikely in the short term due to the computational burden involved, users can calculate geometric mean monthly aggregates from widely-available daily mean data as a stopgap, as daily aggregates are less sensitive to the choice of aggregation scheme than those for monthly or seasonal aggregates. Further, distribution shapes can have implications for the validity of statistical metrics often used for comparison and evaluation of data sets. The methodology is not restricted to AOD and can be applied to other quantities.

Andrew M. Sayer and Kirk D. Knobelspiesse
Interactive discussion
Status: final response (author comments only)
Status: final response (author comments only)
AC: Author comment | RC: Referee comment | SC: Short comment | EC: Editor comment
[Login for Authors/Co-Editors] [Subscribe to comment alert] Printer-friendly Version - Printer-friendly version Supplement - Supplement
Andrew M. Sayer and Kirk D. Knobelspiesse
Andrew M. Sayer and Kirk D. Knobelspiesse
Total article views: 483 (including HTML, PDF, and XML)
HTML PDF XML Total BibTeX EndNote
354 128 1 483 4 3
  • HTML: 354
  • PDF: 128
  • XML: 1
  • Total: 483
  • BibTeX: 4
  • EndNote: 3
Views and downloads (calculated since 31 Jul 2019)
Cumulative views and downloads (calculated since 31 Jul 2019)
Viewed (geographical distribution)  
Total article views: 374 (including HTML, PDF, and XML) Thereof 373 with geography defined and 1 with unknown origin.
Country # Views %
  • 1
No saved metrics found.
No discussed metrics found.
Latest update: 20 Oct 2019
Publications Copernicus
Short summary
Data about the Earth are routinely obtained from satellite observations, model simulations, and ground-based or other measurements. These are at different space and time scales and it is common to average them to reduce gaps and increase ease of use. The question of how the data should be averaged depends on the underlying distribution of the quantity. This study presents a method to determine how to appropriately aggregate data, and applies it to data sets about atmospheric aerosol levels.
Data about the Earth are routinely obtained from satellite observations, model simulations, and...