Journal cover Journal topic
Atmospheric Chemistry and Physics An interactive open-access journal of the European Geosciences Union
© Author(s) 2018. This work is distributed under
the Creative Commons Attribution 4.0 License.
Research article
05 Jan 2018
Review status
This discussion paper is a preprint. A revision of this manuscript was accepted for the journal Atmospheric Chemistry and Physics (ACP) and is expected to appear here in due course.
Associativity Analysis of SO2 and NO2 for Alberta Monitoring Data Using KZ Filtering and Hierarchical Clustering
Joana Soares1, Paul Andrew Makar1, Yayne-abeba Aklilu2, and Ayodeji Akingunola1 1Air Quality Modelling and Integration Section, Air Quality Research Division, Environment and Climate Change, Toronto (ON), M3H 5T4, Canada
2Environmental Monitoring and Science Division, Alberta Environment and Parks, Edmonton (AL), Canada
Abstract. Abstract. Associativity analysis is a powerful tool to deal with large-scale datasets by clustering the data on the basis of (dis)similarity, and can be used to assess the efficacy and design of air-quality monitoring networks. We describe here our use of Kolmogorov-Zurbenko filtering and hierarchical clustering of NO2 and SO2 passive and continuous monitoring data, to analyse and optimize air quality networks for these species in the province of Alberta, Canada. The methodology applied in this study assesses dissimilarity between monitoring station time series based on two metrics: 1-R, R being the Pearson correlation coefficient, and the Euclidean distance. We have combined the analytic power of hierarchical clustering with the spatial information provided by deterministic air quality model results, using the gridded time series of model output as potential station locations, as a proxy for assessing monitoring network design and for network optimization. We find that both metrics should be used to evaluate the similarity between monitoring time series, since this allows a cross-comparison in terms of temporal variation and magnitude of concentrations to assess station potential redundancy. Here, the relative level of potential redundancy of an existing monitoring location was ranked according to each dissimilarity metric, with sites forming clusters at low values of both 1-R and Euclidean distance being the most redundant. We demonstrate clustering results depend on the air contaminant analyzed, reflecting the difference in the respective emission sources of SO2 and NO2 in the region under study. Our work shows that much of the signal identifying the sources of NO2 and SO2 emissions resides in shorter time scales (hourly to daily) due to short-term variation of concentrations. However, the methodology nevertheless identifies stations mainly influenced by seasonality, if larger time scales (weekly to monthly) are considered. We have found that data consisting of longer-term averages may lose the short-term variation needed to identify local sources, implying that long-term averaged observations are not suitable for source identification purposes. In addition to averaging time, round-off levels in data reports, and the accuracy of instrumentation were also shown to have a negative influence on the clustering results. We have performed the first dissimilarity analysis based on gridded air-quality model output, and have shown that the methodology is capable of generating maps of sub-regions within which a single station will represent the entire sub-region, to a given level of dissimilarity. Maps of this nature may be combined with other georeferenced data (e.g. road networks, power availability) to assist in monitoring network design. We have also shown that our methodology is capable of identifying different sampling methodologies, as well as identifying outliers (stations’ time series which are markedly different from all others in a given dataset).
Citation: Soares, J., Makar, P. A., Aklilu, Y.-A., and Akingunola, A.: Associativity Analysis of SO2 and NO2 for Alberta Monitoring Data Using KZ Filtering and Hierarchical Clustering, Atmos. Chem. Phys. Discuss.,, in review, 2018.
Joana Soares et al.
Joana Soares et al.


Total article views: 336 (including HTML, PDF, and XML)

HTML PDF XML Total Supplement BibTeX EndNote
259 62 15 336 17 8 12

Views and downloads (calculated since 05 Jan 2018)

Cumulative views and downloads (calculated since 05 Jan 2018)

Viewed (geographical distribution)

Total article views: 329 (including HTML, PDF, and XML)

Thereof 329 with geography defined and 0 with unknown origin.

Country # Views %
  • 1



Latest update: 25 Apr 2018
Publications Copernicus
Short summary
Grouping data on the basis of (dis)similarity can be used to assess the efficacy of monitoring networks. The data is cross-compared in terms of temporal variation and magnitude of concentrations, and sites are ranked according to their level of potential redundancy. The methodology can be applied to measurement data, helping to identify sites with different measuring technologies or data flaws, and to model output, generating maps of areas of spatial representativeness of a monitoring site.
Grouping data on the basis of (dis)similarity can be used to assess the efficacy of monitoring...