You are on page 1of 7

Haze detection using persistent homology

Cite as: AIP Conference Proceedings 2111, 020012 (2019); https://doi.org/10.1063/1.5111219


Published Online: 27 June 2019

N. F. S. Zulkepli, M. S. M. Noorani, F. A. Razak, M. Ismail, and M. A. Alias

ARTICLES YOU MAY BE INTERESTED IN

Streamflow data analysis using persistent homology


AIP Conference Proceedings 2111, 020021 (2019); https://doi.org/10.1063/1.5111228

Quasi-centroids of four dimensional Leibniz algebras


AIP Conference Proceedings 2111, 020008 (2019); https://doi.org/10.1063/1.5111215

Preface: The 2018 UKM FST Postgraduate Colloquium


AIP Conference Proceedings 2111, 010001 (2019); https://doi.org/10.1063/1.5111207

AIP Conference Proceedings 2111, 020012 (2019); https://doi.org/10.1063/1.5111219 2111, 020012

© 2019 Author(s).
Haze Detection Using Persistent Homology
N. F. S. Zulkepli1, a), M. S. M. Noorani2, F. A. Razak3,
M. Ismail4 and M. A. Alias5
1, 2, 3, 4, 5
School of Mathematical Sciences, Faculty of Science and Technology, Universiti Kebangsaan Malaysia
43600 Bangi, Selangor, Malaysia
a)
Corresponding author: farihasyaqina@yahoo.com

Abstract. Haze is one of the environmental issues that greatly effects human health, economy and ecology. Particulate
matter with aerodynamic size below 10 micrometers PM10 is the major pollutant during haze period. The existing methods
are currently focusing on statistical analysis to provide quantitative analysis of PM10. Persistent homology is a tool in
topological data analysis (TDA) that provides qualitative information known as topological features of data by detecting
birth and death points that persist across multiple scales. One question arises in relating persistent homology and haze. Can
persistent homology detect haze? This study addresses this question by providing qualitative structures of PM10 and
detecting topological changes during haze episodes from 2000 until 2015 in Klang air quality monitoring station. This
paper shows that, there are changes in topological features during haze episodes.

INTRODUCTION

Haze and air pollution are common issues within the Klang Valley area in Malaysia. Klang Valley as an industrial
area and a mainstream economic region experienced deterioration of air quality due to industrialization and
urbanization activities [1, 2, 3]. Klang air quality monitoring station is frequently affected by traffic-related pollution
since it is located on major road in the industrial and high density residential areas [4]. During haze episodes, Klang
station is one of the locations that severely affected [5]. Particulate matter, PM10 has been discovered as a major
pollutant during haze episodes, caused by biomass burning in Sumatra and Kalimantan, both in neighboring Indonesia
[3,4].
Severe haze was recorded in year 2005, 2013, 2014 and 2015 [5]. In August 2005, the highest PM10 concentration
3
in Klang station was recorded at 590 μg/m [6]. Malaysian Ambient Air Quality Guidelines (MAAQG) stated that the
3
average PM10 concentrations for 24-hours that exceed 150 μg/m are considered unhealthy to human [7]. It has been
noted that Klang station shows a drastic increase of PM10 concentration during haze episodes in 2005. Although
previous work, [6, 8, 9] provides statistical analysis by comparing PM10 concentration with MAAQG, it does not
provide qualitative information (topological features) such as 0-dimensional features (connected component) and 1-
dimensional features (holes).
Topological features of air pollution data might provide new information and useful knowledge by analyzing the
“shapes” of the data. This work utilizes tools of topological data analysis (TDA), particularly known as persistent
homology to detect PM10 in Klang station (2000-2015) during the haze episodes by looking at the change of
topological features (or “shapes”). Persistent homology is known as a method that gives a precise definition of
topological features in computational topology by encoding the information in barcode and persistence diagram (PD).
This method has been applied to detect financial crisis [10], wheeze signal [11] and complex dynamical system [12].
However, there is no exploration of persistent homology on environmental issues such as haze in previous work. This

The 2018 UKM FST Postgraduate Colloquium


AIP Conf. Proc. 2111, 020012-1–020012-6; https://doi.org/10.1063/1.5111219
Published by AIP Publishing. 978-0-7354-1843-1/$30.00

020012-1
paper addresses this research gap by studying the topological features extracted from PM10 data using persistent
homology to detect haze period.

DATA

The daily average PM10 data from 1 January 2000 to 31 December 2015 was obtained from the Department of the
Environment, Malaysia (DOE). This work characterizes topological features of PM10 concentration in Klang air
quality monitoring station by months based on the daily data. A total of 192 observation data of PM10 spanning over
16 years from Klang station were involved in this study. Malaysian Ambient Air Quality Guidelines (MAAQG)
provide safe level of PM10 for human health at 150 μg/m 3 . Concentration of PM10 that exceeds MAAQG can give
adverse health effects to human. The missing data in this study has been treated with the method of mean substitution
[13]. The time series of daily data of PM10 concentration in Klang station from 1 January 2000 to 31 December 2015
is shown in Fig. 1.

700
600
500
concentration

400
300
200
100
0

MAAQG

FIGURE 1. Time series of PM10 concentration for Klang station from 1 January 2000 to 31 January 2015. The red line is the
Malaysia Ambient Air Quality Guidelines (MAAQG) at value 150 μg/m3

BACKGROUND

Phase Space Construction

Time delay embedding is a concept dealing with time series data and is required in this work. Since this research
involves analysis of time series data in topological sense, those topological information are not readily available in its
standard form. However, by using Takens’ embedding theorem [14] on the time series data with respect to time delay,
 and embedding dimension, m , a time series x0 ,x1 ,...,xn 1 can be reconstructed in phase space such that
xn ( m, )  (xn ,xn  ,...,xn   m 1 ). In this work, the time delay,  and embedding dimension, m was fixed as 1 and 3
respectively. Since we analyse the data by months, different values of  and m are needed to partition the daily time
series by months. Different settings of the values to compare months with haze and without haze would affect the
results. Previous study done by [15] had used   1 and m  3 and other study done by [16] chose m  3 for their
research.

Persistent Homology

Usually data studied by persistent homology [17] are finite metric spaces known as point cloud data. Basically, a
point cloud is data that is represented by sequence of points in Euclidean space, R n [18]. Computation of simplicial
complexes is compulsory for extracting topological features from point cloud data. The main interest in this study of

020012-2
persistent homology is formation of birth and death points with respect to scaling parameter,  (filtration value). This
can be done by constructing simplicial complexes (refer to Vietoris–Rips  VR  construction below) on point cloud
data. From these constructions, 0-dimensional (connected components), 1-dimensional (holes), 2-dimensional (voids)
features etc. are produced [17].
One of common simplicial complexes that are used for construction of simplicial complex is Vietoris–Rips  VR 
complexes. Two points in point cloud would be connected if the distance between the two points is less than or equal
to   1 and form VR 1 simplicial complex. As filtration value  1 increases to  2 , more points would be connected
with each other with VR 1 contained in VR 2 . This produces filtered simplicial complex, VR 1  VR 2 and
simplicial complexes would continuously form as filtration value  is increased [19]. The birth, b and death, d points
of topological features produced based on Vietoris–Rips complexes with respect to filtration values,  .
All features that are captured by persistent homology were displayed in barcode plot. Barcode is a graphical
representation to help visualize the birth and death point,  b,d  of a feature as the collection of horizontal line
corresponding to filtration value,  . Each feature is represented by a horizontal line with left end point of the line is
the birth point and the right end point is considered the death point [18]. Another representation of b, d  are known
as persistence diagram (PD) that summarize the feature as two-dimensional points set with multiplicities. A point,
 b,d  with multiplicity p , represents p features that appear for the first time at filtration value  1  b and disappear
at  2  d . All topological features appear before it disappear with b  d and lies above the diagonal line b  d . The
difference between the birth and death values, d  b give persistence of a feature. A feature is said to be more
persistent as the values  d  b  increase. The 𝑥-axis and 𝑦-axis in PD represent the birth, b and death, d of features
respectively.

RESULTS

The main objective of this study is to characterize the topological features during haze episode and such features
will be useful for haze detection. Based on the chronology of haze in Malaysia, four major haze episodes had occured
between 2000-2015 that had affected Klang Valley area. The haze episodes were in August 2005, June 2013, March
2014 as well as September and October 2015 [5]. The underlying idea of using topological features is to capture such
event and this can be done using persistent homology a tool in TDA.
The first steps involved in pre-processing the data by applying time delay embedding on time series daily average
PM10 for each month in Klang station. The time delay embedding,   1 and m  3 were used to transform the data
to higher dimensional data (m  3). This data is partitioned by months and will form monthly point clouds data. Each
point cloud represents phase space with   1 and m  3 for each month (January 2000-December 2015). Next,
constructions of simplicial complex, Vietoris–Rips  VR  were applied for each point cloud with respect to filtration
value,  . The maximum filtration value was fixed to a constant ( max  700 ) . Thus, the filtration value,  lies between
0 to 700 i.e. 0     max .
The computation of persistent homology was done by using R-package TDA [20]. All the topological information
was captured in barcode and persistence diagram (PD). December is one of months that experience northeast monsoon,
which brings heavy rainfall to the country and carry away the pollutants to the earth [6]. The topological features
between months with haze and without haze were observed and this paper analyses the topological difference for
connected components (0-dimensional) and holes (1-dimensional). The barcode and persistence diagram (PD) during
the selected months that experienced severe haze period (August 2005, June 2013, March 2014, September and
October 2015) [5] in Klang and months with no haze occurs (December 2005, December 2013, December 2014,
December 2015) [6] shown in Fig. 2.

020012-3
(a)

(b)

(c)

(d)
FIGURE 2. (a) Barcode for months of severe haze. (b) The corresponding persistence diagram (PD) for (a) respectively. (c)
Barcode for months with no haze. (d) The corresponding persistence diagram (PD) for (c) respectively. The black lines and black
dots represent 0-dimensional features. The red lines and red triangles represent 1-dimensional features.

020012-4
As observed in Fig. 2, the black lines and black dots represent topological features, connected components (0-
dimensional) and the red lines and red triangles represent the holes (1-dimensional). During haze episodes (Fig. 2(a)),
the connected components (black lines) are longer than Fig. 2(c) which indicates that there are topological difference
between month with haze and month without haze. The long black lines mean that the features are more persistence
as filtration value increases. The birth points (left end points) for connected components are always 0. In Fig. 2(b),
there are holes (red triangles) that are far from origin compared with holes in Fig. 2(d) that accumulate close to the
origin. There are clearly differences of topological features in barcode and PD between month with haze and month
without haze as illustrated in Fig. 2. During haze, the connected component has longer death points and holes that are
far from the origin compared with months without haze. Basic technique to identify haze is by observing the
concentration of PM10 . High concentration of PM10 indicate the deterioration of air quality in an area which resulting
to haze phenomenon. This work provides an alternative technique to detect haze in a qualitative sense. Since there are
differences of topological features between months with haze and without haze, we conclude that persistent homology
can detect haze.

CONCLUSION

This work provides an innovative approach to characterize PM10 by using persistent homology that leads to haze
detection. While statistical analysis of PM10 focus on quantifying the concentration of PM10 , this work provides a
new approach in observing the qualitative information (topological features) of the data. From the result, there are
changes in the behavior of topological features for months with severe haze episodes in Klang station. These results
illustrate that, the extraction of topological features using persistent homology can provide topological information
about air pollution pollutant, PM10 and such features can be used for haze detection. This work offers an alternative
method to complement existing statistical methods to detect haze. In the future, these topological features will be
developed for early detection of haze involving other pollutant such as carbon monoxide (CO) , nitrogen dioxide
(NO 2 ) and ozone (O 3 ).

ACKNOWLEDGMENTS

The authors would like to express their utmost gratitude to Universiti Kebangsaan Malaysia for the Research
University Grant (DIP-2017-011), Ministry of Education Malaysia Grant FRGS/1/2017/STG06/UKM/01/1 and to
Department of the Environment (DOE) for providing the air quality data. The authors also gratefully acknowledge
Mr. Muhammad Taufik Bin Mohd Yusof from School of Mathematical Sciences, UKM for being part of the persistent
homology research group.

REFERENCE
1. O. H. L. Ling, K. H. Ting, A. Shaharuddin, A. Kadaruddin, and M. J. Yaakob, Environment Asia 3, 123-128
(2010).
2. O. H. L. Ling, K. H. Ting, A. Shaharuddin, A. Kadaruddin, and M. J. Yaakob, Sains Malaysiana 41, 179-191
(2012).
3. S. Z. Azmi, M. T. Latif, A. S. Ismail, and, L. Juneng, Air Quality Atmosphere and Health 3, 53-64 (2010).
4. R. Afroz, M. N. Hassan, and N. A. Ibrahim, Environmental Research 92, 71 – 77 (2003).
5. Department of Environment Malaysia, Chronology of haze episodes in Malaysia, available at
http://www.doe.gov.my.
6. N. F. F. M. Yusof, N. A Ramli, A. S. Yahaya, N. Sansuddin N. A. Ghazali, and W. A. Madhoun, Environ. Monit.
Assess. 163, 655–667 (2010).
7. Department of Environment Malaysia, General information of air pollutant index, available at
http://www.doe.gov.my.
8. S. R. A. Rahman, S. N. S. Ismail, M. F. Ramli, M. T. Latif, E. Z. Abidin, and S. M. Praveena, World Environment
5, 1-11 (2015).
9. A. M. Abdullah, M. A. A. Samah, and Y. J. Tham, Malaysia. Open Environ. Sci. 6, 13–19 (2012).

020012-5
10. M. Gidea, and Y. Katz, Physica A: Statistical Mechanics and its Applications 491, 820-834 (2018).
11. S. Emrani, T. Gentimis, and H. Krim, IEEE Signal Process. Lett. 21, 459–463(2014).
12. K. Mittal, and S. Gupta, Chaos: An Interdisciplinary Journal of Nonlinear Science Chaos 27, 051102 (2017).
13. T. D. Pigott, Educational Research and Evaluation 7, 353–383 (2001).
14. F. Takens, In Dynamical systems and turbulence, Warwick 1980 Springer, 366–381 (1981).
15. Y. Umeda, Transactions of the Japanese Society for Artificial Intelligence, 32(3) : D-G72_1-12 (2017)
16. F. A. Khasawneh and E. Munch, Proceedings of the ASME 2014 International Mechanical Engineering
Congress Exposition, (2014).
17. G. Carlsson, Bulletin of the American Mathematical Society 46, 255–308 (2009).
18. R. Ghrist, Bulletin of the American Mathematical Society 45, 61–75 (2008).
19. C. M. M. Pereira, and R. F. de Mello, Expert Systems with Applications, 42, 6026-6038 (2015).
20. B.T. Fasy, J. Kim. F. Lecci, C. Maria, and V. Rouvreau, Statistical tools for topological data analysis, see
https://cran.r project.org/web/packages/TDA/TDA.pdf.

020012-6

You might also like