Professional Documents
Culture Documents
Abstract— Ozone analysis is the process of identifying would lead to inferior indications regarding the ozone. This is
meaningful patterns that would facilitate the prediction of future due to multiple reasons. First, results of clustering is
trends. One of the common techniques that have been used for significantly changing based on the clustering technique and the
ozone analysis is the clustering technique. Clustering is one of the distance measure used. There are several clustering techniques
popular methods which contributes significant knowledge for time such as partitioning (e.g. k-means, k-medoids, etc.) and
series data mining by aggregating similar data in specific groups. hierarchical (e.g. agglomerative and divisive). As well as, there
However, identifying significant patterns regarding the ground- are different distance measures that could be utilized with the
level ozone is quite challenging task especially after applying the clustering technique such as Euclidean, Minkowski and
clustering task. This paper aims to propose a pattern discovery
Dynamic Time Warping (DTW). Such plenty of choices leads
method for ground-level ozone. The proposed method consists of
an Agglomerative Hierarchical Clustering with Dynamic Time
to different results of clustering. On the other hand, evaluating
Warping (DTW) as a distance measure. AHC has been applied on the results of clustering is a challenging task in which different
a Malaysian Ozone change dataset collected from Putrajaya for evaluation methods have been proposed for such purpose. All
year 2006. Consequentially, the patterns have been extracted using these mentioned reasons makes the process of identifying
the Apriori Association Rules (AAR) algorithm. The experiment significant patterns from ozone clustering results a difficult task.
shows that the extracted patterns are related to CO which is an This paper aims to propose the Apriori Association Rules
interesting relation in accordance to the literature. algorithm in order to extract patterns from the clustering results.
Keywords— Hierarchical Clustering, Dynamic Time Warping,
This paper is considered to be an extension of our study
Ground-Level Ozone, Apriori Association Rules (Sammour and Othman, 2016).
Pattern: starting with high value. For cluster 5, the first interval has begun with 0.004 ppb and
ended up with 0.005 ppb. Second interval shown three peaks
For cluster 2, the first interval has begun with 0.004 ppb and stated as; 0.032 ppb at 12 pm, 0.036 at 2 pm and 0.035 ppb at
ended up with 0.005 ppb. Second interval has begun with 0.008 4pm, and ended with 0.029. Whereas, the third interval shown
ppb and then sharply increased into the maximum peak of 0.055 an unstable decreased until reaching 0.005 ppb. This cluster has
ppb at 2pm and then decreased into 0.046 ppb at 5pm. The third a standard pattern.
interval has decreased reaching 0.009 ppb. This cluster has a For cluster 6, the first interval has begun with 0.015 ppb then
standard pattern. shown a stable decrease until 8 am reaching 0.008 ppb. Second
For cluster 3, the first interval has begun with 0.014 ppb and interval has shown two peaks of 0.034 at 12 pm and 0.039 at 3
ended up with 0.005 ppb. Second interval has shown an increase pm. The third interval shown a gradual decrease of the values
of values reaching the peak of 0.113 ppb at 2 pm and ended up reaching 0.005 ppb. The pattern embedded in such cluster as a
with 0.089 ppb. In the third interval, the values have been high value of the starting point.
decreased into 0.014 ppb. A remarkable pattern could be noticed For cluster 7, the first interval has begun with 0.019 ppb and
from this cluster, this pattern represented by the sharp increase ended up with 0.011 ppb. Second interval has shown two peaks
followed by sharp decrease of values. of 0.078 ppb at 4 pm and 0.074 ppb at 2 pm. Third interval has
For cluster 4, the first interval has begun with 0.015 ppb and shown gradual decrease reaching 0.019 ppb. A remarkable
ended up with 0.006 ppb. Second interval shown multiple peak pattern could be noticed from this cluster, this pattern
in which the values at 4 pm have reached 0.053 and 0.049 ppb at represented by the sharp decrease of the values with multiple
5pm. The third interval has shown a decrease of values 007 ppb. peaks.
For cluster 8, the first interval has begun with 0.033 ppb and
ended up with 0.016 ppb. Second interval has shown a maximum
peak of 0.059 ppb at 3 pm and then ended up with 0.051 ppb.
Third interval has shown a sharp decrease of values reaching
0.017 ppb. The pattern embedded in such cluster as a high value
of the starting point.
For cluster 9, the first interval has begun with 0.025 ppb and ozone rates. In addition, the following two rules (i.e. 6 and 7) are
ended up with 0.007 ppb. Second interval has shown a sharp associated with NOx and NO2 where the decreasing of NO2
increase of values reaching a maximum of 0.062 ppb at 2 pm and with an NOx = 0.003 would lead to increment in the ozone rates.
ended with 0.049 ppb. Third interval has shown a gradual The fifth following rules (i.e. 8-12) are associated with NOx and
decrease of the values reaching 0.014 ppb. The pattern CO where the decreasing of CO with an NOx = 0.003 (especially
embedded in such cluster as a high value of the starting point. for rule 10 and 11) would lead to increment in the ozone rates.
C. Comparison between K=5 and K=9 TABLE III. VALUES USING CLUSTER K = 5
This section aims to accommodate a comparison between the Days K=5 Morning Afternoon Evening Standard
two number of cluster 9 and 5 which have been analyzed in the Category
previous sections. The comparison will be based on multiple Class Start end Max Men end
Max
variables including starting values of ozone, maximum peak, 60 4 0.005 0.006 0.09 0.037 0.005 Good
maximum peak of median and ending values. Table 3 shows the 8 5 0.033 0.016 0.093 0.059 0.017 Moderate
values of 5 number of clusters. Unhealthy
104 3 0.017 0.007 0.105 0.058 0.012 for Sensitive
As shown in Table 3, the number of days included in the Groups
‘unhealthy’ category is nearly representing the half of the year 173 1 0.004 0.005 0.115 0.061 0.008 Unhealthy
Very
which seems to be overestimated categorization. This means that 19 2 0.014 0.005 0.148 0.113 0.014
Unhealthy
this category should be divided into more categories. Whereas,
the ‘moderate’ category contains only eight days which also TABLE IV. VALUES USING CLUSTER K = 9
seems to be underestimated categorization. Generally, this
Days K=9 Morning Afternoon Evenin Proposed
category is supposed to contain more days. However, Table 4 g Category
shows the values of 9 number of cluster. Class Start end Max Men end
Max
As shown in Table 4, unlike the standard 5 categorization, the 38 5 0.004 0.005 0.06 0.036 0.005 Very Good
9-categorizaiton has the ability to provide better description of 22 6 0.015 0.008 0.09 0.039 0.005 Good
the year’s days. This can be represented by giving more 63 4 0.015 0.006 0.093 0.053 0.007
High
Moderate
categories. For instance, the ‘unhealthy’ category has been 121 2 0.004 0.005 0.096 0.055 0.009 Moderate
splitted into two categories as ‘unhealthy’ and ‘very unhealthy 8 8 0.033 0.016 0.093 0.059 0.017
low
for sensitive group’. These categories have shown reasonable Moderate
Unhealthy
contained number of days. In addition, the category ‘moderate’ for
20 9 0.025 0.007 0.077 0.062 0.014
has been splitted into three categories as ‘high moderate’, Sensitive
‘moderate’ and ‘low moderate’. Similarly, these categories have Groups
Very
contained reasonable number of days. Finally, the category Unhealthy
‘good’ has been also divided into two categories as ‘very good’ 52 1 0.008 0.005 0.115 0.076 0.007 for
and ‘good’. Sensitive
Groups
D. Extracting Patterns Using AAR 21 7 0.019 0.011 0.105 0.078 0.019 Unhealthy
Very
19 3 0.014 0.005 0.148 0.113 0.014
Basically, determining factors that affect the ozone is a Unhealthy
difficult task. Akimoto et al. (Akimoto, et al., 2015) have
conducted a study to analyze the causes of ground-level ozone On the other hand, the remaining 8 rules (i.e. 13-20) are
in Japan using 20 years data. As a conclusion in their study, they associated with a single factor which is CO. In fact, these rules
have surprisingly found out that even though with the decrease are related to the peak or highest rates of ozone. Although there
of NOx and NMHC (i.e. considered as the main causes of is no a direct relation between the CO values and the ozone rates
increasing the ozone), there is still an ongoing increment of however, as a general view, CO is related to the transportation.
ground-level ozone rates. Based on their judgment, they have This can evidence the finding of Akimoto et al. (2015) study
referred the reason to the transportation. which implies that the transportation is one of the main reasons
behind the growth of ground-level ozone rates.
Hence, it is a challenging task to identify the factors that
would affect the ground-level ozone. However, this study TABLE V. SIGNIFICANT PATTERNS EXTRACTED USING AAR
attempts to present an analysis for specific cases of extreme
# Factor 1 Factor 2 => Ozone Confidence
growth of ozone rates. Therefore, association rules approach 1 NOx=0.003 Temp=33.7 => 0.089 1.00
has been used in order to clarify the factors that would increase 2 NOx=0.003 Temp=32.1 => 0.088 1.00
3 NOx=0.003 Temp=30.4 => 0.070 1.00
rates of ozone in Putrajaya, Malaysia. Table 5 depicts the results 4 NOx=0.003 Temp=29.9 => 0.061 1.00
of applying association rules by showing the most significant 5 NOx=0.003 Temp=29.2 => 0.059 1.00
patterns with highest confidences. 6 NOx=0.003 NO2=0.013 => 0.059 1.00
7 NOx=0.003 NO2=0.002 => 0.070 1.00
8 NOx=0.003 CO=1.7 => 0.070 1.00
As we can see in Table 5, the first 12 rules are associated with 9 NOx=0.003 CO=1.61 => 0.061 1.00
two factors whereas the other rules are associated with a single 10 NOx=0.003 CO=0.31 => 0.088 1.00
factor. In particular, the first fifth rules (i.e. 1-5) are associated 11 NOx=0.003 CO=0.27 => 0.086 1.00
with NOx and the Temperature where the increasing of 12 NOx=0.003 CO=0.16 => 0.078 1.00
13 CO=0.75 - => 0.148 1.00
temperature with an NOx = 0.003 leads to increasing in the
14 CO=0.91 - => 0.147 1.00 [10] C.f.D. Control and Prevention (1997) NIOSH pocket guide to chemical
15 CO=0.7 - => 0.143 1.00 hazards. Access 1997).
16 CO=0.44 - => 0.140 1.00
[11] P. Hájek and V. Olej (2012) 'Ozone prediction on the basis of neural
17 CO=0.63 - => 0.140 1.00
networks, support vector regression and methods with uncertainty',
18 CO=0.60 - => 0.139 1.00
19 CO=0.59 - 0.139 1.00
Ecological Informatics, Vol. 12, pp. 31-42 (Access 2012
=>
20 CO=0.78 - => 0.131 1.00 [12] M. Kandil, et al. (2014) 'Prediction of Maximum Ground Ozone Levels
using Neural Network', International Journal of Computing and Digital
V. CONCLUSION Systems, Vol. 3 No. 2, pp. 133-140 (Access 2014
[13] LESTARI 'The Institute for Environment and Development in Malaysia
This study has proposed a pattern extraction method for the and the Asia Pacific' [online] http://www.ukm.my/lestari.
ground-level ozone using the Apriori Association Rules method. [14] P. Li, et al. (2013) 'Observational studies and a statistical early warning
The data used in the experiment has been collected from of surface ozone pollution in Tangshan, the largest heavy industry city of
LESTARI (2016) which is the Institution for Environment and North China', International journal of environmental research and public
Development in Malaysia and the Asia Pacific. In the beginning, health, Vol. 10 No. 3, pp. 1048-1061 (Access 2013
AHC with DTW have been applied upon the dataset in order to [15] T.W. Liao (2005) 'Clustering of time series data—a survey', Pattern
clustering the days based on the ozone levels. Consequentially, recognition, Vol. 38 No. 11, pp. 1857-1874 (Access 2005
the proposed AAR has been applied to extract significant [16] Y. Liu, et al. (2010) 'Understanding of internal clustering validation
measures', Data Mining (ICDM), 2010 IEEE 10th International
patterns. The experiment shows that the extracted patterns are Conference on, IEEE, pp. 911-916.
related to CO which is an interesting relation in accordance to [17] A. Monteiro, et al. (2012) 'Trends in ozone concentrations in the Iberian
the literature. In fact, this study has utilized a ground-level ozone Peninsula by quantile regression and clustering', Atmospheric
data with a single year. Hence, using a dataset with multiple environment, Vol. 56, pp. 184-193 (Access 2012
years in future researches has the ability to identify frequent [18] S. Rani and G. Sikka (2012) 'Recent techniques of clustering of time series
patterns which may facilitate the determination of important data: A Survey', Int. J. Comput. Appl, Vol. 52 No. 15, pp. 1-9 (Access
factors of ozone. 2012
[19] S.J.T.B.K. Ray and K.B. Ensor (2009) 'A Model-based Approach for
ACKNOWLEDGEMENT Clustering Air Quality Monitoring Networks in Houston, Texas', (Access
2009
This study is supported by the University Kebangsaan [20] K. Saithanu and J. Mekparyup (2012) 'Clustering of Air Quality and
Malaysia (UKM) and funded by research grant Meteorological Variables Associated with High Ground Ozone
(FRGS/1/2012/SG05/UKM/02/6). Concentration in the Industrial Areas, at the East of Thailand',
International Journal of Pure and Applied Mathematics, Vol. 81 No. 3, pp.
REFERENCES 505-515 (Access 2012
[1] J. Adame, et al. (2012) 'Application of cluster analysis to surface ozone, [21] M. Sammour and Z. Othman (2016) 'An Agglomerative Hierarchical
NO 2 and SO 2 daily patterns in an industrial area in Central-Southern Clustering with Various Distance Measurements for Ground Level Ozone
Spain measured with a DOAS system', Science of the Total Environment, Clustering in Putrajaya, Malaysia', International Journal on Advanced
Vol. 429, pp. 281-291 (Access 2012 Science, Engineering and Information Technology, Vol. 6 No. 6, pp.
1127-1133 (Access 2016
[2] M. Aghagolzadeh, et al. (2007) 'A Hierarchical Clustering Based on
Mutual Information Maximization', Image Processing, 2007. ICIP 2007. [22] J. Smith, et al. (2013) 'Characterization of Gulf of Mexico background
IEEE International Conference on, pp. I - 277-I - 280. ozone concentrations', 12th Annual CMAS Conference, Chapel Hill, NC.
[3] S.S. Ahmad and N. Aziz (2013) 'Spatial and temporal analysis of ground [23] W. Sun, et al. (2015) 'Prediction of surface ozone episodes using clusters
level ozone and nitrogen dioxide concentration across the twin cities of based generalized linear mixed effects models in Houston–Galveston–
Pakistan', Environmental monitoring and assessment, Vol. 185 No. 4, pp. Brazoria area, Texas', Atmospheric Pollution Research, Vol. 6 No. 2, pp.
3133-3147 (Access 2013 245-253 (Access 2015
[4] AirNow 'Ozone and Your Health' [online] [24] W. Tamas, et al. (2016) 'Hybridization of Air Quality Forecasting Models
https://cfpub.epa.gov/airnow/index.cfm?action=ozone_health.index#req Using Machine Learning and Clustering: An Original Approach to Detect
uest.PDFPath#ozone-c.pdf (Accessed 2016). Pollutant Peaks', Aerosol and Air Quality Research, Vol. 16 No. 2, pp.
405-416 (Access 2016
[5] H. Akimoto, et al. (2015) 'Analysis of monitoring data of ground-level
ozone in Japan for long-term trend during 1990–2010: causes of temporal [25] Y.Y. Toh, et al. (2013) 'The influence of meteorological factors and
and spatial variation', Atmospheric environment, Vol. 102, pp. 302-310 biomass burning on surface ozone concentrations at Tanah Rata,
(Access 2015 Malaysia', Atmospheric environment, Vol. 70, pp. 435-446 (Access 2013
[6] M.L. Bell, et al. (2007) 'Climate change, ambient ozone, and health in 50 [26] R. Vingarzan (2004) 'A review of surface ozone background levels and
US cities', Climatic Change, Vol. 82 No. 1-2, pp. 61-76 (Access 2007 trends', Atmospheric environment, Vol. 38 No. 21, pp. 3431-3442
(Access 2004
[7] M.L. Bell, et al. (2004) 'Ozone and short-term mortality in 95 US urban
communities, 1987-2000', Jama, Vol. 292 No. 19, pp. 2372-2378 (Access [27] S. Wang, et al. (2013) 'New Spatiotemporal Clustering Algorithms and
2004 their Applications to Ozone Pollution', 2013 IEEE 13th International
Conference on Data Mining Workshops, IEEE, pp. 1061-1068.
[8] C. Borgelt (2004) 'Recursion Pruning for the Apriori Algorithm', FIMI.
[28] A. Wilson, et al. (2014) 'Modeling the effect of temperature on ozone-
[9] J.N. Christensen, et al. (2015) 'Unraveling the sources of ground level
related mortality', The Annals of Applied Statistics, Vol. 8 No. 3, pp.
ozone in the Intermountain Western United States using Pb isotopes',
1728-1749 (Access 2014
Science of the Total Environment, Vol. 530, pp. 519-525 (Access 2015