You are on page 1of 7

AEGAEUM JOURNAL ISSN NO: 0776-3808

An Application of Spatial Data mining in the study of


Corona Virus (COVID-19) Pandemic through Statistical
Approach

M. Rajathi1 & Dr. R. Arumugam2


1
Department of Education, Periyar Maniammai Institute of Science & Technology,
Thanjavur-613 403, Tamilnadu, India.
2
Department of Mathematics, Periyar Maniammai Institute of Science & Technology,
Thanjavur-613 403, Tamilnadu, India.
Email: rajathiarumugam@pmu.edu, arumugamr2@gmail.com,

Abstract
In the technological world Spatial Database Management System (SDBMS) has been a vital role
to study neighbourhood relation. The core concept of spatial data mining we need to investigate the
neighbors of many objects in the single run of typical data mining algorithm. This means that in spatial data
mining algorithm we have to efficiently process the neighborhood relation. An integration of spatial data
mining algorithms and the potential of spatial database management system (SDBMS) will help efficiently
providing general concept of neighborhood relation and its implementation. This paper focusses that the
neighbouhood relation of the Corona virus (COVID-19) pandamic in India as on 30th March 2020. For the
significant study of the corona virus SPSS and SQL query has been used.
Keywords: Spatial Data Base, COVID-19, SPSS, Pandemic

1. Introduction
The explosive growth of spatial database has far outpaced the human potential to
interpret this statistics. This creates an urgent need for new technology and equipment that
support the human in transforming the facts into useful facts and knowledge. Spatial
Database Management System (SDBMS) is the database structures for the control of
spatial information [2]. Spatial Data Mining (SDM) is the technique to locate the implicit
regularities, rules or patterns hidden in massive spatial database [1][4][5][6].
Spatial database framework is a database framework which offers spatial realities
types (SDT) in its data form and shape question language. We utilize set of components
as a general outline of spatial things [3]. The spatial database control gadget need to be
able to retrieve from a massive collection of objects in some space the ones lying within a
specific area with out scanning the whole set. The spatial database control system should
be capable of retrieve from a massive series of gadgets in a few area those existing in a
specific locale without examining the entire set. For that spatial ordering is obligatory.
The DBMS enables numerous spatial file to shape e.g. R-tree [8]. They are used in
rushing up the processing of spatial queries or nearest queries [7]. The SDBMS musty
also have the function to connect object from special classes. The business RDBMS (e.g.
Oracle) can be used to combine the fundamental operations for spatial records mining
[10]. The topological relations [9] among two items A and B are gotten from the nine
meeing points of the insides, the limits and the supplements of A and B with each other.
In addition, these devices are commonly intended to see client purchasing behaviors in
advertise crate records [11]. The SDBMS smelly likewise have the trademark to interface
object from uncommon classes' druing a couple of spatial relationship.
The 2019 corona virus disease (COVID-19) epidemic in Chinaisa global health
care threat [12] andisby a ways the largest outbreak of a standard pneumonia considering
the fact that the Severe Acute Respiration Syndrome (SARS) outbreak in 2003. Within
weeks of the initial outbreak the entire number of instances and deaths passed the ones of
SARS [13]. A Statistical Study on the Impact of Dengue Fever in Thanjavur District
Using SPSS was made by Dr.R.Arumugam et. al 2019 [17].
The outbreak was first discovered in late December 2019 while clusters of
pneumonia cases of unknown etiology were found to be associated with
epidemiologically related publicity to a seafood market and untraced exposures inside the

Volume 8, Issue 4, 2020 http://aegaeum.com/ Page No: 1698


AEGAEUM JOURNAL ISSN NO: 0776-3808

city of Wuhan of Hubei Province [14]. Since, the wide variety of cases has persevered to
escalate exponentially within and past Wuhan, spreading to all 34 regions of China by 30th
January 2020. On the identical day, the World Health Organization (WHO) declared the
COVID-19 outbreak a public health emergency of worldwide concern [15]. In this paper a
set of database queries of the corona virus (COVID-19) pandamic has been tested and
introduced for mining the spatial database the use of SPSS tool.
2. Methods and Meterials
2.1 Spatial Data and Spatial Database System
In different fields there's a need to manipulate spatial data i.e. data associated with
space. One distinguished instance of spatial information is the satellite informations for
the corona virus (COVID-19). To extract information from a satellite it needs to be
processed w.r.t spatial body of reference, in all likelihood our Earth’s surface. But the
satellite information is n't the simplest the spatial records and our Earth surface are not the
only body of reference. Since the advent of relational database device there were tries to
manipulate such facts in database.
The necessities and techniques for dealing with objects in area that have identity
and well defined active cases, recovaries, and deaths. Here we are discussing spatial
database systems in the constrained sense. The queries or command that we execute on
spatial records is known as spatial query.
For example, the queries are given for the following questions,
1. Which states are afftected by means of Corona?
2. How many peoples are affected?
3. How many peoples are recoverd? And
4. How many deaths are occurred?
Like that lot of queries are decribed in this paper. And list out all information the
use of quary language based on the spatial query.

Figure 1: Timeline of the pandemic spread across India (As on 30th March 2020)
Secondly, if our database keeps the detail of a country name, affected areas,
recovories, deaths and total active cases are list out here based on the spatial information.
Then the question like, list the top five corona affcted states, in which state greater than
five is a non- spatial query and database management system with spatial data and spatial
query is essentially required.
3. Analysis
3.1 Modeling the Spatial Database
Corona afftected place (ie active cases at different states in India), recoveries
from the corona virus (COVID-19) and deaths can be designed using sql query language
and queries of the outputs are displayed in the following table1, table 2 and table 3.
Query Model:
3.2 Neighborhood relation of the Corona States
The neighborhood relation says that the mutual influence among more than two states
(i.e., objects) depends on factors such as the topology, the maintaing social distancing and
practice respiratory hygiene. For example a population pond can cause different degree
and different levels of Corona pandemic in the neighborhood location. The topological,
social distancing and respiratory hygiene relation are the binary relation.

Volume 8, Issue 4, 2020 http://aegaeum.com/ Page No: 1699


AEGAEUM JOURNAL ISSN NO: 0776-3808

3.3 DBMS Support


The commercial Relational Data Base Mangement Systems (RDBMS) (e.g.
Oracle) can be used to integrate the essential operations for spatial data mining [10]. All
the potentials of these systems can be effectively used in spatial data mining applications.
The DBMS supports many spatial index structure e.g. R-tree [8]. They are used in
speeding up the processing of spatial queries or nearest queries [7]. If the spatial objects
are fairly complex, however, retrieving the neighbours of some object this way is still
very time consuming. It is due to the complexity of the evaluation of neighbourhood
relations on such objects. Furthermore when creating all neighbourhood paths with a
given source objects, a very large number of neighbours operations have to be performed.
Finally many spatial databases are static because there are not many updates on objects
[16].
Therefore, materializing the relevant neighbourhood graphs by using the concept of
neighborhood indices would be quite useful in executing spatial query.

3.4 Creating Neighboring State Index


To create a neighbouring Corona pandemic state index ImaxDB, a spatial join on
DB with respect to the neighbourhood relation is performed. . For each pair of objects
returned by the spatial join we then have to determine the exact distance, the direction
relation and the topological relation.

Table 1: Query Model SELECT TOP 5 * FROM Corona Virus pandemic


Query1
Sl No State Active Recoveries Deaths Total
1 Maharashtra 215 25 7 247
2 Kerala 202 20 1 223
3 Karnataka 83 5 3 91
4 Delhi 72 6 2 80
5 Uttar Pradesh 72 11 0 83

Table 2: SELECT * FROM pandemic WHERE State='Rajasthan' OR State='Goa'


Query2
State Active Recoveries Deaths Total
Rajasthan 60 3 0 63
Goa 5 0 0 5

Table 3: SELECT State, Active FROM pandemic WHERE Active>'5'


Query3
State Active cases
Karnataka 83
Delhi 72
Uttar Pradesh 72
Telangana 70
Gujarat 69
Rajasthan 60
Tamil Nadu 67

Volume 8, Issue 4, 2020 http://aegaeum.com/ Page No: 1700


AEGAEUM JOURNAL ISSN NO: 0776-3808

Query3
State Active cases
Chandigarh 9

3.5 Descriptive Statistics using SPSS (For significant study)


Table 4: Descriptive Statistics for Corona virus (COVID – 19) pandemic
Statistics
state Active recovery death total
N Valid 30 27 27 27 27
Missing 0 3 3 3 3
Mean 43.85 3.81 1.00 48.67
Std. Error of Mean 10.527 1.279 .342 11.838
Median 23.00a 1.00a .55a 24.00a
Mode 1 0 0 1
Std. Deviation 54.700 6.645 1.776 61.512
Variance 2992.131 44.157 3.154 3783.769
Skewness 2.141 2.223 2.492 2.204
Std. Error of Skewness .448 .448 .448 .448
Kurtosis 4.820 4.214 6.140 5.144
Std. Error of Kurtosis .872 .872 .872 .872
Range 214 25 7 246
Minimum 1 0 0 1
Maximum 215 25 7 247
Sum 1184 103 27 1314
Percentiles 10 1.96b .b,c .b,c 2.20b
20 4.87 .05 . 5.53
25 6.67 .21 . 7.33
30 8.47 .36 .05 9.07
40 13.60 .68 .30 15.20
50 23.00 1.00 .55 24.00
60 39.27 1.68 .79 42.60
70 62.80 2.93 1.09 65.40
75 68.50 3.83 1.39 72.67
80 70.13 5.10 1.69 76.40
90 81.53 15.80 2.87 89.40

3.6 Analysis of Variance (ANOVA) using SPSS


Table 5 ANOVA table for Corona virus (COVID – 19) pandemic
Sum of Squares df Mean Square
active Between Groups (Combined) 71805.150 10 7180.515
Linear Weighted 54364.172 1 54364.172
Term Deviation 17440.978 9 1937.886
Within Groups 5990.257 16 374.391
Total 77795.407 26
Death Between Groups (Combined) 52.186 10 5.219
Linear Weighted 15.408 1 15.408
Term Deviation 36.778 9 4.086
Within Groups 29.814 16 1.863
Total 82.000 26

Volume 8, Issue 4, 2020 http://aegaeum.com/ Page No: 1701


AEGAEUM JOURNAL ISSN NO: 0776-3808

Figure 2: Corona Virus pandamic in India as on 30th March 2020

Figure 3: Freq level in active case Figure 4: Freq level in recoveries Figure 5: Freq level in death stage

4. Discussion and Result


The first figure shows that the timeline of the Corona virus (COVID -19)
pandemic spread across India (As on 30th March 2020) represented by using graphical
format. Table 1 demonstrates that the queries for the top five states corona virus affected
areas in the three levels like active, recoveries and death and total cases in India.
Table two revealed that precise state to check the status of the corono virus
pandemic based on the queries. Table three illustrates that the top five corona affected
states based on the query as on 30th March 2020 in India.
Table four demonstrates that the descriptive statistics of Corona Virus (COVID-
19) as on 30th March 2020 in India. The corona range between 1 and 215 in the active
stage with mean 43.85 and the standard deviation is 54.7. The skewness represents that
the lack of symmetry (ie., 2.141) for 27 corona infected states and the kurtosis is greater
than three (i.e 4.82) in the active stage, 4.214 in the recoveries and 6.14. Therefore the
skewness is positive, it is shows that lepto kurtic, this means that that the probability plot
has peaked at the right side. Similarly the recoveries and death stage represents that
positive skewness and leptokurtic with N= 27.
Table five used to represent that the status of Corona pandemic in the statistical
format using ANOVA table. Therefore it is significant between groups and within groups
in the active cases and aslo in the death stage at 5% level (ie p > 0.05). Figure two
represents that the corona virus pandamic in India as on 30th March 2020 in active,
recoveries and death stages. Figure three explined that the frequency levels in the active
cases, which means that in the normal cuve right tailed. Fourth figure clearly says that the
frequency values in the recoveries of coronoa which gives the symmetic level and also the
fifth figure represents the death level based on the frequency values (ie peaked level) in
the corona virus pandemic COVID-19 in India as on 30th March 2020.
5. Conclusion
During the initial stage of corona virus pandamic COVID-19 outbreak in India, We are
presented the applications of spatial data mining based on the COVID-19. Some
techniques are adopted the queries for viewing the various types of neighbourhood corona
states. Pictorial representation are given for spatial information to test the significant

Volume 8, Issue 4, 2020 http://aegaeum.com/ Page No: 1702


AEGAEUM JOURNAL ISSN NO: 0776-3808

study the statistical software SPSS is used. Also we are presented the ANOVA table for
the study of the significant at 5% level.

Scope of the study


Data mining concept is very essential to display the various parameters in well defined
manner. In the health sector we have to represent the up-to-date and exact health
information and the measures of precations associated with the impact of corona virus
COVID-19. Our findings can be used to formulate the specific study of the neighbouring
peoples based on Spatial Data Base Management System to measure the impact of corona
COVID-19.

REFERENCES
[1] Agrawal, R., T.Imielinski and A.Swami, Database mining: A overall performance
perspective. IEEE Transactions on Knowledge and Data Engineering, five
(6):914–925, 1993.

[2] Bill, F., Fundamentals of Geographical Information Systems: Hardware,


Software and Data (in German). Heidelberg, Germany: Wichmann Publishing,
1991.

[3] Egenhofer, M.J. Reasoning about binary topological relations. In Proc. second
Int. Symp. On huge spatial Databases, Zurich, Switzerland, PP.143-160, 1991.

[4] Ester, M., H.P.Kriegel., and J. Sander., Spatial statistics mining: A database
method. In Proc. fifth Int. Symp. On Large Spatial Databases, Berlin, Germany,
pp. 47–66, 1997.

[5] ]Ester, M., A. Frommelt, H.P Kriegel, and J. Sander, Algorithms for
characterization and fashion detection in spatial databases. In Proc. 4th Int.
Conf. On Knowledge Discovery and Data Mining, New York City, NY, pp. 44–50,
1998.

[6] Fayyad, U.M.J., G.Piatetsky-Shapiro, and P. Smyth, from facts mining to


knowledge discovery: An overview. In Advances in Knowledge Discovery and
Data Mining. Menlo Park: AAAI Press, pp. 1–34, 1996.

[7] Gueting, R.H. An advent to spatial database systems. VLDB Journal Special Issue
on Spatial Database Systems, three (four). 1994.

[8] Guttman, A. R-trees: A dynamic index shape for spatial searching. In Proc. ACM
SIGMOD Int. Conf. On Management of Data, pp. 47–54. 1984.

[9] Koperski, K. And J. Han, Discovery of spatial association regulations in


geographic statistics databases. In Proc. 4th Int. Symp. On Large Spatial
Databases (SSD ’95), Portland, ME, pp. 47–66, 1995.

[10] Koperski, K., J.Adhikary, and J. Han, Knowledge discovery in spatial databases:
Progress and challenges. In Proc. SIGMOD Workshop on Research Issues in
Data Mining and Knowledge Discovery. Technical Report 96-08, University of
British Columbia, Vancouver, Canada. 1996.

[11] Ladnee R, F E Petry, M A Cobb, Fuzzy Set Approaches to Spatial Data Mining of
Association Rules[J].Transin GIS, 2003,7(1).123-138.

Volume 8, Issue 4, 2020 http://aegaeum.com/ Page No: 1703


AEGAEUM JOURNAL ISSN NO: 0776-3808

[12] Wang, C. A novel coronavirus outbreak of global health concern. [CrossRef],


Lancet, 395, 470–473, 2020

[13] Hawryluck, L. SARS manage and psychological effects of quarantine, Toronto,


Canada. Emerg. Infect. Dis., [CrossRef] [PubMed]. 10, 1206–1212., 2004

[14] Nishiura, H. The Extent of Transmission of Novel Coronavirus in Wuhan, China,


J. Clin. Med. [CrossRef] [PubMed] 9, 330, 2020.

[15] Mahase, E. China coronavirus: WHO publicizes international emergency as


dying toll exceeds 200. BMJClin. Res. Ed. [CrossRef] [PubMed], 368, m408,
2020

[16] Ester Martin,Hans-Peter Kriegel,Sander Jorg, Algorithms and Applications for


Spatial statistics Mining, Published in Geographic Data Mining and Knowledge
Discovery, Research Monographs in GIS, Taylor and Fransis, 2001.

[17] Dr Arumugam R , C. Gowri, M.Rajathi, A Statistical Study on the Impact of


Dengue Fever in Thanjavur District Using SPSS, Compliance Engineering
Journal, , Vol.10, Issue 12, pp-671-683, 2019.

Volume 8, Issue 4, 2020 http://aegaeum.com/ Page No: 1704

You might also like