You are on page 1of 13

1

MSc DEGREE
IN
INFORMATION SYSTEMS

PROJECT PROPOSAL

PROJECT TITLE:- COVID-19 REPORT

NAME :- CHAITANYA KUMAR YADAVALLI

ID NO :- K2007759

SUPERVISOR:- ROBSON, MARTIN

WARRANTY STATEMENT
This is a student project. Therefore, neither the student nor Kingston University
makes any warranty, express or implied, as to the accuracy of the data or conclusion
of the work performed in the project and will not be held responsible for any
consequences arising out of any inaccuracies or omissions therein.
2

TABLE OF CONTENTS

1. ABSTRACT………………………………………………………………...3

2. INTRODUCTION AND BACKGROUND ………………………………3

2.1 MOTIVATION AND CONTRIBUTION……………………………..4

2.2 OVERVIEW…………………………………………………………….4

3. AIMS AND OBJECTIVES………………………………………………...5

4. LITERATURE REVIEW………………………………………….………5

4.1 WHAT IS MACHINE LEARNING ………………………………......5

4.2 SUPERVISED LEARNING……………………………………….…...5

4.3 ENSEMBLE LEARNING……………………………………………...7

4.4 UNSUPERVISED LEARNING………………………………………..8

4.5 ARTIFICIAL NEURAL NETWORK…………………………….......8

5. METHODOLOGY………………………………………………………....9

6. TECHNOLOGY…………………………………………………..……….10

7. ETHICS AND LEGAL CONCERNS…………………………..……...…11

8. CONCLUSION…………………………………………………..……...…11

9. REFERENCES………………………………………………………….…12

LIST OF FIGURES:

FIGURE 1 : REGRESSION MODEL……………………………….6


FIGURE 2 : EXAMPLE OF DECISION TREE……………….……7
FIGURE 3: ARTIFICIAL NEURAL NETWORK……………...…..8
3

1. ABSTRACT
Planet facing a COVID – 19 pandemics. Every person needs to know about the
current COVID situation-19 in their locality, nation or even the planet. So, here we're
creating a project where COVID – 19 pandemics datasets are collected from time to
time and then stored in a database. We often forecast the outbreak and display this
information in a human-readable format, or even pictorially or graphically, using
various data analysis and machine-learning techniques. This forecast helps local
governments and residents take necessary steps and measures.

2. INTRODUCTION AND BACKGROUND


Coronaviruses are single-strand and even positive-sense RNA viruses considered to
contain some of the largest viral genomes, up to a length of approximately 32 kbp [1–
5]. Largest sources of diversity for coronaviruses belong to the strains that infect bats
and even birds, providing a reservoir for recombination in wild animals as well as
mutation that can enable transmission of cross-species into other mammals as well as
humans. The COVID-19 virus was found to be most closely associated with two bat
SARS-like coronaviruses, bat-SL-CoVZXC21 and also bat-SL-CoVZC45, found in
the Chinese horseshoe bats Rhinolophus sinuous.

The novel Coronavirus disease (COVID-19) was first discovered in Wuhan, Hubei
Province, China on 31 December 2019. Now it began to spread quickly all over the
world. The combined incidence of causative virus (SARS-CoV-2) is growing rapidly
and has also affected 196 countries and territories with the USA, Spain, Italy and the
United Kingdom. And France is the most affected, too. This coronavirus epidemic has
been declared a pandemic by the World Health Organization (WHO), though the virus
is still spreading. A total of 3,581,884 confirmed positive cases have been reported
resulting in 248,558 deaths as of 4 August 2020. The primary distinction between the
CoV-2 pandemic and associated viruses, such as Extreme Acute Respiratory
Syndrome (SARS) and even Middle East Respiratory Syndrome (MERS), is its
capacity to propagate rapidly by human contact, leaving almost 20 percent of infected
subjects as symptom-free carriers as well. In addition, some studies have indicated
that the disease caused by CoV-2 is more dangerous to people with poor immune
systems. Elderly people as well as patients with diseases such as cancer, diabetes,
neurological disorders, coronary heart disease and HIV / AIDS are more vulnerable to
extreme COVID-19 effects. The only solution we have in the absence of any curative
medicine is to slow down the spread by practicing "social distancing" to stop the
chain of transmitting this virus. This CoV-2 behavior demands that a robust
mathematical framework be established to track its spread and monitor the spread via
dynamic decision-making.
Innovative approaches for creating, deploying, managing and analyzing big data on
the increasing network of infected topics, patient information, and their group
movements are also required. Multiple data tools such as text messages, electronic
interactions, social media, and even web articles can be of great help in studying the
growth of group activity infection. By patching this data with Machine Learning (ML)
and Artificial Intelligence (AI), researchers can predict where and when the disease is
4

likely to spread, and also alert certain areas in order to suit the arrangements needed.
Travel history of infected subjects is automatically monitored, in order to research
epidemiological associations with disease spread. Some community-based effects
have been identified. Technology needs to be built efficiently and cost-effectively for
storage and even analyses of these huge data for further processing. It must then be
coordinated by the use of cloud computing and AI solutions as well. An e-commerce
company Alibaba has developed cloud and AI technologies to help China, counter
coronavirus, predict the height, size and duration of the outbreak, which is reported to
have been implemented with 98 percent accuracy in real-world testing in different
locations in China.

2.1 MOTIVATION AND CONTRIBUTION

ML is used to treat big data and to smartly forecast the spread of this disease as well.
Cloud computing can also be combined with high-speed computations to rapidly
boost the prediction process. In order to reduce the power consumption, several
sources can be used to retrieve data. In this paper, we present a prediction model
deployed using the TensorFlow system and various layers in it to accurately predict
the number of COVID-19 cases, to increase and also to decrease the number of cases
in the near future in a specific region and to alert the locality. We present a prediction
scheme based on the ML model, which can be used to allow governments and people
to react proactively in real-time prediction. Finally, we summarize this analysis and
also present various directions of study.

2.2 OVERVIEW

The outbreak of coronavirus, and the illness it causes, COVID-19, took the world by
storm. Each day the newsrooms filter loads of information — articles, official
documents, interviews with experts, etc. Health workers work every week to follow
hundreds of academic articles on drug studies, epidemiological data, regulation of
operation, and much more. In addition, social network sites need to reduce noise and
encourage validated stories so as not to foster misinformed and frightened users. The
epidemic of coronavirus (COVID-19) took the world by surprise. A great shortage of
doctors, ventilators, personal protective equipment and testing capability has been
faced by many countries all. But, in the near future, it will be much worse.

Many newsrooms, senior officials, and government bodies process tons of


information daily in search of correct, accurate, and credible COVID-19 pandemic
information. And even every local resident should be aware of the current coronavirus
outbreak situation. So, here we're working to build a local COVID project – 19
registries. That helps all local government bodies warn the people about polluted
zones and divide the areas into different zones based on the data we collect in various
ways. We use data processing techniques and algorithms to analyze the data we have
obtained from various sources and probably many other data mining techniques to
filter the dataset to get the correct collection of data as needed by our program.
Machine learning algorithms are the way to continually build and train our modules
so that by adjusting incoming datasets, we do not need to build the modules again and
again. Machine learning and mathematical analysis can be used to forecast possible
virus outbreaks.
5

3. AIMS AND OBJECTIVES


Aims To design and prototype a local COVID-19 registry which will support local
authorities and services, planners and epidemiologists with rich information about
local incidence, context and behavior of the disease.

Objectives
● To elicit detailed requirements for the registry system.
● To model the system requirements using a standard modelling language (e.g.
UML)
To design the system modules:
■ A data gathering module accepting input from a small range of sources,
converting automatically to, and storing in, the registry’s internal format
■ A data quality module checking input to avoid duplication of records and
■ A data analysis and reporting module to explore the data and identify patterns
and relationships.
■ A reporting module to extract defined subsets of data for research and
publication 
● To build prototype versions of these modules and integrate if possible. 
● To demonstrate the prototype and collect feedback. Reiterate if possible. 
● To maintain control of quality throughout

4. LITERATURE REVIEW

In this report, we present a thorough study and development of predictive models of


machine learning, considering a collection of sample data from a specific region and
also how to use Python and other programming tools for statistical analysis and data
interpretation. How tools such as TensorFlow, matplotlib etc. are used to build and
deploy a simple prototype model as they are in real time.

DISEASE REGISTRIES IN HEALTHCARE

“A medical registry for a specific disease is a project to maintain a comprehensive


record of all patients suffering from that disease within a stated geographic boundary.
Strenuous efforts are made to ensure that all patient details are captured from all
relevant sources and duplicates eliminated. The data is scrutinized and updated
regularly. This provides a high-quality resource for clinicians to follow the progress
of the disease and the statistical efficacy of treatments used.” (Robson, 2020).
6

4.1 WHAT IS MACHINE LEARNING?

Machine learning is a computer science branch that involves advanced mathematical


algorithms and techniques that are likely to be the most commonly applied field of
information mining, analytics, artificial intelligence, database techniques, pattern
recognition, optimization techniques, etc. The algorithms are primarily categorized
into two subcategories, which are defined as:

4.2 SUPERVISED LEARNING:

It is also the most traditional strategy of learning under which the model is educated
using predefined class labels. The type names may also be the "legitimate" and the
"fraudulent" statements in the sense of insurance fraud identification. The model can
be based on the training dataset. Instead, any new attempts to predict their class are
always contrasted with the already qualified model. An argument would be
categorized as a legal allegation because it fits a similar trend to the valid conduct
then it would be listed as unlawful. The disadvantages are that each class is important
to people, and is therefore conveniently used for labeling of trends. The downside is
the complexity in collecting class names. Also, because there is a mass data log, it's
expensive to sign any of them, so reports need to be clearly recorded as false positives
so true negatives will build a bad perception of the insurance provider inside the
consumers ' minds. Skewed distribution – category labels inside the training dataset
that result in a "true" during a model that does not have a really good predictive
accuracy. Supervised learning models are unable to identify new forms of fraud, so
experts need considerable efforts to extract the classified training samples that can be
used to create the model.
The brief explanation of algorithms which fall under supervised Learning are as
follows:

● REGRESSION MODEL

Regression is one of the most common statistical techniques estimating the relation of
variables. The interaction between a predictor and one or more independent variables
is modelled. This analyzes how the significance of dependent variables varies in the
established relationship by adjusting the values of independent variables.

FIGURE 1 : REGRESSION MODEL


7

The regression model considers the main feature of broad data sets in the sense of the
continuous data and is expected to have a regular distribution. This is used to
calculate the impact of different factors which affect a variable 's movement. Through
regression, a prediction variable determines the importance of an answer variable. In
this case, a method known as regression equation is used for comparing them with the
dependent variables for all the independent variables. Through this approach the
variance of the dependent variable is defined by the use of a statistical curve to
estimate the regression function.

In prescient investigation, there are two sorts of relapse models utilized for
examination or demonstrating, the direct relapse model and the strategic relapse
model. The idea of straight relapse is utilized for anticipating the causal connection
among needy and autonomous factors. In this model a straight condition will be
utilized as a relapse technique. At the other hand, the tasks examination since subject
factors are gatherings. Obscure estimations of discrete factors are assessed, in view of
built up estimations of autonomous factors, utilizing this procedure. In expectation it
can accept a predetermined number of qualities.

4.3 ENSEMBLE LEARNING

This belongs within the division of machine learning to the supervised learning
algorithms. Such models are built by training different related quiet models and then
integrating their predictive performance. This lets us find the right pattern for fresh
data to use.

● DECISION TREE

A choice tree might be a reason for arrangement yet it is most some of the time
utilized in relapse. This is a tree-like model that speaks to the choices and their
expected results. The outcomes may likewise be the result of exercises, material
expenses or handiness. Growing branch represents an option between a number of
alternatives in its tree-like form and its every leaf represents an option. This
partitioned data into subsets depending on the types of input variables. It allows the
people to evaluate the judgment. Ways to grasp and analyze quickly make use of
decision trees common. It has the internal nodes labelled with the decision-related
queries. All divisions that come out of a style are branded with potential responses to
the question.
8

FIGURE 2 : EXAMPLE OF DECISION TREE

The tree's outside hubs, called the leaves, are named with question judgment. This
model has the property to deal with the missed information and is useful in picking
the fundamental factors, as well. These are additionally alluded to as generative
variants of laws of deduction, using logical proof. It uses most information in the
dataset and limits Query Stage.

The concept may be applied to various potential situations(cases) representing the


model's adaptability, versatility and dignity. These may be integrated as per the need
or function alongside certain decision models. Yet implementing the improvements
would be the downside. A small shift in the details contributes to substantial systemic
reform. Relative to other statistical models, they slowdown in prediction efficiency.
Within this model, the measurement is complicated, especially regarding the use of
ambivalent data.

4.4 UNSUPERVISED LEARNING:

Unsupervised learning does not have class identifiers. It pivots to looking for
instances with odd marks. Uncontrolled learning strategies can uncover both old and
new forms of deception because they are not confined to deception trends that also
have pre-defined class labels such as supervised learning techniques. The benefits are
that it tries to identify something that does not follow conventional behavior, and
because it requires context, it may recognize trends that have not historically been
observed. Although the downside is due to lack of focus, there might even be
occasions where no useful information was found within the range of features selected
for the instruction.

4.5 ARTIFICIAL NEURAL NETWORK

Artificial neural network comes into both controlled and unmonitored models; it is a
network of artificial neurons centered on biological neurons that simulates the
capacities of the human nervous system to process input signals and generate outputs.
This is a sophisticated paradigm able to mold the highly complicated and high-end
relationships.
9

FIGURE 3: ARTIFICIAL NEURAL NETWORK

Artificial neural networks are used as an important method for learning from the
sample datasets and creating a guess of the latest data in predictive analytics
applications. For the computation, an input sequence of the training data is introduced
to the input layer of the network and transferred to a secret layer that is a neuron
vector. Based on the performance condition, various forms of stimulation mechanisms
are used at the neurons. One neuron production is passed to, or fed to, the next layer's
neurons. This is obtained at the output point, which may be the predictor of new
results.
Artificial neural networks have multiple configurations, and each model uses a
particular algorithm. Backpropagation is the common algorithm used dominantly in
many supervised problems of learning. Artificial neural networks are often seen in
questions with unsupervised learning. Clustering is the methodology used in
unsupervised learning and often utilizes artificial neural networks. They have the
strength in the data to cope with the non-linear relationship. These are often used to
determine the results of statistical equations and decision trees. Such frameworks are
used in image detection problems with the ability of pattern recognition.

5. METHODOLOGY

Creation and implementation of these predictive models requires many phases by


which we can forecast the future on the basis of current and historical evidence.
And store it in a cloud or folder, and train and track the users and warn them
continuously. The affected moves are:

REQUIREMENT COLLECTION

In order to create a model, the purpose of prediction must be explicit. The form of
information that is to be obtained can be specified by prediction. That area data would
be needed to create the model.
10

DATA COLLECTION

Once we learn about the need, we need to gather the datasets, which could be from
various sources, used to build the model. This may be a full collection of cases in a
covid-infected area of 19. The sort of data may be in a standardized system or in an
unstructured way. They also need to check the data gathered.

DATA ANALYSIS AND MASSAGING

We need to review and plan the data gathered for study and to be included in the
model. Unstructured data is turned into a standardized type. If the complete data is
accessible in a standardized manner, the consistency of the data is then checked.
There is a probability that inaccurate data is found in the key dataset, or that there are
several missed values against parameters, both of which will be resolved. The
adequacy and exactness of the prescient model relies exclusively upon the consistency
of the outcomes. This procedure is frequently alluded to as information blending or
information kneading, which suggests making an interpretation of crude information
into an arrangement utilized by investigation.

STATISTICS, MACHINE LEARNING

The simulation method uses a range of computational and machine learning


techniques. The two common methods used primarily in analytics are probability
theory and regression analysis. And even artificial neural networks, decision trees,
and vector support devices are the machine learning tools that we continue to use
widely in several predictive analytics activities. We also need to incorporate the
principles of statistics and machine learning in order to create models. Machine
learning approaches provide a drastic benefit over traditional mathematical methods
or techniques, but likelihood and computational techniques must be used in the
creation of any advanced model.

PREDICTIVE MODELING

Here a model is built based on probabilistic, computational and machine learning


methods and data sets of instances. If the implementation is finished, the test data set
is checked, which is part of the main data set obtained, to verify the quality and
consistency of the model and, if it is effective, the model is considered to be correct,
the model will make reliable predictions of the latest data inserted as input into the
method. Multi-model approaches are selected for an issue in several implementations
in real time.

PREDICTION AND MONITORING

After positive prediction research, the model is expected to be implemented on the


cloud platform website for regular forecasts and decision-making. The model is then
regularly tracked to ensure that it delivers the right outcomes and allows reliable
predictions.
11

This can be shown here that the paradigm of machine learning is not a simple phase in
producing projections of the future. This is a step-by - step mechanism requiring
various procedures from criteria selection to installation and testing for successful
device use and consistency to render it a framework in decision-making phase.

6. TECHNOLOGY AND RESOURCES

This machine learning model creation and implementation of frameworks and in real
time, consisting generally of different technologies, allow analytics that accelerate the
model-determined decision-making process. For example, a simple analytical
framework can consist of specialized software and services that can handle incredibly
large datasets and create models over them using cloud-based analytics tools. As this
form of analytical architecture may be rather large, from data warehouse design to
hardware decision taking. The broad range of methods and techniques used in data
mining and machine learning have grown immensely to build an interactive
framework with distributed computer devices.

Even though we use some or most of the applications or frameworks listed in


Anaconda, RStudio, Alteryx Framework, MATLAB, IBM SPSS, SAP Analytics
Cloud, TensorFlow, koras, matplotlib are the key tools and platforms we still claim
packages need to be included in the creation and implementation of the model.

TensorFlow is the most important method for designing and implementing models in
this phase. And in python programming language, we create the majority of code.

7. ETHICAL, LEGAL, SOCIAL, SECURITY AND


PROFESSIONAL CONCERNS

Investigation information are gathered and ensured as per both the European General
Data Protection Regulation and the UK Data Protection Act.Any records used as data
in this project will be created fictionally and not refer to any real individual. Any
views or opinions expressed by individuals will be reported anonymously.

8. CONCLUSION AND FUTURE SCOPE

The planet is now under the influence of the COVID-19 virus. This article is planned
to utilize the spectrum of pandemic research machine learning models across a
dataset. In conclusion, a minimum amount of Root Mean Square Error (RMSE) is
required in projecting the COVID-19 transmission over other methods.
12

There has been a long tradition of utilizing machine learning methods in the process
of prediction and data processing. Later on, statistical models were used as predictive
models that relied on survey results from a very broad data collection. Through
advances in the world of information science and the development in computing
technologies, modern techniques and strategies have been developed and better
algorithms have been adopted over time. Machine learning algorithms have a rather
strong track record in being used as statistical software. Artificial neural networks
have brought about a breakthrough in predictive analytics. The performance or future
of any attribute may be estimated on the basis of input parameters.

Now with the advances of machine learning and the growth of deep learning
technologies, there is an increase in the usage of deep learning models in predictive
analytics and they are being implemented in full swing to this mission. This paper
offers us the potential for designing novel models for predictive modeling in the area
of medical sciences and the transmission of pandemic diseases. There is still an ability
and potential to attach new improvements to the current models presented here to
enhance their efficiency in the challenge and to make them better fit in real-world
scenarios.
13

REFERENCES
“The Hundred-Page Machine Learning Guide, by Andriy Barkov.

"Collective Intelligence Programming: Creating Smart Cloud 2.0 Apps," by


Toby Sekaran.

'The Principles of Statistical Learning: Data Analysis, Inference and Prediction'


by Trevor Hastie, Robert Toshigami and Jerome Friedman.

"Learn from Data: A Short Course," by Yasser Abu Mostafa, Malik Magnon-
Ismail, and Husain-Tien Lin.

'Machine Learning for Complete Beginners: A Simple English Review' by Oliver


Theobald.

'Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms,


Worked Cases and Case Studies,' by John D. Kelleher, Brian Mac Name and
Aoife D'Arcy.

Pujadas, E. et al. 2020, SARS-CoV-2 viral load predicts COVID-19


mortality, Lancet Respiratory Medicine online, Aug 6, 2020. Available at
https://www.thelancet.com/pdfs/journals/lanres/PIIS2213-2600(20)30354-
4.pdf

You might also like