Professional Documents
Culture Documents
MSc DEGREE
IN
INFORMATION SYSTEMS
PROJECT PROPOSAL
ID NO :- K2007759
WARRANTY STATEMENT
This is a student project. Therefore, neither the student nor Kingston University
makes any warranty, express or implied, as to the accuracy of the data or conclusion
of the work performed in the project and will not be held responsible for any
consequences arising out of any inaccuracies or omissions therein.
2
TABLE OF CONTENTS
1. ABSTRACT………………………………………………………………...3
2.2 OVERVIEW…………………………………………………………….4
4. LITERATURE REVIEW………………………………………….………5
5. METHODOLOGY………………………………………………………....9
6. TECHNOLOGY…………………………………………………..……….10
8. CONCLUSION…………………………………………………..……...…11
9. REFERENCES………………………………………………………….…12
LIST OF FIGURES:
1. ABSTRACT
Planet facing a COVID – 19 pandemics. Every person needs to know about the
current COVID situation-19 in their locality, nation or even the planet. So, here we're
creating a project where COVID – 19 pandemics datasets are collected from time to
time and then stored in a database. We often forecast the outbreak and display this
information in a human-readable format, or even pictorially or graphically, using
various data analysis and machine-learning techniques. This forecast helps local
governments and residents take necessary steps and measures.
The novel Coronavirus disease (COVID-19) was first discovered in Wuhan, Hubei
Province, China on 31 December 2019. Now it began to spread quickly all over the
world. The combined incidence of causative virus (SARS-CoV-2) is growing rapidly
and has also affected 196 countries and territories with the USA, Spain, Italy and the
United Kingdom. And France is the most affected, too. This coronavirus epidemic has
been declared a pandemic by the World Health Organization (WHO), though the virus
is still spreading. A total of 3,581,884 confirmed positive cases have been reported
resulting in 248,558 deaths as of 4 August 2020. The primary distinction between the
CoV-2 pandemic and associated viruses, such as Extreme Acute Respiratory
Syndrome (SARS) and even Middle East Respiratory Syndrome (MERS), is its
capacity to propagate rapidly by human contact, leaving almost 20 percent of infected
subjects as symptom-free carriers as well. In addition, some studies have indicated
that the disease caused by CoV-2 is more dangerous to people with poor immune
systems. Elderly people as well as patients with diseases such as cancer, diabetes,
neurological disorders, coronary heart disease and HIV / AIDS are more vulnerable to
extreme COVID-19 effects. The only solution we have in the absence of any curative
medicine is to slow down the spread by practicing "social distancing" to stop the
chain of transmitting this virus. This CoV-2 behavior demands that a robust
mathematical framework be established to track its spread and monitor the spread via
dynamic decision-making.
Innovative approaches for creating, deploying, managing and analyzing big data on
the increasing network of infected topics, patient information, and their group
movements are also required. Multiple data tools such as text messages, electronic
interactions, social media, and even web articles can be of great help in studying the
growth of group activity infection. By patching this data with Machine Learning (ML)
and Artificial Intelligence (AI), researchers can predict where and when the disease is
4
likely to spread, and also alert certain areas in order to suit the arrangements needed.
Travel history of infected subjects is automatically monitored, in order to research
epidemiological associations with disease spread. Some community-based effects
have been identified. Technology needs to be built efficiently and cost-effectively for
storage and even analyses of these huge data for further processing. It must then be
coordinated by the use of cloud computing and AI solutions as well. An e-commerce
company Alibaba has developed cloud and AI technologies to help China, counter
coronavirus, predict the height, size and duration of the outbreak, which is reported to
have been implemented with 98 percent accuracy in real-world testing in different
locations in China.
ML is used to treat big data and to smartly forecast the spread of this disease as well.
Cloud computing can also be combined with high-speed computations to rapidly
boost the prediction process. In order to reduce the power consumption, several
sources can be used to retrieve data. In this paper, we present a prediction model
deployed using the TensorFlow system and various layers in it to accurately predict
the number of COVID-19 cases, to increase and also to decrease the number of cases
in the near future in a specific region and to alert the locality. We present a prediction
scheme based on the ML model, which can be used to allow governments and people
to react proactively in real-time prediction. Finally, we summarize this analysis and
also present various directions of study.
2.2 OVERVIEW
The outbreak of coronavirus, and the illness it causes, COVID-19, took the world by
storm. Each day the newsrooms filter loads of information — articles, official
documents, interviews with experts, etc. Health workers work every week to follow
hundreds of academic articles on drug studies, epidemiological data, regulation of
operation, and much more. In addition, social network sites need to reduce noise and
encourage validated stories so as not to foster misinformed and frightened users. The
epidemic of coronavirus (COVID-19) took the world by surprise. A great shortage of
doctors, ventilators, personal protective equipment and testing capability has been
faced by many countries all. But, in the near future, it will be much worse.
Objectives
● To elicit detailed requirements for the registry system.
● To model the system requirements using a standard modelling language (e.g.
UML)
To design the system modules:
■ A data gathering module accepting input from a small range of sources,
converting automatically to, and storing in, the registry’s internal format
■ A data quality module checking input to avoid duplication of records and
■ A data analysis and reporting module to explore the data and identify patterns
and relationships.
■ A reporting module to extract defined subsets of data for research and
publication
● To build prototype versions of these modules and integrate if possible.
● To demonstrate the prototype and collect feedback. Reiterate if possible.
● To maintain control of quality throughout
4. LITERATURE REVIEW
It is also the most traditional strategy of learning under which the model is educated
using predefined class labels. The type names may also be the "legitimate" and the
"fraudulent" statements in the sense of insurance fraud identification. The model can
be based on the training dataset. Instead, any new attempts to predict their class are
always contrasted with the already qualified model. An argument would be
categorized as a legal allegation because it fits a similar trend to the valid conduct
then it would be listed as unlawful. The disadvantages are that each class is important
to people, and is therefore conveniently used for labeling of trends. The downside is
the complexity in collecting class names. Also, because there is a mass data log, it's
expensive to sign any of them, so reports need to be clearly recorded as false positives
so true negatives will build a bad perception of the insurance provider inside the
consumers ' minds. Skewed distribution – category labels inside the training dataset
that result in a "true" during a model that does not have a really good predictive
accuracy. Supervised learning models are unable to identify new forms of fraud, so
experts need considerable efforts to extract the classified training samples that can be
used to create the model.
The brief explanation of algorithms which fall under supervised Learning are as
follows:
● REGRESSION MODEL
Regression is one of the most common statistical techniques estimating the relation of
variables. The interaction between a predictor and one or more independent variables
is modelled. This analyzes how the significance of dependent variables varies in the
established relationship by adjusting the values of independent variables.
The regression model considers the main feature of broad data sets in the sense of the
continuous data and is expected to have a regular distribution. This is used to
calculate the impact of different factors which affect a variable 's movement. Through
regression, a prediction variable determines the importance of an answer variable. In
this case, a method known as regression equation is used for comparing them with the
dependent variables for all the independent variables. Through this approach the
variance of the dependent variable is defined by the use of a statistical curve to
estimate the regression function.
In prescient investigation, there are two sorts of relapse models utilized for
examination or demonstrating, the direct relapse model and the strategic relapse
model. The idea of straight relapse is utilized for anticipating the causal connection
among needy and autonomous factors. In this model a straight condition will be
utilized as a relapse technique. At the other hand, the tasks examination since subject
factors are gatherings. Obscure estimations of discrete factors are assessed, in view of
built up estimations of autonomous factors, utilizing this procedure. In expectation it
can accept a predetermined number of qualities.
This belongs within the division of machine learning to the supervised learning
algorithms. Such models are built by training different related quiet models and then
integrating their predictive performance. This lets us find the right pattern for fresh
data to use.
● DECISION TREE
A choice tree might be a reason for arrangement yet it is most some of the time
utilized in relapse. This is a tree-like model that speaks to the choices and their
expected results. The outcomes may likewise be the result of exercises, material
expenses or handiness. Growing branch represents an option between a number of
alternatives in its tree-like form and its every leaf represents an option. This
partitioned data into subsets depending on the types of input variables. It allows the
people to evaluate the judgment. Ways to grasp and analyze quickly make use of
decision trees common. It has the internal nodes labelled with the decision-related
queries. All divisions that come out of a style are branded with potential responses to
the question.
8
The tree's outside hubs, called the leaves, are named with question judgment. This
model has the property to deal with the missed information and is useful in picking
the fundamental factors, as well. These are additionally alluded to as generative
variants of laws of deduction, using logical proof. It uses most information in the
dataset and limits Query Stage.
Unsupervised learning does not have class identifiers. It pivots to looking for
instances with odd marks. Uncontrolled learning strategies can uncover both old and
new forms of deception because they are not confined to deception trends that also
have pre-defined class labels such as supervised learning techniques. The benefits are
that it tries to identify something that does not follow conventional behavior, and
because it requires context, it may recognize trends that have not historically been
observed. Although the downside is due to lack of focus, there might even be
occasions where no useful information was found within the range of features selected
for the instruction.
Artificial neural network comes into both controlled and unmonitored models; it is a
network of artificial neurons centered on biological neurons that simulates the
capacities of the human nervous system to process input signals and generate outputs.
This is a sophisticated paradigm able to mold the highly complicated and high-end
relationships.
9
Artificial neural networks are used as an important method for learning from the
sample datasets and creating a guess of the latest data in predictive analytics
applications. For the computation, an input sequence of the training data is introduced
to the input layer of the network and transferred to a secret layer that is a neuron
vector. Based on the performance condition, various forms of stimulation mechanisms
are used at the neurons. One neuron production is passed to, or fed to, the next layer's
neurons. This is obtained at the output point, which may be the predictor of new
results.
Artificial neural networks have multiple configurations, and each model uses a
particular algorithm. Backpropagation is the common algorithm used dominantly in
many supervised problems of learning. Artificial neural networks are often seen in
questions with unsupervised learning. Clustering is the methodology used in
unsupervised learning and often utilizes artificial neural networks. They have the
strength in the data to cope with the non-linear relationship. These are often used to
determine the results of statistical equations and decision trees. Such frameworks are
used in image detection problems with the ability of pattern recognition.
5. METHODOLOGY
REQUIREMENT COLLECTION
In order to create a model, the purpose of prediction must be explicit. The form of
information that is to be obtained can be specified by prediction. That area data would
be needed to create the model.
10
DATA COLLECTION
Once we learn about the need, we need to gather the datasets, which could be from
various sources, used to build the model. This may be a full collection of cases in a
covid-infected area of 19. The sort of data may be in a standardized system or in an
unstructured way. They also need to check the data gathered.
We need to review and plan the data gathered for study and to be included in the
model. Unstructured data is turned into a standardized type. If the complete data is
accessible in a standardized manner, the consistency of the data is then checked.
There is a probability that inaccurate data is found in the key dataset, or that there are
several missed values against parameters, both of which will be resolved. The
adequacy and exactness of the prescient model relies exclusively upon the consistency
of the outcomes. This procedure is frequently alluded to as information blending or
information kneading, which suggests making an interpretation of crude information
into an arrangement utilized by investigation.
PREDICTIVE MODELING
This can be shown here that the paradigm of machine learning is not a simple phase in
producing projections of the future. This is a step-by - step mechanism requiring
various procedures from criteria selection to installation and testing for successful
device use and consistency to render it a framework in decision-making phase.
This machine learning model creation and implementation of frameworks and in real
time, consisting generally of different technologies, allow analytics that accelerate the
model-determined decision-making process. For example, a simple analytical
framework can consist of specialized software and services that can handle incredibly
large datasets and create models over them using cloud-based analytics tools. As this
form of analytical architecture may be rather large, from data warehouse design to
hardware decision taking. The broad range of methods and techniques used in data
mining and machine learning have grown immensely to build an interactive
framework with distributed computer devices.
TensorFlow is the most important method for designing and implementing models in
this phase. And in python programming language, we create the majority of code.
Investigation information are gathered and ensured as per both the European General
Data Protection Regulation and the UK Data Protection Act.Any records used as data
in this project will be created fictionally and not refer to any real individual. Any
views or opinions expressed by individuals will be reported anonymously.
The planet is now under the influence of the COVID-19 virus. This article is planned
to utilize the spectrum of pandemic research machine learning models across a
dataset. In conclusion, a minimum amount of Root Mean Square Error (RMSE) is
required in projecting the COVID-19 transmission over other methods.
12
There has been a long tradition of utilizing machine learning methods in the process
of prediction and data processing. Later on, statistical models were used as predictive
models that relied on survey results from a very broad data collection. Through
advances in the world of information science and the development in computing
technologies, modern techniques and strategies have been developed and better
algorithms have been adopted over time. Machine learning algorithms have a rather
strong track record in being used as statistical software. Artificial neural networks
have brought about a breakthrough in predictive analytics. The performance or future
of any attribute may be estimated on the basis of input parameters.
Now with the advances of machine learning and the growth of deep learning
technologies, there is an increase in the usage of deep learning models in predictive
analytics and they are being implemented in full swing to this mission. This paper
offers us the potential for designing novel models for predictive modeling in the area
of medical sciences and the transmission of pandemic diseases. There is still an ability
and potential to attach new improvements to the current models presented here to
enhance their efficiency in the challenge and to make them better fit in real-world
scenarios.
13
REFERENCES
“The Hundred-Page Machine Learning Guide, by Andriy Barkov.
"Learn from Data: A Short Course," by Yasser Abu Mostafa, Malik Magnon-
Ismail, and Husain-Tien Lin.