You are on page 1of 4

ABSTRACT helps to convey oxygen which is needed for the functioning

of the body cells. It has been established that the average


Cardiovascular disease is the number
heart pumps blood enough to fill a modern tanker annually
one cause of death all over the world.
and beats for about four million times annually [1].
Data mining can help to retrieve valu-
able knowledge from available data Heart diseases are also known as cardiovascular diseases
from the health sector. It helps to train a (CVDs). Heart diseases happen to be the most common
model to predict patients’ health which cause of death globally. According to World Health Or-
will be faster as compared to clinical ganization (WHO), both males and females are equally
ex- perimentation. Various affected by heart disease. WHO estimated that 17.9 mil-
implementation of machine learning lion people are dead from cardiovascular disease in 2016
algorithms such as Logistic Regression, which is about 31% of all deaths worldwide. About 85%
K-Nearest Neigh- bor, Naïve Bayes of these global deaths are caused by stroke and heart attack
(NB), Support Vec- tor Machine, etc. [2].
have been applied on Cleveland heart Cardiovascular diseases result when there is a
datasets but there has been a limit to malfunction of the heart and blood vessels. One-fifth of
modeling using Bayesian Network deaths has been linked to cardiac and occurs when a
(BN). This research applied BN trigger interacts with an arrhythmic substrate [3].
modeling to discover the relation- ship
between 14 relevant attributes of the
Cleveland heart data collected from 1.1 Bayesian Belief Networks
UCI repository. The aim is to check how
the dependency between attributes af- It is a type of probabilistic graphical model. It is a direc-
fects the performance of the classifier. tional acyclic graph made up of nodes and edges. The
The BN produces a reliable and trans- nodes denote the random variables while the edges stand
parent graphical representation between for the causality. Each node has a conditional probability
the attributes with the ability to predict distribution (CPD) that shows the relationship of a node
new scenarios. The model has an accu- with its parents. The joint probability distribution is the
racy of 85%. It was concluded that the product of several conditional distributions. The equation
model outperformed the NB classifier depicts the dependency structure by a direct acyclic
which has an accuracy of 80%. graph. Where Pa (Xi) denotes the parent nodes of Xi.
This equa- tion is known as the chain rule [17]. ‘ The
conditional independence assumption made by Naïve
1 Introduction Bayes is some- times rigid, especially in situations where
the attributes are correlated. Hence the need for a more
The heart is a vital organ in the human body. It is flexible way of modeling- Bayesian Belief Network
essential for pumping blood in the circulatory system. (BBN). In BBN, we specify which pair of attribute is
The blood conditionally independent. Bayesian network has been
proved effective for uncertainty management in AI. It uses
probability theory and graph theory to represent the
relationship between nodes. BN modeling originated
from data mining and machine learn- ing. It reflects the
probabilistic influences from big data sets [17].

1.1.1 Some Basic Definition in Bayesian


Belief Network
Blocked
A path in between vertices A and B in a BN is blocked if
it passes through a vertex C in a way that either:

• Serial Connection ((A→C→B) or


((A←C←B)) or diverging (A←C→B) and C is
conditioned on
• Converging (A→C←B) and neither C nor its
de- scendants have received influence

D-separation
A and B are d-separated by C if all paths from a vertex
of A to a vertex of B are blocked. If A and B are d-
separated by C, then A is independent node C. A causal trail (A→C→B), eviden- tial trail
of B given C (A ⊥ B | C). (A←C→B), or common causal trail (A←C→B) is active
only if C is not observed. A common effect trail
Active Trail (A→C←B) is active only if either C or one of C’s
descen- dants is observed.
A trail is active, if there is a flow of
influence from a node A to B through a Markov’s Blanket
The Markov states that any node N is conditionally in-
dependent of another node given its Markov blanket. A
node’s Markov blanket includes all its parents, children 2 CONCLUSION
and children’s parents. Therefore, if a node is not present
in the class attribute’s Markov blanket, its value is not The research work developed a Bayesian network model for heart disease
relevant to the classification [18]. prediction in a human being. This model was built using the bnlearn
package in R. The goal of this research is to compare the effectiveness of
1.1.2 Learning Bayesian network the Bayesian classifiers in predicting heart diseases. We used two differ- ent
implementations of Bayesian classifier: the Bayesian Belief Network and
The two major operations in learning Bayesian network: the Naïve Bayes. The Bayesian Belief Network produced a graphical
representation of the depen- dencies between attributes. The obtained model
• Structural Learning helps us to identify the causal dependencies and conditional indepen-
• Parametric Learning dencies between attributes. Dataset was collected from the University of
California, Irvine machine learning reposi- tory. It consists of 303
Structure learning instances with 14 attributes; 13 nu- meric input attributes and one output.
The Bayesian Belief Network Outperformed the Naïve Bayes in the
This is the determination of the topology of the BN. prediction of heart diseases. This research will assist in making infer-
Three popular structure learning algorithms are discussed ences about heart diseases, thereby serving as a diagnostic tool to support
below: medical practitioners. As a future work, our model can be compared with
Constraint-based algorithms: other classifier.
This analyzes the relations using the Markov property of
BN with conditional independence tests. This can result References
in causal models even when learned from observational
data. In this algorithm, an undirected graph could be gener- [1] S. B. Meaghan George, S. heart Bleiberg, N. Alawa and D. Sanghavi.
ated and can be transformed into a BN using an Case study: Delivery and pay- ment reform in congestive failure at
additional independence test [17]. two large aca- demic centers. Elsevier, vol. 2, pp. 107-112, 2014.
Score-based algorithms: http://dx.doi.org/10.1016/j.hjdsi.2014.04.001
Algorithms assigns a number to each Bayesian network
and then maximize it using some search algorithm,
choos- ing the model with the highest score.
Hybrid algorithms:
This combines both the constraint and the score-based,
it uses a conditional independence test to minimize the
search space and the network score to identify the
network with optimal performance.
Parametric learning
This involves the calculation of the conditional probabil-
ities in a network topology. Parameter learning is of two
main types:
Maximum Likelihood Estimation (MLE):
This is the natural estimate for the CPDS. It simply uses
the relative frequencies with which the variable states have
occurred. According to MLE, we should fill the CPDs
such that P(data | model) is maximized.
Bayesian Estimation:
The Bayesian Parameter estimator starts with the already
existing prior conditional probability tables that express
our beliefs about the variables before the data was ob-
served.
[2] WHO, 2016. [Online].
https://www.who.int/en/news-
room/fact-
sheets/detail/cardiovascular-
diseases- (cvds) , 2016.
[3] P. d. Bijl, VictoriaDelgado and J.
J.Bax. Imaging for sudden cardiac
death risk stratification: Current
per- spective and future directions.
Elsevier:Progress in
Cardiovascular Diseases, vol. 62,
no. 3, pp. 205-211, 2019.
[4] S. Varun, G. Mounika, P. Sahoo and
K. Eswaran. Effi- cient System for
Heart Disease Prediction by applying
Logistic Regression. International
Journal of Com- puter Science and
Technology, vol. 10, no. 1, 2019.

You might also like