You are on page 1of 18

applied

sciences
Article
Tabular Data Generation to Improve Classification of Liver
Disease Diagnosis
Mohammad Alauthman 1, * , Amjad Aldweesh 2, * , Ahmad Al-qerem 3 , Faisal Aburub 4 , Yazan Al-Smadi 3 ,
Awad M. Abaker 5 , Omar Radhi Alzubi 6 and Bilal Alzubi 5

1 Department of Information Security, Faculty of Information Technology, University of Petra,


Amman 11196, Jordan
2 College of Computing and Information Technology, Shaqra University, Shaqra 11911, Saudi Arabia
3 Computer Science Department, Faculty of Information Technology, Zarqa University, Zarqa 13110, Jordan
4 Department of Business Intelligence and Data Analytics, University of Petra, Amman 11196, Jordan
5 Computer Science Department, College of Computing in Al-Qunfudah, Umm Al-Qura University,
Mecca 24382, Saudi Arabia
6 Computer Engineering Department, College of Computing in Al-Qunfudah, Umm Al-Qura University,
Qunfudah 24382, Saudi Arabia
* Correspondence: mohammad.alauthman@uop.edu.jo (M.A.); a.aldweesh@su.edu.sa (A.A.)

Abstract: Liver diseases are among the most common diseases worldwide. Because of the high
incidence and high mortality rate, these diseases diagnoses are vital. Several elements harm the
liver. For instance, obesity, undiagnosed hepatitis infection, and alcohol abuse. This causes abnormal
nerve function, bloody coughing or vomiting, insufficient kidney function, hepatic failure, jaundice,
and liver encephalopathy.. The diagnosis of this disease is very expensive and complex. Therefore,
this work aims to assess the performance of various machine learning algorithms at decreasing the
cost of predictive diagnoses of chronic liver disease. In this study, five machine learning algorithms
were employed: Logistic Regression, K-Nearest Neighbor, Decision Tree, Support Vector Machine,
and Artificial Neural Network (ANN) algorithm. In this work, we examined the effects of the
increased prediction accuracy of Generative Adversarial Networks (GANs) and the synthetic minority
oversampling technique (SMOTE). Generative opponents’ networks (GANs) are a mechanism to
Citation: Alauthman, M.; Aldweesh, produce artificial data with a distribution close to real data distribution. This is achieved by training
A.; Al-qerem, A.; Aburub, F.; two different networks: the generator, which seeks to produce new and real samples, and the
Al-Smadi, Y.; Abaker, A.M.; Alzubi,
discriminator, which classifies the augmented samples using supervised classifications. Statistics
O.R.; Alzubi, B. Tabular Data
show that the use of increased data slightly improves the performance of the classifier.
Generation to Improve Classification
of Liver Disease Diagnosis. Appl. Sci.
Keywords: liver diseases; GAN; data augmentation; machine learning; classifications
2023, 13, 2678. https://doi.org/
10.3390/app13042678

Academic Editor: José Machado

Received: 15 January 2023


1. Introduction
Revised: 8 February 2023 The liver is the largest organ in the body and is essential for food digestion and for
Accepted: 10 February 2023 processing the body’s toxic substances. Induced by viruses and alcohol, liver damage is life-
Published: 19 February 2023 threatening. Hepatitis, cirrhosis, liver tumors, liver cancer, and but a few common hepatitis
diseases. The main cause of death is liver failure and cirrhosis [1]. Liver disease is, therefore,
one of the world’s biggest health problems. Chronic liver disease is a significant contributor
to mortality rates globally. A range of elements that harm the liver are responsible for
Copyright: © 2023 by the authors.
this disease, such as, obesity, undiagnosed hepatitis infection and alcohol abuse. The
Licensee MDPI, Basel, Switzerland.
causes of liver disease can include abnormal nerve function, coughing up or vomiting
This article is an open access article
blood, insufficient kidney function, liver failure, jaundice, and liver encephalopathy. The
distributed under the terms and
diagnosis of this disease is very expensive and complex. About two million die of liver
conditions of the Creative Commons
disease worldwide annually [2].
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
In 2010, one million people died from cirrhosis of the liver, with millions suffering
4.0/).
from liver cancer, according to the Global Burden of Disease (G.B.D.) project published in

Appl. Sci. 2023, 13, 2678. https://doi.org/10.3390/app13042678 https://www.mdpi.com/journal/applsci


Appl. Sci. 2023, 13, 2678 2 of 18

B.M.C. Medicine [3]. For the prediction and diagnosis of liver diseases, machine learning
has significantly affected the biomedical field [4–6]. Machine learning guarantees improved
detection and prediction of diseases of biomedical interest and also enhances the objective
nature of decision making [7]. Medical problems can be easily solved, and diagnosis costs
can be reduced by using machine learning techniques. The main aspect of this study
is to predict the results more effectively and to reduce diagnostic costs in the medical
sector. Therefore, for the classification of patients with liver disease or not, we have used
different classification techniques. There were five machine learning techniques including
Logistic Regression, K-Nearest Neighbor, Decision Tree, Support Vector Machine, and
Artificial Neural Network (ANN); and different perspectives such as accuracy, precision,
remembering, and f-1 score were assessed on the performance of these techniques. Data
augmentation has been developed over the last decade by treating each table column as a
random variable, modeling a joint multivariate probability distribution, and then sampling
from that distribution.
Due to the performance and flexibility provided in representing data, the creation
of generative models using Variational Autoencoder (VAEs) and, later, GANs and their
numerous extensions [8–11] has been very effective. GANs are also used to generate tabular
data, especially in healthcare. For example, [12] uses GANs to generate continuous time-
series medical records, and [13] proposes using GANs to generate discrete tabular data. To
generate heterogeneous non-time-series continuous and/or binary data, GAN combines
an auto-encoder and a GAN [14]. ehrGAN [15] is a software that produces augmented
medical records.
Desk GAN [16] is a convolutional neural network that attempts to solve the problem
of generating synthetic data. Recent attention has been paid to the Generative Adversarial
Networks (GANs) because of their improved quality of data in a broad range of applications.
Real-world data is used to produce artificial data that appears and acts like actual data,
which significantly enhances machine learning algorithms. By using this augmented data
in the training on larger data volumes, the efficiency of prediction algorithms can be greatly
improved [17]. GANs are neural networks which train generators as well as discriminators.
These networks are involved in a cooperative game which teaches generators to produce
artificial data from real world data with the same distribution as real-world data. The
resulting data is sent to the discriminator, together with real data, which uses a conventional
supervised grade to distinguish between real and amplified data. The effect of vanilla
GANs on improving the performance of liver disease prediction is also investigated in
this paper. Two sets of different sample numbers will be expanded in order to investigate
the impact of increased data volume on results. Liver disease is then predicted using
algorithms for machine learning. To evaluate the performance of the model, classification
accuracy is used. Liver diseases are among the most common diseases worldwide.
This work is therefore aimed at assessing the performance of various algorithms of
machine learning in order to decrease the cost of predictive diagnoses of chronic liver
disease. Since GAN’s results in computer vision, text and speech are remarkable, we will
investigate it as a possible means of addressing liver disease classification. We generate
artificial data to overcome the problem of the lack of data when using the predictive model.
Building a high-quality classification model is considered a challenging task when small
amounts of data are involved. In this paper, we address the training data augmentation
issue, where a Generative Adversarial Networks GAN is used as an augmentation tech-
nique. We augmented the data by doubling and tripling the number of samples. The
goal of this augmentation is to increase the generalization performance and stabilize the
classifier to better fit the trained model as well as avoiding overfitting with a small size of
labelled samples.
This study explores applying and evaluating five classification algorithms on the
Indian Liver Patient Dataset (ILPD). The goal is to assist medical professionals in the
early diagnosis and screening of liver diseases by identifying liver patients from healthy
individuals. The algorithms are compared based on their performance factors, and the
Appl. Sci. 2023, 13, 2678 3 of 18

impact of data augmentation using GAN is also investigated. The main objective of this
research is to develop a classification model with good generalizability, meaning the ability
to perform well on new, unseen data. Previous research has shown that models with
low generalizability tend to overfit the training data, which is a significant challenge in
developing accurate and reliable models.
This paper is divided into seven sections. This section has introduced the paper’s
problem and methodology. Section 2 presents some of the previous work done in the fields
of liver disease and data augmentation. Section 3 explains the GANs framework. Section 4
provides an E.D.A. of the dataset used in this work. Section 5 presents the proposed
approach. Section 6 shows the employed data augmentation techniques. Section 7 presents
the experiments and results. Section 8 concludes the paper.

2. Related Works
2.1. Liver Disease Diagnosis
The liver is the main laboratory of the human body. About 20 million chemical
reactions per minute occur in this organ. Here, blood proteins are synthesized (for example,
immunoglobulins responsible for the so-called humoral immunity of the whole organism,
albumin, which hold the required volume of fluid in the bloodstream, and others), the
synpaper of bile acids, which are substances necessary for the digestion of food in the small
intestine, and the accumulation and breakdown of glucose as the main source of energy
for the body [18]. The liver metabolizes fats and detoxifies toxins. The slightest violation
of at least one of the functions of the liver leads to serious disturbances in the work of the
whole organism.
In mild cases, acute hepatitis is practically asymptomatic, being detected only during
random or targeted examination (for example, in the workplace among persons in contact
with hepatotropic poisons, or in the household with regard to poisoning with mushrooms,
etc.). In more severe cases (for example, with toxic hepatitis), the clinical symptoms of the
disease develop rapidly, often in combination with signs of general intoxication and toxic
damage to other organs and systems. During the disease, icteric coloration of the skin and
mucous membranes, whitish-clay-colored stools, saturated dark-colored (“beer-colored”)
urine, and hemorrhagic phenomena are characteristic. The color of the skin is orange or
saffron. However, in mild cases of jaundice, the yellowing of the skin and eyes may only be
noticeable in daylight. The first sign of jaundice is typically the yellowing of the whites of
the eyes and the soft palate’s mucous membrane. Other symptoms may include frequent
nosebleeds and petechiae, itching, slow heartbeat, depression, irritability, insomnia, and
other signs of damage to central nervous system.
The liver and spleen are slightly enlarged and slightly painful in liver disease. Its size
may decrease with especially severe lesions and the predominance of necrotic changes in
the liver (acute dystrophy).
Laboratory tests may show elevated bilirubin levels (100–300 µmol/L or higher), in-
creased activity of certain serum enzymes such as aldolase, aspartate aminotransferase, and
alanine aminotransferase (levels higher than 40 units), lactate dehydrogenase, decreased
albumin levels, and increased globulin levels. Additionally, there may be indicators of
protein deposits in sedimentary samples (such as thymol and sublimate).
The liver’s production of fibrinogen, prothrombin, and VII and V coagulation factors
is impaired because of hemorrhagic phenomena, considering the epidemiological situation
in identifying the nature and cause of the disease. In unclear cases, first, one should think
about viral hepatitis. Detection of the so-called Australian antigen is characteristic of serum
hepatitis B (it is also detected in virus carriers, but rarely in other diseases x). Mechanical
(subhepatic) jaundice usually occurs acutely only when a stone in cholelithiasis blocks the
common bile duct. However, in this case, jaundice is preceded by an attack of biliary colic;
bilirubin in the blood is mostly straight, and the stool is discolored. With hemolytic adrenal
jaundice, free (indirect) bilirubin is determined in the blood, the stool is intensely colored,
and the osmotic resistance of erythrocytes is usually reduced. In the case of false jaundice
Appl. Sci. 2023, 13, 2678 4 of 18

(due to skin staining with carotene with prolonged and abundant consumption of oranges,
carrots, and pumpkin), the sclera is usually not colored, and hyperbilirubinemia is absent.
Scanning the liver allows for the determination of its size; with hepatitis, sometimes there
is a reduced or uneven accumulation of a radioisotope drug in the liver tissue. In some
cases, increased accumulation in the spleen occurs.
The research by Jeyalakshmi et al. [19] focuses on developing a prediction system
for liver disease using a specific type of machine learning algorithm called Convolutional
Neural Network (CNN). The aim is to improve the accuracy of liver disease diagnosis by
utilizing the CNN algorithm. The performance of the proposed system is evaluated and
compared to other traditional machine learning methods. The results show that using CNN
improves accuracy in predicting liver disease compared to other methods.
The research paper by Islam et al. [20] focuses on machine learning and deep learning
methods to analyze the essential factors affecting liver disease and make predictions about
the presence of liver disease. The study applies various algorithms such as decision trees,
random forests, and deep neural networks to the Indian Liver Patient Dataset (I.L.P.D.) and
compares their performance. The results showed that the deep neural network achieved
the highest accuracy in predicting liver disease, and essential factors affecting liver disease
were identified through feature importance analysis.
Sravani et al. [21] focuses on using machine learning algorithms to predict liver
diseases. The study evaluates the performance of various classification algorithms, such
as k-nearest neighbors, decision trees, random forests, and support vector machines in
detecting liver diseases. The performance of these algorithms is evaluated in terms of
accuracy, sensitivity, specificity, and area under the receiver operating characteristic curve.
The results show that the random forest algorithm best-predicted liver maladies.
Belavigi et al. [22] proposed a comparison of Rprop and SAG on text datasets and CNN
on image datasets. This study uses deep learning for early liver disease prediction. All
three methods use liver-related text and image datasets as inputs. Accuracy is determined
by comparing the models’ respective outputs. Compared to results using word datasets,
image datasets are more reliable.
Singh et al. [23] designed software based on classification algorithms (including logistic
regression, random forest, and naive Bayes) to predict the risk of liver disease from a data
set with liver function test results.
Differential diagnosis in cases with a vivid clinical picture of diffuse liver damage
should be carried out with liver cirrhosis. With cirrhosis, the symptoms of the disease
are more pronounced, and the liver is usually much denser than with hepatitis; it can
be increased but often also decreased in size (atrophic phase of cirrhosis). As a rule,
splenomegaly is observed, hepatic signs (vascular telangiectasias, hepatic tongue, hepatic
palms) are often detected, and symptoms of portal hypertension may occur. Laboratory
studies show significant deviations from the norm in the results of the so-called liver
tests, with puncture biopsy—disorganization of the liver structure and the significant
proliferation of connective tissue.

2.2. Data Augmentation


Increasing the variety of data samples is an excellent way to improve the performance
of machine learning algorithms. This technology is predominantly utilized in computer
vision applications. Simple algorithms can be used, such as transformations on existing
images (flipping, cropping, and rotation) [18,24]. However, these methods only modify the
existing samples and do not generate new ones. To address this issue, researchers have
begun adapting algorithms for increasing data more effectively in tabular data.
Synthetic data is a type of data that is generated by combining real-world characteris-
tics and sampling from a distribution. Each approach to generating synthetic data has its
unique advantages and disadvantages. Historically, synthetic data has been used for data
anonymization and software testing, an is sometimes even intertwined. With the rise of Big
Data and deep learning, privacy has become an increasingly important concern, making
Appl. Sci. 2023, 13, 2678 5 of 18

creating realistic synthetic data a critical aspect. A study published in [16] implements
an uncontrolled data enhancement method in a semi-supervised learning environment to
make the model more consistent when training both real and unmarked data. This method
replaces random noise with noise that is more like the real world. This makes it easier
to predict what will happen based on real data and predictions based on data that has
been improved. Another study [25] presents a data enhancement method for training a
convolutional neural network (CNN) known as random erasing.
However, this approach faces the challenge of occlusion, which is prevalent in using
CNNs. To address this challenge, the study [24] proposes AutoAugment. This process
identifies the most suitable data enhancement policy using reinforcement learning to find
the optimal combination of decisions and feature order for maximum accuracy. Addition-
ally, the authors of [26] propose the Fast AutoAugment algorithm, motivated by Bayesian
data enhancement. This algorithm optimizes the search for missing data points and results
in faster search times and improved error rates compared to AutoAugment. Meanwhile,
the population-based increase (P.B.A.) algorithm replaces the fixed increase policy during
training periods, producing the optimal schedule for each training period of increased
policies. This algorithm requires less computing power and training time, as demonstrated
in [27].
The current research on liver disease diagnosis using machine learning primarily fo-
cuses on using existing datasets and applying different algorithms for prediction. However,
there needs to be more knowledge about using newly generated tabular data to improve
liver disease diagnosis. This type of research would involve creating new datasets from
relevant sources and using these datasets to train machine learning models for improved
predictions. Creating these new datasets could involve data pre-processing, feature selec-
tion, and data generation techniques such as synthetic data generation and oversampling to
overcome class imbalance issues. The use of newly generated tabular data has the potential
to provide more accurate and robust predictions in liver disease diagnosis as compared to
the use of existing datasets. Further research in this area can help fill this knowledge gap
and advance the field of liver disease diagnosis using machine learning.

3. I.L.P.D. Dataset: Exploratory Data Analysis


E.D.A. is an iterative brainstorming around the dataset and has no strict instructions.
One question can be asked based on ideas already formed prior to analysis, but another
can be asked after a pattern or outlier has been noticed. This means that E.D.A. does not
provide an exhaustive list of approaches available for data collection considering the small
number of processes that can be taken to understand the data being studied.

3.1. Dataset Description


In this paper, the databases of 583 entries are obtained from the I.L.P.D. (Indian Liver
Patient Dataset). Using the U.C.I. Machine Learning Repository (https://archive.ics.uci.
edu/ml/index.php (accessed on 1 January 2023)), the dataset was downloaded successfully.
Inside the I.L.P.D. dataset is thorough information regarding 583 Indian liver patients. In
particular, 416 of these are liver patients, and the rest of the 167 are non-liver patients. The
dataset was collected from the northeast of Andhra Pradesh in India. A class label called
selector is used to split patients into liver and non-liver categories. It should be noted that
patients of 89 years and older are specified as the age of “90”. Some information on the
attributes are shown in Table 1.
Appl. Sci. 2023, 13, 2678 6 of 18

Table 1. Description of Liver patient dataset.

Sl. No Attribute Name Attribute Type Attribute Description


1. Age Numeric Age of the patient
2. Sex Nominal Gender of the patient
3. Total Bilirubin Numeric Quantity of total bilirubin in patient
4. Direct Bilirubin Numeric Quantity of direct bilirubin in patient
Alkphos Alkaline
5. Numeric Amount of A.L.P. enzyme in patient
Phosphatase
Sgpt Alamine
6. Numeric Amount of S.G.P.T. in patient
Aminotransferase
Sgot Aspartate
7. Numeric Amount of S.G.O.T. in patient
Aminotransferase
8. Total Proteins Numeric Protein content in patient
9. Albumin Numeric Amount of albumin in patient
Albumin and Fraction of albumin and globulin in
10. Numeric
Globulin Ratio Patient
11. Class Numeric [1,2] Status of liver disease in patient

3.2. Exploratory Data Analysis


Most of the results we derived from this analysis confirm the research and other
resources available (Kaggle website Indian liver disease analysis); such patterns must
be confirmed in several sets to be beneficial for data understanding. Furthermore, more
data allows us to zoom in on multiple facets in the data. In the analysis, E.D.A. is only
a tool that gives the analyst insight into the nature of the data since it includes the word
«explorational». There are stronger statistical procedures such as modelling and inference,
which can help us make better use of data and draw more solid conclusions.
We have found that the results of blood tests may vary between men and women with
liver disease. Without solid evidence, the conclusion could be reached that men are more
severely affected by liver disease than women or that there is a sexual confounder (e.g.,
physiological differences or risk factors of the behavior) which causes the difference. On
the other hand, gender does not seem to influence the presence or lack of liver disease,
i.e., every gender has “shared” relatively positive and negative cases. It is worth noting,
however, that the average age diagnosed as positive is higher in both women and men than
the average age diagnosed as negative.
A matrix for correlation, as shown in Figure 1, can be an extremely concise way of
examining links between numerical variables. Some of the surprisingly strong correlations
between ALT (both liver enzymes), AST and total protein (the former is only the most
plentiful), and albumin (being a protein in abundance) are direct bilirubins with total
proteins. This is also high and not as large as other strong correlations because globulins
introduce some variability into the ratio, which is not shared with albumin. The ratio is not
as large.
Other strong correlations exist between direct bilirubin and total bilirubin (as the
former is only a component of the latter) and between AST and ALT (both liver enzymes)
and albumin (which is the most abundant protein). This is also high and not as large as
other strong correlations because globulins introduce some variability into the ratio that is
not shared with albumin, as seen from the pairplot for clarification in Figures 2 and 3.
Appl.
Appl.Sci. 2023,13,
Sci.2023, 23,2678
x FOR PEER REVIEW 77 of 18
18
Appl. Sci. 2023, 23, x FOR PEER REVIEW 7 of 18

Figure1.1.
Figure
Figure 1.Simple
SimpleCorrelation
Simple Correlation
Correlation Plot-
Plot- liver
liver
Plot- disease.
disease.
liver disease.

Figure 2.
Figure 2. Pair
Pair Plot-
Plot- liver
liver disease.
disease.
Figure 2. Pair Plot- liver disease.
Appl.
Appl.Sci. 2023,13,
Sci.2023, 23,2678
x FOR PEER REVIEW 88 of
of 18
18

Figure3.3.Pair
Figure PairPlot-
Plot-a aliver
liver disease
disease with
with label
label categories.
categories.

AWepair-plot is a graphical
see a negative trend. tool
Eventhat allows
weakly, visualizing
higher values the
are pairwise
associated relationships be-
with the lower
tween multiple variables in a dataset. Each variable is plotted against all
A/G ratio of direct bilirubin. It makes sense because both the low A/G ratio and the highthe other variables
in the dataset,
direct bilirubinresulting
indicateinliver
a grid of scatterplots.
disease. A pair plot
After removing can help
outliers, identify patterns
the coefficient and
of correla-
trends in the relationships
tion between two variables between variables
may indicate and is useful
a stronger for exploring
correlation. the structure
It is important of that
to note the
data, as we
outliers arecan
notsee in in Figure
necessarily 2.
incorrect; they may represent a distinct group of observations
that On the be
should other hand,rather
isolated a pairthan
plotrejected.
with label categories Figure 3 is a variation of the
basic pair plot that includes information about label categories. Label categories are
categorical variables that provide additional information about each data point, such as
4. Model Construction
groupInmembership
Figure 4, weorcan a binary classification
see a flowchart thatlabel.
outlines the process for creating the model.
We see a negative trend. Even weakly, higher values are associated with the lower
This chart provides an overview of the system used to enhance the data and carry out
A/G ratio of direct bilirubin. It makes sense because both the low A/G ratio and the high
classification. The augmented data generated from this system will be used to train and
direct bilirubin indicate liver disease. After removing outliers, the coefficient of correlation
develop the final model using various machine-learning algorithms. To get started, the
between two variables may indicate a stronger correlation. It is important to note that
raw data is split into training and test data sets. The raw training data is then enhanced,
outliers are not necessarily incorrect; they may represent a distinct group of observations
and the raw and enhanced training data are combined to train the machine learning algo-
that should be isolated rather than rejected.
rithms. The test data that was separated earlier is then used to evaluate the performance
ofModel
4. the trained models. The GAN method was chosen in this research because it has shown
Construction
high training stability and sample quality, as reported in previous studies. To make the
In Figure 4, we can see a flowchart that outlines the process for creating the model.
predictive model better at identifying the decision boundaries between the two classes in
This chart provides an overview of the system used to enhance the data and carry out
the liver disease data set, the original sigmoid cross-entropy loss function for the discrim-
classification. The augmented data generated from this system will be used to train and
inator was used during data augmentation. For comparison purposes, the SMOTE was
develop the final model using various machine-learning algorithms. To get started, the raw
also is
data used ininto
split the study.
training and test data sets. The raw training data is then enhanced, and
the raw and enhanced training data are combined to train the machine learning algorithms.
The test data that was separated earlier is then used to evaluate the performance of the
trained models. The GAN method was chosen in this research because it has shown high
training stability and sample quality, as reported in previous studies. To make the predictive
model better at identifying the decision boundaries between the two classes in the liver
disease data set, the original sigmoid cross-entropy loss function for the discriminator was
used during data augmentation. For comparison purposes, the SMOTE was also used in
the study.
Appl. Sci. 2023, 23,
13, x
2678
FOR PEER REVIEW 9 9of
of 18

Figure 4. The
Figure 4. The architecture
architectureofofthe
theproposed
proposed approach.
approach.

5. Classification
5. Classification Algorithms
Algorithms
5.1. Artificial Neural Networks (ANN)
5.1. Artificial Neural Networks (ANN)
Artificial Neural Networks (ANNs) are a class of machine learning algorithms inspired
Artificial Neural Networks (ANNs) are a class of machine learning algorithms in-
by the human brain’s structure and function [28,29]. They consist of interconnected nodes
spired by the human
called artificial neurons,brain's
which structure
processand
andfunction
transmit[28,29]. They consist
information. of interconnected
ANNs are highly flexible
nodes called artificial neurons, which process and transmit information.
and can be used for many tasks, including image recognition, natural language ANNsprocessing,
are highly
flexible and can be used for many tasks, including image recognition, natural
and prediction. They can also handle complex, non-linear relationships between inputs language
and
processing, and prediction. They can also handle complex, non-linear relationships
outputs and learn from large amounts of data. However, ANNs can be prone to overfitting be-
tween inputs and outputs and learn from large amounts of data. However,
and require a significant amount of computational resources and careful tuning of their ANNs can be
prone to overfittingDespite
hyperparameters. and require
thesealimitations,
significant amount
ANNs haveof computational resources
become a popular and and care-
powerful
ful tuning of their hyperparameters. Despite these limitations, ANNs have become
tool for many machine-learning applications due to their ability to model complex patterns a pop-
ular and
in data. powerful tool for many machine-learning applications due to their ability to
model complex patterns in data.
5.2. Support Vector Machines (SVM)
5.2. Support
Support Vector
VectorMachines
Machines (SVM)
(SVM) are a popular automated learning method used for
Support Vector
classification and areMachines
linked to (SVM) are algorithms
the latest a popular automated
[16,30,31]. learning methodcalculates
This technique used for
the margin between
classification and areclasses
linked and separates
to the them with
latest algorithms deliberate
[16,30,31]. margins.
This techniqueThecalculates
aim is to
maximize
the marginthe distanceclasses
between betweenandclasses, thereby
separates themreducing errors in the
with deliberate classification
margins. The aim process.
is to
The SVM algorithm
maximize the distance is highly
betweeneffective
classes,inthereby
classifying non-linear
reducing errorsdata andclassification
in the is widely used in
pro-
various
cess. Thedomains such as text
SVM algorithm classification,
is highly effectiveimage recognition,
in classifying and bioinformatics
non-linear data and is [32].
widely
used in various domains such as text classification, image recognition, and bioinformatics
5.3. Decision Trees (DT)
[32].
A decision tree predicts outcomes by dividing the input space into smaller subspaces
through
5.3. recursive
Decision Trees (DT) partitioning. The instance tree method is a catalyst for the last node.
It hasAits
decision tree referred
rules. It is to as the tree
predicts outcomes structurethe
by dividing resemblance
input spacetointo
thesmaller
node number.
subspacesIn
most cases, there are two options: a decision node or a leaf node. The data
through recursive partitioning. The instance tree method is a catalyst for the last node. Itbegins from the
rootits
has and top It
rules. ofisthe node intothe
referred lower
as the treepart of the data
structure category.toWe
resemblance theadopt
node this classification
number. In most
type in this research, starting with the most common standard binary divisions. The C4.5
cases, there are two options: a decision node or a leaf node. The data begins from the root
decision tree, distributed by Ross Quinlan in 1993, will also be used [33,34]. In addition to a
and top
rather traditional method, this method has the power and speed to express data structures.
Moreover, the diversity of options has good effects on the grading process and the quality
of the node in the lower part of the data category. We adopt this classification type
of its results.
in this research, starting with the most common standard binary divisions. The C4.5 deci-
sion tree, distributed
5.4. K-Nearest Neighbours by Ross Quinlan
Algorithm in 1993, will also be used [33,34]. In addition to a
(K.N.N.)
rather traditional method, this method has the power and speed to express data struc-
K-Nearest Neighbors (K.N.N.) is a widely used instance-based learning algorithm
tures. Moreover, the diversity of options has good effects on the grading process and the
for classification and regression problems. It operates by finding the K closest points in
quality of its results.
the training set to a new data point, using a specified distance metric, and assigning the
majority class label or the average target value of the nearest neighbors as the prediction
5.4. K-Nearest Neighbours Algorithm (K.N.N.)
for the new data point [30,35]. The strengths of K.N.N. include its simplicity, its ability to
Appl. Sci. 2023, 13, 2678 10 of 18

handle multi-class problems, and its suitability for large datasets. However, its performance
can be affected by the choice of a distance metric, sensitivity to irrelevant features and the
scale of the data, and the computational cost of finding the nearest neighbors. Despite its
limitations, K.N.N. remains a popular and effective machine learning method.

5.5. Logistic Regression Classifier (L.R.)


The Logistic Regression Classifier (L.R.) [34,36] is a widely used supervised learning
algorithm for binary and multi-class classification problems. It models the relationship
between a set of input features and the probability of a binary outcome through a logistic
function. L.R. optimizes its parameters by maximizing the likelihood of the observed class
labels in the training data. The optimized model can then be used to make predictions on
new data points by computing the estimated probability of each class and choosing the
class with the highest probability as the prediction. L.R. has several advantages, including
its simplicity, interpretability, and the availability of efficient algorithms for optimization.
However, it can be sensitive to outliers and require the independence assumption between
features. Despite these limitations, L.R. remains a widely used and effective method for
binary and multi-class classification problems.

6. Data Augmentation Methods


6.1. Generative Adversarial Networks
In the area of data augmentation, generative models that include adversarial networks
have recently come to be regarded as one of the most fascinating and potentially fruitful
strategies. This is because the generation of images has significantly improved because of
incorporating these models. GANs are superior to other algorithms in sample quality and
usability across a broad variety of applications, including translating text to images. This is
because GANs can learn from their missteps. It is important to train not just one but two
deep neural networks in parallel for GANs to function properly. The first network, referred
to as the generator, is in charge of producing synthetic data by imitating the training
data distribution. The discriminator is the second component of a generative adversarial
network (GAN) and its function is to differentiate between real and fake data by utilizing
traditional supervised learning techniques [9]. The quality of the generated samples is
evaluated by the discriminator to determine if they are sufficiently accurate to distinguish
between genuine and false data. This determination is made by the network’s ability to
determine whether the produced samples are of high enough quality. Algorithm 1 shows
the GAN pseudocode.

Algorithm 1: Generative Adversarial Networks (GANs) Algorithm


For number of training iterations do:
For K steps do: h i
Sample minibatch of m noise samples z(1) , . . . , z(m) from noise prior pg (z)
h i
Sample minibatch of m examples x(1) , . . . , x(m) from data-generating distribution pdata (x)
Update the discriminator by ascending it is stochastic gradient
m h      i
∇θd m1 ∑ logd x(i) + log 1 − d g z(i)
i=1
End For
Sample minibatch of m noise samples z(1) , . . . , z(m) from noise prior pg (z)
Update the generator by descending it is stochastic gradient
m h    i
∇θd m1 ∑ log 1 − d g z(i)
i=1
End For

6.2. Synthetic Minority Oversampling Technique


If the data are not evenly distributed, a problem known as unbalanced target classes
may occur, in which classifiers used by machine learning systems may be biased in favor of
Appl. Sci. 2023, 13, 2678 11 of 18

one category over another. However, the distribution of classes that was supposed to occur
differs significantly across datasets. As a result, we chose to make use of the Synthetic
Minority Oversampling Method (SMOTE) to guarantee that all of the data was distributed
fairly. SMOTE [37] produces synthetic data samples to increase the number of minority
class data samples by first locating the K nearest neighbors, measuring the distance between
them, and then increasing the distance by a random value between 0 and 1.

7. Experiments and Evaluation


The 10-fold cross-validation is widely used as the standard for conducting experiments.
The data is divided into 10 equal parts, referred to as subsets, with one subset being used for
testing while the rest are utilized for training. This process is repeated 10 times, with each
subset being used for testing only once, ensuring that every instance in the feature matrix
is included in both the testing and training phases. The results obtained from each iteration
are then averaged to produce a single classification rate after being repeated 10 times.

7.1. Evaluation Performance Measures


The evaluation of the various classifiers on the original dataset before augmentation
(No-AUG) after doubling the dataset (DD-AUG) and tripling the dataset (TD-AUG) was
conducted using various measures. The Receiver Operating Characteristic (ROC) param-
eters, including True Positive (TP), False Negative (FN), False Positive (FP), and True
Negative (TN), were utilized to evaluate and compare the performance. TP denotes the
number of accurate positive diagnoses where the individual is classified as healthy and
is indeed healthy; TN denotes the individual who is classified as infected and is indeed
infected; and FP denotes the individual who is classified as infected but is healthy. FN
denotes the individual who is classified as healthy but is infected. The ROC parameters
demonstrate the consistency of the used classifier.
In addition to the ROC parameters, this study also employs the measures of Sensitivity,
Precision, F1, and Specificity to evaluate the model. These performance metrics are derived
from the confusion matrices and are calculated using the following equations

| TN | + | TP|
Accuracy = (1)
| TN | + | TP| + | FN | + | FP|

| TN |
Speci f icity= (2)
| TN | + | FP|
| TP|
Sensitivity = (3)
| TP| + | FN |
2 ∗ | TP|
F1 − Measure = (4)
2 ∗ | TP| + FN + | FP|
It is not easy to get all aspects of performance in a contingency table, because only
half of its information is used to calculate PPV, N.P.V., Sensitivity and Specificity. The
Matthews correlation coefficient (MMC) and the precision factor can benefit from four
numbers (all existing numbers), providing a more comprehensive, more balanced and
representative representation than if they are linear or vertical. In all of the measures that
have been put forward here, we can generalize that a higher value is better, but the MCC
method doesn’t follow this theory. If we take the values from 0 to 1 (0, −1, 1), 1 represents
perfect correlation, 0 represents random distribution, and −1 represents perfect negative
correlation. In risky cases, class imbalance affects the MCC and the level of accuracy. MCC
has been developing slowly compared to others. Of 75% of cases predicted, MCC has
achieved only 0.5. Based on random results, 50% of the cases that were predicted correctly
(positive and negative) gave a value of 0, unlike other measures that provided 0.5, such as
accuracy, N.P.V., Sensitivity, PPV and Specificity. It is reliable to examine performance in
data that has been evenly distributed.
Appl. Sci. 2023, 13, 2678 12 of 18

In this form, biases can be considered as deviations from the relationships. We must
evaluate all measures to imagine an integrated form of how to perform a prediction. To
obtain a clear picture of the performance of the prediction, we can use a receiver operating
characteristics (R.O.C.) study. We can analyze this from finding the appropriate classifier. It
also illustrates the tradeoffs between the Specificity on one hand and the Sensitivity on the
other. There are special programs to draw R.O.C. curves, and we can rely on them when our
predictor type is dependent on probabilities and provides a good degree of classification.
However, we must be aware that this degree is not a real value for P, but it represents its
reliable value in ranking for predictions.
Sensitivity or recall [38] is one of the reliable measures which are used to verify the
ability of a model or system to retrieve the situation in a non-defect case. Depending on the
formula’s calculations, the score for Sensitivity or recall is calculated between [0, 1] in an
interval. However, for Specificity, it is one of the reliable measures to verify the ability of
a model or system that cannot retrieve the condition in a non-defect case. Depending on
the calculations of this formula, the score of Specificity is calculated between [0, 1] in an
interval [39].

7.2. Experimental Results


In this work, various measures are applied to evaluate differential evolution’s per-
formance and efficiency and to build upon different algorithms for supervised machine
learning. This section focuses on how measures in different types and situations are imple-
mented. The results of tests conducted to assess the proposed method’s effectiveness are
presented, including a study of the impact of different distance methods on classification
accuracy. Information about the performance of five different types of classifiers, including
(ANN), (SVM), (L.R.), (D.T.), and K-nearest neighbor (K-NN) with data augmentation (NO-
AUG), double data augmentation (DD-AUG), and triple data augmentation (TD-AUG), is
provided in Tables 2–7. All experimental results are based on the average of 10-fold cross-
validation. The results of the SVM technique are presented in Table 2, and the experiments
conducted with different data set increments showed little difference in results.

Table 2. Evaluation of SVM Classifier for NO-AUG, DD-AUG and TD-AUG.

GAN SMOTE
Case
Accuracy Recall Precision F-measure Accuracy Recall Precision F-Measure
NO-AUG 0.70669 0.98558 0.71304 0.82745 0.8237 0.824 0.832 0.822
SVM DD-AUG 0.71254 0.98745 0.72004 0.83215 0.9182 0.918 0.923 0.918
TD-AUG 0.71689 0.989104 0.73041 0.83545 0.9473 0.947 0.950 0.947
AVG (AUG) 0.71472 0.988277 0.72523 0.8338 0.9328 0.933 0.937 0.933

Table 3. Evaluation of D.T. Classifier for NO-AUG, DD-AUG and TD-AUG.

GAN SMOTE
Case
Accuracy Recall Precision F-Measure Accuracy Recall Precision F-Measure
NO-AUG 0.60163 0.71875 0.73105 0.72485 0.9462 0.946 0.948 0.946
D.T. DD-AUG 0.59455 0.74575 0.76924 0.74674 0.9746 0.975 0.975 0.975
TD-AUG 0.59278 0.7575 0.76553 0.74934 0.9829 0.983 0.983 0.983
AVG 0.59367 0.75163 0.76739 0.74804 0.9788 0.979 0.979 0.979
Appl. Sci. 2023, 13, 2678 13 of 18

Table 4. Evaluation of kNN Classifier for NO-AUG, DD-AUG and TD-AUG.

GAN SMOTE
Case
Accuracy Recall Precision F-Measure Accuracy Recall Precision F-Measure
NO-AUG 0.67067 0.8101 0.74889 0.77829 0.9907 0.991 0.991 0.991
K.N.N. DD-AUG 0.69455 0.8145 0.75321 0.78524 0.9951 0.995 0.995 0.995
TD-AUG 0.69122 0.8784 0.77001 0.79245 0.9968 0.997 0.997 0.997
AVG 0.69289 0.8465 0.76161 0.78885 0.996 0.996 0.996 0.996

Table 5. Evaluation of L.R. Classifier for NO-AUG, DD-AUG and TD-AUG.

GAN SMOTE
Case
Accuracy Recall Precision F-Measure Accuracy Recall Precision F-Measure
NO-AUG 0.74889 0.9976 0.71306 0.83166 0.9624 0.962 0.963 0.962
L.R. DD-AUG 0.75451 0.9862 0.71786 0.83517 0.9851 0.985 0.985 0.985
TD-AUG 0.75007 0.9954 0.71724 0.84006 0.9893 0.989 0.989 0.989
AVG 0.75229 0.9908 0.71755 0.83762 0.9872 0.987 0.987 0.987

Table 6. Evaluation of ANN Classifier for NO-AUG, DD-AUG and TD-AUG.

GAN SMOTE
Case
Accuracy Recall Precision F-Measure Accuracy Recall Precision F-Measure
NO-AUG 0.6964 0.85817 0.75158 0.80135 0.5588 0.559 0.559 0.549
ANN DD-AUG 0.7024 0.8754 0.75471 0.80002 0.5473 0.547 0.546 0.546
TD-AUG 0.7094 0.8813 0.75004 0.80081 0.5035 0.504 0.527 0.499
AVG 0.7059 0.8784 0.75238 0.80042 0.5254 0.526 0.537 0.523

In Table 2, all cases that achieved approximately 0.71% accuracy are observed for all
Class 2 cases, such as NO-AUG, DD-AUG, and TD-AUG. With regard to the F-measurement,
the results are 0.83. The output ranged in precision from 0.71304 to 0.73041 for the TD-AUG
sets. On the basis of these results, we can conclude that this classification does not have
good results that make it an inappropriate way for people who are healthy to diagnose
liver disease in terms of the different status of the patient. SMOTE, on the other hand,
outperforms GAN in terms of accuracy, with an increase ranging from 12 to 22%.
Nonetheless, GAN outperformed SMOTE in terms of precision recall and F-measure.
In light of this, GAN outperforms SMOTE in terms of model stability. Notably, the ac-
curacy increase with data augmentation is quite stable. For clarity, accuracy improves
somewhat as data creation progresses. Additionally, the average of evaluation metrics
was calculated to emphasize the performance of data augmentation techniques. The find-
ings reveal that SMOTE surpasses GAN with an average accuracy of 93% across all data
augmentation cases.
In Table 3, all cases achieved around 0.60% of accuracy are observed, ad for all Class
2 cases NO-AUG, DD-AUG and TD-AUG we note the decrease in the accuracy of the
samples. As regards F-measurement, the results are 0.74. The output ranged in precision
from 0.73105 to 0.76553 for the TD-AUG sets. Based on these findings, we may deduce that
this classification does not provide satisfactory outcomes and is hence unsuitable for use
by healthy individuals in the diagnosis of liver disease in patients with varying degrees of
illness. SMOTE, on the other hand, is superior to additional data augmentation in terms
of its ability to enhance accuracy. Additionally, improvements may be noticed in terms of
Appl. Sci. 2023, 13, 2678 14 of 18

accuracy, recall, and F-measure. SMOTE’s average accuracy of 97% is a gain in accuracy of
38% when compared to GAN’s average accuracy of 59%.

Table 7. Comparison with other research work.

Research Title Method and Results


Accurate liver disease
MCNN-LDPS: 90.75%
[19] prediction system using
M.L.P.N.N.: 86.70%
convolutional neural network
Statistical Analysis and
ANN 76.07%, DTREE 76.07%,
Identification of Important
R.Forest 74.36%, SVM 74.35%,
Factors
[20] MLP 74.36%, GNB 74.50%,
of Liver Disease using
KNN 78.63%,
Machine Learning and Deep
Logistic Regression 73.50%
Learning Architecture.
Prediction of Liver Malady
ANN 94.09%
[21] Using Advanced
SVM 78.09%
Classification Algorithms
Rprop: 69.41%
Prediction of Liver Disease
[22] S.A.G.: 68.82%
using Rprop, S.A.G. and CNN
CNN: 96.07%
Software-based prediction of
liver disease with feature LR, SMO, RF, NB, J48, IBk.
[23]
selection and classification The best result is L.R.: 77.4%
techniques
F-measure of 80.1%, a precision of
Supervised Machine Learning
80.4%, and an A.U.C. equal to 88.4%
[40] Models for Liver Disease Risk
after SMOTE with 10-fold
Prediction
cross-validation.
A Hybrid Machine Learning
algorithm for Heart and Liver Recall (SVM): 62.93
Disease Prediction Using Recall (P.S.O.S.V.M.): 83.62
[41]
Modified Particle Swarm Recall (C.P.S.O.S.V.M.): 96.55
Optimization with Support Recall (CCPSOSVM): 97.41
Vector Machine
Statistical Machine Learning
[42] Approaches to Liver Disease The RF: 98.14% accuercy.
Prediction
Prediction of fatty liver
The accuracy of R.F., NB, ANN, and
[43] disease using machine
LR 87.48, 82.65, 81.85, and 76.96%.
learning algorithms
ANN: 0.932 with SMOTE
Tabular Data Generation to SVM: 0.9328 with SMOTE
Proposed approach Improve Classification of LR: 0.9872 with SMOTE
Liver Disease Diagnosis DT: 0.9788 with SMOTE
K-NN: 0.996 with SMOTE

In Table 4, all cases that achieved approximately 0.69% of accuracy are observed, ad
for all Class 2 cases, such as NO-AUG, DD-AUG, and TD-AUG, we note the decrease in the
accuracy of the samples. With regard to F-measurement, the results are 0.79. The output
ranged in precision from 0.74889 to 0.77001 for the TD-AUG sets. Based on these data, it
appears that the existing classification scheme is inappropriate for discriminating between
the various phases of liver disease in otherwise healthy individuals. However, in basic data
augmentation scenarios, SMOTE exhibits a greater improvement in outcomes than GAN,
with an average accuracy of 99%.
for all Class 2 cases, such as NO-AUG, DD-AUG, and TD-AUG, we note the decrease in
the accuracy of the samples. With regard to F-measurement, the results are 0.79. The out-
put ranged in precision from 0.74889 to 0.77001 for the TD-AUG sets. Based on these data,
it appears that the existing classification scheme is inappropriate for discriminating be-
Appl. Sci. 2023, 13, 2678 tween the various phases of liver disease in otherwise healthy individuals. However, 15 ofin
18
basic data augmentation scenarios, SMOTE exhibits a greater improvement in outcomes
than GAN, with an average accuracy of 99%.
Table 5 shows that all cases achieved around 0.75% of accuracy are observed, and for
Table 5 shows that all cases achieved around 0.75% of accuracy are observed, and for
all Class 2 cases, including NO-AUG, DD-AUG, and TD-AUG we note the decrease in the
all Class 2 cases, including NO-AUG, DD-AUG, and TD-AUG we note the decrease in the
accuracy of the samples. As regards F-measurement, results are 0.84. The output ranged
accuracy of the samples. As regards F-measurement, results are 0.84. The output ranged in
in precision from 0.71306 to 0.71786 for the TD-AUG sets. Given these data, we can con-
precision from 0.71306 to 0.71786 for the TD-AUG sets. Given these data, we can conclude
clude that this classification is an inaccurate method for identifying liver sickness in indi-
that this classification is an inaccurate method for identifying liver sickness in individuals
viduals who are otherwise healthy. This is because it fails to account for individual varia-
who are otherwise healthy. This is because it fails to account for individual variations in
tions in liver function. On the other hand, in comparison to GAN, SMOTE demonstrates
liver function. On the other hand, in comparison to GAN, SMOTE demonstrates better
better stability in terms of improvement in accuracy. Despite this, we found that employ-
stability in terms of improvement in accuracy. Despite this, we found that employing L.R.
ing L.R. with SMOTE performs much better than GAN, with an average accuracy of 98%
with SMOTE performs much better than GAN, with an average accuracy of 98% across
across all cases.
all cases.
In Table 6, all cases that achieved around 0.71% of accuracy are observed, and for all
In Table 6, all cases that achieved around 0.71% of accuracy are observed, and for all
Class 2 cases NO-AUG, DD-AUG, and TD-AUG, we note the increase in the accuracy of
Class 2 cases NO-AUG, DD-AUG, and TD-AUG, we note the increase in the accuracy of the
the samples. With regard to F-measurement, the results are 0.80. The output ranged in
samples. With regard to F-measurement, the results are 0.80. The output ranged in precision
precision from 0.75004 to 0.75471 for the TD-AUG sets. Although GAN improves accu-
from 0.75004 to 0.75471 for the TD-AUG sets. Although GAN improves accuracy, SMOTE
racy, SMOTE shows a loss in overall accuracy. To the best of our understanding, the
shows a loss in overall accuracy. To the best of our understanding, the higher the quantity
higher the quantity of data, the larger the classifier size should be to anticipate better per-
of data, the larger the classifier size should be to anticipate better performance. In this study,
formance. In this study, we use ANN with 10 epochs. As a result, we concluded that ANN
we use ANN with 10 epochs. As a result, we concluded that ANN performs better with
performs better with GAN overall data sizes, even with the same neural network size.
GAN overall data sizes, even with the same neural network size. ANN is also more stable
ANN is also more stable when utilizing GAN. Based on the data, it can be determined
when utilizing GAN. Based on the data, it can be determined that the classification does
that the classification does not yield satisfactory results, thus indicating that SMOTE is an
not yield satisfactory results, thus indicating that SMOTE is an inadequate approach for
inadequate approach for diagnosing liver disease in individuals considered healthy. The
diagnosing liver disease in individuals considered healthy. The accuracy of the five machine
accuracy of the five machine classifiers is compared in Figure 5. The figure shows that
classifiers is compared in Figure 5. The figure shows that most classifiers’ classification
most classifiers’ classification performance improves for double and triple data augmen-
performance improves for double and triple data augmentation (DD-AUG, TD-AUG). In
tation (DD-AUG, TD-AUG). In addition, it shows that SMOTE outperforms GAN in over-
addition, it shows that SMOTE outperforms GAN in overall data augmentation cases.
all data augmentation cases.

0.8
Accuracy

0.6

0.4

0.2

0
SVM DT KNN ANN LR

Machine learning classifiers


NO-AUG (GAN) NO-AUG (SMOTE) DD-AUG (GAN)
DD-AUG (SMOTE) TD-AUG (GAN) TD-AUG (SMOTE)

Figure5.5.Performance
Figure Performancecomparison
comparisonofof different
different machine
machine learning
learning techniques.
techniques.

The comparison of the results in Table 7 shows that the proposed approach has the
highest accuracy among all the other approaches for liver disease prediction. The proposed
approach uses multiple classification algorithms such as ANN, SVM, L.R., D.T., and K-NN,
and has improved accuracy by applying SMOTE on the dataset. The highest accuracy of
0.9872 is achieved using Logistic Regression with SMOTE. All of the other algorithms also
show good accuracy with SMOTE. On the other hand, in the other studies, the highest
accuracy achieved is approximately 90% using a Convolutional Neural Network (CNN),
and the highest accuracy among the traditional machine learning algorithms is around 78%
using K-NN. The results suggest that the proposed approach is better in terms of accuracy
compared to other approaches.
Appl. Sci. 2023, 13, 2678 16 of 18

8. Conclusions
A growing number of people are developing liver illness as a result of the use of
excessive amounts of alcohol, the inhaling of gas, and the consumption of tainted food,
pickles, and medicines. The earlier a liver diagnosis is made, the better the prognosis.
Liver disease can be diagnosed based on blood enzyme levels. Diagnosing this condition
is a difficult and time-consuming process that can be highly expensive. This work is
consequently geared toward evaluating the performance of various machine learning
algorithms to reduce the cost of prediction diagnostics of chronic liver disease. We used five
different logistic regression techniques, as well as the K-Nearest neighbor algorithm, the
Decision Tree methodology, the Support Vector Machine algorithm, and the ANN algorithm.
In this study, we investigated how an improvement in the accuracy of predictions made
by Generative Adversarial Networks (GANs) and a technique called synthetic minority
oversampling impacted the results (SMOTE). The experimental results demonstrate that
SMOTE surpasses GAN in its effectiveness when utilizing the proposed classifiers across
all data augmentation scenarios (NO-AUG, DD-AUG, and TD-AUG). Furthermore, we
found that K.N.N. outperforms others with an average accuracy of 99%. However, GAN
results demonstrate better model stability when compared to SMOTE. As a future direction,
we intend to experiment with other cost-sensitive data resampling techniques and compare
their performances with GANs over larger data augmentation.

Author Contributions: Conceptualization, M.A. and Y.A.-S.; Methodology, A.A., A.A.-q. and M.A.;
Resources, F.A.; Writing—original draft, A.A.-q. and Y.A.-S.; Writing—review & editing, A.M.A., B.A.
and O.R.A. All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: Not applicable.
Acknowledgments: The author would like to thank the Deanship of Scientific Research at Shaqra
University for supporting this research.
Conflicts of Interest: The authors declare no conflict of interest.

References
1. Lin, R.-H. An intelligent model for liver disease diagnosis. Artif. Intell. Med. 2009, 47, 53–62. [CrossRef] [PubMed]
2. Maddrey, W.C.; Sorrell, M.F.; Schiff, E.R. Schiff’s Diseases of the Liver; John Wiley & Sons: Hoboken, NJ, USA, 2011.
3. Oniśko, A.; Druzdzel, M.J.; Wasyluk, H. Learning Bayesian network parameters from small data sets: Application of Noisy-OR
gates. Int. J. Approx. Reason. 2001, 27, 165–182. [CrossRef]
4. Babu, M.S.P.; Ramana, B.V.; Kumar, B.R.S. New automatic diagnosis of liver status using bayesian classification. In Proceedings of
the International Conference on Intelligent Network and Computing) ICINC, Kuala Lumpur, Malaysia, 26–28 November 2010.
5. Domingos, P. Metacost: A general method for making classifiers cost-sensitive. In Proceedings of the Fifth ACM SIGKDD
International Conference on Knowledge Discovery and Data Mining, San Diego, CA USA, 15–18 August 1999; pp. 155–164.
6. Ramana, B.V.; Babu, M.S.P.; Venkateswarlu, N. A critical study of selected classification algorithms for liver disease diagnosis. Int.
J. Database Manag. Syst. 2011, 3, 101–114. [CrossRef]
7. Kim, S.; Jung, S.; Park, Y.; Lee, J.; Park, J. Effective liver cancer diagnosis method based on machine learning algorithm. In
Proceedings of the 2014 7th International Conference on Biomedical Engineering and Informatics, Dalian, China, 14–16 October
2014; pp. 714–718.
8. Al-Qerem, A.; Alsalman, Y.S.; Mansour, K. Image Generation Using Different Models of Generative Adversarial Network. In
Proceedings of the 2019 International Arab Conference on Information Technology (ACIT), Al Ain, United Arab Emirates, 3–5
December 2019; pp. 241–245.
9. Al-Qerem, A.; Kharbat, F.; Nashwan, S.; Ashraf, S.; Blaou, K. General model for best feature extraction of EEG using discrete
wavelet transform wavelet family and differential evolution. Int. J. Distrib. Sens. Netw. 2020, 16, 1550147720911009. [CrossRef]
10. Al-Qerem, A. An efficient machine-learning model based on data augmentation for pain intensity recognition. Egypt. Inform. J.
2020, 21, 241–257. [CrossRef]
11. Arjovsky, M.; Bottou, L. Towards principled methods for training generative adversarial networks. arXiv 2017, arXiv:1701.04862.
12. Borji, A. Pros and cons of gan evaluation measures. Comput. Vis. Image Underst. 2019, 179, 41–65. [CrossRef]
Appl. Sci. 2023, 13, 2678 17 of 18

13. Ho, D.; Liang, E.; Chen, X.; Stoica, I.; Abbeel, P. Population based augmentation: Efficient learning of augmentation policy
schedules. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019;
pp. 2731–2741.
14. Perez, L.; Wang, J. The effectiveness of data augmentation in image classification using deep learning. arXiv 2017, arXiv:1712.04621.
15. Che, Z.; Cheng, Y.; Zhai, S.; Sun, Z.; Liu, Y. Boosting deep learning risk prediction with generative adversarial networks for
electronic health records. In Proceedings of the 2017 IEEE International Conference on Data Mining (ICDM), New Orleans, LA,
USA, 18–21 November 2017; pp. 787–792.
16. Pradhan, A. Support vector machine-a survey. Int. J. Emerg. Technol. Adv. Eng. 2012, 2, 82–85.
17. Radford, A.; Metz, L.; Chintala, S. Unsupervised representation learning with deep convolutional generative adversarial networks.
arXiv 2015, arXiv:1511.06434.
18. Al-Qerem, A.; Salem, A.A.; Jebreen, I.; Nabot, A.; Samhan, A. Comparison between Transfer Learning and Data Augmentation
on Medical Images Classification. In Proceedings of the 2021 22nd International Arab Conference on Information Technology
(ACIT), Muscat, Oman, 21–23 December 2021; pp. 1–7.
19. Jeyalakshmi, K.; Rangaraj, R. Accurate liver disease prediction system using convolutional neural network. Indian J. Sci. Technol.
2021, 14, 1406–1421. [CrossRef]
20. Islam, M.K.; Alam, M.M.; Rony, M.R.A.H.; Mohiuddin, K. Statistical Analysis and Identification of Important Factors of Liver
Disease using Machine Learning and Deep Learning Architecture. In Proceedings of the 2019 3rd International Conference on
Innovation in Artificial Intelligence, Suzhou, China, 15–18 March 2019; pp. 131–137.
21. Sravani, K.; Anushna, G.; Maithraye, I.; Chetan, P.; Yeruva, S. Prediction of Liver Malady Using Advanced Classification
Algorithms. In Machine Learning Technologies and Applications: Proceedings of ICACECS 2020; Springer: Singapore, 2021; pp. 39–49.
22. Belavigi, D.; Veena, G.; Harekal, D. Prediction of liver disease using Rprop, SAG and CNN. Int. J. Innov. Technol. Expl. Eng. IJITEE
2019, 8, 3290–3295.
23. Singh, J.; Bagga, S.; Kaur, R. Software-based prediction of liver disease with feature selection and classification techniques.
Procedia Comput. Sci. 2020, 167, 1970–1980. [CrossRef]
24. Shorten, C.; Khoshgoftaar, T.M. A survey on image data augmentation for deep learning. J. Big Data 2019, 6, 1–48. [CrossRef]
25. Salimans, T.; Goodfellow, I.; Zaremba, W.; Cheung, V.; Radford, A.; Chen, X. Improved techniques for training gans. Adv. Neural
Inf. Process. Syst. 2016, 29, 2234–2242.
26. Tran, T.; Pham, T.; Carneiro, G.; Palmer, L.; Reid, I. A bayesian data augmentation approach for learning deep models. Adv. Neural
Inf. Process. Syst. 2017, 30, 2794–2803.
27. Turhan, C.G.; Bilge, H.S. Recent trends in deep generative models: A review. In Proceedings of the 2018 3rd Interna-
tional Conference on Computer Science and Engineering (UBMK), Sarajevo, Bosnia and Herzegovina, 20–23 September 2018;
pp. 574–579.
28. Zou, J.; Han, Y.; So, S.-S. Overview of artificial neural networks. Artif. Neural Netw. 2008, 458, 14–22.
29. Ecer, F.; Ardabili, S.; Band, S.S.; Mosavi, A. Training Multilayer Perceptron with Genetic Algorithms and Particle Swarm
Optimization for Modeling Stock Price Index Prediction. Entropy 2020, 22, 1239. [CrossRef]
30. Bansal, M.; Goyal, A.; Choudhary, A. A comparative analysis of K-Nearest Neighbor, Genetic, Support Vector Machine, Decision
Tree, and Long Short Term Memory algorithms in machine learning. Decis. Anal. J. 2022, 3, 100071. [CrossRef]
31. Xia, D.; Tang, H.; Sun, S.; Tang, C.; Zhang, B. Landslide Susceptibility Mapping Based on the Germinal Center Optimization
Algorithm and Support Vector Classification. Remote Sens. 2022, 14, 2707. [CrossRef]
32. Awad, M.; Khanna, R. Support vector machines for classification. In Efficient Learning Machines; Apress: Berkeley, CA, USA, 2015;
pp. 39–66.
33. Osei-Bryson, K.-M. Evaluation of decision trees: A multi-criteria approach. Comput. Oper. Res. 2004, 31, 1933–1945. [CrossRef]
34. Saxena, R.; Sharma, S.K.; Gupta, M.; Sampada, G.C. A Novel Approach for Feature Selection and Classification of Diabetes
Mellitus: Machine Learning Methods. Comput. Intell. Neurosci. 2022, 2022, 3820360. [CrossRef] [PubMed]
35. Kataria, A.; Singh, M. A review of data classification using k-nearest neighbour algorithm. Int. J. Emerg. Technol. Adv. Eng. 2013, 3,
354–360.
36. Lemon, S.C.; Roy, J.; Clark, M.A.; Friedmann, P.D.; Rakowski, W. Classification and regression tree analysis in public health:
Methodological review and comparison with logistic regression. Ann. Behav. Med. 2003, 26, 172–181. [CrossRef]
37. Fernández, A.; Garcia, S.; Herrera, F.; Chawla, N.V. SMOTE for learning from imbalanced data: Progress and challenges, marking
the 15-year anniversary. J. Artif. Intell. Res. 2018, 61, 863–905. [CrossRef]
38. Laakso, M.; Soininen, H.; Partanen, K.; Lehtovirta, M.; Hallikainen, M.; Hänninen, T.; Helkala, E.-L.; Vainio, P.; Riekkinen, P. MRI
of the hippocampus in Alzheimer’s disease: Sensitivity, specificity, and analysis of the incorrectly classified subjects. Neurobiol.
Aging 1998, 19, 23–31. [CrossRef]
39. Sokolova, M.; Japkowicz, N.; Szpakowicz, S. Beyond accuracy, F-score and ROC: A family of discriminant measures for
performance evaluation. In Proceedings of the Australasian joint conference on artificial intelligence, Hobart, Australia, 4–8
December 2006; pp. 1015–1021.
40. Dritsas, E.; Trigka, M. Supervised Machine Learning Models for Liver Disease Risk Prediction. Computers 2023, 12, 19. [CrossRef]
41. Behera, M.P.; Sarangi, A.; Mishra, D.; Sarangi, S.K. A Hybrid Machine Learning algorithm for Heart and Liver Disease Prediction
Using Modified Particle Swarm Optimization with Support Vector Machine. Procedia Comput. Sci. 2023, 218, 818–827. [CrossRef]
Appl. Sci. 2023, 13, 2678 18 of 18

42. Mostafa, F.; Hasan, E.; Williamson, M.; Khan, H. Statistical Machine Learning Approaches to Liver Disease Prediction. Livers
2021, 1, 294–312. [CrossRef]
43. Wu, C.-C.; Yeh, W.-C.; Hsu, W.-D.; Islam, M.M.; Nguyen, P.A.; Poly, T.N.; Wang, Y.-C.; Yang, H.-C.; Li, Y.-C. Prediction of fatty
liver disease using machine learning algorithms. Comput. Methods Programs Biomed. 2019, 170, 23–29. [CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

You might also like