You are on page 1of 7

Exercise #5: Implementing Binary Logistic Regression model

using Neural Networks with Cross-Validation


John Vincent M. Cabugnason
Computer Science and Engineering
Jose Rizal University
Mandaluyong, Philippines
johnvincent.cabugnason@my.jru.edu

Abstract portfolio consisting of features describing the vehicle, the


driver and the geographic area of the car registration and the
This study focuses on improving the predictive performance of number of insurance claims observed during one accounting
models for complex binary classification problems in insurance. It
year [4].
explores the use of neural network architectures, the selection of
appropriate activation functions, configuration of loss functions,
and regularization techniques to prevent overfitting. These B. Explain the motivation for implementing BLogReg using
techniques aim to enhance model interpretability and accuracy for neural networks.
different classification tasks.
Keywords — Binary Logistic Regression, Neural Networks,
Machine Learning, Classification models
The concept of using machine learning algorithms to save
lives has a powerful impact on people. This study specifically
focuses on addressing the biggest threats to human health and
I. INTRODUCTION the leading causes of death. By using advanced technology
A. Conduct some research and introduce the problem of like machine learning, researchers hope to find new ways to
Binary Logistic regression (BLogReg). prevent or treat these health issues, ultimately improving
people's quality of life and saving lives. The potential for this
kind of research to make a difference is truly remarkable and
According to Pramoditha, R. (2022), [1] Logistic
has the potential to change the world for the better. In [2],
regression is a neural network model with no hidden layers
Data set is generated by combining different type of sources,
[1]. Binary Logistic Regression, or BLogReg for short, is a
in order to identify factors that can classify the status of the
statistical technique that helps us understand how one or more
smoker as smoker, former smoker and non-smoker. 33
factors can impact a simple "yes" or "no" outcome. In other
Attributes were used with 10000 instances, class problem was
words, it's a way to figure out what factors are most likely to
divided in three classes Smoker, Non-Smoker and Former
influence whether something happens or not. For example,
Smoker. The significance of an attribute obtained using the
BLogReg can help us predict whether a customer will buy a
odds ratio, and factors were identified using classification with
product or not based on certain characteristics, such as age,
10-fold cross validation combined with 1-1 and 1-against all
gender, and income. In simpler terms, the purpose of
method [2].
BLogReg is to figure out the likelihood of something
happening or not happening based on certain factors. For II. LITERATURE REVIEW
example, we might use BLogReg to predict the likelihood of a
customer buying a product based on their age, gender, and “Logistic regression and artificial neural network
income. By analyzing these factors, we can estimate the classification models: a methodology review”
probability of the customer making a purchase or not.
In the medical field, predictive models are being
Over the last ten years, there has been a significant rise in increasingly used to help with diagnosing and predicting the
the use of new technologies and big data analytics. These progression of various health conditions. These models are
advancements have started to change the way businesses designed to analyze data and identify patterns that can help
approach predictive modeling. In other words, companies are doctors make more accurate diagnoses and predictions about a
now using these new tools and techniques to better understand patient's health. By using these advanced tools, medical
trends and patterns in their data, which can help them make professionals can provide better care and treatment options
more informed decisions. This shift has impacted many that are tailored to each patient's unique needs. According to
different industries, and we are likely to see even more Dreiseitl S. and Ohno-Machado L. (2003), [4] These models
changes in the years to come. In [4], We consider the problem are built from ‘‘experience’’, which constitutes data acquired
of predicting the probability that an insurance claim will be from actual cases [4]. The data can be prepared and organized
filed in a French motor third party liability (MTPL) insurance in different ways. In some cases, it can be transformed into a
set of rules, which is commonly seen in knowledge-based
expert systems. Alternatively, the data can also be used as
training material for statistical and machine learning models. “Enhancing Logistic Regression Using Neural
These models learn from the data to make predictions and gain Networks for Classification in Actuarial Learning”
insights that can be valuable in various fields, including
healthcare. The past decade has seen a surge in new technologies and
big data analytics, which have begun to transform the
In [3], There are now several implementations of landscape of predictive modeling in many enterprises. These
predictive modeling algorithms readily available, both as free new technologies have made it possible to collect and analyze
and commercial software [3]. The accuracy of the results vast amounts of data, which has led to the development of
produced by these models relies on three main things: the more accurate and sophisticated predictive models. One of the
quality of the data used to create the model, how carefully the most significant developments in predictive modeling in
model's adjustable parameters were selected, and the criteria recent years has been the rise of machine learning. Machine
used to assess the model's performance. It's important to use learning algorithms can learn from data without being
high-quality data when building a model, and to choose the explicitly programmed, which makes them ideal for tasks such
right settings for the model's parameters. Additionally, the as predicting customer behavior or identifying fraud.
criteria used to evaluate the model's performance should be
carefully chosen to ensure that the results are meaningful and According to Tzougas G. and Kutzkov K (2023), [4] The
accurate. By paying attention to these factors, we can create ML methods which have been used so far for efficiently
models that are more reliable and effective in various fields, addressing alternative regression and classification problems
including healthcare. in insurance include, for example, XGBoost, random forest
(RF), decision trees (DTs), naïve Bayes, K-nearest neighbor
In [3], The task of classifying data is to decide class (K-NN) model, AdaBoost (AB) model, stochastic gradient
membership y of an unknown data item x0 based on a dataset boosting (SGB) model, support vector machine (SVM) model
D = (x1; y1),....,(xn, yn) of data items xi with known class and NNs [4]. In addition, a very interesting and novel research
memberships yi [3]. There are two main ways to classify data. direction is the combined actuarial neural network (CANN).
The first way is to simply divide the data into two groups and CANN is a type of neural network that is specifically designed
label them as either 0 or 1. This is known as dichotomous for actuarial applications. It combines the strengths of both
classification. The second way is more complex, as it involves traditional actuarial methods and machine learning techniques.
modeling the probability of a data item belonging to a All traditional actuarial models can be implemented in a
particular class. This approach not only assigns a class label to neural network. In particular, the CANN approach can be seen
the data item, but also provides a probability of its as a neural network that boosts the traditional actuarial
membership in that class. This probability can be useful in regression model.
many applications, including healthcare, where it can help
doctors make more accurate diagnoses and treatment Another major development has been the growth of big
decisions. data. Big data refers to the large and complex datasets that are
now being generated by businesses and organizations. These
Logistic regression, artificial neural networks, k-nearest datasets can be used to train machine learning algorithms and
neighbors, and decision trees are all methods used to model develop more accurate predictive models. A logistic regression
the probability of a data item belonging to a particular class. model is a type of statistical model that can be used to predict
These methods are part of the second approach to data the probability of an event occurring. It is a linear model,
classification mentioned earlier. However, they differ in how which means that the predicted probability is a linear
they approximate the probability based on the data. Each combination of the explanatory variables. The model is
method has its own strengths and weaknesses, and is better constrained to produce outputs between 0 and 1, which
suited for certain types of data and applications. By represents the probability of the event occurring and not
understanding the differences between these methods, we can occurring, respectively. According to Tzougas G and Kutzkov
choose the best one for a particular problem and obtain more K (2023), This approach enables us to consider several
accurate results. In [4], Currently, logistic regression and extensions of the logistic regression model and utilize methods
artificial neural networks are the most widely used models in for optimization and regularization that come as part of
biomedicine, as measured by the number of publications platforms for neural network training [2].
indexed in MEDLINE: 28,500 for logistic regression, 8500 for
neural networks, 1300 for k- nearest neighbors, 1100 for The use of predictive modeling is becoming increasingly
decision trees, and 100 for support vector machines. widespread across a range of industries. For example, banks
use predictive modeling to identify customers who are at risk
of defaulting on their loans, and retailers use it to predict
which products are likely to be purchased by a given
customer. Overall, the past decade has seen a significant
transformation in the field of predictive modeling. New
technologies and big data analytics have made it possible to
develop more accurate and sophisticated predictive models.
“Neural Networks for Macroeconomic Forecasting:
A Complementary Approach to
Linear Regression Models” “Comparison of artificial neural network and binary
logistic regression for determination of impaired
Over the past few years, there has been a growing interest glucose tolerance/diabetes”
among economists in neural networks. These powerful tools
have caught the attention of macroeconomic forecasters due to Since the inception of computing, artificial intelligence has
their ability to identify and replicate both simple and complex been suggested as a means of utilizing reasoning to assist in
connections between different variables. After examining clinical decision-making. According to Kazemnedjad et al.,
many studies, it appears that neural networks are often better 2008, Artificial neural networks are a computer modeling
at predicting economic output and financial variables like technique that takes inspiration from the observed behaviors of
stock prices than traditional linear models when it comes to biological neurons [6]. Discriminant analysis and logistic
forecasting for the future. It has been discovered that the regression are statistical methods that have traditionally been
neural network is better at predicting outcomes than a well- used to create models for clinical diagnosis and treatment.
known linear regression model that was created by the However, recent studies have shown that artificial neural
department. In fact, the neural network has been found to networks can enhance prediction accuracy in various
reduce errors by anywhere from 13 to 40 percent. Although situations, such as predicting the prognosis of breast cancer in
several tests have been conducted, they suggest that there isn't women post-surgery, developing models for surgical decision-
enough proof to show that the increase in accuracy when using making for traumatic brain injury patients, and predicting
neural networks for forecasting is statistically significant. survival rates of alcoholic patients with severe liver disease.
Nevertheless, some studies have reported that artificial neural
According to Gonzales S. (2000-07), [5] the human brain networks and statistical models produce comparable results.
is the most complex computer known to us, and in an effort to
gain a deeper understanding of it, researchers have sought to Binary logistic regression showed that all factors were
replicate its various abilities through the development of significantly associated with glucose tolerance status. Age,
artificial intelligence [5]. According to recent research, neural sex, BMI, and WHR were significant risk factors for diabetes
networks have the potential to be effective in predicting mellitus (DM). Additionally, those with hyperlipidemia or
volatile financial variables that are typically challenging to hypertension had a higher risk of DM and impaired glucose
forecast using conventional statistical methods. The author tolerance (IGT). The mean response is the probability of a
suggests that the most basic forms of neural networks are binary outcome, such as whether or not a person has a disease.
closely related to standard econometric techniques. In this The logit function is a transformation of the probability that
paper, the author will draw comparisons between neural maps it to a linear scale. This allows us to use linear regression
networks and econometric methods to help readers who are to model the relationship between the logit of the probability
familiar with econometrics better comprehend the topic. By and the predictors. The predictors can be continuous variables,
gaining a better understanding of neural networks, economists such as age or weight, or categorical variables, such as gender
can determine whether these models are useful for or race. The model can be used to predict the probability of the
macroeconomic and financial forecasting. outcome for a given set of predictors. It can also be used to
identify the factors that are most strongly associated with the
In [5], As cognitive scientists studied the brain and its outcome. However, it is important to note that logistic
ability to learn, they identified some key characteristics that regression is a statistical model, and it is not a perfect
seemed particularly important to the brain's success. Based on predictor. The results of a logistic regression model should be
these attributes, neural networks were developed as a interpreted with caution. In [5], If the only objective is to
framework. To gain a deeper comprehension of these make accurate predictions, then neural networks may be a
networks, it is beneficial to briefly explore these essential better choice than binary logistic regression. However, if it is
characteristics of the brain. According to Gonzales S. (2000- important to understand the relationship between the
07), the brain consists of billions of basic units known as predictors and the outcome, then binary logistic regression
neurons that are arranged in an extensive network. These may be a better choice.
neurons are believed to carry out the relatively uncomplicated
function of selectively transmitting electrical impulses to one Neural networks and binary logistic regression are both
another [5]. machine learning models that can be used for classification or
prediction. Neural networks are typically more complex than
binary logistic regression, but they can be used to model more
complex relationships between the predictors and the outcome.
Binary logistic regression is often more interpretable than
neural networks, and it can be used to identify the effect of Before using the dataset for BlogReg modeling, several
individual factors on the outcome. The choice of which model steps need to be taken to prepare it. Firstly, the data needs to
to use depends on the specific objectives of the analysis. be cleaned by removing any missing or irrelevant information
and addressing any outliers. Next, the most relevant features
that are likely to affect the target variable should be selected.
A. Summarize and Discuss the Highlights of the related
The data also needs to be normalized and scaled to ensure that
papers.
all features are on the same scale. Finally, the dataset should
be split into training and testing sets to evaluate the model's
The four papers discuss the application of machine performance on new data. These steps help ensure that the
learning models for classification and prediction. Logistic data is appropriately prepared for modeling and that the model
regression and artificial neural networks are popular machine performs well when applied to new data.
learning models known for their effectiveness in various
domains. Logistic regression is a statistical model that predicts
the probability of an event occurring by combining B. Detail the initial neural network architecture that can be
explanatory variables linearly. It produces outputs between 0 used for BLogReg.
and 1, representing the probabilities of the event occurring or
not. To create an initial neural network architecture for
BlogReg, we can start with an input layer that has nodes
On the other hand, artificial neural networks are inspired corresponding to the number of features in the dataset. We
by the human brain and consist of interconnected nodes can then add one or more hidden layers with different
capable of learning patterns in data. They are commonly used numbers of nodes, using activation functions like relu or
for tasks like image recognition and natural language sigmoid to introduce non-linearity. Finally, we'll have an
processing. Both logistic regression and artificial neural output layer with a single node that represents the
networks have strengths and weaknesses. Logistic regression probability of the target variable. The network will be
is simple and interpretable but may struggle with complex trained using techniques like backpropagation and
problems and numerous predictors. Neural networks can stochastic gradient descent to optimize the model's
handle complex relationships but are harder to interpret and performance.
prone to overfitting.
C. Discuss your strategy for adjusting the different
The choice between these models depends on the analysis
hyperparameters of a Neural Network to get the best
objectives. Neural networks are suitable for accurate
model. Limit only to the following Hyperparameters:
predictions, while logistic regression is better for
understanding predictor-outcome relationships. Other machine a. Learning Rate – 0.001
learning models like decision trees, support vector machines,
and k-nearest neighbors can also be used for classification and b. Epochs
prediction, depending on the data characteristics and analysis
goals. c. Number of Neurons (10,20,30,40) for each Hidden
Layer
In conclusion, machine learning models are powerful tools d. Training/Testing Size (Set Training and Testing Size to
for accurate predictions and understanding variable 70/30 only)
relationships. However, selecting the appropriate model is
crucial for the specific task at hand. D. Discuss the different hyperparameters of a Neural
Network
III. METHODOLOGY
Neural Networks are complex models that require careful
In the dataset that I have, it has the sex, smoker, region, tuning to perform well. One of the key factors that impact their
age, bmi, children, and charges as my attributes where helps performance is the learning rate, which determines how much
me for predicting the insurance charges for smoking. The the model's weights are updated as it learns from the data. If
features in this dataset may be used to predict or analyze the the learning rate is set too high, the model can overshoot the
target variable, which is likely to be the medical charges. This optimal weights and perform poorly. On the other hand, if the
dataset can be used for various purposes, such as identifying learning rate is too low, the model may take too long to
factors that affect medical charges or developing models to converge and may not learn effectively. So finding the right
predict medical charges based on the given features. balance is crucial for achieving good performance.
A. Explain the preprocessing steps needed to prepare the The hidden layers of a Neural Network are like the brain of
dataset for BlogReg modeling. the model, where it learns to recognize patterns in the data.
The size of these layers is important, as a larger layer can learn
more complex features, but may also require more data to B. Display and analyze the performance charts of the top-
avoid overfitting. Additionally, the activation function used in performing model across all epochs.
each layer can have a big impact on how the model learns.
C. For additional points: Display and analyze the training
Different functions have different strengths and weaknesses,
chart for a specific fold.
so selecting the right one is crucial for achieving good
performance. By carefully tuning these hyperparameters, we
can ensure that our Neural Network is learning the most
important features from the data and making accurate
predictions.

IV. RESULTS
A. Run your chosen dataset in the NN and present the results
of the implementation.

V. ANALYSIS

A. Interpret the results based on your observation and


discuss whether or not the top model is showing signs of
overfitting.

The results show that the model was able to predict the
training examples in each fold perfectly, achieving an
accuracy score of 1.0000. This suggests that the model was
able to learn the training data well. However, it is important to
keep in mind that this doesn't necessarily mean that the model data, which would enhance its ability to perform well in
will perform just as well on new data that it hasn't seen before. real-world scenarios.

To be confident that the model can make accurate


predictions on new data, we need to evaluate its performance
on unseen data. This is because sometimes a model can be too
good at learning the training data and become too specialized,
making it difficult to generalize well to new data. Therefore,
we need to evaluate the model's performance on validation or VI. CONCLUSION
test data to see if it can make accurate predictions on data that A. In your own words, summarize the things that you learned
it hasn't seen before. in this exercise.

Although the model achieved a perfect accuracy score on


In this paper, I delve into the world of Binary Logistic
the training data, it doesn't necessarily mean that it will
Regression (BLogReg), a powerful statistical tool that allows
perform just as well on new data. To be sure that the model
us to understand how different factors can influence a simple
can make accurate predictions on new data, we need to
"yes" or "no" outcome. Imagine being able to predict whether
evaluate its performance on validation or test data that it hasn't
a customer will buy a product or not based on their age,
seen before.
gender, and income. That's exactly what BLogReg enables us
to do. It's like peering into the future and estimating the
By assessing the model's performance on a separate
probability of an event happening or not happening based on
validation or test set, we can determine whether the model is
specific factors.
overfitting or not. If the model shows high accuracy on both
the training and validation/test sets, it suggests that the model
This paper highlights the incredible potential of this
is not overfitting and can generalize well to new data. This
combination, especially when it comes to healthcare. By
means that the model can make accurate predictions on new
leveraging advanced technology like machine learning,
data that it hasn't seen before, which is crucial for its practical
researchers aim to tackle the biggest threats to human health
use.
and find innovative ways to prevent or treat these issues.
Imagine the impact it could have on saving lives and
B. Discuss any challenges or limitations you encountered improving people's quality of life. Furthermore, the paper
during the experiments. explores how BLogReg and neural networks have been
applied in various fields, such as medical diagnosis, accurate
During the experiments, one challenge I encountered predictive modeling, and even economic forecasting. It's
was the issue of consistently obtaining a train accuracy of fascinating to see how these techniques are revolutionizing
1.00. This means that the model was able to perfectly industries and shaping the future of data analysis.
predict the training examples in each fold, which may REFERENCES
seem ideal at first. However, it raised concerns about
[1] Pramoditha, R. (2022, May 19). Replicate a logistic regression
potential overfitting and the generalizability of the model model as an artificial neural network in keras. Towards Data
to unseen data. Another challenge I faced during the Science. https://towardsdatascience.com/replicate-a-logistic-
experiments was the lack of information about the model's regression-model-as-an-artificial-neural-network-in-keras-
performance on unseen data. This means that I didn't have cd6f49cf4b2c
enough data to fully evaluate the model's effectiveness in [2] Dreiseitl, S., & Ohno-Machado, L. (2002). Logistic regression
and artificial neural network classification models: a
making accurate predictions on new data that it hasn't seen methodology review. Journal of Biomedical Informatics, 35(5–
before. This made it difficult to determine if the model was 6), 352–359. https://doi.org/10.1016/s1532-0464(03)00034-0
truly effective and reliable in real-world scenarios. [3] No title. (n.d.). Aip.org. Retrieved October 14, 2023, from
https://pubs.aip.org/aip/acp/article/2203/1/020036/961282/Classi
fication-of-the-factors-for-smoking
Another challenge I encountered was the lack of
[4] Tzougas, G., & Kutzkov, K. (2023). Enhancing logistic regression using
detailed information about the dataset used for training and neural networks for classification in actuarial learning. Algorithms,
validation. Having a clear understanding of the 16(2), 99. https://doi.org/10.3390/a16020099
characteristics and distribution of the data is essential in [5] Gonzalez, S. (n.d.). Neural networks for macroeconomic forecasting: A
evaluating the model's performance and identifying any complementary approach to linear regression models. Cloudfront.net.
potential biases or limitations. Without this information, it Retrieved October 15, 2023, from
https://d1wqtxts1xzle7.cloudfront.net/34194665/Gonz00-libre.pdf?
was challenging to assess the reliability and 1405316597=&response-content-disposition=inline%3B+filename
generalizability of the model's predictions. It would have %3DNeural_Networks_for_Macroeconomic_Foreca.pdf&Expires=1697
been beneficial to have insights into the dataset to ensure 361343&Signature=DyeJ9UA0rZUKIAgSulR2pmVfxSMKC~e5ufB8G
yFiwIU0sm5cUx7WMJXc9-
that the model was trained on diverse and representative iGHj2Xq022tSC~2talZFMSaxq7yTHHdmOIjoW3kMJm7npROmTwW
L9HSjm8cl13I8WVbeG0gx9HK96qXx7c8XOGMbOtRDCfiWi1NYGS
1WvR5ZvqkW1RmsirJtP-
zsOe2tW5fW97YcxLfJAhO1zgmGmHk5oW2YvLTuarES3F0~uvCy4K [6] Kazemnejad, A., Batvandi, Z., & Faradmal, J. (n.d.). Comparison of
3Gvkptf~LexN10HMbOOCAu- artificial neural network and binary logistic regression for determination
z65gGLNalxJ1k0TLCMZieHBzqwWVWwKKCPvUvwPweAElzt7FjGz of impaired glucose tolerance/diabetes. Who.int. Retrieved October 15,
QYwcnei35BCMw7iDCC4pnG~b3kKft1UAfDMw__&Key-Pair- 2023, from
Id=APKAJLOHF5GGSLRBV4ZA https://iris.who.int/bitstream/handle/10665/117927/16_6_2010_0615_06
20.pdf?sequence=1&isAllowed=y

You might also like