You are on page 1of 2

Exercise #WW-M1: Predictive Modeling Strategies

Lean Angelo Enriquez


College of Computer Studies and Engineering
Jose Rizal University
Mandaluyong City, Philippines

I. INTRODUCTION minimize sum of squared errors. The prediction models are


usually evaluated against a baseline (the reference point).
Usually, in linear regression, we use some simple heuristics
Data Mining has the capability to extract hidden
like predicting mean or median of y given all x’s as our
knowledge from very large datasets. The extracted knowledge
baselines; then this is compared with performance of the
is also called patterns. In this perspective, machine learning
prediction model to ascertain its worthiness in making
algorithms employ a greater role for the prediction process.
predictions.
There are two types of machine learning algorithms:
(i) Supervised learning: wherein the class attribute by which I. INITIAL MODEL PERFORMANCE
the data objects will be grouped. The classification problem
Based on my experimentation on my selected training
falls into this category. There are wide varieties of supervised
the linear coefficient, or slope, of 97.43 indicates that for
learning models available including decision trees, neural
every one-unit rise in the independent variable, the estimated
networks, support vector machines and so on.
value of the dependent variable increases by 97.43 units. This
(ii) Unsupervised learning: wherein the class label is
shows how strong the correlation is between your model's
unknown. The example for this learning model is clustering.
independent and dependent variables. The R2 value indicates
Here also, there is spectrum of models available including k-
how well the variables that are independent explain the
nearest neighbor (k-NN), graph-based clustering algorithms,
difference of the dependent variable. An R2 value of 0.92 for
partitioning algorithms etc., The problem which can be solved
training data shows that the model explains 92% of the
to find the solution should fit with the type of learning and
variability, but an R2 score of 0.90 for test data shows that the
also the type of model. The applicability of the model is purely
model explains 90% of the variability in the test data.
dependent upon the structure and the expected solution.[1]
These high R2 values show that the model has great
By investigating the effectiveness of hybrid models
predictive accuracy for both the training and test datasets. An
combining different techniques, various researchers have
intercept of 145.06 represents the point where the regression
explored diverse methodologies, including neural networks
line meets the y direction. In methods, when the independent
and various machine learning methods, to enhance prediction
variable is zero, the estimated value for the dependent variable
accuracy. While these studies provide valuable insights, the
is 145.06. This sets a baseline for your model's predictions. In
variability in datasets, models, and outcomes underscores the
conclusion, with high R2 scores and high intercept and
complexity of the predictive task. Despite the advancements,
coefficient values, the linear regression model looks to fit the
there remains a pressing need for further investigations to
data well and has good accuracy in predicting.
refine existing models and improve the overall performance of
cardiovascular disease prediction. The diverse landscape of
machine learning applications in this domain emphasizes the II. CHALLENGES FACED
importance of continued research to enhance the accuracy,
reliability, and generalizability of predictive models, During model evaluation, I encountered the challenge
ultimately contributing to more effective clinical interventions of achieving a good fit with my training dataset. It was
and patient care.[2] difficult to adjust the hyperparameters and find the right
correlations to obtain a satisfactory result. I initially analyzed
In simple linear regression, a forecast model uses an the model and dataset to assess their compatibility, and with
independent variable to anticipate the value of a dependent patience, I successfully identified the appropriate parameters
one. This model assumes that there is a straight line linking the and made the necessary adjustments to improve the
two variables and finds coefficients for both the intercept and performance of my experimentation. It seems that I faced
the slope that minimize the sum of squares. Multiple linear challenges related to model fitting and hyperparameter tuning
regression differs from simple linear regression as it accounts during the model evaluation process.
for more than one independent variable when predicting III. STRATEGIES APPLIED
values of a dependent variable. It builds on simple linear
regression by multiplying predictors with their corresponding
constants and finding their coefficients and intercept to Insufficient amount of training data:
Solution Strategy: Data Enhancing is the process of adding VI. CONCLUSION
new training data to existing data using techniques like To give a good conclusion for my experimentation,
rotation, flipping, scaling, and noise. This increases the range multiple linear regression uses more than one independent
and quantity of training data, perhaps improving the model's variable to predict a dependent variable. Unlike basic linear
capacity to make predictions. In reality, data enhancing may regression, it takes considering several predictors and
be made via tools such as Keras' ImageDataGenerator for coefficients in order to reduce prediction errors. Throughout
picture data or by manually adding additional created data the process, I thoroughly evaluated the model and dataset to
points for tabular data. verify that they performed well together. With patience and
comprehensive analysis, I was able to identify the appropriate
Overfitting of Training Data: parameters and make necessary tweaks to increase the model's
Solution Strategy: Regularization approaches target high performance. This procedure gave useful insights, particularly
parameter values, preventing the model from fitting the noise when taking on the problem of creating a good fit with the
in the training data. This improves the model's capacity to training dataset, which included changing hyperparameters
generalize to previously unknown data. Apply regularization and determining the appropriate correlations for a good output.
into your model training process using frameworks like
TensorFlow or scikit-learn, which provide built-in support for
regularization. VII. REFERENCES

Underfitting of Training Data: [1] Rasheed, A. A. (2023). Improving prediction efficiency by revolutionary
Solution Strategy: Model Size Adjustment and Group machine learning models. Materials Today: Proceedings, 81, 577–583.
Methods. https://doi.org/10.1016/j.matpr.2021.04.014
It Increase the model's complexity by including more layers or
units in neural networks, or by applying deeper procedures. [2] Ogunpola, A., Saeed, F., Basurra, S., Albarrak, A. M., & Qasem, S. N.
For group methods, use techniques like AdaBoost or Random (2024). Machine learning-based predictive models for detection of
Forests to aggregate several models and increase performance. cardiovascular diseases. Diagnostics (Basel, Switzerland), 14(2), 144.
https://doi.org/10.3390/diagnostics14020144

IV. IMPROVED MODEL PERFORMANCE


After carefully validating different models for
prediction and making many changes to solve the issues I
experienced, I gained relevant insights and information during
the model review and strategy implementation process
described earlier. especially I faced the problem of creating a
good fit with my training dataset, which required navigating
the complexity of changing hyperparameters and determining
the appropriate correlations to reach a satisfactory outcome.
Through effective study of the model and dataset, together
with patience, I successfully identified the suitable parameters
and made necessary tweaks to improve the performance of my
experiment.

You might also like