Professional Documents
Culture Documents
Benchmark - Model Building
Benchmark - Model Building
Logistic Regression
By Logistic regression a commonly used statistical model for predicting binary outcomes. In our case,
it used to predict whether a deductible payment is accurate or inaccurate based on the available
variables. Logistic regression is suitable for us because it provides interpretable coefficients and can
handle categorical and continuous predictors.
Random Forest
By using Random Forest that is an ensemble learning method that combines multiple decision trees to
make predictions. It is effective for us to classification and regression tasks. Random Forest is capture
complex relationships between variables and handle high-dimensional datasets, making it a suitable
choice for predicting deductible accuracy.
The selected models, namely Logistic Regression, Random Forest, and Support Vector Machines
(SVM), are well-suited for addressing the business problem of accurately predicting insurance
deductible payments for several reasons.
Interpretability
Logistic Regression provides interpretable coefficients, allowing us to understand the impact of each
predictor variable on the likelihood of accurate or inaccurate deductible payments. This can provide
valuable insights into the factors influencing deductible accuracy and aid in decision-making.
Robustness
Random Forest and SVM are known for their robustness to noise and outliers in the data. In real-
world scenarios, data may contain inconsistencies or outliers, and these models can handle such
situations effectively, minimizing the impact of erroneous data points on the overall predictions.
Performance
Logistic Regression, Random Forest, and SVM are widely used and well-established models with
demonstrated success in various domains. They have been extensively studied and optimized, and
their performance has been validated in many applications, including classification tasks similar to the
insurance deductible prediction problem.
What variables did you include or leave out and why?
The variables included in the questionnaire were selected to capture various aspects of the
customer's profile, engagement, satisfaction, and behavior, which are relevant for predicting
insurance deductible accuracy. Variables related to demographics, engagement, satisfaction, and
competitive awareness were considered to provide a comprehensive understanding of the customer's
characteristics and potential influencing factors.
Age range
This variable provides insights into the customer's age group, which can be relevant in understanding
their preferences, behaviors, and potential insurance needs.
Gender
Gender can play a role in determining specific factors that may influence insurance deductible
accuracy, such as risk perception and decision-making processes.
Location category
The customer's location can impact various aspects of insurance, such as regional factors, accessibility
to services, and potential risks.
Occupation
The customer's occupation may provide insights into their lifestyle, income level, and potential risk
exposure, which can be relevant for predicting deductible accuracy.
Purchase frequency
Understanding how frequently the customer makes purchases can indicate their level of engagement
with the company and potentially reflect their overall customer value.
Feedback/submission frequency
This variable indicates the customer's willingness to provide feedback or submit inquiries, which can
provide insights into their level of engagement and potential areas for improvement.
Overall satisfaction
Understanding the customer's satisfaction level helps gauge their perception of the company's
services and their likelihood of maintaining a positive relationship.
Likelihood to recommend
This variable assesses the customer's willingness to recommend the company to others, which
reflects their overall satisfaction and loyalty.