You are on page 1of 4

Data analysis process by specifying models.

Logistic Regression
By Logistic regression a commonly used statistical model for predicting binary outcomes. In our case,
it used to predict whether a deductible payment is accurate or inaccurate based on the available
variables. Logistic regression is suitable for us because it provides interpretable coefficients and can
handle categorical and continuous predictors.

Random Forest
By using Random Forest that is an ensemble learning method that combines multiple decision trees to
make predictions. It is effective for us to classification and regression tasks. Random Forest is capture
complex relationships between variables and handle high-dimensional datasets, making it a suitable
choice for predicting deductible accuracy.

Support Vector Machines (SVM)


By SVM that is a supervised learning model is used for both classification and regression. SVM works
by finding a hyperplane that best separates the data points into different classes. It handle high-
dimensional data and is effective in cases where the data is not linearly separable. SVM is applied to
predict the accuracy of deductible payments.

why this best addresses the business problem.

The selected models, namely Logistic Regression, Random Forest, and Support Vector Machines
(SVM), are well-suited for addressing the business problem of accurately predicting insurance
deductible payments for several reasons.

Interpretability
Logistic Regression provides interpretable coefficients, allowing us to understand the impact of each
predictor variable on the likelihood of accurate or inaccurate deductible payments. This can provide
valuable insights into the factors influencing deductible accuracy and aid in decision-making.

Flexibility and Non-linearity


Random Forest and SVM are capable of capturing complex relationships and non-linear patterns in
the data. Insurance deductible payments may be influenced by various factors, and these models can
handle both categorical and continuous variables, making them suitable for capturing the intricate
nature of the problem.

Robustness
Random Forest and SVM are known for their robustness to noise and outliers in the data. In real-
world scenarios, data may contain inconsistencies or outliers, and these models can handle such
situations effectively, minimizing the impact of erroneous data points on the overall predictions.

Performance
Logistic Regression, Random Forest, and SVM are widely used and well-established models with
demonstrated success in various domains. They have been extensively studied and optimized, and
their performance has been validated in many applications, including classification tasks similar to the
insurance deductible prediction problem.
What variables did you include or leave out and why?

The variables included in the questionnaire were selected to capture various aspects of the
customer's profile, engagement, satisfaction, and behavior, which are relevant for predicting
insurance deductible accuracy. Variables related to demographics, engagement, satisfaction, and
competitive awareness were considered to provide a comprehensive understanding of the customer's
characteristics and potential influencing factors.

Age range
This variable provides insights into the customer's age group, which can be relevant in understanding
their preferences, behaviors, and potential insurance needs.

Gender
Gender can play a role in determining specific factors that may influence insurance deductible
accuracy, such as risk perception and decision-making processes.

Location category
The customer's location can impact various aspects of insurance, such as regional factors, accessibility
to services, and potential risks.

Occupation
The customer's occupation may provide insights into their lifestyle, income level, and potential risk
exposure, which can be relevant for predicting deductible accuracy.

Purchase frequency
Understanding how frequently the customer makes purchases can indicate their level of engagement
with the company and potentially reflect their overall customer value.

Recency of last purchase


The recency of the customer's last purchase provides information about their engagement and
potential responsiveness to promotional activities or changes in the insurance policy.

Average monetary value of transactions


The average monetary value can indicate the customer's spending capacity and potential influence on
the company's profitability.

Website/app interaction frequency


This variable reflects the customer's level of engagement with the company's digital platforms, which
can be an indicator of their overall satisfaction and involvement.

Customer service contact frequency


How often the customer contacts customer service can reflect their level of engagement, potential
issues or concerns, and overall customer satisfaction.

Feedback/submission frequency
This variable indicates the customer's willingness to provide feedback or submit inquiries, which can
provide insights into their level of engagement and potential areas for improvement.

Overall satisfaction
Understanding the customer's satisfaction level helps gauge their perception of the company's
services and their likelihood of maintaining a positive relationship.

Awareness of competitor offerings


This variable assesses the customer's knowledge of competitor offerings, which can impact their
decision-making and loyalty towards the company.
Comparison of prices to competitors
Understanding how the customer perceives the company's prices in comparison to competitors helps
assess their perceived value proposition.

Purchase behavior over time


This variable captures changes in the customer's purchase behavior, which can provide insights into
their loyalty, satisfaction, or potential changes in needs.

Likelihood to recommend
This variable assesses the customer's willingness to recommend the company to others, which
reflects their overall satisfaction and loyalty.

Provide specific screenshots from the modeling software.

You might also like