Professional Documents
Culture Documents
The CRISP-DM methodology is a framework originally developed by data miners to generalize the
common approaches to defining and analyzing a problem.
The Framework is iterative. Going through the process once without having solved the
problem is not a failure!
Business Understanding:
This is the first step in approaching any business problem. The ability to cast the problem as one or
more data science problems is the key skill in this phase. To do this, we ask these questions:
· What type of analysis will provide the information to inform that decision?
From the business case above, to see what Bob could have done better, let’s answer these key
questions:
· Decision: The Company wants to decide on which customers to contact regarding their insurance
policy.
· What information is needed to inform that decision: Customers who are most likely to answer a
phone call (not good enough and does not help us achieve our business goal)
Predictive modeling is flawed because we are not predicting the right information needed to make
our decision.
Bob’s boss wants him to get a list of customers who are most likely to answer a phone call.
However, this is not a good enough predictor to get prospective customers who are receptive as
well. Bob should have thought about the business goals, then further challenged his boss on if
predicting the customers who are likely to answer a call is good enough to make a business decision.
This will require both parties to thoroughly examine the business problem keeping in mind the kind
of data and variables that give a good enough prediction of the target outcome.
We’ll briefly look at some key questions Bob should answer at each stage.
Data Understanding:
The data is the raw material from which the solution will be built. The critical part of Data
understanding is to estimate the cost and benefit of each data source and deciding on the best
investment of your resources. We dig beneath the surface to grasp the intricacies of the business
problem and the data that is available, then match them to more data mining tasks for which we
have modeling tools to apply.
· Apart from customer demographic data, other important data might be: Years as a customer, kinds
of insurance, subscription history, times contacted in the past and any other variable that might be
informative.
Data Preparation:
Real-world data is seldom in the format we require for modeling. Data preparation involves one or
more of the following: Gathering, cleansing, formatting, and sampling of the data.
For the Data Preparation phase, some questions Bob needs to answer are:
· Are there outliers that can skew the result of the analysis?
The figure below shows a methodology map for Data analysis and predictive modeling depending
on the type of business problem.
To adequately use the Methodology map, we use a Top-down approach in asking questions
depending on the type of business problem and the amount of data available, until we arrive at the
predictive model we will use.
For Data analysis, Bob needs to do some exploratory data analysis to know answers to the
following questions:
· What variables (data fields), customer behaviors are most correlated with a customer paying for
insurance?
· Are data fields such as: Years as a customer, previous insurance subscriptions, times contacted in
the past statistically significant with target outcome (customer paying for insurance)
· Are there distinctive clusters within the data each customer belongs to?
· Given the descriptive statistics, what hypothesis should be tested to make an inference?
· Given the insurance problem is a classification problem (Yes or No), what classification algorithm
should be used? Some important ones are (logistic regression, decision trees, forest, and boosted
models)
Remember that the data mining process is iterative. It should be repeated until there is a balance
with the business goals.
· Does the result make sense within the context of the business problem?
Deployment involves implementing the predictive model in some information systems or business
processes.
Finally, Bob presents the results of his data analysis to his boss, the company achieves a 70%
positive response.
CONCLUSION:
Developing a mental model of the data mining framework helps us understand what needs to be
done when faced with a business problem. It is important to remember that the CRISP-DM
methodology is an iterative process. So, if there are still difficulties in coming up with a solution,
we go back to the business understanding phase. A second iteration may lead to a better solution.
References:
https://www.linkedin.com/advice/0/what-most-common-business-problems-analytics-can-b0jic
https://www.pangaeax.com/2022/06/06/data-analytics-solve-business-problems/
https://www.clariontech.com/platform-blog/solve-common-business-problem-with-analytics-solutions
https://www.investopedia.com/terms/d/data-analytics.asp
https://online.hbs.edu/blog/post/business-analytics-examples