You are on page 1of 40

ANALYTICS METHODS

1
BY THE END OF THIS LESSON,
YOU SHOULD KNOW:
• Categories of analytics methods.
• Methodology for data analytics.
• Popular analytics methods.
• Choosing analytical methods.

2
Big data analytics is the process of examining large and varied data sets --
i.e., big data -- to uncover hidden patterns, unknown correlations, market
trends, customer preferences and other useful information that can help
organizations make more-informed business decisions

Big data analytics applications enable data scientists, predictive modelers,


statisticians and other analytics professionals to analyze growing volumes of
structured transaction data, plus other forms of data that are often left
untapped by conventional business intelligence (BI) and analytics programs.
That encompasses a mix of semi-structured and unstructured data

3
RECAP: PURPOSE OF DATA ANALYTICS

SUPPORT DECISION- PROVIDE AN ADVANTAGE GIVES INSIGHT INTO THE


MAKING. OVER COMPETITORS. FUTURE.

4
RECAP: HEALTH CARE

Data

VALUE!
5
RECAP: FOUR TYPES OF ANALYTICS
Prescriptive – This type of
analysis reveals what actions Predictive – An analysis of
should be taken. This is the likely scenarios of what might
most valuable kind of analysis happen. The deliverables are
and usually results in rules usually a predictive forecast.
and recommendations for next
steps.

Diagnostic – A look at past Descriptive – What is


performance to determine happening now based on
what happened and why. The incoming data. To mine the
result of the analysis is often analytics, you typically use a
an analytic dashboard. real-time dashboard and/or
email reports.

6
7
PRESCRIPTIVE ANALYTICS
• Prescriptive analytics is really valuable, but largely not used.
According to Gartner, 13 percent of organizations are using
predictive but only 3 percent are using prescriptive analytics.
• Where analytics in general sheds light on a subject,
prescriptive analytics gives you a laser-like focus to answer
specific questions. Which use optimization and simulation
algorithms to advice on possible outcomes and answer:
“What should we do?”
• For example, in the health care industry, you can better manage
the patient population by using prescriptive analytics to
measure the number of patients who are clinically obese, then
add filters for factors like diabetes and LDL cholesterol levels to
determine where to focus treatment. The same prescriptive
model can be applied to almost any industry target group or
problem.

8
PREDICTIVE ANALYTICS
• Predictive analytics use data to identify past patterns to predict the
future. which use statistical models and forecasts techniques to
understand the future and answer: “What could happen?”

• While it does not predict what will happen, it does provide


probabilities into what could happen.

• For example, some companies are using predictive analytics for sales
lead scoring. Some companies have gone one step further use predictive
analytics for the entire sales process, analysing lead source, number of
communications, types of communications, social media, documents, CRM
data, etc. One common type of predictive analysis is sentiment analysis, in
which the model predicts the sentiment score based on data it has. The
outcome is actionable, valuable data that can be used for business
decisions. Predictive analysis can also be very useful in optimizing
customer relationship management. By leveraging data on customer
behavior and spending trends, it is possible to more efficiently cross-sell or
upsell products.

9
DIAGNOSTIC ANALYTICS
Diagnostic analytics are used for discovery or to determine why something
happened. Diagnostic analysis looks at past performance to understand what
happened and why. Businesses use this type of analysis to uncover patterns
in their business processes.

The most common application is in social media, where you can use this type
of analysis to assess the number of posts, shares, mentions and fan
interactions to figure out what worked in past campaigns and what didn't.

10
DESCRIPTIVE ANALYTICS
• Descriptive analytics are valuable for uncovering patterns that
offer insight, Finally, descriptive analysis examines what is
happening in real-time based on incoming data. Descriptive
analysis is often referred to as the simplest type, since it allows
you to convert big data into useful bite-sized nuggets.

• However, the results need to be monitored in real-time through


email reports or a dashboard Which use data aggregation and
data mining to provide insight into the past and answer: “What
has happened?”
• A simple example of descriptive analytics would be assessing
credit risk; using past financial performance to predict a
customer’s likely financial performance. Descriptive analytics can
be useful in the sales cycle, for example, to categorize customers
by their likely product preferences and sales cycle.

11
Correlation

• A technique for investigating


WHAT DO WE the relationship between two
quantitative, continuous
SEARCH FOR IN variables, for example, age
DATA and blood pressure.

ANALYTICS? Pattern

• A repetitive characteristic.

12
Methodology for
analytics, data mining,
and data science projects

Cross Industry Standard Process


for Data Mining (CRISP-DM)

13
BUSINESS UNDERSTANDING
• Understand the problem to be solved. This may require multiple iterations before an
acceptable solution formulation would appear.

• The design team should think carefully about the problem to be solved and about
the use scenario. The must ask the questions of:
• What exactly do we want to do?
• How exactly would we do it?
• What parts of this use scenario constitute possible data mining models?

14
DATA UNDERSTANDING
• Data is the raw material from which the solution will be built.

• It is important to understand the strengths and limitations of the data because rarely
is there an exact match with the problem. For example, historical data often are
collected for a different purpose.

• It is common for a business problem to have several data mining tasks and the result
of each task solves the problem.

15
DATA PREPARATION
• Often, data is not in the form that it is required, hence, conversion is necessary to
achieve a form that can help yield better results.

• Examples, converting data into tabular format, removing or inferring missing values,
and converting data to different types.

16
MODELLING
• The primary place where data mining techniques are applied to the data.

• Typically, the output is some sort of model or pattern capturing regularities in


the data.

17
EVALUATION
• Aim is to assess the data mining results and to gain confidence that the results are
valid and reliable.
• Stakeholders would like to know if the proposed model is going to do more good
than harm, or would it be catastrophic.
• Evaluating results of data mining includes both quantitative and qualitative
assessments.
• These evaluation techniques are statistical in nature and thus not covered in this
course.

18
DEPLOYMENT
• Data mining results are put into real use in order to realise some return on
investment. This involves implementing the proposed model.
• The observation from this stage may require an iteration back to the Business
Understanding stage. There, improvements and refinements to the model is made.

19
POPULAR ANALYTICS METHODS

Classification Co-
and class Regression Similarity Clustering occurrence Profiling
probability matching grouping
estimations

20
CASE: MEGATELCO
The company has a major problem with customer retention in their
wireless business. In the mid-Atlantic region, 20% of cell phone
customers leave as soon as their contracts expire, and lately it has been
getting increasingly difficult to acquire new customers.
The cell phone market has become saturated. Telco companies are
battling to attract each other’s customers while retaining their own.
Customers switching from one company to another is called “churn”,
and it is expensive all around.

21
CLASSIFICATION AND CLASS PROBABILITY
ESTIMATION
• Goal: To predict in which class an individual belongs to.
• Question: Among all the customers of MegaTelecom, which are likely to respond to a
given offer?
• Individual is a customer.
• Classes are “will respond” and “will not respond”.
• Classification task: A data mining model predicts which class an individual
belongs to.
• Class probability estimation task: Instead predicting which class an individual
belongs to, here it predicts the “probability” that an
individual will belong to which class. The probability comes as a score value.

22
Probability:
80%

WILL RESPOND

If OFFER ☺

Probability:
5%
WILL NOT RESPOND
23
REGRESSION
• Goal: To predict or estimate, for each individual, the numerical value of some
variable for that variable.
• Question: How much will a given customer use the service?
• Task: Predict the “service usage” property (variable) for a particular individual
typically by looking at other similar individuals in the population and their
historical usage.

24
A LOT!

WILL RESPOND
HOW much
service would
she use?

A LITTLE….
25
SIMILARITY MATCHING
• Underlies other data mining tasks, such as classification, regression and
clustering.
• Goal: To identify similar individuals based on data known about them. In other words,
to find similar individuals.
• Most popular methods for making product recommendations (finding people who are
similar to you when purchasing items).

26
CLUSTERING
• Goal: To group individuals in a population together by their similarity, but not driven
by any specific purpose.
• Question: Do our customers form natural groups or segments?
• Useful in preliminary domain exploration which natural groups would later suggest
other data mining tasks or approaches.

27
TEXTS
OCCASIONALLY Calls for long hours

Only receives calls Intensive data plan

Texts frequently
Seldom calls nor texts

28
CO-OCCURRENCE GROUPING
• Also known as frequent itemset mining, association rule discovery, market-basket
analysis.
• Goal: To find associations between individuals based on transactions involving them.
• Question: What items are commonly purchased together?
• Task: Identify similarity of objects based on their “appearing” together in
transactions.
• Example: people who bought X also bought Y.

29
Hungry
people who
bought PIZZA
also bought
NOODLES,
therefore,
always offer
NOODLES to
someone
who bought
PIZZA.

30
PROFILING
• Also known as “behaviour description”.
• Goal: To characterise the typical behaviour of an individual, group or population.
• Question: What is the typical cell phone usage of this customer segment?
• Task: Requires a complex description of night and weekend airtime averages,
international usage, roaming charges, text minutes etc.

31
JANE IS A STUDENT. THIS IS HER
SERVICE USAGE PROFILE RECORDED BY
HER TELCO.

January

February

March

A mismatch!
April Does not fit
profile.
ALERT!
32
JACK IS A LECTURER. THIS IS HIS
PURCHASE PROFILE RECORDED BY HIS
CREDIT CARD COMPANY.

January

February

March

April A mismatch!
Does not fit
profile.
FRAUD!
33
WHICH • Often, a data analyst must be able to propose one or multiple
analytical methods to solve a business problem. However, this can
ANALYTICAL be tricky. One way is by identifying if the business problem
requires a supervised or an unsupervised data mining method by

METHOD? determining if the question has a target/purpose for the


grouping.

34
Q1: Do our customers Q2: Can we find groups of customers who
naturally fall into different have particularly high likelihoods of
groups? cancelling their service soon after their
contracts expire?
Is there a target
Is there a target

Will a customer leave when


her contract expires?
Hence, use unsupervised methods Hence, use supervised methods
Clustering, co-occurrence
Classification, regression
Grouping/association,
profiling/sequential
Similarity
matching
35
Q2: Can we find groups of customers who
have particularly high likelihoods of
NOW you know
cancelling their service soon after their
that….
contracts expire?

Is there a target
Task: To predict the
target.
Will a customer leave when
her contract expires?
Requires labelled data
Hence, use supervised methods on the target.

Classification, regression

36
NEXT thing that you
need to know…. WILL THIS CUSTOMER
PURCHASE SERVICE S1 IF
GIVEN INCENTIVE X?

Classification or
Regression?
Which service package (S1,
S2 or none) will a customer
likely purchase if given
How much will this incentive X?
customer use the service?

37
1. “Would toothbrushes would often be purchased with linen?” Which analytics
method would be able to answer this question?
A. Regression
B. Classification
C. Clustering
D. Co-occurrence grouping

2. Following the previous question, “What is the common purchase rate of this pair of
items by Mr X?“ Which analytics method would be able to answer this question?
A. Classification
B. Regression
C. Profiling
D. Clustering

38
3. Following the previous question, “But who really are these customers? Can I characterise
them?“ Which analytics method would be able to answer this question?
A. Regression
B. Classification
C. Clustering
D. Co-occurrence grouping

4. “Will attaining a new customer be profitable? How much revenue should I expect this
customer to generate?” Which analytics method would be able to answer this question?
A. Classification
B. Regression
C. Profiling
D. Clustering
39
5. Data has been collected on visitors’ viewing habits at a bank’s website. Which
method is used to identify pages commonly viewed together during the same visit
to the website?
A. Clustering
B. Classification
C. Regression
D. Association rules

40

You might also like