You are on page 1of 19

ISOM3360 Data Mining for Business Analytics

Overview of Data Mining Process

Instructor: Yi Yang
Department of ISOM
Spring 2023
Last Lecture
Course overview

This Lecture
Data

Overview of data mining process

2
Prediction is at the heart of making
decisions under uncertainty. Our
businesses and personal lives are
riddled with such decisions.

Uncertainty constrains strategy.


Better prediction creates
opportunities for new business
structures and strategies to
compete.

3
Predictive Machine learning
ML is the process of training a software, called a model, that
learns pattern from a dataset. This predictive model can then
make predictions about previously unseen data.

Can be applied in situations where it is very challenging (or


impossible) to define patterns by hand

Patterns, patterns, patterns

It’s a
Label FACE
face

ML model: induces
a pattern from data
Label
non-face
4
Predictive analytics

We use predictions to take action in a product, or


in a business process.

E.g., a system predicts that a user will like a new


camera, the system then sends emails about this new
camera to the user.

That is main difference with descriptive analysis.

Prediction ≠ Decision

5
Exercise
You, as a company marketing director, want to know the answers to the
following questions. Which ones require a data mining solution?

Who are the high-value customers?

Is there an age difference between the high-value customers and the low-value
customers?

Will some particular new customer be high-value customer?

How many sales amount should I expect a new customer to generate?

Customer Gender Age Membership Monthly Amount


Purchase

Alice F 25 Y 5 $120
Let’s define customers whose
Bob M 40 Y 3 $30
amount > $100 as high-value
Charlie M 35 Y 6 $210 customer. The rests are the low-
value customer.
Doug M 18 N 4 $95

… … … … … …

6
Exercise

q Say you work in a digital media company that provides


online streaming video service. You have lots of data
about lots of users watching lots of movies/TVs. What
decisions can benefit from predictive analytics?

7
Data mining process

8
Terminology: data
Label: the thing we are predicting. The label can be the kind of
animals in a picture, binary indicator of a spam email, housing price,
the Chinese translation of an English sentence.

Features: variables that represent the data. Feature space can be as


large as millions.

What can be features in a spam detector?

What can be features in predicting customer churn?


Name Balance Age Default
Example: a particular instance of data.
Mike 123,000 50 No

Dataset: a set of examples. Mary 51,100 40 Yes


Bill 68,000 55 No
Jim 74,000 46 Yes
Dave 23,000 44 No
Anne 100,000 50 Yes 9
Types of features

Numeric feature
the number of items bought by a customer (e.g., 12)

the time that a customer spends on the website (e.g., 16.49 min)

Categorical feature
Example: industry sector (Education, Computer, Agriculture, Energy,
etc..)

Example: location region (Sai Kung, Sha Tin, Wan Chai, etc)

The balance in a bank account is a ______ feature.

Zipcode (e.g. 200041) is a _____ feature.


10
An Example: Customer Churn

Which customers may exit the contract, which may


stay?

11
12
Descriptive analytics

13
What about unstructured data

Merrill Lynch cited a rule of thumb that


somewhere around 90% of all potentially usable
business information may originate in
unstructured form.

14
Unstructured data: Text

15
Unstructured data: Image

16
Two learning paradigms
Supervised learning (prediction): learn a model
that predicts labels that can be used for unseen data.
House price prediction (numerical label)
Credit card default (categorical(binary) label)

Unsupervised learning (relationship mining) :


finds relationships in training data without reference
to labels.

Customer clustering

Key: is there a label that we are trying to


predict?
17
Supervised learning learns A->B

Input (A) Output (B) Application


customer churn? customer churning
customer conversion? targeting
ad user info click? Online advertising
email spam? spam filtering
customer complaint category? document categorization
English Chinese machine translation
audio text transcript speech recognition
CT image coronavirus? disease diagnose
image, radar info driving path self-driving car

18
19

You might also like