Professional Documents
Culture Documents
2-Overview of Data Mining
2-Overview of Data Mining
Instructor: Yi Yang
Department of ISOM
Spring 2023
Last Lecture
Course overview
This Lecture
Data
2
Prediction is at the heart of making
decisions under uncertainty. Our
businesses and personal lives are
riddled with such decisions.
3
Predictive Machine learning
ML is the process of training a software, called a model, that
learns pattern from a dataset. This predictive model can then
make predictions about previously unseen data.
It’s a
Label FACE
face
ML model: induces
a pattern from data
Label
non-face
4
Predictive analytics
Prediction ≠ Decision
5
Exercise
You, as a company marketing director, want to know the answers to the
following questions. Which ones require a data mining solution?
Is there an age difference between the high-value customers and the low-value
customers?
Alice F 25 Y 5 $120
Let’s define customers whose
Bob M 40 Y 3 $30
amount > $100 as high-value
Charlie M 35 Y 6 $210 customer. The rests are the low-
value customer.
Doug M 18 N 4 $95
… … … … … …
6
Exercise
7
Data mining process
8
Terminology: data
Label: the thing we are predicting. The label can be the kind of
animals in a picture, binary indicator of a spam email, housing price,
the Chinese translation of an English sentence.
Numeric feature
the number of items bought by a customer (e.g., 12)
the time that a customer spends on the website (e.g., 16.49 min)
Categorical feature
Example: industry sector (Education, Computer, Agriculture, Energy,
etc..)
Example: location region (Sai Kung, Sha Tin, Wan Chai, etc)
11
12
Descriptive analytics
13
What about unstructured data
14
Unstructured data: Text
15
Unstructured data: Image
16
Two learning paradigms
Supervised learning (prediction): learn a model
that predicts labels that can be used for unseen data.
House price prediction (numerical label)
Credit card default (categorical(binary) label)
Customer clustering
18
19