Professional Documents
Culture Documents
Dr Dimitrios Letsios
Department of Informatics
King’s College London
1 / 72
Lecture Contents
I Section 3: Models
I Knowledge Representations
I Evaluation
2 / 72
Definition
3 / 72
Definition (2)
4 / 72
Real-World Applications
Web Data
I PageRank assigns measures to web pages, based on online
search query relevance (Google).
I Email filtering classifies new messages as spams or hams.
I Online advertising based on users with similar purchases.
I Social media identify users with similar preferences.
5 / 72
Real-World Applications (2)
Risk
I Statistical calculation of bank loan default risk.
I Anticipated job candidate performance in recruitments.
6 / 72
Real-World Applications (3)
Images
I Oil spill or deforestation detection from satellite images.
I Currency recognition in automated payment machines.
I Face recognition for police surveillance.
Engineering
I Power demand forecasting for electricity suppliers.
I Failure prediction for machine maintenance in manufacturing.
7 / 72
Data Sets
Contact Lens Data Set [WFH3, Table 1.1]
8 / 72
Data Sets (2)
9 / 72
Data Sets (3)
10 / 72
Data Sets (4)
11 / 72
Data Sets (5)
12 / 72
Data Sets (6)
Family Tree [WFH3, Figure 2.1]
13 / 72
Data Sets (7)
Attribute Types:
I Numeric: Continuous or discrete with well-defined distance
between values.
I Nominal: Categorical.
I Dichotomous: Binary or boolean or yes/no.
I Ordinal: Ordered but without well-defined distance, e.g. poor,
reasonable, good and excellent health quality.
I Interval: Ordered, but also measured in fixed units, e.g. cool,
mild and hot temperatures.
14 / 72
Data Sets (8)
Attribute-Relation File Format (ARFF) [WFH3, Figure 2.2]
15 / 72
Data Sets (9)
16 / 72
Data Sets (10)
17 / 72
Data Sets (11)
Feature Engineering
The process of transforming raw data by selecting the most
suitable attributes for the data mining problem to be solved.
18 / 72
Patterns
I Structural patterns:
I Capture and explain data aspects in an explicit way.
I Can be used for better-informed decisions.
I E.g. rules in the form if-then-else.
19 / 72
Patterns (2)
20 / 72
Patterns (3)
Classification Rule
Attribute values predict the label.
i f ( o u t l o o k == s u n n y ) and ( h u m i d i t y == h i g h ) :
p l a y = no
21 / 72
Patterns (4)
22 / 72
Patterns (5)
23 / 72
General Picture
24 / 72
Lecture Contents
I Section 3: Models
I Knowledge Representations
I Evaluation
25 / 72
Concrete Tasks
Examples:
I Classification models relationships between data elements to
predict classes or labels.
I Regression models relationships between data elements to
predict numeric quantities.
I Clustering models relationships of instances to group them so
that instances in the same group are similar.
I Association models relationships between attributes.
26 / 72
Concrete Tasks (2)
Classification:
I The data is classified, e.g. people can be labelled as Covid
positive, or negative based on their symptoms.
I Models ways that attributes determine the class of instances.
I Supervised learning task because it uses already classified
instances to predictions for new instances.
27 / 72
Concrete Tasks (3)
Regression:
I Models ways that attributes determine a numeric value.
I Variant of classification, but without discrete classes.
I Supervised learning task, similarly to classification.
I Often, the produced model is more interesting than predicted
values, e.g. what attributes affect car prices.
28 / 72
Concrete Tasks (4)
Clustering:
I Models similarity between instances and divides them into
groups so that instances in the same group are more similar
than instances in different groups.
I E.g. partition customers into groups.
I By labelling the clusters, we may use them in meaningful ways.
I Unsupervised learning task because the data is not labelled.
29 / 72
Concrete Tasks (5)
Association:
I Models how some attributes determine other attributes.
I No specific class or label.
I May examine any subset of attributes to predict any other
disjoint subset attributes.
I Usually involve only nominal data.
I E.g. use supermarket data, to identify combinations of
products that occur together in transactions.
30 / 72
Process
31 / 72
Process (2)
Data Mining Process
Data containing
Examples
Question of
Evaluate
Interest
Score Model
Prediction or
Insight
32 / 72
Process (3)
I Supervised learning:
I There is a target attribute.
I If nominal, then classification. E.g. to play or not in the
weather data set.
I If numeric, then predition. E.g. predict power value in the CPU
performance data set.
I Unsupervised learning:
I There is no target attribute.
I Cluster instances into groups of similarity.
I Find attribute correlations or associations.
I There exist other data mining tasks for other types of data.
33 / 72
Process (4)
34 / 72
Process (5)
35 / 72
Process (6)
36 / 72
Process (7)
37 / 72
Process (8)
Step 6: Repeat
I Usually, multiple iterations of the aforementioned steps are
required to build a good enough model.
I Revise the performed steps, adapt and reiterate.
38 / 72
Relation to Other Fields
39 / 72
Relation to Other Fields (2)
I DM is strongly related to ML and Stats.
I These fields share methods, but use them in different ways
and for different reasons.
40 / 72
Relation to Other Fields (3)
41 / 72
Relation to Other Fields (4)
42 / 72
Relation to Other Fields (5)
43 / 72
Lecture Contents
I Section 3: Models
I Knowledge Representations
I Evaluation
44 / 72
Knowledge Representations
I Decision Tables
I Trees
I Rules
I Linear Models
I Instance-Based Representations
I Clusters
45 / 72
Tables
Decision Table
I Concise visual representation for specifying which actions to
perform based on given conditions.
I Contains a set of attributes and a decision label for each
unique set of attribute values.
46 / 72
Trees
Building Blocks
I Nodes: specify decisions to be made.
I Branches from a node represent possible alternatives.
I A branch connects a parent node to one of its child nodes.
I The very top node without a parent is called root.
I The very bottom nodes without a child are called leaves.
47 / 72
Trees (2)
Decision Trees:
I Branches may involve a single or multiple attributes.
I We examine the value of an attribute and branch based on
equality or inequality.
i f ( t e m p e r a t u r e < 80 ) :
branch l e f t
else :
branch r i g h t
48 / 72
Trees (3)
Decision Trees:
I A path is a sequence of nodes such that each node is the
child of the previous node in the sequence.
I An attribute can be tested more than once in a path.
I In a classification context:
I A leaf specifies a class.
I Each instance satisfying all decisions of the corresponding path
from the root to the leaf is assigned this class.
49 / 72
Trees (4)
I Possible solutions:
I Ignore all instances with missing values.
I Each attribute may get the value missing.
I Set the most popular choice for each missing attribute value.
I Make a probabilistic (weighted) choice for each missing
attribute value, based on the other instances.
I All these solutions propagate errors, especially when the
number of missing values increases.
50 / 72
Trees (5)
Functional Tree
I Computes a function of multiple attribute values in each node.
I Branches based on the value returned by the function.
i f ( petal length ∗ petal width > threshold ) :
make d e c i s i o n
else :
make d i f f e r e n t d e c i s i o n
51 / 72
Trees (6)
Regression Tree
I Predicts numeric values.
I Each node branches on the value of an attribute or on the
value of a function of the attributes.
I A leaf specifies a predicted value for corresponding instances.
52 / 72
Trees (7)
Model Trees
I Similar to a regression tree, except that a regression
equation predicts the numeric output value in each leaf.
I A regression equation predicts a numeric quantity as a
function of the attributes.
I More sophisticated than linear regression and regression trees.
53 / 72
Rules
Rule
I An expression in if-then format.
I The if part is the pre-condition or antecedent and consists
of a series of tests.
I The then part is the conclusion or consequent and assigns
values to one or more attributes.
54 / 72
Rules (2)
Classification rules:
I Predict the class or label of an instance.
I Can be derived from a decision tree.
I One rule can be constructed for each leaf of the tree:
I The pre-condition contains a clause for each decision along the
path from the root to the leaf.
I The conclusion is the class of the leaf.
I Rule sets constructed in this way may contain redundancies,
especially if multiple leaves contain the same class.
55 / 72
Rules (3)
I Transforming a set of rules into a decision tree is also
possible, but not straightforward.
I The difficulty is the order of tests, starting from the root.
I The replicated subtree problem may occur, i.e. no matter
which rule is chosen first, the other is replicated in the tree.
I Sometimes, classification rules can be significantly more
compact than decision trees.
i f a and b :
x
i f c and d :
x
56 / 72
Rules (4)
57 / 72
Rules (5)
Association rules:
I Predict an attribute of an instance.
I Similar to classification rules except that they can predict
combinations of attributes too.
I Express different regularities in the data set.
I Many different association rules, even in tiny data sets.
58 / 72
Rules (6)
Learning Rules:
I By adding new rules and refining existing rules while more
instances are added in the training set.
I A refinement may add another conjunctive clause (and) to a
pre-condition.
Rules may:
I contain functions of attribute values, e.g. area( rectangle ).
I compare attribute values or functions of them, e.g.
area( rectangle )>width(rectangle)
I recursively concern different data set parts, e.g.
tallerThan ( rectangle , triangle )
59 / 72
Linear Models
I A linear model is a weighted sum of attribute values.
I E.g. PRP = 2.47 · CACH + 37.06.
I All attribute values must be numeric.
I Typically visualised as a 2D scatter plot with a regression
line, i.e. a linear function that best represents the data.
60 / 72
Linear Models (2)
I Linear models can be applied to classification problems, by
defining decision boundaries separating instances that
belong to different classes.
I E.g. 0.5 · PL + 0.8 · PW = 2.0.
61 / 72
Instance-Based Representations
Instance-Based Learning
I Instead of creating models, memorise actual instances.
I Instances are knowledge representation themselves.
I For new instances, search their closest ones in the training set.
62 / 72
Instance-Based Representations (2)
Euclidean Distance
Metric computing the distance between instances i and i 0 with
numeric attributes.
s
n
d(i, i 0 ) = ∑ (xi,j − xi ,j )2
0
j=1
63 / 72
Instance-Based Representations (3)
Hamming Distance
Metric computing the distance between instances i and i 0 with
nominal attributes.
(
0, if xi,j = xi 0 ,j
I Attribute j contributes to d(i, i 0 ) :
1, if xi,j 6= xi 0 ,j
I Contributes 0 if the attribute values are the same.
I Contributes 1 if the attribute values are different.
64 / 72
Instance-Based Representations (4)
I Often, it is not desirable to store all training instances.
I Deciding the (i) saved and (ii) discarded instances is an issue.
I Even though instance-based methods do not learn an explicit
structure, the instances and distance metric specify
boundaries distinguishing different classes.
I Some instance-based methods create rectagular regions
containing instances of the same class.
65 / 72
Clusters
Clustering
Partitions the training set into regions which can be:
I non-overlapping, i.e. each instance is in exactly one cluster,
I overlapping, i.e. an instance may appear in multiple clusters.
66 / 72
Clusters (2)
67 / 72
Model Evaluation
https://xkcd.com/242/
68 / 72
Model Evaluation (2)
69 / 72
Model Evaluation (3)
70 / 72
Model Evaluation (4)
71 / 72
Lecture Contents
I Section 3: Models
I Knowledge Representations
I Evaluation
72 / 72