Professional Documents
Culture Documents
• Classification
• Dividing up objects so that each is assigned into one of a number of (mutually exhaustive and
exclusive)categories know as Class
• Ex. People who are closely resemble, slightly resemble, not resemble some scene committing a
crime.
• predicts categorical class labels (discrete or nominal)
• classifies data (constructs a model) based on the training set and the values
(class labels) in a classifying attribute and uses it in classifying new data
• Prediction
• models continuous-valued functions, i.e., predicts unknown or missing
values
• Typical applications
• Credit approval
• Target marketing
• Medical diagnosis
• Fraud detection
7/10/22 Data Mining: Concepts and Techniques 1
Classification—A Two-Step Process
• Model construction: describing a set of predetermined classes
• Each tuple/sample is assumed to belong to a predefined class, as
determined by the class label attribute
• The set of tuples used for model construction is training set
• The model is represented as classification rules, decision trees, or
mathematical formulae
• Model usage: for classifying future or unknown objects
• Estimate accuracy of the model
• The known label of test sample is compared with the classified
result from the model
• Accuracy rate is the percentage of test set samples that are
correctly classified by the model
• Test set is independent of training set, otherwise over-fitting will
occur
• If the accuracy is acceptable, use the model to classify data tuples
whose class labels are not known
7/10/22 Data Mining: Concepts and Techniques 2
Day Wind Rain Status
Weekday No Yes Late
Weekend Yes No Late
Weekday No No Late
Weekday Yes No Late
Process (1): Model Construction
Classification
Algorithms
Training
Data
Testing
Data Unseen Data
(Weekday,No,No)
Day Wind Rain
Weekday No Yes Status?
Weekend Yes No
Weekday
Weekday
No
Yes
No
No
Late?
7/10/22 Data Mining: Concepts and Techniques 5
Supervised vs. Unsupervised Learning
A Z
Distance Measures
• Popular distance measures • Manhattan Distance(City Block
Distance):
• Euclidean Ditance:
• The CBD between two points (a1,a2)
• In 2D, training set (a1,a2) unseen and (b1,b2) is () +()
instance (b1,b2) the length • Maximum dimension distance:
(distance) is:
• The largest absolute distance
• ED between 2 points(a1,…,an), between any pair of corresponding
(b1,…,bn) in nD space attribute values
• Ex.(6.2, −7.1, −5.0, 18.3, −3.1, 8.9)
and (8.3,12.4,−4.1, 19.7, −6.2, 12.4)
is: 12.4 − (−7.1) = 19.5.