Professional Documents
Culture Documents
Minimize distance
But to Centers of Groups
Clustering
• First need to identify clusters
– Can be done automatically
– Often clusters determined by problem
• Then simple matter to measure distance
from new observation to each cluster
– Use same measures as with Memory-based
Partitioning
• Define new categorical variables
– Divide data into fixed number (k) of regions
– K-means clustering
Clustering Uses
• Segment customers
– Find profitability of each,
treat accordingly
• Star classification:
– Red giants, white dwarfs,
normal
– Brightness & temperature
used to classify
• U.S. Army
– Identify sizes needed for
female soldiers
– (males – one size fits all)
Tires
• Segment customers into product
categories
– High end (they would buy Michelins)
– Intermediate & Low
• Standardize data (as in Memory based
reasoning)
Raw Tire Data
BRAND INCOME AGE OF CAR
Michelin $182,200 5 months
Michelin $171,200 3 years
Goodyear $28,800 7 years
Goodyear $37,800 6 years
Goodyear $42,200 5 years
Goodyear $55,600 4 years
Goodyear $51,200 9 years
Goodyear $173,400 7 years
Opie’s tires $13,400 3 years
Opie’s tires $68,800 6 years
Standardize
• INCOME
– MIN(1,INCOME/200000)
• AGE OF CAR
– IF({AGE OF CAR})<12 months,1,
– ELSE[MAX{(8-Years)/7},0]
Sort Data by Outcome
BRAND INCOME AGE OF CAR
Michelin High income Bought this year
Michelin High income Bought 1-3 yrs ago
Goodyear Low income Bought 4+ yrs ago
Goodyear Low income Bought 4+ yrs ago
Goodyear Low income Bought 4+ yrs ago
Goodyear Avg income Bought 1-3 yrs ago
Goodyear Avg income Bought 4+ yrs ago
Goodyear High income Bought 4+ yrs ago
Opie’s tires Low income Bought 1-3 yrs ago
Opie’s tires Avg income Bought 4+ yrs ago
Standardized Training Data
BRAND INCOME AGE OF CAR
Michelin 0.911 1
Michelin 0.856 0.714
Goodyear 0.144 0.143
Goodyear 0.189 0.286
Goodyear 0.211 0.429
Goodyear 0.278 0.571
Goodyear 0.256 0
Goodyear 0.867 0.143
Opie’s tires 0.067 0.714
Opie’s tires 0.344 0.286
Identify Cluster Means
(could use median, mode)
BRAND INCOME CAR AGE