Professional Documents
Culture Documents
Introduction
• Course Roadmap
– Association Rules
– Cluster Analysis
– Smoothing Methods
ASSOCIATION RULES
• Also called
– Affinity Analysis
• Due to its origin from the studies of customer purchase transactions databases
• Main Idea is
single transaction
Segmentation
Association rules
• Two-stage process
– Rule generation
• Apriori Algorithm
– Open RStudio
• Candidate Rules generation
– Select rules which are most likely to capture the true association
• “If-then” format
• Rule generation
• Rule generation
– ‘concept of support’
– Support of a rule is
• Measures the degree of support the data provides for the validity of the rule
• Support
Support
Assignment 1
1)Which of the following is not an advantage of association rules?
The rules are transparent and easy to understand
Generates clear and simple rules
Generates too many rules
None of the above
2)In one of the frequent item-set examples, it is observed that if tea and milk are bought then
sugar is also purchased by the customers. After generating an association rule among the given
set of items, it is inferred:
{Tea} is antecedent and {sugar} is consequent
{Tea } is antecedent and the item set {milk, sugar} is consequent
The item set {Tea, milk } is consequent and {sugar} is antecedent
The item set { Tea, milk } is antecedent and { sugar} is consequent
3)Support is:
No.of transactions with both antecedent and consequent item sets
Measures the degree of support the data provides for the validity of the rule
Expressed as a percentage of total records
All of the above
4)Online recommender systems is an example of:
Cluster Analysis
Affinity analysis
Decision analysis
Both a and b
P(antecedent | consequent)
None of the above
8)A database has 5 transactions. Of these, 4 transactions include milk and bread. Further, of the
given 4 transactions, 3 transactions include cheese. Find the support percentage for the following
association rule “if milk and bread are purchased, then cheese is also purchased”.
60%
75%
80%
None of the above
9)What are the methods to interpret the results after rule generation?
Absolute Mean
Lift ratio
Gini Index
Apriori
10)How can we best represent ‘benchmark confidence’ for the following association rule: “If X
and Y, then Z”.
{X,Y}/(Total number of transactions)
{Z}/{X,Y}
{X,Y,Z}/(Total number of transactions)
{Z}/(Total number of transactions)
Assignment 2
3)A _____ is a tree diagram for displaying clustering results. Vertical lines represent clusters
that are joined together.
dendrogram
scattergram
scree plot
Histogram
4)Which of the following is true about single link clustering?
distance between clusters is the maximum distance between their members
distance between two clusters as the minimum distance between their members
distance between two clusters as the average distance between their members
none of the above
5)What would be the loss of information if we cluster following observations into single group
under ward’s method?
(3, 3, 2, 0, 5, 2, 6, 4, 0)
34
35.9
34.9
35
6)What would be the loss of information if we cluster following observations into six groups
under ward’s method?
(0, 0, 3, 3, 1, 2, 2, 2,4, 4, 5)
0
1
0.5
10
7)The metrics used for categorical data for calculating distance in clustering is:
jaquard’s coefficient
correlation coefficient
matching coefficient
Maximum coordinate
10)The metrics based on correlation coefficient to calculate distance can be defined as: