You are on page 1of 8

GRP 5

GENOBISA, CLAUDINE N.
GONZALES, JULZ NICHOLE A.
JUNSAY, RICHA ELIZ MONIQUE R.

Instructions:
• Write all the members name on the top leftmost corner of the document.
• Insert your answer after each question.

Scenario 1: The Driving license trainer argues that his trainees always pass the driving
test on the first try if they have experience playing car racing games with a joystick.
Use: Driving license dataset

a) Your Task: Apply the APRIORI algorithm to justify whether the Driving license
trainer claim is valid or not. Apriori execution parameters should be applied in
order to obtain the required rules:
• Minsupport threshold (lowerBoundMin Support parameter): 0.1
• Minconfidence threshold (metricType and minMetric parameters): 0.6
• Number of rules extracted (numRules parameters): 10
GRP 5
GENOBISA, CLAUDINE N.
GONZALES, JULZ NICHOLE A.
JUNSAY, RICHA ELIZ MONIQUE R.

The result from the APRIORI algorithm shows 4 best rules that are found. The first
rule, playing car racing games with joystick to passing driving test on the first try, has
the highest confidence interval of 91%. Playing car racing games with joystick has 32
instances and driving test on first try has 29 instances. The result shows that a trainee
who plays car racing games with joysticks almost always pass the driving test on the first
try.
GRP 5
GENOBISA, CLAUDINE N.
GONZALES, JULZ NICHOLE A.
JUNSAY, RICHA ELIZ MONIQUE R.

Scenario 2: The Basket-2018 dataset provide retail transactions of ABC supermarket


which includes: customer card number, amount, payment method, socio-demographic
data (gender, age, etc.) and list of items purchased during the transaction. Use: Basket
(2018).csv

1. Discover association rules between purchased items to help the marketing department
in their upcoming special offer and sales promotion campaigns for 2019.
a) For this analysis, variables 1 to 7 should be removed. Why should variables 1
to 7 be removed?
- The variables from 1 to 7 (Card number, Amount, Payment, Gender, Tenant,
Income, and Age) should be removed because these variables are not necessary in the
data that will be generated since the goal here is to obtain association rules that represent
links between purchased items in this dataset.

b) The goal is to obtain association rules depicting links between purchased items.
The following Apriori execution parameters should be applied in order to obtain
the required rules:
o Minsupport threshold (lowerBoundMin Support parameter): 0.1
o Minconfidence threshold (metricType and minMetric parameters): 0.7
o Number of rules extracted (numRules parameters): 5
c) Attach the result of the Apriori.
GRP 5
GENOBISA, CLAUDINE N.
GONZALES, JULZ NICHOLE A.
JUNSAY, RICHA ELIZ MONIQUE R.

d) Based on the result of the Apriori execution, write down the best special offer
promotion package? Justify your answer by interpreting the result of the Apriori
execution?
- Based on the result of the Apriori execution, the best special offer promotion
package is shown in the first rule which are Canned vegetables, Bread, and Frozen goods
since it has the highest confidence interval of 87% compared to the second and third
rules with 86% and 84%, respectively.
GRP 5
GENOBISA, CLAUDINE N.
GONZALES, JULZ NICHOLE A.
JUNSAY, RICHA ELIZ MONIQUE R.

2. Discover association rules between customer gender and purchased items to help the
marketing department in their promotion campaigns on:
• International Women’s Day.
• International Men’s Day.
b) For this analysis Variables 1 to 3 and 5 to 7 should be removed
c) The goal is to obtain for each customer gender (Gender= M and Gender= F)
association rules depicting links between the gender purchased items. The following
Apriori execution parameters should be applied in order to obtain the required rules:
o ● Minsupport threshold (lowerBoundMin Support parameter): 0.1
o ● Minconfidence threshold (metricType and minMetric parameters): 0.8
o ● Number of rules extracted (numRules parameters): 10
d) Attach the result of the Apriori execution.

e) Based on the result of the Apriori execution, write down the best promotion package
for the International Women’s Day? Justify your answer by interpreting the result of
the Apriori execution?
- Using the Apriori execution, Cakes and chocolates are the best promotion package
for International Women's Day. Which indicates that these promotion packages have a
confidence value of 87% and a lift value of 1.69, the association rules performed in
classifying the cases is indicated.
GRP 5
GENOBISA, CLAUDINE N.
GONZALES, JULZ NICHOLE A.
JUNSAY, RICHA ELIZ MONIQUE R.

f) Based on the result of the Apriori execution, write down the best promotion package
for the International Men’s Day? Justify your answer and interpret the result of the
Apriori execution?
- Based on the Apriori execution, the best promotion package for International Men's
Day is canned vegetables, frozen goods, and bread, with a confidence value of 97%,
which is close to 1, and a lift value of 1.98, which indicates how well the association rules
performed during the execution.

3. The super market notice an increase on sales of healthy foods (Fruits & vegetables
and Fish). The public awareness about healthy eating and disease prevention has been
increased on 2018 and has effect on the consumption and sales of healthy foods. Your
task is to discover association rules between customer age group and healthy foods
purchased items (Fruits & vegetables and fish).
a) For this analysis, variables 1 to 6 should be removed.

b) Discretize the age attribute using the Discretize filter in the Unsupervised/ Attribute
filter, the discretization parameter should be as follows:
• Number of intervals set to 4 (bins parameter).
• Equal-width method (useEqualFrequency parameter).
GRP 5
GENOBISA, CLAUDINE N.
GONZALES, JULZ NICHOLE A.
JUNSAY, RICHA ELIZ MONIQUE R.

c) Why should you discretize age attribute?


- We should discretize the age attribute so that the impact of small fluctuations on
our model is minimized. Minor fluctuations are just noise, and discretizing the age
attribute will smooth out our data's small fluctuations. Furthermore, the remaining
attributes will be described by our discrete age attribute.

d) The goal is to find association rules depicting links between age groups and healthy
food purchased items. The following Apriori execution parameters should be applied in
order to obtain the required rules:
● Minsupport threshold (lowerBoundMin Support parameter): 0.1
● Minconfidence threshold (metricType and minMetric parameters): 0.7
● Number of rules extracted (numRules parameters): 10
e) Attach the result of the Apriori execution in text file and save it as Apriori -Age

f) Based on the result of the Apriori execution, write down if there is any relationship
between any of the 4 age groups and healthy food purchased items? Justify your
answer by interpreting the result of the Apriori execution?
- According to the result, Rule 1 shows an 87% confidence that if people buy
canned vegetables and bread, they will buy frozen goods and there are 167 instances
that people will only buy canned vegetables and bread and 146 instances that it will be
canned vegetables, bread and frozen goods. Rule 2 shows that there is an 86%
confidence that if people buy frozen goods and bread, they will buy canned vegetables
and there are 170 instances that people will only buy frozen goods and bread and there
GRP 5
GENOBISA, CLAUDINE N.
GONZALES, JULZ NICHOLE A.
JUNSAY, RICHA ELIZ MONIQUE R.

146 instances that people will buy frozen goods bread and canned vegetables. Rule 3
shows an 84% confidence that if people buy canned vegetables and frozen goods, they
will buy bread and there are 173 instances that people will only buy canned vegetables
and frozen goods and 146 instances that people will buy canned vegetables, frozen goods
and bread. This gives us an idea that there is no relationship between the age group and
purchasing healthy food items. Therefore, anyone from all age group can purchase health
foods.

You might also like