You are on page 1of 5

11/29/23, 11:21 PM ML Assignment 02 - Colaboratory

# importing module
import pandas as pd
# dataset
data = pd.read_csv("/content/drive/MyDrive/Market_Basket_Optimisation.csv")
# printing the shape of the dataset
data.shape

(7500, 20)

# printing the heading


data.head()

whole
vegetables green cottage energy tomato
shrimp almonds avocado weat yams
mix grapes cheese drink juice
flour y

0 burgers meatballs eggs NaN NaN NaN NaN NaN NaN NaN

1 chutney NaN NaN NaN NaN NaN NaN NaN NaN NaN

2 turkey avocado NaN NaN NaN NaN NaN NaN NaN NaN

mineral energy whole green


3 milk NaN NaN NaN NaN NaN
water bar wheat rice tea

low fat
4 NaN NaN NaN NaN NaN NaN NaN NaN NaN
yogurt

# importing module
import numpy as np
# Gather All Items of Each Transactions into Numpy Array
transaction = []
for i in range(0, data.shape[0]):
for j in range(0, data.shape[1]):
transaction.append(data.values[i,j])
# converting to numpy array
transaction = np.array(transaction)
# Transform Them a Pandas DataFrame
df = pd.DataFrame(transaction, columns=["items"])
# Put 1 to Each Item For Making Countable Table, to be able to perform Group By
df["incident_count"] = 1
# Delete NaN Items from Dataset
indexNames = df[df['items'] == "nan" ].index
df.drop(indexNames , inplace=True)
# Making a New Appropriate Pandas DataFrame for Visualizations
df_table = df.groupby("items").sum().sort_values("incident_count", ascending=False).reset_index()
# Initial Visualizations
df_table.head(10).style.background_gradient(cmap='Greens')

items incident_count

0 mineral water 1787

1 eggs 1348

2 spaghetti 1306

3 french fries 1282

4 chocolate 1230

5 green tea 990

6 milk 972

7 ground beef 737

8 frozen vegetables 715

9 pancakes 713

# importing required module


import plotly.express as px
# to have a same origin
df_table["all"] = "all"
# creating tree map using plotly
fig = px.treemap(df_table.head(30), path=['all', "items"], values='incident_count',
l df t bl ["i id t t"] h d(30) h d t ['it ']
https://colab.research.google.com/drive/1b2mSuZT01tjOr-aD6thEBhRUTWouswj5#scrollTo=AaLUrF-sMHhi&printMode=true 1/5
11/29/23, 11:21 PM ML Assignment 02 - Colaboratory
color=df_table["incident_count"].head(30), hover_data=['items'],
color_continuous_scale='Greens',
)
# ploting the treemap
fig.show()

output /usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283: DeprecationWarning:

`should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and

labels=all
incident_count
parent=
all
color
id=all
mineral water spaghetti green tea frozen vegetables cookies escalope low fat yogurt shrimp items=(?)
color=877.888
1600

1400
pancakes
tomatoes turkey chicken whole wheat rice

french fries
milk 1200

grated cheese soup honey champagne 1000


burgers olive oil
eggs
800
chocolate
ground beef
herb & pepper
cake frozen smoothie cooking oil fresh bread salmon 600

400

# importing the required module


from mlxtend.preprocessing import TransactionEncoder
# initializing the transactionEncoder
te = TransactionEncoder()
te_ary = te.fit(transaction).transform(transaction)
dataset = pd.DataFrame(te_ary, columns=te.columns_)
# dataset after encoded
dataset

& a b c d e f g h ... p r

0 False False False True False False True False True False ... False True

1 False False True True False False True False False False ... False False

2 False False False False False False True False True False ... False False

3 False False True False False False False False False False ... False False

4 False False True False False False False False False False ... False False

... ... ... ... ... ... ... ... ... ... ... ... ... ...

149995 False False True False False False False False False False ... False False

149996 False False True False False False False False False False ... False False

149997 False False True False False False False False False False ... False False

149998 False False True False False False False False False False ... False False

149999 False False True False False False False False False False ... False False

150000 rows × 27 columns

https://colab.research.google.com/drive/1b2mSuZT01tjOr-aD6thEBhRUTWouswj5#scrollTo=AaLUrF-sMHhi&printMode=true 2/5
11/29/23, 11:21 PM ML Assignment 02 - Colaboratory
# select top 50 items
first50 = df_table["items"].head(50).values
# Extract Top50
dataset = dataset.loc[:50]
# shape of the dataset
dataset.shape

(51, 27)

# importing the required module


from mlxtend.frequent_patterns import apriori, association_rules

# Extracting the most frequest itemsets via Mlxtend.


# The length column has been added to increase ease of filtering.
frequent_itemsets = apriori(dataset, min_support=0.01, use_colnames=True)
frequent_itemsets['length'] = frequent_itemsets['itemsets'].apply(lambda x: len(x))
# printing the frequent itemset
frequent_itemsets

https://colab.research.google.com/drive/1b2mSuZT01tjOr-aD6thEBhRUTWouswj5#scrollTo=AaLUrF-sMHhi&printMode=true 3/5
11/29/23, 11:21 PM ML Assignment 02 - Colaboratory

1 to 25 of 379 entries Filter


index support itemsets length
0 0.9215686274509803 frozenset({'a'}) 1
1 0.0392156862745098 frozenset({'b'}) 1
2 0.0392156862745098 frozenset({'c'}) 1
3 0.0196078431372549 frozenset({'d'}) 1
4 0.09803921568627451 frozenset({'e'}) 1
5 0.0392156862745098 frozenset({'g'}) 1
6 0.0196078431372549 frozenset({'h'}) 1
7 0.0196078431372549 frozenset({'k'}) 1
8 0.0196078431372549 frozenset({'l'}) 1
9 0.0196078431372549 frozenset({'m'}) 1
10 0.9019607843137255 frozenset({'n'}) 1
11 0.0196078431372549 frozenset({'o'}) 1
12 0.0392156862745098 frozenset({'r'}) 1
13 0.058823529411764705 frozenset({'s'}) 1
14 the frequntly items
# printing 0.058823529411764705 frozenset({'t'}) 1
frequent_itemsets[
15 (frequent_itemsets['length'] == 2) 0.058823529411764705
& frozenset({'u'}) 1
16 (frequent_itemsets['support'] >= 0.05) ]
0.0196078431372549 frozenset({'v'}) 1
17 0.0392156862745098 frozenset({'y'}) 1
/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283: DeprecationWarning:
18 0.0196078431372549 frozenset({'b', 'a'}) 2
19
`should_run_async` 0.0196078431372549
will not call `transform_cell` automatically frozenset({'c',
in the 'a'})
future. Please 2
pass the result to `transformed_cell` argument and
20 0.0196078431372549 frozenset({'d', 'a'}) 2
21 support itemsets length 0.0196078431372549 frozenset({'e', 'a'}) 2
22 0.0196078431372549 frozenset({'l', 'a'}) 2
24 0.882353 (n, a) 2
23 0.0196078431372549 frozenset({'m', 'a'}) 2
55 240.058824 (s, e) 2 0.8823529411764706 frozenset({'n', 'a'}) 2
Show 25 per page 1 2 10 16
56 0.058824 (e, t) 2

57 0.058824 (e, u) 2
Like what you see? Visit the data table notebook to learn more about interactive tables.
/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283: DeprecationWarning:

`should_run_async`
# printing the frequntly will not
items call
with `transform_cell`
length 3 automatically in the future. Please pass the result to `transformed_cell` argument and
frequent_itemsets[ (frequent_itemsets['length'] == 3) ].head(3)
/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283: DeprecationWarning:
/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283: DeprecationWarning:
`should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and
`should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and
Distributions
support itemsets length

88 0.019608 (e, b, a) 3

89 0.019608 (l, b, a) 3

90 0.019608 (m, b, a) 3

2-d distributions
# We set our metric as "Lift" to define whether antecedents & consequents are dependent our not
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1.2)
rules["antecedents_length"] = rules["antecedents"].apply(lambda x: len(x))
rules["consequents_length"] = rules["consequents"].apply(lambda x: len(x))
rules.sort_values("lift",ascending=False)

Time series

Values

https://colab.research.google.com/drive/1b2mSuZT01tjOr-aD6thEBhRUTWouswj5#scrollTo=AaLUrF-sMHhi&printMode=true 4/5
11/29/23, 11:21 PM ML Assignment 02 - Colaboratory

/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283: DeprecationWarning:

`should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and

antecedent consequent
antecedents consequents support confidence lift leverage conviction zhangs_metric antecedents_
support support

2445 (m, a) (e, l, t) 0.019608 0.019608 0.019608 1.000000 51.000000 0.019223 inf 1.000000

2150 (e, l, a) (t, b) 0.019608 0.019608 0.019608 1.000000 51.000000 0.019223 inf 1.000000

3886 (s, m, e) (t, b, a) 0.019608 0.019608 0.019608 1.000000 51.000000 0.019223 inf 1.000000
# Sort values based on confidence
rules.sort_values("confidence",ascending=False)
3885 (s, a, e) (t, m, b) 0.019608 0.019608 0.019608 1.000000 51.000000 0.019223 inf 1.000000

3884 (s, t, e) (m, b, a) 0.019608 0.019608 0.019608 1.000000 51.000000 0.019223 inf 1.000000
/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283: DeprecationWarning:
... ... ... ... ... ... ... ... ... ... ...
`should_run_async`
539 (s, e) will not call
(t) `transform_cell`
0.058824 automatically
0.058824 0.019608 in the future. 5.666667
0.333333 Please pass the result
0.016148 to `transformed_cell`
1.411765 0.875000 argument and

540 (e, t) (s) 0.058824


antecedent 0.058824
consequent 0.019608 0.333333 5.666667 0.016148 1.411765 0.875000
antecedents consequents support confidence lift leverage conviction zhangs_metric antecedents_leng
support support
545 (s, e) (u) 0.058824 0.058824 0.019608 0.333333 5.666667 0.016148 1.411765 0.875000
2445 (m, a) (e, l, t) 0.019608 0.019608 0.019608 1.0 51.0 0.019223 inf 1.00
18 (e) (c) 0.098039 0.039216 0.019608 0.200000 5.100000 0.015763 1.200980 0.891304
2728 (e, t, m) (s, b) 0.019608 0.039216 0.019608 1.0 25.5 0.018839 inf 0.98
19 (c) (e) 0.039216 0.098039 0.019608 0.500000 5.100000 0.015763 1.803922 0.836735
2692rows × 12(s,
4890 l, e)
columns (t, b) 0.019608 0.019608 0.019608 1.0 51.0 0.019223 inf 1.00

2693 (s, l, t) (e, b) 0.019608 0.039216 0.019608 1.0 25.5 0.018839 inf 0.98

2694 (e, l, t) (s, b) 0.019608 0.039216 0.019608 1.0 25.5 0.018839 inf 0.98

... ... ... ... ... ... ... ... ... ... ...

763 (e) (t, b, a) 0.098039 0.019608 0.019608 0.2 10.2 0.017686 1.22549 1.00

2859 (e) (n, t, c, h) 0.098039 0.019608 0.019608 0.2 10.2 0.017686 1.22549 1.00

739 (e) (m, b, a) 0.098039 0.019608 0.019608 0.2 10.2 0.017686 1.22549 1.00

727 (e) (l, b, a) 0.098039 0.019608 0.019608 0.2 10.2 0.017686 1.22549 1.00

1611 (e) (n, t, h) 0.098039 0.019608 0.019608 0.2 10.2 0.017686 1.22549 1.00

4890 rows × 12 columns

https://colab.research.google.com/drive/1b2mSuZT01tjOr-aD6thEBhRUTWouswj5#scrollTo=AaLUrF-sMHhi&printMode=true 5/5

You might also like