ML Assignment 02 - Aqsa Mushtaq

11/29/23, 11:21 PM ML Assignment 02 - Colaboratory
# importing module
import pandas as pd
# dataset
data = pd.read_csv("/content/drive/MyDrive/Market_Basket_Optimisation.csv")
# printing the shape of the dataset
data.shape
(7500, 20)
# printing the heading

data.head()
whole
vegetables green cottage energy tomato
shrimp almonds avocado weat yams
mix grapes cheese drink juice
flour y
0 burgers meatballs eggs NaN NaN NaN NaN NaN NaN NaN
1 chutney NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 turkey avocado NaN NaN NaN NaN NaN NaN NaN NaN
mineral energy whole green

3 milk NaN NaN NaN NaN NaN
water bar wheat rice tea
low fat
4 NaN NaN NaN NaN NaN NaN NaN NaN NaN
yogurt
# importing module
import numpy as np
# Gather All Items of Each Transactions into Numpy Array
transaction = []
for i in range(0, data.shape[0]):
for j in range(0, data.shape[1]):
transaction.append(data.values[i,j])
# converting to numpy array
transaction = np.array(transaction)
# Transform Them a Pandas DataFrame
df = pd.DataFrame(transaction, columns=["items"])
# Put 1 to Each Item For Making Countable Table, to be able to perform Group By
df["incident_count"] = 1
# Delete NaN Items from Dataset
indexNames = df[df['items'] == "nan" ].index
df.drop(indexNames , inplace=True)
# Making a New Appropriate Pandas DataFrame for Visualizations
df_table = df.groupby("items").sum().sort_values("incident_count", ascending=False).reset_index()
# Initial Visualizations
df_table.head(10).style.background_gradient(cmap='Greens')
items incident_count
0 mineral water 1787
1 eggs 1348
2 spaghetti 1306
3 french fries 1282
4 chocolate 1230
5 green tea 990
6 milk 972
7 ground beef 737
8 frozen vegetables 715
9 pancakes 713
# importing required module

import plotly.express as px
# to have a same origin
df_table["all"] = "all"
# creating tree map using plotly
fig = px.treemap(df_table.head(30), path=['all', "items"], values='incident_count',
l df t bl ["i id t t"] h d(30) h d t ['it ']
https://colab.research.google.com/drive/1b2mSuZT01tjOr-aD6thEBhRUTWouswj5#scrollTo=AaLUrF-sMHhi&printMode=true 1/5
color=df_table["incident_count"].head(30), hover_data=['items'],
color_continuous_scale='Greens',
)
# ploting the treemap
fig.show()
output /usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283: DeprecationWarning:
`should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and
labels=all
incident_count
parent=
all
color
id=all
mineral water spaghetti green tea frozen vegetables cookies escalope low fat yogurt shrimp items=(?)
color=877.888
1600
1400
pancakes
tomatoes turkey chicken whole wheat rice
french fries
milk 1200
grated cheese soup honey champagne 1000

burgers olive oil
eggs
800
chocolate
ground beef
herb & pepper
cake frozen smoothie cooking oil fresh bread salmon 600
400
# importing the required module

from mlxtend.preprocessing import TransactionEncoder
# initializing the transactionEncoder
te = TransactionEncoder()
te_ary = te.fit(transaction).transform(transaction)
dataset = pd.DataFrame(te_ary, columns=te.columns_)
# dataset after encoded
dataset
& a b c d e f g h ... p r
0 False False False True False False True False True False ... False True
1 False False True True False False True False False False ... False False
2 False False False False False False True False True False ... False False
3 False False True False False False False False False False ... False False
... ... ... ... ... ... ... ... ... ... ... ... ... ...
150000 rows × 27 columns
# select top 50 items
first50 = df_table["items"].head(50).values
# Extract Top50
dataset = dataset.loc[:50]
# shape of the dataset
dataset.shape
(51, 27)
# importing the required module

from mlxtend.frequent_patterns import apriori, association_rules
# Extracting the most frequest itemsets via Mlxtend.

# The length column has been added to increase ease of filtering.
frequent_itemsets = apriori(dataset, min_support=0.01, use_colnames=True)
frequent_itemsets['length'] = frequent_itemsets['itemsets'].apply(lambda x: len(x))
# printing the frequent itemset
frequent_itemsets
1 to 25 of 379 entries Filter

index support itemsets length
0 0.9215686274509803 frozenset({'a'}) 1
1 0.0392156862745098 frozenset({'b'}) 1
2 0.0392156862745098 frozenset({'c'}) 1
3 0.0196078431372549 frozenset({'d'}) 1
4 0.09803921568627451 frozenset({'e'}) 1
5 0.0392156862745098 frozenset({'g'}) 1
6 0.0196078431372549 frozenset({'h'}) 1
7 0.0196078431372549 frozenset({'k'}) 1
8 0.0196078431372549 frozenset({'l'}) 1
9 0.0196078431372549 frozenset({'m'}) 1
10 0.9019607843137255 frozenset({'n'}) 1
11 0.0196078431372549 frozenset({'o'}) 1
12 0.0392156862745098 frozenset({'r'}) 1
13 0.058823529411764705 frozenset({'s'}) 1
14 the frequntly items
# printing 0.058823529411764705 frozenset({'t'}) 1
frequent_itemsets[
15 (frequent_itemsets['length'] == 2) 0.058823529411764705
& frozenset({'u'}) 1
16 (frequent_itemsets['support'] >= 0.05) ]
0.0196078431372549 frozenset({'v'}) 1
17 0.0392156862745098 frozenset({'y'}) 1
/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283: DeprecationWarning:
18 0.0196078431372549 frozenset({'b', 'a'}) 2
19
`should_run_async` 0.0196078431372549
will not call `transform_cell` automatically frozenset({'c',
in the 'a'})
future. Please 2
pass the result to `transformed_cell` argument and
20 0.0196078431372549 frozenset({'d', 'a'}) 2
21 support itemsets length 0.0196078431372549 frozenset({'e', 'a'}) 2
22 0.0196078431372549 frozenset({'l', 'a'}) 2
24 0.882353 (n, a) 2
23 0.0196078431372549 frozenset({'m', 'a'}) 2
55 240.058824 (s, e) 2 0.8823529411764706 frozenset({'n', 'a'}) 2
Show 25 per page 1 2 10 16
56 0.058824 (e, t) 2
57 0.058824 (e, u) 2
Like what you see? Visit the data table notebook to learn more about interactive tables.
`should_run_async`
# printing the frequntly will not
items call
with `transform_cell`
length 3 automatically in the future. Please pass the result to `transformed_cell` argument and
frequent_itemsets[ (frequent_itemsets['length'] == 3) ].head(3)
Distributions
support itemsets length
88 0.019608 (e, b, a) 3
89 0.019608 (l, b, a) 3
90 0.019608 (m, b, a) 3
2-d distributions
# We set our metric as "Lift" to define whether antecedents & consequents are dependent our not
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1.2)
rules["antecedents_length"] = rules["antecedents"].apply(lambda x: len(x))
rules["consequents_length"] = rules["consequents"].apply(lambda x: len(x))
rules.sort_values("lift",ascending=False)
Time series
Values
antecedent consequent
antecedents consequents support confidence lift leverage conviction zhangs_metric antecedents_
support support
2445 (m, a) (e, l, t) 0.019608 0.019608 0.019608 1.000000 51.000000 0.019223 inf 1.000000
2150 (e, l, a) (t, b) 0.019608 0.019608 0.019608 1.000000 51.000000 0.019223 inf 1.000000
3886 (s, m, e) (t, b, a) 0.019608 0.019608 0.019608 1.000000 51.000000 0.019223 inf 1.000000
# Sort values based on confidence
rules.sort_values("confidence",ascending=False)
3885 (s, a, e) (t, m, b) 0.019608 0.019608 0.019608 1.000000 51.000000 0.019223 inf 1.000000
3884 (s, t, e) (m, b, a) 0.019608 0.019608 0.019608 1.000000 51.000000 0.019223 inf 1.000000
... ... ... ... ... ... ... ... ... ... ...
`should_run_async`
539 (s, e) will not call
(t) `transform_cell`
0.058824 automatically
0.058824 0.019608 in the future. 5.666667
0.333333 Please pass the result
0.016148 to `transformed_cell`
1.411765 0.875000 argument and
540 (e, t) (s) 0.058824

antecedent 0.058824
consequent 0.019608 0.333333 5.666667 0.016148 1.411765 0.875000
antecedents consequents support confidence lift leverage conviction zhangs_metric antecedents_leng
support support
545 (s, e) (u) 0.058824 0.058824 0.019608 0.333333 5.666667 0.016148 1.411765 0.875000
2445 (m, a) (e, l, t) 0.019608 0.019608 0.019608 1.0 51.0 0.019223 inf 1.00
18 (e) (c) 0.098039 0.039216 0.019608 0.200000 5.100000 0.015763 1.200980 0.891304
2728 (e, t, m) (s, b) 0.019608 0.039216 0.019608 1.0 25.5 0.018839 inf 0.98
19 (c) (e) 0.039216 0.098039 0.019608 0.500000 5.100000 0.015763 1.803922 0.836735
2692rows × 12(s,
4890 l, e)
columns (t, b) 0.019608 0.019608 0.019608 1.0 51.0 0.019223 inf 1.00
2693 (s, l, t) (e, b) 0.019608 0.039216 0.019608 1.0 25.5 0.018839 inf 0.98
2694 (e, l, t) (s, b) 0.019608 0.039216 0.019608 1.0 25.5 0.018839 inf 0.98
... ... ... ... ... ... ... ... ... ... ...
763 (e) (t, b, a) 0.098039 0.019608 0.019608 0.2 10.2 0.017686 1.22549 1.00
2859 (e) (n, t, c, h) 0.098039 0.019608 0.019608 0.2 10.2 0.017686 1.22549 1.00
739 (e) (m, b, a) 0.098039 0.019608 0.019608 0.2 10.2 0.017686 1.22549 1.00
727 (e) (l, b, a) 0.098039 0.019608 0.019608 0.2 10.2 0.017686 1.22549 1.00
1611 (e) (n, t, h) 0.098039 0.019608 0.019608 0.2 10.2 0.017686 1.22549 1.00
4890 rows × 12 columns

ML Assignment 02 - Aqsa Mushtaq

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ML Assignment 02 - Aqsa Mushtaq

Uploaded by

Copyright:

Available Formats

11/29/23, 11:21 PM ML Assignment 02 - Colaboratory

# printing the heading

mineral energy whole green

0 mineral water 1787

3 french fries 1282

5 green tea 990

7 ground beef 737

8 frozen vegetables 715

# importing required module

output /usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283: DeprecationWarning:

grated cheese soup honey champagne 1000

# importing the required module

150000 rows × 27 columns

# importing the required module

# Extracting the most frequest itemsets via Mlxtend.

1 to 25 of 379 entries Filter

540 (e, t) (s) 0.058824

4890 rows × 12 columns

You might also like