You are on page 1of 14

Hashemite University

Prince Al-Hussein bin Abdullah II Faculty for


Information Technology
Department of Computer Information Systems

Data Mining Assignment-1

Supervised by Dr Esra'a alshdaifaT

Student Name Student ID


Ala'a Saed Fathi Moh'd Younis 1933981
Mahmoud Anwar mearish AlGhati 1935133
Zaid Tharwat Mustafa Ramadan 1931979
Moayad monther hassan rahhal 1933519
Saif Ghassan Jamal Al-Smadi 2030389
A. Initial data exploration

A1. Data type of attributes:

user -id numeric - interval

gender symmetric binary

age numeric-interval

martial_status symmetric binary

website_activity ordinal

Browsed_electronics_12mo Asymmetric binary

bought_electronics_12mo symmetric binary

bought_digital_media_18mo Asymmetric binary

bought_digital_books symmetric binary

Payment_Method ordinal
A2. data explanation > use excel for extraction the median
range ,etc.…

attributes median variance mean StdDev max min range

user -id - - - - - - -

gender - - - - - - -

age 47 165.856 46.1 12.879 70 17 53

martial_status - - - - - - -

website_activity - - - - - - -

Browsed_electronics_12 - - - - - - -
mo

bought_electronics_12m - - - - - - -
o

bought_digital_media_ - - - - - - -
18mo

bought_digital_books - - - - - - -

Payment_method - - - - - - -
Frequency:

attributes Value freq attributes Value freq

age 0-34 65 Payment_method credit card 42

age 35-52 132 Payment_method bank 109


transfer

age 52-end 103 Payment_method monthly 28


billing

gender f 143 martial_status m 156

gender m 157 martial_status s 144

bought_digital_ no 47 bought_electronics_ yes 152


12mo
media_18mo
bought_digital yes 253 bought_electronics_ no 148
12mo
_media_18mo
website_activity frequent 20 bought_digital_ yes 126
books

website_activity regular 115 bought_digital_ no 174


books

website_activity seldom 165 Browsed_electronics_ no 16


12mo

Payment_method web 121 Browsed_electronics yes 284


account
_12mo
A3. Outliers :

To find the percentage of outlier in any attribute

[ Filter ->unsupervised -> attribute -> Interquartile Range]

and this :
clustering:

go to cluster panel => simple means.

start .
scatter plots:

go to visualize.
B. Data pre-processing

B1. Equi-width binning

Steps :

1. Choose the filter name “ Discretize ”.


2. Then enter number of attribute in “ attributeIndices”.
3. Also enter number of Bins which equal ‘3’.
4. Then click button “ Apply “ .
5. Final The arrow indicates the result .
equi-depth binning

Steps :

1. Choose the filter name “ Discretize ”.


2. Then enter number of attribute in “ attributeIndices”.
3. Then enter number of Bins which equal ‘3’.
4. Also change value “useEqualFrequency” from False to True .
5. Then click button “ Apply “ .
6. Final The arrow indicates the result.
B2. min-max normalization :

Steps :

1. Choose the filter name “ Normalize ”.


2. Then enter Range of attribute in “ scale 1 and translation 0”.
3. Then click button “ Apply “ .
4. Final The arrow indicates the result .
z-score normalization

Steps :

1. Choose the filter name “ Standardize ”.


2. Then click button “ Apply “ .
3. Final The arrow indicates the result .
B3. Discretise the Age attribute into the following categories:

Steps :

1. First Discretise the Age attribute by Equation :

2. Then find frequency of each category by Equations :

Teenager =

Young =

Mid_Age =

Mature =

Old =
B4. Convert the "Gender" to binary :

Steps :

1. Choose the filter name “ NominalToBinary”.


2. Then enter number of attribute in “ attributeIndices”.
3. Then click button “ Apply “ .
4. Then again Choose the filter name “ NumericToBinary”.
5. Then enter number of attribute in “ attributeIndices”.
6. Then click button “ Apply “ .
7. Final The arrow indicates the result.
C. Association Rules Mining:

Steps:

1. First Convert any type attribute from Numeric to Nominal.


2. Then select “Associate “ and Choose Apriori .
3. Then enter Number of Rules in “ numRules ”
4. Then click button “ Start “ .
5. Final The arrow indicates the result.

Rules :

1. Pro attribute And Polit attribute there are implication or co-


occurrence (not causality) , High conf(0.93) which means I have a high
trust in this rule.

2. Sup-Gro attribute And Polit attribute there are implication or co-


occurrence (not causality) , High conf(0.93) which means I have a high
trust in this rule.

3. Soc-Club attribute And Polit attribute there are implication or co-


occurrence (not causality) , High conf(0.93) which means I have a high
high trust in this rule.

You might also like