Professional Documents
Culture Documents
Part A – MCQs
1|Pag
e
Q6 Consider discretizing a continuous attribute whose values are listed below:
3, 4, 5, 10, 12, 21, 32, 43, 44, 46, 52, 59, 63
Using equal-width partitioning and five bins, how many values are there in the first bin?
A. 1
B. 2
C. 3
D. 4
E. 5
Bin3 …
Q7 Which of the following are known as qualitative data types?
(i) nominal
(ii) interval
(iii) ordinal
(iv) discrete
(v) continuous
A. (i) and (ii).
B. (ii) and (iii)
C. (i) and (iii).
D. (iii) and (iv)
E (iv) and (v)
2|Pag
e
Q9 Suppose a group of 12 students with the test scores listed as follows:
19, 71, 48, 63, 35, 85, 69, 81, 72, 88, 100, 95
By partitioning them into three bins using equi-width method and smoothing by bins boundaries,
how many data items will be in the second bin?
A 1
B 2
C 3
D 4
E 5
Working: 19, 35, 48 ,63, 69, 71, 72, 81, 85, 88, 95, 100
W = (max – min)/3 = (100 – 19)/3 = 27
Bin 1: (19 to 46) – [19, 35]
Bin 2: (47 to 73) – [48, 63, 69, 71, 72]
Bin 3: (74 to 100) – [81, 85, 88, 95, 100]
Q10 This step of the KDD process model deals with noisy data.
A Data Integration
B Data pre-processing
C Data transformation
D Data mining
E Data Interpretation
Q12 Income data for a group of 12 people have the following properties: mean
=$33000 and sd = $11000
Using z-score normalization, an income of $73600 is transformed to:
A 2.24
B 2.79
C 3.69
D 4.25
E 5.36
Working: Z-norm = (value – mean)/SD = (73600 – 33000)/ 11000 = 3.69
Q13 Income data for a group of 12 people have the following properties:
min =$55000 and max = $150000:
Using min-max normalization to map (0 – 1), an income of $73600 is transformed to
3|Pag
e
A 0.543
B 0.434
C 0.365
D 0.196
E 0.098
Working:
v' =
4|Pag
e
Sorted:
13 15 16 16 19 20 20 21 22 22 25 25 25 30 33 33 35 35 35 36 40 45 46 52 73
(i)
V= 200
v' =
V= 300
v' =
V= 400
5|Pag
e
v' =
V= 600
v' =
V= 750
v' =
V= 900
v' =
V= 1000
v' =
V= 1200
v' =
V’
6|Pag
e
(ii) Decimal Scaling the normalized values are:
𝑁
1
𝜎=√ ∑ (𝑥𝑖 − 𝑢)2
𝑁−1
𝑖 =1
Mean = (35 + 36 + 46 + 68 + 70 ) / 5 = 51
(35-51)2 + (36-51)2 + (46-51)2 + (68-51)2 + (70-51)2 = 1156
StdDev = SQRT ( 1156 / 4) = 17
7|Pag
e
Q5) Using the Min-Max method, normalize the following data to scale (1 – 10).
Show your calculations. Use
the formula
new_min = 1
new_max = 10
V = 5.8 (David)
v' = 9) + 1 = 4.6
V = 6.0 (Jessica)
v' = 9) +1 = 7
V = 6.25 (Mary)
v' = 9) +1 = 10
V = 5.9 (Rahini)
v' =
8|Pag
e
Body Mass Index (BMI
V = 28 (David)
v' = 9) + 1 = 10
V = 18 (Jessica)
v' = 9) + 1 = 1
V = 20 (Mary)
v' = 9) + 1 = 2.8
V = 21 (Rahini)
v' = 9) + 1 = 3.7
V = 125 (Jacki)
v' =
V = 145 (David)
v' = 145 9) +1 = 10
V = 110 (Jessica)
v' = 9) + 1 = 1
V = 135 (Mary)
v' =
V = 120 (Rahini)
v' = 9) +1 = 3.57
9|Pag
e
Min-Max Normalisation (1-10)
1.1 Open Weather.nominal dataset, and click on choose button to choose a filter
There are a lot of different filters. Allfilter and MultiFilter are ways of combining filters. We have
supervised and unsupervised filters. Supervised filters are ones that use a class value for their
operation. They aren't as common as unsupervised filters, which don't use the class value. There are
attribute filters and instance filters. We want to remove an attribute. So we're looking for an attribute
filter. There are so many filters in Weka that you just must learn to look around and find what you
want.
10 | P a g e
1.3 click on the text box of the choose field and specify the attributeIndices as 3 and select OK
1.4 click on Apply button to see the impact (attribute 3 Humidity is removed from the data set) .
Note : This remove functionality can be achieved by selecting the attribute and choose Remove button
11 | P a g e
Original File:
12 | P a g e
13 | P a g e
Step 2: Visualize the data
14 | P a g e
2.4 Click on any point in the scatter plot and can see its detail
2.5 Change the X Y axis through the bars on right side and check the different plots
2.6 Jitter slider. Sometimes, points sit right on top of each other, and jitter just adds a little bit of
randomness to the x- and the y-axes.
15 | P a g e