Professional Documents
Culture Documents
8
23
P6989 [Total No. of Pages : 5
ic-
[5865] - 302
tat
M.C.A. (Management)
5s
IT 32: DATA WAREHOUSING AND DATA MINING
3:2
(2020 Pattern) (Semester - III)
02 71
8:3
1
Time : 2½ Hours] [Max. Marks : 50
20
5/0 013
Instructions to the candidates:
1) All questions are compulsory.
2) Draw neat & labelled diagram wherever necessary.
8/2
.23 MP
82
8
i) Find the wrong statement of the K-means clustering.
23
ic-
a) K-means clustering is a method of vector quantization.
16
tat
b) K-means neighbour is same as K-nearest.
5s
c) K-means clustering aims to partition ‘n’ observations into K clusters.
.24
3:2
1
d) K-means clustering produces the final estimate of cluster centroids.
49
8:3
17
20
13
a) 2 b) 1
8/2
c) 3 d) 4
M
5/0
IM
82
8
23
iii) _____ is an intermediate storage area used for data processing in ETL
.23
tat
a) Buffer b) Virtual memory
8.2
5s
3:2
71
49
8:3
31
a) Fact b) Dimension
16
8.2
[5865] - 302 1
P.T.O.
vi) The role of ETL is to ______.
8
23
a) Find erroneous data
ic-
b) Fix erroneous data
tat
c) Both finding and fixing erroneous data
5s
d) Filtering of the data source
3:2
02 71
vii) _____ is a data transformation process.
8:3
1
a) Comparison b) Projection
20
5/0 013
c) Selection d) Filtering
8/2
.23 MP
82
8
23
c) Online Terminal Protocol
ic-
d) Online Terminal Processing
16
tat
5s
ix) An approach in which the aggregated totals are stored in a multidimensional
.24
3:2
database while the detailed data is stored in the relational database is
1
49
8:3
a ______
17
a) MOLAP b) ROLAP
20
13
c) HOLAP d) OLAP
02
P0
8/2
8
23
.23
xi)
ic-
Efficiency and scalability of data mining algorithm is related to ______.
16
tat
a) Mining methodology b) User interaction
8.2
5s
3:2
71
49
8:3
c) Time-sensitive d) Technical-sensitive
1
02
P0
8/2
xiii) If the ETL process featches the data separately from the host server
M
5/0
during the automatic load to the data warehouse, one of the challenge
IM
82
involved is ______
a) the associated network may be down
.23
d) None
.24
49
[5865] - 302 2
xiv) Webmining helps to improve the power of web search engine by identifying
8
23
_______
ic-
a) Web pages and classifying the web documents
tat
b) XML documents
5s
c) Text documents
3:2
02 71
d) Database
8:3
1
20
5/0 013
xv) ______ is achieved by splitting the text into white spaces.
a) Text cleanup
8/2 b) Tokenization
.23 MP
82
8
xvi) Assigning data to one of the clusters and recompute the centroid are the
23
steps in which algorithm.
ic-
16
tat
a) Apriori algorithm b) Bayesian classification
5s
c) FP tree algorithm d) K-mean
.24
3:2
1
49
8:3
17
xvii) A disadvantage of KNN algorithm is, it takes _____
20
8
23
.23
a) Volatile
tat
8.2
b) Subject oriented
5s
.24
3:2
c) Non volatile
71
49
d) Time varient
8:3
31
20
1
a) Clustering b) Association
8.2
c) Classification d) Subset
.24
49
[5865] - 302 3
Q2) a) What is a Data warehouse. Explain the need and characteristics of Data
8
23
warehouse. [5]
ic-
tat
b) Explain the schemas of Data warehouse. [5]
5s
3:2
02 71
8:3
OR
1
20
a) 5/0 013
Explain Kimball Life Cycle diagram in detail. [5]
8/2
.23 MP
82
architecture. [5]
8
23
ic-
16
tat
Q3) a) What is ETL? Explain data preprocessing techniques in detail. [6]
5s
b) What is OLAP? Describe the characteristics of OLAP. [4]
.24
3:2
1
49
8:3
17
20
OR
13
02
transformation. [6]
M
5/0
8
23
.23
tat
b) Apply FP Tree Algorithm to construct FP Tree and find frequent itemset
8.2
5s
for the following dataset given below (minimum support = 30%) [7]
.24
3:2
71
49
3 Coconut, Dates
M
5/0
4 Berries, Dates
IM
82
.23
5 Apple, Coconut
16
OR
.24
49
[5865] - 302 4
a) Explain data mining techniques in brief. [3]
8
23
b) How does the KNN algorithm works? [7]
ic-
tat
Apply KNN classification algorithm for the given dataset and predict the
5s
class for X(P1 = 3, P2 = 7) (K = 3)
3:2
P1 P2 Class
02 71
8:3
7 7 False
1
20
7 4 5/0 013
False
8/2
.23 MP
3 4 True
1 4 True
8.2 IM
82
8
23
ic-
Q5) a) What is text mining? Explain the process of text mining. [4]
16
tat
5s
b) Explain K-means algorithm. Apply K-means algorithm for group of visitors
.24
3:2
to a website into two groups using their age as follows:
1
49
8:3
17
15, 16, 19, 20, 21, 28, 35, 40, 42, 44, 60, 65
20
13
OR
8/2
M
5/0
a) Apply K-means algorithm for the given data set where K is the cluster
IM
8
23
.23
tat
8.2
5s
.24
3:2
71
49
8:3
31
20
1
02
P0
8/2
M
5/0
IM
82
.23
16
8.2
.24
49
[5865] - 302 5