MID – I K.
Vara Prasad
QUESTION PAPER IIIrd year CSE – 1, 2, 3
DATA WAREHOUSE AND DATA MINING
Answer the following question: 10 x 3 = 30 Marks
1 (a). Explain kinds of data can be mined? Give examples.
(b). How to improve the quality of data? Explain various approaches and tasks used in
data Preprocessing.
2 (a). Given the following data (in increasing order) for the attribute:
age: 13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30,33, 33, 35, 35, 35, 35, 36,
40, 45, 46, 52, 70.
i. Use min-max normalization to transform the value 35 for age onto the range
[0.0,1.0].
ii. Use z-score normalization to transform the value 35 for age, where the standard
deviation of age is 12.94 years.
(b). What are the various OLAP operations are used in the multidimensional data model?
Explain them in detail with an example
3 (a). Explain what kinds of pattern scan be mined? Give examples.
(b). What is data consolidation? In detail discuss various techniques used to consolidate
data.
MID – I
OFFLINE BITS K. Vara Prasad
III year CSE – 1, 2, 3
rd
DATA WAREHOUSE AND DATA MINING
1. __________ is a subject-oriented, integrated, time-variant, non-volatile collection of
data in support of management decisions.
A. Data Mining. B. Data Warehousing. C. Web Mining.
D. Text Mining.
2. For a cube with n dimensions, there are a total of _______ cuboids, including the base
cuboid
A. n cuboids B. 2n cuboids C. 1 cuboid D. 2n cuboids
3. The time horizon in Data warehouse is usually __________
A. 1-2 years B. 3-4 years C. 5-6 years D. 5-10 years
4. Correlation coefficient test is used to apply on _____ type of data.
A. Nominal data B. Numeric data C. Complex data D. Imaginary data
5. Chi-square test is used perform for _____ type of data.
A. Nominal data B. Numeric data C. Complex data D. Imaginary data
6. What is Summarization in data mining?
A. Setting up a target data
B. Data mining procedure to sort data
C. A method to find data
D. To represent the derivate data with visualization and reports.
7. _____ is a process of converting given data into number of frequencies.
A. Integration B. Normalization C. Binning D. Clustering
8. Capability of data mining is to build ___________ models.
A. Predictive B. Imperative C. Introspective D. Business
9. Correlation coefficient is also called____
A. Min-max coefficient B. Pearson coefficient C. Wavelet coefficient
D. Zero coefficient
10. The process of matching up equivalent real-world entities from multiple data sources
is called___
A. Normalization B. Indexing C. Materialization
D. Entity Identification Problem
11. _______ is process of combining of two or more objects into a single object.
A. Generalization B. Aggregation C. Specialization
D. Multitasking
12. In Data warehouse, the load and index is____?
A. A process to upgrade the quality of data warehouse after it is moved into a
warehouse
B. A simple initial parameters
C. A process to load the data in the data warehouse and to create necessary indexes
D. A upgrading policy to ensure the quality of data
13. The output of KDD is _______________
A. Data B. Information C. Query D. Useful Information
14. ____________ of data removes or reduces noise and the treatment of missing values.
A. Data preprocessing B. Data post processing C. Nullifying data
D. Normalization
.
15. ________maps the core warehouse metadata to business concepts, familiar and
useful to end users.
A. Application level metadata B. Algorithmic level metadata
C. Department Level Metadata D. Core warehouse metadata
16. In web mining, _____ is used to know the order in which URLs tend to be accessed.
A. Clustering B. Associations
C. Classification D. Sequential analysis
17. Reducing the number of attributes to solve the high dimensionality problem is called
as______________
A. Compression B. Dimensionality reduction C. Transformation
D. Integration
18. Data transformation includes __________.
A. a process to change data from a detailed level to a summary level.
B. a process to change data from a summary level to a detailed level.
C. joining data from one source into various sources of data.
D. separating data from one source into various sources of data.
19. The type of relationship in star schema is __________________.
A. Many-to-Many. B. One-to-One. C. One-to-Many.
D. Many-to-One.
20. The problem of finding hidden structure in unlabelled data is called ___________
A. Supervised learning B. Unsupervised learning
C. Reinforcement learning D. Semi supervised learning