You are on page 1of 29

Data Warehousing and Data Mining

MCQ’s

Unit – I

Q1. Which of the following is an essential process in which the intelligent methods are applied
to extract data patterns?

a. Warehousing
b. Data Mining
c. Text Mining
d. Data Selection

Ans: B

Q2. What are the functions of Data Mining?

a. Association and correctional analysis classification


b. Prediction and characterization
c. Cluster analysis and Evolution analysis
d. All of the above

Ans: D

Q3. Which of the following statements is correct about data mining?

a. It can be referred to as the procedure of mining knowledge from data


b. Data mining can be defined as the procedure of extracting information from a set of the
data
c. The procedure of data mining also involves several other processes like data cleaning,
data transformation, and data integration
d. All of the above

Ans: D

Q4. Which one of the following statements is not correct about the data cleaning?

a. It refers to the process of data cleaning


b. It refers to the transformation of wrong data into correct data
c. It refers to correcting inconsistent data
d. All of the above

Ans: D

Q5. The issues like efficiency, scalability of data mining algorithms comes under_______

a. Performance issues
b. Diverse data type issues
c. Mining methodology and user interaction
d. All of the above

Ans: A

Q6. Which of the following correctly refers the data selection?

a. A subject-oriented integrated time-variant non-volatile collection of data in support of


management
b. The actual discovery phase of a knowledge discovery process
c. The stage of selecting the right data for a KDD process
d. All of the above

Ans: C

Q7. Which of the following correctly defines the term "Discovery"?

a. It is hidden within a database and can only be recovered if one is given certain clues (an
example IS encrypted information).
b. An extremely complex molecule that occurs in human chromosomes and that carries
genetic information in the form of genes.
c. It is a kind of process of executing implicit, previously unknown and potentially useful
information from data
d. None of the above

Ans: C

Q8. Which one of the following refers to the model regularities or to the objects that trends or
not consistent with the change in time?
a. Prediction
b. Evolution analysis
c. Classification
d. Both A and B

Ans: B

Q9. The issues like "handling the rational and complex types of data" comes under which of the
following category?

a. Diverse Data Type


b. Mining methodology and user interaction Issues
c. Performance issues
d. All of the above

Ans: A

Q10. Which of the following also used as the first step in the knowledge discovery process?

a. Data selection
b. Data cleaning
c. Data transformation
d. Data integration

Ans: B

Q11. Which of the following refers to the steps of the knowledge discovery process, in which the
several data sources are combined?

a. Data selection
b. Data cleaning
c. Data transformation
d. Data integration

Ans: D

Q12. Which one of the following issues must be considered before investing in data mining?

a. Compatibility
b. Functionality
c. Vendor consideration
d. All of the above

Ans: D

Q13. Which one of the following issues must be considered before investing in data mining?

a. Compatibility
b. Functionality
c. Vendor consideration
d. All of the above

Ans: A

Q14. To remove noise and inconsistent data ____ is needed.

(a)Data Cleaning
(b)Data Transformation
(c)Data Reduction
(d)Data Integration

Ans: A

Q15. _____ studies the collection, analysis, interpretation or explanation, and presentation of data.

(a)Statistics
(b)Visualization
(c)Data Mining
(d)Clustering

Ans: A

Q16. _____ investigates how computers can learn (or improve their performance) based on data.

(a) Machine Learning


(b) Artificial Intelligence
(c) Statistics
(d) Visualization

Ans: A
Q18. ____ is the science of searching for documents or information in documents.

(a)Data Mining
(b)Information Retrieval
(c)Text Mining
(d)Web Mining

Ans: B

Q19. The data mining process should be highly ______

(a)On Going
(b)Active
(c)Interactive
(d)Flexible

Ans : C

Q20. In real world multidimensional view of data mining, The major dimensions are

data, knowledge, technologies, and _____

(a) Methods
(b) Applications
(c) Tools
(d) Files

Ans: B

Q21. An _____ is a data field, representing a characteristic or feature of a data object.

(a)Method
(b)Variable
(c)Task
(d)Attribute

Ans: D

Q22. The values of a _____ attribute are symbols or names of things.

(a)Ordinal
(b)Nominal
(c)Ratio
(d)Interval

Ans: B

Q23. Data about data” is referred to as _____

(a) Information
(b) Database
(c) Metadata
(d) File

Ans: C

Q24. Patterns that can be discovered from a given database are which type…

a) More than one type


b) Multiple type always
c) One type only
d) No specific type
Ans: A

Q25. ——- is not a data mining functionality?


A) Clustering and Analysis
B) Selection and interpretation
C) Classification and regression
D) Characterization and Discrimination

Ans: B

Q26. Which of the following can also applied to other forms?


a) Data streams & Sequence data
b) Networked data
c) Text & Spatial data
d) All of these
Ans: D
Q27. ——– is the output of KDD…
a) Query
b) Useful Information
c) Data
d) information
Ans: B
Q28. What is noise?
a) component of a network
b) context of KDD and data mining
c) aspects of a data warehouse
d) None of these
Ans: B

Q29. Which of the following is not belong to data mining?


(A). Knowledge extraction
(B). Data transformation
(C). Data exploration
(D). Data archaeology
Ans: B

Q30. The learning which is used for inferring a model from labeled
training data is called?
(A). Unsupervised learning
(B). Reinforcement learning
(C). Supervised learning
(D) Machine Learning
Ans: C

Unit -II

Q1. Q1. What is the type of relationship in star schema?


1. many-to-many.
2. one-to-one
3. many-to-one
4. one-to-many
Ans: D

Q2. Fact tables are _______.


1. completely demoralized.
2. partially demoralized.
3. completely normalized.
4. partially normalized.

Ans: C

Q3. Which is NOT a basic conceptual schema in Data Modeling of


Data Warehouses?
1. Star schema
2. Tree schema
3. Snowflake schema
4. Fact constellations b
Ans: B

Q4. Which is NOT a valid layer in Three-layer Data Warehouse Architecture in


Conceptual View?
1. Processed data layer
2. Real-time data layer
3. Derived data layer
4. Reconciled data layer
Ans: A

Q5. Among the types of fact tables which is not a correct type ?
1. Fact-less fact table
2. Transaction fact tables
3. Integration fact tables
4. Aggregate fact tables
Ans: C

Q6. Among the followings which is not a characteristic of Data Warehouse?


1. Integrated
2. Volatile
3. Time-variant
4. Subject oriented

Ans: B

Q7. what is not considered as isssues in data warehousing?


1. optimization
2. data transformation
3. extraction
4. inter mediation
Ans: D

Q8. Which one is NOT considering as a standard query technique?


1. Drill-up
2. Drill-across
3. DSS
4. Pivoting

Ans: D

Q9. Among the following which is not a type of business data ?


1. Real time data
2. Application data
3. Reconciled data
4. Derived data
Ans: B

Q10. A data warehouse is which of the following?


1. Can be updated by end users.
2. Contains numerous naming conventions and formats.
3. Organized around important subject areas.
4. Contains only current data.
Ans: C

Q11. A snowflake schema is which of the following types of tables?


1. Fact
2. Dimension
3. Helper
4. All of the above

Ans: D

Q12. The extract process is which of the following?


1. Capturing all of the data contained in various operational systems
2. Capturing a subset of the data contained in various operational systems
3. Capturing all of the data contained in various decision support systems .
4. Capturing a subset of the data contained in various decision support systems
Ans: B

Q13. The generic two-level data warehouse architecture includes which of the
following?
1. At least one data mart
2. Data that can extracted from numerous internal and external sources
3. Near real-time updates
4. All of the above.

Ans: B

Q14. Which one is correct regarding MOLAP ?


1. Data is stored and fetched from the main data warehouse.
2. Use complex SQL queries to fetch data from the main warehouse
3. Large volume of data is used.
4. All are incorrect

Ans: A

Q15. In the OLAP model, the _ provides the multidimensional view.


1. Data layer
2. Data link layer
3. Presentation layer
4. Application layer

Ans: A

Q16. Which of the following is not true regarding characteristics of warehoused data?
1. Changed data will be added as new data
2. Data warehouse can contains historical data
3. Obsolete data are discarded
4. Users can change data once entered into the data warehouse

Ans: D

Q17. Which is the core of the multidimensional model that consists of a large
set of facts and a number of dimensions?
1. Multidimensional cube
2. Data model
3. Data cube
4. None of the above

Ans: C

Q18. Which of the following statements is incorrect


1. ROLAPs have large data volumes
2. Data form of ROLAP is large multidimentional array made of cubes
3. MOLAP uses sparse matrix technology to manage data sparcity
4. Access for MOLAP is faster than ROLAP

Ans: B

Q19. Which of the following standard query techniques increase the granularity
1. roll-up
2. drill-down
3. slicing
4. dicing

Ans: B

Q20. The full form of OLAP is


1. Online Analytical Processing
2. Online Advanced Processing
3. Online Analytical Performance
4. Online Advanced Preparation

Ans: A

Q21. __ is a standard query technique that can be used within OLAP to zoom in

to more detailed data by changing dimensions.


1. Drill-up
2. Drill-down
3. Pivoting
4. Drill-across

Ans: B

Q22. The output of an OLAP query is displayed as a


1. Pivot
2. Matrix
3. Excel
4. both B and C
Ans: B

Q23. A __ combines facts from multiple processes into a single fact table and
eases the analytic burden on BI applications.
1. Aggregate fact table
2. Consolidated fact table
3. Transaction fact table
4. Accumulating snapshot fact table

Ans: B

Q24. In OLAP operations, Slicing is the technique of ____


1. Selecting one particular dimension from a given cube and providing a
new sub-cube
2. Selecting two or more dimensions from a given cube and providing a
new sub-cube
3. Rotating the data axes in order to provide an alternative presentation of data
4. Performing aggregation on a data cube

Ans: A

Q25. Focusing on the modeling and analysis of data for decision makers,
not on daily operations or transaction processing is known
1. Integrated
2. Time-variant
3. Subject oriented
4. Non-volatile
Ans: C
Q26. Which one is not a type of fact?
1. Fully Addictive
2. Cumulative addictive
3. Semi Addictive
4. Non Addictive
Ans: C

Q27. _____ refers to the currency and lineage of data in a data warehouse
1. Operational metadata
2. Business metadata
3. Technical metadata
4. End-User meatdata
Ans: A

Q28. Which of the following correctly refers to the term "Data Independence"?

a. It means that the programs are not dependent on the logical attributes
b. It refers to that data that is defined separately, not included in the program
c. It means that the programs are totally dependent on the physical attributes of data
d. Both A and C

Ans: D

Q29. A data warehouse is which of the following?


A. Can be updated by end users.

B. Contains numerous naming conventions and formats.

C. Organized around important subject areas.

D. Contains only current data.

Ans: C

Q30. The load and index is which of the following?


A. A process to reject data from the data warehouse and to create the necessary indexes

B. A process to load the data in the data warehouse and to create the necessary indexes

C. A process to upgrade the quality of data after it is moved into a data warehouse

A process to upgrade the quality of data before it is moved into a data warehouse
D.
Ans: B
Unit –III

Q1. Which of the following statements is incorrect about the hierarchal clustering?

a. The hierarchal type of clustering is also known as the HCA


b. The choice of an appropriate metric can influence the shape of the cluster
c. In general, the splits and merges both are determined in a greedy manner
d. All of the above

Ans: A

Q2. Which one of the following statements about the K-means clustering is incorrect?

a. The goal of the k-means clustering is to partition (n) observation into (k) clusters
b. K-means clustering can be defined as the method of quantization
c. The nearest neighbor is the same as the K-means
d. All of the above

Ans: C

Q3. Which one of the following can be considered as the final output of the hierarchal type of

clustering?
a. A tree which displays how the close thing are to each other
b. Assignment of each point to clusters
c. Finalize estimation of cluster centroids
d. None of the above

Ans: A

Q4. Which one of the clustering technique needs the merging approach?

a. Partitioned
b. Naïve Bayes
c. Hierarchical
d. Both A and C

Ans: C

Q5. The self-organizing maps can also be considered as the instance of _________ type of learning.

a. Supervised learning
b. Unsupervised learning
c. Missing data imputation
d. Both A & C

Ans: B

Q6. The following given statement can be considered as the examples of_________

Suppose one wants to predict the number of newborns according to the size of storks'

population by performing supervised learning

a. Structural equation modeling


b. Clustering
c. Regression
d. Classification

Ans: C

Q7. In the example predicting the number of newborns, the final number of total newborns can be
considered as the _________

a. Features
b. Observation
c. Attribute
d. Outcome

Ans: D

Q8. Which of the following statement is true about the classification?

a. It is a measure of accuracy
b. It is a subdivision of a set
c. It is the task of assigning a classification
d. None of the above

Ans: B

Q9. Which of the following can be considered as the classification or mapping of a set or class with

some predefined group or classes?

a. Data set
b. Data Characterization
c. Data Sub Structure
d. Data Discrimination

Ans: D

Q10. The analysis performed to uncover the interesting statistical correlation between

associated -attributes value pairs are known as the _______.

a. Mining of association
b. Mining of correlation
c. Mining of clusters
d. All of the above

Ans: B
Q11. Which one of the following can be defined as the data object which does not comply

with the general behavior (or the model of available data)?

a. Evaluation Analysis
b. Outliner Analysis
c. Classification
d. Prediction

Ans: B

Q12. Which one of the following correctly defines the term cluster?

a. Group of similar objects that differ significantly from other objects


b. Symbolic representation of facts or ideas from which information can potentially be
Extracted.
c. Operations on a database to transform or simplify data in order to prepare it for a
machine-learning algorithm
d. All of the above

Ans: A

Q13. Which one of the following refers to the binary attribute?

a. This takes only two values. In general, these values will be 0 and 1, and they can be
coded as one bit
b. The natural environment of a certain species
c. Systems that can be used without knowledge of internal operations
d. All of the above

Ans: A

Q14. Which one of the following correctly refers to the task of the classification?

a. A measure of the accuracy, of the classification of a concept that is given by a certain


theory
b. The task of assigning a classification to a set of examples
c. A subdivision of a set of examples into a number of classes
d. None of the above
Ans: B

Q15. Which of the following correctly defines the term "Hybrid"?

a. Approach to the design of learning algorithms that is structured along the lines of the
theory of evolution.
b. Decision support systems that contain an information base filled with the knowledge of
an expert formulated in terms of if-then rules.
c. Combining different types of method or information
d. None of these

Ans: C

Q16. Euclidean distance measure is can also defined as ___________

a. The process of finding a solution for a problem simply by enumerating all possible
solutions according to some predefined order and then testing them
b. The distance between two points as calculated using the Pythagoras theorem
c. A stage of the KDD process in which new data is added to the existing selection.
d. All of the above

Ans: C

Q17. Which one of the following correctly refers to the Class study in the data characterization?

a. Final class
b. Study class
c. Target class
d. Both A and C

Ans: C

Q18. Which of the following refers to the sequence of pattern that occurs frequently?

a. Frequent sub-sequence
b. Frequent sub-structure
c. Frequent sub-items
d. All of the above
Ans: A

Q19. Which one of the following refers to the model regularities or to the objects that trends or not

consistent with the change in time?

a. Prediction
b. Evolution analysis
c. Classification
d. Both A and B

Ans: B

Q20. Some telecommunication company wants to segment their customers


into distinct groups in order to send appropriate subscription offers, this is
an example of
A. Supervised learning
B. Data extraction
C. Serration
D. Unsupervised learning
Ans: D

Q21. Self-organizing maps are an example of…


A. Unsupervised learning
B. Supervised learning
C. Reinforcement learning
D. Missing data imputation
Ans: A

Q22. Background knowledge referred to


A. An additional acquaintance used by a learning algorithm to facilitate the
learning process
B. A neural network that makes use of a hidden layer
C. It is a form of automatic learning.
D. None of these
Ans: A

Q23. Case-based learning is


A. A class of learning algorithm that tries to find an optimum classification of a set
of examples using the probabilistic theory.
B. Any mechanism employed by a learning system to constrain the search
space of hypothesis
c. An approach to the design of learning algorithms that are inspired by the fact
that when people encounter new situations, they often explain them by reference
to familiar experiences, adapting the explanations to fit the new situation.
D. None of these
Ans: C
Q24. Classification is
A. A subdivision of a set of examples into a number of classes
B. A measure of the accuracy, of the classification of a concept that is given by a certain theory
C. The task of assigning a classification to a set of examples
D. None of these
Ans: A

Q25. Classification accuracy is


A. A subdivision of a set of examples into a number of classes
B. A measure of the accuracy, of the classification of a concept that is given by a certain theory
C. The task of assigning a classification to a set of examples
D. None of these
Ans: B

Q26. Cluster is
A. Group of similar objects that differ significantly from other objects
B. Operations on a database to transform or simplify data in order to prepare it
for a machine-learning algorithm
C. Symbolic representation of facts or ideas from which information can
potentially be extracted
D. None of these
Ans: A

Q27. A definition or a concept is if it classifies any examples as coming


within the concept
A. Complete
B. Consistent
C. Constant
D. None of these
Ans: B

Q28............................ is a summarization of the general characteristics or features of a target class o


data.

A) Data Characterization

B) Data Classification

C) Data discrimination

D) Data selection
Ans: A

Q29. ............................. is a comparison of the general features of the target class data objects
against the general features of objects from one or multiple contrasting classes.

A) Data Characterization

B) Data Classification

C) Data discrimination

D) Data selection
Ans: C

Q30. ............................. is the process of finding a model that describes and distinguishes data class
or concepts.

A) Data Characterization

B) Data Classification

C) Data discrimination

D) Data selection
Ans: A

Unit – IV

Q1. Which one of the following can be considered as the correct application of the data mining?

a. Fraud detection
b. Corporate Analysis & Risk management
c. Management and market analysis
d. All of the above

Ans: D

Q2. Data mining can also applied to other forms such as ................

i) Data streams
ii) Sequence data
iii) Networked data
iv) Text data

A) i, ii, iii
B) ii, iii, iv
C) i, iii, iv
D) All i, ii, iii, iv
Ans: D

Q3. ___________ is the application of data mining techniques to discover patterns from the Web.

A. Text Mining.

B. Multimedia Mining.

C. Web Mining.
D. Link Mining.

Ans: C

Q4.Which of the following datamining technique isused for optimization?


a.Artificial Neural Networks
b.If then rule induction
c. Genetic algorithms
d. Decision trees
Ans: c

Q5. Click stream data is used for the following.


a. To track the user activity on the web page
b. To study customer buying patterns
c. Feed about web site design
d. All the above
Ans: d

Q6. Which of the following is the private network to access the data through the web.
a. Internet
b. Extranet
c. Intranet
d. None of the above
Ans: c

Q7. Web-enabling the Data Warehouse uses the following as the information delivery mechanism.
a. Web technology
b. Grid computing
c. Artificial intelligence
d. None of these
Ans: a

Q8. Web house is what kind of network?


a. Distributed system
b. Client and server only
c. Parallel system
d. None of the above
Ans: a
Q9. Which of the following is an open-source Data mining tool?
a. Clementine
b. Intelligent Miner
c. Weka3
d. Enterprise Miner
Ans: c

Q10. Which of the following is an open-source ETL tool?


a. Cover
b. SAS data Integrator
c. Cognos Decision Stream
d. Microsoft DTS
Ans: a

Q11. Data set {brown, black, blue, green , red} is example of:

a. Continuous attribute

b. Ordinal attribute

c. Numeric attribute

d. Nominal attribute
Ans: D

Q12. Data Visualization in mining cannot be done using:

a. Photos

b. Graphs

c. Charts

d. Information Graphics

Ans: A

Q13. Dimensionality reduction reduces the data set size by removing _________:

a. composite attributes
b. derived attributes

c. relevant attributes

d. irrelevant attributes

Ans: D

Q14. Identify the example of sequence data:

a. weather forecast

b. data matrix

c. market basket data

d. genomic data

Ans: D

Q15. To detect fraudulent usage of credit cards, the following data mining task should be used

a. Outlier analysis

b. prediction

c. association analysis

d. feature selection

Ans: A

Q16. Which of the following is NOT example of ordinal attributes?

a. Zip codes

b. Ordered numbers

c. Movie ratings

d. Military ranks
Ans: A

Q17. Which data mining task can be used for predicting wind velocities as a function of temperature,

humidity, air pressure, etc.?:

a. Cluster Analysis

b. Regression

c. Classification

d. Sequential pattern discovery

Ans: B

Q18. In asymmetric attibute

a. No value is considered important over other values

b. All values are equals

c. Only non-zero value is important

d. Range of values is important

Ans: C

Q19. Which statement is not TRUE regarding a data mining task?

a. Clustering is a descriptive data mining task

b. Classification is a predictive data mining task

c. Regression is a descriptive data mining task

d. Deviation detection is a predictive data mining task


Ans: C

Q20. Identify the example of Nominal attribute Select one:


a. Temperature

b. Salary

c. Mass

d. Gender

Ans: D

Q21. Nominal and ordinal attributes can be collectively referred to as_________ attributes

a. perfect

b. qualitative

c. consistent

d. optimized

Ans: B

Q22. Which of the following is not a data mining task?

a. Feature Subset Detection

b. Association Rule Discovery

c. Regression

d. Sequential Pattern Discovery

Ans: A

Q23. Which of the following is an Entity identification problem?

a. One person with different email address

b. One person’s name written in different way

c. Title for person


d. One person with multiple phone numbers

Ans: B

Q24. n Binning, we first sort data and partition into (equal-frequency) bins and then

which of the following is not a valid step

a. smooth by bin boundaries

b. smooth by bin median

c. smooth by bin means

d. smooth by bin values

Ans: D

Q25. Correlation analysis is used for

a. handling missing values

b. identifying redundant attributes

c. handling different data formats

d. eliminating noise

Ans: B

Q26. Which of the following is NOT a data quality related issue?

a. Missing values

b. Outlier records

c. Duplicate records

d. Attribute value range

Ans: D
Q27. Which of the following is not a Data discretization Method?

a. Histogram analysis

b. Cluster Analysis

c. Data compression

d. Binning

Ans: C

Q28. Which of the following data mining task is known as Market Basket Analysis?

a. Association Analysis

b. Regression

c. Classification

d. Outlier Analysis

Ans: A

Q29. Which of the following can also applied to other forms?


a) Data streams & Sequence data
b) Networked data
c) Text & Spatial data
d) All of these

Ans: D

Q30. Firms that are engaged in sentiment mining are analyzing data collected from?
A. social media sites.
B. in-depth interviews.
C. focus groups.
D. experiments.

Ans: A

You might also like