You are on page 1of 3

1.

Data Mining is [ ]
A) A subject-oriented integrated time variant non-volatile collection of data in support of
management
B) The stage of selecting the right data for a KDD process
C) The actual discovery phase of a knowledge discovery process D) None of these
2. Principle Component Analysis can be used for [ ]
A) Data Integration B) Data Cleaning C) Data Discretization D) DataReduction
3. Smoothing Techniques are [ ]
A) Binning B) Aggregation C) Attribute Creation D) All of the above
4. Extracting knowledge from large amount of data is called_________ [ ]
A) Warehousing B) Data Mining C)Data Base D)Cluster
5. _______is a summarization of general characteristics or features of a target class of
data. [ ]
A) Concept hierarchies B) Classification C) Characterization D) Associatio analysis
6. The leaf nodes in Decision Tree represent [ ]
A) Attributes B) Class Labels C) Both A and B D) None of the above
7. Association rules are discarded as uninteresting if they do not satisfy [ ]
A) Minimum Support threshold B) Minimum Confidence threshold
C) Both A and B D) None of the above
8. Treating incorrect or missing data is called as ___________ [ ]
A) Selection B) Cleaning C) Transformation D) Interpretation
9. ______analyzes data objects with out consulting a known class label. [ ]
A) Classification B) Clustering C) Association analysis D) Characterization
10. Normalization Techniques are [ ]
A) Min-Max B) Z-Score C) Decimal Scaling D) All of the above
11. ____________ maps data into predefined groups. [ ]
A) Regression B) Time series analysis C) Prediction D) Classification
12. OLAP stands for [ ]
A)Online Academic Planning B) Online Analytical Processing
C) Offline Analytical Processing D) Offline Agricultural Planning
13. Removing noise is called as ___________ [ ]
A) Selection B) Cleaning C) Transformation D) Interpretation
14. The leaf nodes of decision tree represents ____________ [ ]
A)Attributes B)Noisy Values C)Attribute Values D)Class Labels
15. The problem of finding hidden structure in unlabeled data is called [
]
A) Unsupervised learning B) Supervised learning C) Reinforcement learning D)None

16. _____can be used to reduce the data by collecting and replacing low-level concepts with
higher level concepts. [
]
A) Concept hierarchies B) Classification C) Characterization D) Associatio analysis
17. Measures for pattern interestingness are [ ]
A) Confidence B) Support C) both A,B D) None of the above
18. Support(A=>B) =_______________________ [ ]
A) P(AUB) B)P(A) C)P(B) D)P(B/A)
19. Market Basket Analysis is an example for [ ]
A) Classification B) Clustering C) Outlier Analysis D) Frequent pattern Analysis
20. Which of the following is a Classification algorithm. [ ]
A) Apriori B) Decision Tree C) FP-Growth D) All of the above

21.A _____________is a repository of information collected from multiple sources,Stored


under a unified schema.
22. Data objects that do not comply with the general behavior or model of the data are called
as__________

23. ________________ technique can be used for Dimensionality Reduction.


24. Lift(A,B)=______________
25. The two steps of a priori algorithm are ____________ and ___________
26. A ___________________is a flowchart-like tree structure, where each internal node
denotes a test on an attribute, each branch represents an outcome of the test, and each leaf
node holds a class label.
27. Confidence(A=>B) = P(B/A)=_____________
28. The two steps of Apriori algorithm are ____________ and ___________
29. The number of elements in a sequence is called as ___________of the sequence
30. The___________ of a classifier on a given test set is the percentage of test set tuples that
are correctly classified by the classifier.
31. ____________________ involves scaling all values for a given attribute so that they fall
within a small specified range, such as -1:0 to 1:0, or 0:0 to 1:0.
32. An itemset X is known as __________item set in a dataset S if there exists no proper
super itemset Y such that Y has same support count as X in S.
33. _______________database stores sequences of ordered events
34.________________ algorithm does not involve candidate generation to find frequent
items.
35._________is a random error or variance in a measured variable.
36.An itemset X is a _________ item set in a data set S if there exists no proper super-itemset
Y such that Y has the same support count as X in S.
37. ___________________ involves finding the “best” line to fit two attributes (or variables),
so that one attribute can be used to predict the other.
38.____________ and _______________ are the two steps involved in classification.
39.__________________ technique predicts a continuous valued function.

You might also like