Professional Documents
Culture Documents
Document 4
Document 4
Ans: Historical
Ans: Metadata
Ans: Data
4. ___ and ___ are the key to emerging Business Intelligence technologies.
6. Online Analytical Processing (OLAP) is a technology that is used to create ___ software.
Ans: Multiple
Ans: True
9. ___ Optimization techniques are based on the concepts of genetic combination, mutation, and
natural selection.
11. A data warehouse refers to a database that is maintained separately from an organization’s
operational databases. (True/False)
Ans: True
Ans: True
13. ___ system is customer-oriented and is used for transaction and query processing by clerks,
clients, and information technology professionals.
Ans: OLTP
15. In ___ schema some dimension tables are normalized, thereby further splitting the data into
additional tables.
Ans: Snowflake
16. The ___ data model is commonly used in the design of relational databases.
Ans: Entity-relationship
17. Data warehouses and OLAP tools are based on ___ data model.
Ans: Multidimensional
18. The ___ exposes the information being captured, stored, and managed by operational systems.
21. The ___ software gives the user the opportunity to look at the data from a variety of different
dimensions.
A. Converting data into knowledge and making it available throughout the organization
B. Analytical software and solutions for gathering, consolidating, analyzing and providing access to
information in a way that is supposed to let the users of an enterprise make better business decisions.
C. Both A & B
23. Based on the overall requirements of business intelligence, the ___ layer is required to extract,
cleanse and transform data into load files for the information warehouse.
Ans: True
Ans: Noise
II. BI can help companies share selected strategic information with business partners.
III. BI 2.0″ is used to describe the acquisition, provision and analysis of “real-time” data
Ans: D.
27. ___ routines attempt to fill in missing values, smooth out noise while identifying outlines, and
correct inconsistencies in the data.
28. ___ is used to refer to systems and technologies that provide the business with the means for
decision-makers to extract personalized meaningful information about their business and industry.
29. In ___ each value in a bin is replaced by the mean value of the bin.
30. ___ regression involves finding the “best” line to fit two variables so that one variable can be used to
predict the other.
Ans: Linear
31. ___ works to remove the noise from the data that includes techniques like binning, clustering, and
regression.
Ans: Smoothing
33. The ___ technique uses encoding mechanisms to reduce the data set size.
B. Numerosity reduction
C. Data compression
D. Dimension reduction
35. ___ hierarchies can be used to reduce the data by collecting and replacing low-level concepts by
higher-level concepts.
Ans: Concept
36. The ___ rule can be used to segment numeric data into relatively uniform, “natural” intervals.
Ans: 3-4-5
Ans: DBMS
38. Data Base Management System (DBMS) supports query languages. (True/False)
Ans: True
39. The ___ item sets find all sets of items (items sets) whose support is greater than the user-specified
minimum support, σ.
41. ___ techniques are used to detect relationships or associations between specific values of
categorical variables in large data sets.
43. Using a decision tree, only categorical variables would be modelled. (True/False).
Ans: False
Ans: False
46. For a given transaction database T, a ___ is an expression of the form X => Y, where X and Y are
subsets of A and X => Y holds with confidence Ʈ, if Ʈ% of transactions in D support X also support Y.
47. The ___ rule describes associations between quantitative items or attributes.
48. The ___ step eliminates the extensions of (k-1) – itemsets, which are not found to be frequent, from
being considered for counting support.
Ans: Pruning
49. In the first phase of the Partition algorithm, the algorithm logically divides the database into a
number of ___.
51. ___ algorithm works like a train running over the data, with stops at intervals M between
transactions. When the train reaches the end of the transaction file it completes one path.
Ans: Two
54. Data mining systems should provide capabilities to mine association rules at multiple levels of
abstraction and traverse easily among different abstraction spaces (True/False).
Ans: True
55. Which one of the following is alternative search strategies for mining multiple-level associations with
reduced support?
b) Equidepth binning,
d) Equilength binning
57. Association rules that involve two or more dimension or predicates can be referred to as ___.
58. An algorithm that performs a series of “walks” through itemset space is called a ___.
Ans: variance
61. The process of grouping a set of physical or abstract objects into classes of similar objects is called
___.
Ans: Cluster
Ans: Segmentation
a. Segmentation
b. Compression
Ans: True
Ans: By example
66. Weight and height of an individual fall into ___ kind of variables.
Ans: Continuous
67. In the K-means algorithm for partitioning, each cluster is represented by the ___ of objects in the
cluster.
Ans: Means
68. K-means clustering requires prior knowledge about number clusters required as its
input.(True/False).
Ans: True
Ans: Clustering
70. ___ software provides a set of partitioned clustering algorithms that treat the clustering problem as
an optimization process.
Ans: CLUTO
Ans: Two
72. ___ can be viewed as the construction and use of a model to assess the class of an unlabeled sample,
or to assess the value or value ranges of an attribute that a given sample is likely to have.
Ans: Prediction
73. ___ of data removes or reduces noise (by applying smoothing techniques) and the treatment of
missing values.
Ans: Pre-processing
74. ___ method refers to the ability to construct the model efficiently given a large amount of data.
Ans: Scalability
Ans: This is a flow – chart – like a tree structure, where each internal node denotes a test on an
attribute, each branch represents an outcome of the test, and leaf nodes represent classes or class
distributions.
76. The basic algorithm for decision tree induction is a ___ algorithm.
Ans: greedy
77. The ___ measure is used to select the test attribute at each node in the tree.
79. ___ is simple text files that are automatically generated every time someone accesses one Website.
82. Which of the following techniques are concerned about user navigation accessing?
a. Structured data
b. Un-structured data
d. Binary data
84. ___ Web mining involves the development of Sophisticated Artificial Intelligence systems.
85. The ___ approaches to Web mining have generally focused on techniques for integrating and
organizing the heterogeneous and semi-structured data on the Web into more structured and high-level
collections of resources.
Ans: database
86. Association rules involving multimedia objects can be mined in ___ and ___ databases.
88. Which of the following are the measures of the text retrieval documents?
a. Precision
b. Recall
c. F-score
d. a,b,c
Ans: d. A,b,c
Ans: Semi-structured
90. Which of the following is the first step in text retrieval systems?
a. Stemming
c. Tokenization
Ans: c. Tokenization
a. A
b. The
c. of
d. a,b,c
Ans: d. A,b,c
93. Insurance and direct mail are two industries that rely on ___ to make profitable business decisions.
94. To aid decision-making, analysts construct ___ models using warehouse data to predict the
outcomes of a variety of decision alternatives.
Ans: predictive
95. A ___ profile is a model that predicts the future purchasing behaviour of an individual customer,
given historical transaction data for both the individual and for the larger population of all of a particular
company’s customers.
Ans: predictive
96. Data mining can be used to help predict future patient behaviour and to improve treatment
programs (True/False).
Ans: True
98. Data mining in the telecommunication industry helps to understand the business involved, identify
telecommunication patterns (True/False).
Ans: True
100. ___ is proving to be a critical link between theory, simulation, and experiment.
101. IDS are based on ___ that are developed by the manual encoding of expert knowledge.
a) Efficiency
b) Quality of data
c) Marketing
103. To improve accuracy, data mining programs are used to analyze audit data and extract features
that can distinguish normal activities from intrusions. (True/False)
Ans: True
104. Data mining-based IDSs (especially anomaly detection systems) have higher false-positive rates
than traditional handcrafted signature-based methods. (True/False)
Ans: True
105. ___ is a new class of intrusion detection algorithms that do not rely on labelled data.
106. ___ algorithm uses the frequency distribution of each feature’s values to proportionally generate a
sufficient amount of anomalies.
107. OLAP typically includes the following kinds of analyses: simple, comparison, trend, ___ and ___.
108. Patient Rule Induction Method (PRIM) and Weighted Item Sets (WIS), is a type of ___ technique.
Ans: OLAP
110. ___ method is useful for finding patterns or associations between attributes.
Ans: WIS