Data Mining and Warehousing Quizzes Compilation - Answer Key

Far Eastern University
Institute of Accounts, Business and Finance

Business Administration
Data Mining and Warehousing
Final Term – Quiz 1
Name: ___________________________________________________ Student Number: ______________________
Section / Time: _____________________________________________ Date: ________________________________
Instruction - Part 1: Write the word TRUE on the box provided therein, if the statement is CORRECT: Otherwise, write FALSE. Erasures are NOT
allowed; final answer must be properly reflected on the box provided therein.
1 6
2 7
3 8
4 9
5 10
1. The graph-based association is used in various applications, such Answer: TRUE

as social network analysis, recommendation systems, and fraud
analysis. 6. Classification is a form of data mining that extracts models
describing important data classes.
Answer: False – Detection
Answer: Analysis
2. The Apriori algorithm is based on the concept that if an item set 7. Association can provide valuable insights into seller’s behavior
is frequent, then all of its subsets must also be frequent. and preferences.
Answer: True Answer: False - Consumer

8. Itemset correlation a collection of one or more items that
3. Classification is a technique in data mining that involves frequently co-occur together is called an itemset.
categorizing or classifying data objects into predefined classes,
categories, or groups based on their features or variables. Answer: False – Association
Answer: False – Attributes 9. Sequential association is used to identify patterns that occur in a
general sequence or order.
4. It is an important task in data mining because it enables
organizations to make data-driven decisions. Answer: False – Specific
Answer: TRUE 10. An itemset is a collection of one or more items.
5. Regression analysis is a statistical methodology that is most Answer: TRUE

often used for numeric prediction.
Instruction - Part 2: Write your answer (capital letter), erasures are NOT allowed; final answer must be properly reflected on the box provided
therein.
11 16
12 17
13 18
14 19
15 20
11. Eclat then performs a depth-first search on a ___, representing 13. ___ refers to the statistical relationship between two or more
the dataset's frequent itemsets. variables, where the variation in one variable is associated with
A. Neuron-like structure the variation in another variable.
B. Tree-like structure A. Association
C. Frequent-pattern like structure B. FP-Growth Algorithm
D. Data-like structure C. Apriori Algorithm
E. None of the above D. Correlation Analysis
E. None of the above
12. The algorithm is efficient regarding both memory usage and
runtime, especially for sparse datasets. 14. This is an example of:
A. Equivalence Clustering Class and Bottom-up Lattice A. Direct Correlation Analysis
Traversal B. Indirect Correlation Analysis
B. FP-Growth Algorithm C. No Correlation Analysis
C. Apriori Algorithm D. None of the above
D. None of the above
15. By understanding the relationships between different variables, 18. ___ is a frequent itemset mining algorithm based on the vertical
we can gain valuable insights into complex systems and make data format.
informed decisions based on ___. A. Equivalence Clustering Class and Bottom-up Lattice
A. Informed decisions Traversal
B. Data set decisions B. Equivalence Classification Clustering and Bottom-down
C. Data-driven analysis Lattice Traversal
D. Data-driven decisions C. Equivalence Class Clustering and Bottom-down Lattice
E. None of the above Traversal
D. Equivalence Class Clustering and Bottom-up Lattice
16. Pearson correlation measures the linear relationship between Traversal
two ___ variables. E. None of the above
A. Discrete
B. Numeric 19. Correlation can be positive, negative, or zero, depending on the
C. Continuous direction and strength of the relationship between the variables.
D. Categorical A. Attributes
E. None of the above B. Pattern
C. Data sets
17. The FP-Growth algorithm builds a compact representation of the D. Variablee
dataset called a frequent pattern tree (FP-tree), which is used to E. None of the above
mine ___.
A. Frequent pattern item sets 20. The FP-Growth algorithm can handle datasets with both ___.
B. Infrequent item sets A. Discrete and continuous attributes.
C. Frequent item sets B. Discrete and continuous variables.
D. Frequent item data sets C. Discrete and numeric attributes.
E. None of the above D. Continuous and numeric variables
E. None of the above
Instruction – Part 3: Identification. Identity what is being asked on the following question. If the question does not provide the correct definition,
write NA. Erasures are NOT allowed.
21
22
23
24
25
21. This is one of the benefits of correlation analysis wherein it be used to identify correlations between different process variables and identify
potential sources of quality problems.
Answer: Quality control
22. Spearman Rank Correlation measures the degree of association between the ranks of two attributes.
Answer: NA
23. This is one of the benefits of correlation analysis wherein it quantifies the degree and direction of the relationship, we can gain insights into how
changes in one variable are likely to affect the other.
Answer: Identifying Relationships
24. ___ in data analysis are two important techniques that can help uncover relationships and patterns in large datasets.
Answer: NA
25. ___ measures the linear relationship between two continuous variables.
Answer: Pearson correlation
Quiz 5 and 6
Cluster Analysis
1 6 11
2 7 12
3 8 13
4 9 14
5 10 15
11. The quality of a clustering method depends on the similarity
measure used by the method its implementation, and its ability 21. Partitioning methods are also efficient in identifying natural
to discover all of the hidden patterns. clusters within data and can be used for various applications,
such as customer segmentation, image segmentation, and
Answer: FALSE - some or anomaly detection.
Answer: FALSE – Effective
12. Weights should be associated with different variables based on
applications and data semantics. 22. A good clustering method will produce high quality clusters that
is cohesive within clusters and distinctive among clusters.
Answer: TRUE
Answer: FALSE - between
13. Other popular partitioning methods in data mining include K-
medoids, Fuzzy K-means, and Hierarchical K-means. 23. The choice of the algorithm depends on the specific clustering
problem and the dataset characteristics.
Answer: FALSE – Fuzzy-C
Answer: TRUE
14. K-means partitions a dataset into K clusters, where K is a user-
defined parameter. 24. Summarization, compression, finding k-means and outlier
detection are considered as a preprocessing tool in clustering.
Answer: TRUE
Answer: FALSE – k-nearest neighbors
15. The refinement process involves calculating the mean of the
data points assigned to each cluster and updating the cluster 25. A cluster analysis is a collection of data objects whether similar
centroids' coordinates accordingly. or dissimilar.
Answer: TRUE Answer: FALSE – Cluster
16 – 17: What is the two-step process in cluster analysis?

16. Partitioning methods are irrelatively easy to implement and can
handle large datasets.
16 - finding similarities between data
Answer: FALSE - relatively according to the characteristics found in the
data.
17. Clustering for data understanding and applications includes city-
management.
17 - grouping similar data objects into
clusters
Answer: FALSE – city-planning
18. There is always a separate “quality” function that measures the 18 – 20. Discuss the graph.
“goodness” of a cluster.
Answer: FALSE - usually
19. Clustering for data understanding and applications includes

economic research.
Answer: FALSE - science
20. The most widely used partitioning method is the K-means

algorithm, which randomly assigns data points to clusters and
iteratively refines the clusters' medoids until convergence.
Answer: FALSE - centroids
Outlier Analysis
21 26 31
22 27 32
23 28 33
24 29 34
25 30 35
21. Outliers can be caused by various factors, such as data entry errors, unexpected events, etc., and their detection can lead to valuable insights and
improve the accuracy of models.
Answer: TRUE
22. Outliers are also often referred to as abbreviations.
Answer: FALSE – aberrations
23. An outlier is the process of identifying and examining data points that significantly differ from the rest of the dataset.
Answer: Outlier analysis in data mining
24. Outlier can arise from various sources, such as measurement errors or data collection methods, and it can negatively affect the accuracy and
reliability of data analysis.
Answer: FALSE - Noise
25. Detecting and removing outliers can improve the accuracy and reliability of data mining.
Answer: FALSE - analysis
26. One of the benefits of outlier analysis in data mining is it can lead to better-informed decisions.
Answer: TRUE
27. Removing outliers or developing models that can handle them appropriately can improve data performance.
Answer: FALSE - model
28. Global outliers are typically detected using statistical methods focusing on the entire dataset's extreme values.
Answer: TRUE
29. If the data is continuous, statistical methods such as z-scores can be used, while for categorical data, methods such as the chi-squared test can be
used.
Answer: TRUE
30. An outlier analysis can be defined as a data point that deviates significantly from the normal pattern or behavior of the data.
Answer: FALSE – outlier only
31. One of the benefits of outlier analysis in data mining is the identification of data quantity issues wherein outliers can be caused by data
collection, processing, or measurement errors, which can indicate data quality issues.
Answer: FALSE - quality
32. Studying outliers can provide valuable insights and lead to mysteries.
Answer: FALSE - Discoveries
33. Collective outliers are typically detected using clustering analysis or other methods that group similar data points.
Answer: FALSE – algorithms
34. Contextual outliers are typically detected using domain knowledge or contextual information irrelevant to the dataset.
Answer: FALSE - Relevant

35. A wide range of techniques can be used for outlier analysis in data mining, such as statistical methods, clustering algorithms, and machine
learning models.
Answer: TRUE

Data Mining and Warehousing Quizzes Compilation - Answer Key

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Mining and Warehousing Quizzes Compilation - Answer Key

Uploaded by

Copyright:

Available Formats

Far Eastern University

Institute of Accounts, Business and Finance

1. The graph-based association is used in various applications, such Answer: TRUE

Answer: True Answer: False - Consumer

Answer: TRUE 10. An itemset is a collection of one or more items.

5. Regression analysis is a statistical methodology that is most Answer: TRUE

Answer: Quality control

Answer: Identifying Relationships

Answer: Pearson correlation

Answer: TRUE Answer: FALSE – Cluster

16 – 17: What is the two-step process in cluster analysis?

Answer: FALSE - usually

19. Clustering for data understanding and applications includes

20. The most widely used partitioning method is the K-means

Answer: FALSE - centroids

22. Outliers are also often referred to as abbreviations.

Answer: FALSE – aberrations

Answer: Outlier analysis in data mining

Answer: FALSE - Noise

Answer: FALSE - analysis

Answer: FALSE - model

Answer: FALSE – outlier only

Answer: FALSE - quality

Answer: FALSE - Discoveries

Answer: FALSE – algorithms

Answer: FALSE - Relevant

You might also like