Professional Documents
Culture Documents
Data Mining and Warehousing Quizzes Compilation - Answer Key
Data Mining and Warehousing Quizzes Compilation - Answer Key
Answer: False – Attributes 9. Sequential association is used to identify patterns that occur in a
general sequence or order.
4. It is an important task in data mining because it enables
organizations to make data-driven decisions. Answer: False – Specific
Instruction - Part 2: Write your answer (capital letter), erasures are NOT allowed; final answer must be properly reflected on the box provided
therein.
11 16
12 17
13 18
14 19
15 20
11. Eclat then performs a depth-first search on a ___, representing 13. ___ refers to the statistical relationship between two or more
the dataset's frequent itemsets. variables, where the variation in one variable is associated with
A. Neuron-like structure the variation in another variable.
B. Tree-like structure A. Association
C. Frequent-pattern like structure B. FP-Growth Algorithm
D. Data-like structure C. Apriori Algorithm
E. None of the above D. Correlation Analysis
E. None of the above
12. The algorithm is efficient regarding both memory usage and
runtime, especially for sparse datasets. 14. This is an example of:
A. Equivalence Clustering Class and Bottom-up Lattice A. Direct Correlation Analysis
Traversal B. Indirect Correlation Analysis
B. FP-Growth Algorithm C. No Correlation Analysis
C. Apriori Algorithm D. None of the above
D. None of the above
15. By understanding the relationships between different variables, 18. ___ is a frequent itemset mining algorithm based on the vertical
we can gain valuable insights into complex systems and make data format.
informed decisions based on ___. A. Equivalence Clustering Class and Bottom-up Lattice
A. Informed decisions Traversal
B. Data set decisions B. Equivalence Classification Clustering and Bottom-down
C. Data-driven analysis Lattice Traversal
D. Data-driven decisions C. Equivalence Class Clustering and Bottom-down Lattice
E. None of the above Traversal
D. Equivalence Class Clustering and Bottom-up Lattice
16. Pearson correlation measures the linear relationship between Traversal
two ___ variables. E. None of the above
A. Discrete
B. Numeric 19. Correlation can be positive, negative, or zero, depending on the
C. Continuous direction and strength of the relationship between the variables.
D. Categorical A. Attributes
E. None of the above B. Pattern
C. Data sets
17. The FP-Growth algorithm builds a compact representation of the D. Variablee
dataset called a frequent pattern tree (FP-tree), which is used to E. None of the above
mine ___.
A. Frequent pattern item sets 20. The FP-Growth algorithm can handle datasets with both ___.
B. Infrequent item sets A. Discrete and continuous attributes.
C. Frequent item sets B. Discrete and continuous variables.
D. Frequent item data sets C. Discrete and numeric attributes.
E. None of the above D. Continuous and numeric variables
E. None of the above
Instruction – Part 3: Identification. Identity what is being asked on the following question. If the question does not provide the correct definition,
write NA. Erasures are NOT allowed.
21
22
23
24
25
21. This is one of the benefits of correlation analysis wherein it be used to identify correlations between different process variables and identify
potential sources of quality problems.
22. Spearman Rank Correlation measures the degree of association between the ranks of two attributes.
Answer: NA
23. This is one of the benefits of correlation analysis wherein it quantifies the degree and direction of the relationship, we can gain insights into how
changes in one variable are likely to affect the other.
24. ___ in data analysis are two important techniques that can help uncover relationships and patterns in large datasets.
Answer: NA
25. ___ measures the linear relationship between two continuous variables.
Quiz 5 and 6
Cluster Analysis
Instruction - Part 1: Write the word TRUE on the box provided therein, if the statement is CORRECT: Otherwise, write FALSE. Erasures are NOT
allowed; final answer must be properly reflected on the box provided therein.
1 6 11
2 7 12
3 8 13
4 9 14
5 10 15
11. The quality of a clustering method depends on the similarity
measure used by the method its implementation, and its ability 21. Partitioning methods are also efficient in identifying natural
to discover all of the hidden patterns. clusters within data and can be used for various applications,
such as customer segmentation, image segmentation, and
Answer: FALSE - some or anomaly detection.
Answer: FALSE – Effective
12. Weights should be associated with different variables based on
applications and data semantics. 22. A good clustering method will produce high quality clusters that
is cohesive within clusters and distinctive among clusters.
Answer: TRUE
Answer: FALSE - between
13. Other popular partitioning methods in data mining include K-
medoids, Fuzzy K-means, and Hierarchical K-means. 23. The choice of the algorithm depends on the specific clustering
problem and the dataset characteristics.
Answer: FALSE – Fuzzy-C
Answer: TRUE
14. K-means partitions a dataset into K clusters, where K is a user-
defined parameter. 24. Summarization, compression, finding k-means and outlier
detection are considered as a preprocessing tool in clustering.
Answer: TRUE
Answer: FALSE – k-nearest neighbors
15. The refinement process involves calculating the mean of the
data points assigned to each cluster and updating the cluster 25. A cluster analysis is a collection of data objects whether similar
centroids' coordinates accordingly. or dissimilar.
18. There is always a separate “quality” function that measures the 18 – 20. Discuss the graph.
“goodness” of a cluster.
Outlier Analysis
Instruction - Part 2: Write the word TRUE on the box provided therein, if the statement is CORRECT: Otherwise, write FALSE. Erasures are NOT
allowed; final answer must be properly reflected on the box provided therein.
21 26 31
22 27 32
23 28 33
24 29 34
25 30 35
21. Outliers can be caused by various factors, such as data entry errors, unexpected events, etc., and their detection can lead to valuable insights and
improve the accuracy of models.
Answer: TRUE
23. An outlier is the process of identifying and examining data points that significantly differ from the rest of the dataset.
24. Outlier can arise from various sources, such as measurement errors or data collection methods, and it can negatively affect the accuracy and
reliability of data analysis.
25. Detecting and removing outliers can improve the accuracy and reliability of data mining.
26. One of the benefits of outlier analysis in data mining is it can lead to better-informed decisions.
Answer: TRUE
27. Removing outliers or developing models that can handle them appropriately can improve data performance.
28. Global outliers are typically detected using statistical methods focusing on the entire dataset's extreme values.
Answer: TRUE
29. If the data is continuous, statistical methods such as z-scores can be used, while for categorical data, methods such as the chi-squared test can be
used.
Answer: TRUE
30. An outlier analysis can be defined as a data point that deviates significantly from the normal pattern or behavior of the data.
31. One of the benefits of outlier analysis in data mining is the identification of data quantity issues wherein outliers can be caused by data
collection, processing, or measurement errors, which can indicate data quality issues.
32. Studying outliers can provide valuable insights and lead to mysteries.
33. Collective outliers are typically detected using clustering analysis or other methods that group similar data points.
34. Contextual outliers are typically detected using domain knowledge or contextual information irrelevant to the dataset.
Answer: TRUE