You are on page 1of 5

Far Eastern University

Institute of Accounts, Business and Finance


Business Administration
Data Mining and Warehousing
Final Term – Quiz 1
Name: ___________________________________________________ Student Number: ______________________
Section / Time: _____________________________________________ Date: ________________________________
Instruction - Part 1: Write the word TRUE on the box provided therein, if the statement is CORRECT: Otherwise, write FALSE. Erasures are NOT
allowed; final answer must be properly reflected on the box provided therein.
1 6
2 7
3 8
4 9
5 10

1. The graph-based association is used in various applications, such Answer: TRUE


as social network analysis, recommendation systems, and fraud
analysis. 6. Classification is a form of data mining that extracts models
describing important data classes.
Answer: False – Detection
Answer: Analysis
2. The Apriori algorithm is based on the concept that if an item set 7. Association can provide valuable insights into seller’s behavior
is frequent, then all of its subsets must also be frequent. and preferences.

Answer: True Answer: False - Consumer


8. Itemset correlation a collection of one or more items that
3. Classification is a technique in data mining that involves frequently co-occur together is called an itemset.
categorizing or classifying data objects into predefined classes,
categories, or groups based on their features or variables. Answer: False – Association

Answer: False – Attributes 9. Sequential association is used to identify patterns that occur in a
general sequence or order.
4. It is an important task in data mining because it enables
organizations to make data-driven decisions. Answer: False – Specific

Answer: TRUE 10. An itemset is a collection of one or more items.

5. Regression analysis is a statistical methodology that is most Answer: TRUE


often used for numeric prediction.

Instruction - Part 2: Write your answer (capital letter), erasures are NOT allowed; final answer must be properly reflected on the box provided
therein.
11 16
12 17
13 18
14 19
15 20

11. Eclat then performs a depth-first search on a ___, representing 13. ___ refers to the statistical relationship between two or more
the dataset's frequent itemsets. variables, where the variation in one variable is associated with
A. Neuron-like structure the variation in another variable.
B. Tree-like structure A. Association
C. Frequent-pattern like structure B. FP-Growth Algorithm
D. Data-like structure C. Apriori Algorithm
E. None of the above D. Correlation Analysis
E. None of the above
12. The algorithm is efficient regarding both memory usage and
runtime, especially for sparse datasets. 14. This is an example of:
A. Equivalence Clustering Class and Bottom-up Lattice A. Direct Correlation Analysis
Traversal B. Indirect Correlation Analysis
B. FP-Growth Algorithm C. No Correlation Analysis
C. Apriori Algorithm D. None of the above
D. None of the above
15. By understanding the relationships between different variables, 18. ___ is a frequent itemset mining algorithm based on the vertical
we can gain valuable insights into complex systems and make data format.
informed decisions based on ___. A. Equivalence Clustering Class and Bottom-up Lattice
A. Informed decisions Traversal
B. Data set decisions B. Equivalence Classification Clustering and Bottom-down
C. Data-driven analysis Lattice Traversal
D. Data-driven decisions C. Equivalence Class Clustering and Bottom-down Lattice
E. None of the above Traversal
D. Equivalence Class Clustering and Bottom-up Lattice
16. Pearson correlation measures the linear relationship between Traversal
two ___ variables. E. None of the above
A. Discrete
B. Numeric 19. Correlation can be positive, negative, or zero, depending on the
C. Continuous direction and strength of the relationship between the variables.
D. Categorical A. Attributes
E. None of the above B. Pattern
C. Data sets
17. The FP-Growth algorithm builds a compact representation of the D. Variablee
dataset called a frequent pattern tree (FP-tree), which is used to E. None of the above
mine ___.
A. Frequent pattern item sets 20. The FP-Growth algorithm can handle datasets with both ___.
B. Infrequent item sets A. Discrete and continuous attributes.
C. Frequent item sets B. Discrete and continuous variables.
D. Frequent item data sets C. Discrete and numeric attributes.
E. None of the above D. Continuous and numeric variables
E. None of the above

Instruction – Part 3: Identification. Identity what is being asked on the following question. If the question does not provide the correct definition,
write NA. Erasures are NOT allowed.

21
22
23
24
25
21. This is one of the benefits of correlation analysis wherein it be used to identify correlations between different process variables and identify
potential sources of quality problems.

Answer: Quality control

22. Spearman Rank Correlation measures the degree of association between the ranks of two attributes.

Answer: NA

23. This is one of the benefits of correlation analysis wherein it quantifies the degree and direction of the relationship, we can gain insights into how
changes in one variable are likely to affect the other.

Answer: Identifying Relationships

24. ___ in data analysis are two important techniques that can help uncover relationships and patterns in large datasets.

Answer: NA

25. ___ measures the linear relationship between two continuous variables.

Answer: Pearson correlation

Quiz 5 and 6
Cluster Analysis
Instruction - Part 1: Write the word TRUE on the box provided therein, if the statement is CORRECT: Otherwise, write FALSE. Erasures are NOT
allowed; final answer must be properly reflected on the box provided therein.
1 6 11
2 7 12
3 8 13
4 9 14
5 10 15
11. The quality of a clustering method depends on the similarity
measure used by the method its implementation, and its ability 21. Partitioning methods are also efficient in identifying natural
to discover all of the hidden patterns. clusters within data and can be used for various applications,
such as customer segmentation, image segmentation, and
Answer: FALSE - some or anomaly detection.
Answer: FALSE – Effective
12. Weights should be associated with different variables based on
applications and data semantics. 22. A good clustering method will produce high quality clusters that
is cohesive within clusters and distinctive among clusters.
Answer: TRUE
Answer: FALSE - between
13. Other popular partitioning methods in data mining include K-
medoids, Fuzzy K-means, and Hierarchical K-means. 23. The choice of the algorithm depends on the specific clustering
problem and the dataset characteristics.
Answer: FALSE – Fuzzy-C
Answer: TRUE
14. K-means partitions a dataset into K clusters, where K is a user-
defined parameter. 24. Summarization, compression, finding k-means and outlier
detection are considered as a preprocessing tool in clustering.
Answer: TRUE
Answer: FALSE – k-nearest neighbors
15. The refinement process involves calculating the mean of the
data points assigned to each cluster and updating the cluster 25. A cluster analysis is a collection of data objects whether similar
centroids' coordinates accordingly. or dissimilar.

Answer: TRUE Answer: FALSE – Cluster

16 – 17: What is the two-step process in cluster analysis?


16. Partitioning methods are irrelatively easy to implement and can
handle large datasets.
16 - finding similarities between data
Answer: FALSE - relatively according to the characteristics found in the
data.
17. Clustering for data understanding and applications includes city-
management.
17 - grouping similar data objects into
clusters
Answer: FALSE – city-planning

18. There is always a separate “quality” function that measures the 18 – 20. Discuss the graph.
“goodness” of a cluster.

Answer: FALSE - usually

19. Clustering for data understanding and applications includes


economic research.
Answer: FALSE - science

20. The most widely used partitioning method is the K-means


algorithm, which randomly assigns data points to clusters and
iteratively refines the clusters' medoids until convergence.

Answer: FALSE - centroids

Outlier Analysis

Instruction - Part 2: Write the word TRUE on the box provided therein, if the statement is CORRECT: Otherwise, write FALSE. Erasures are NOT
allowed; final answer must be properly reflected on the box provided therein.
21 26 31
22 27 32
23 28 33
24 29 34
25 30 35

21. Outliers can be caused by various factors, such as data entry errors, unexpected events, etc., and their detection can lead to valuable insights and
improve the accuracy of models.

Answer: TRUE

22. Outliers are also often referred to as abbreviations.

Answer: FALSE – aberrations

23. An outlier is the process of identifying and examining data points that significantly differ from the rest of the dataset.

Answer: Outlier analysis in data mining

24. Outlier can arise from various sources, such as measurement errors or data collection methods, and it can negatively affect the accuracy and
reliability of data analysis.

Answer: FALSE - Noise

25. Detecting and removing outliers can improve the accuracy and reliability of data mining.

Answer: FALSE - analysis

26. One of the benefits of outlier analysis in data mining is it can lead to better-informed decisions.

Answer: TRUE

27. Removing outliers or developing models that can handle them appropriately can improve data performance.

Answer: FALSE - model

28. Global outliers are typically detected using statistical methods focusing on the entire dataset's extreme values.

Answer: TRUE

29. If the data is continuous, statistical methods such as z-scores can be used, while for categorical data, methods such as the chi-squared test can be
used.

Answer: TRUE

30. An outlier analysis can be defined as a data point that deviates significantly from the normal pattern or behavior of the data.

Answer: FALSE – outlier only

31. One of the benefits of outlier analysis in data mining is the identification of data quantity issues wherein outliers can be caused by data
collection, processing, or measurement errors, which can indicate data quality issues.

Answer: FALSE - quality

32. Studying outliers can provide valuable insights and lead to mysteries.

Answer: FALSE - Discoveries

33. Collective outliers are typically detected using clustering analysis or other methods that group similar data points.

Answer: FALSE – algorithms

34. Contextual outliers are typically detected using domain knowledge or contextual information irrelevant to the dataset.

Answer: FALSE - Relevant


35. A wide range of techniques can be used for outlier analysis in data mining, such as statistical methods, clustering algorithms, and machine
learning models.

Answer: TRUE

You might also like