Professional Documents
Culture Documents
a) Classifica on
b) Regression
c) Clustering
d) Data storage
a) Descrip ve analysis
d) Data visualiza on
a) Sending emails
c) Recommender systems
d) Internet browsing
5. Which term refers to the process of transforming raw data into a suitable format for analysis?
a) Data preprocessing
b) Data visualiza on
c) Data warehousing
d) Data integra on
6. What is the primary challenge in handling big data in data mining?
a) Classifica on groups data into predefined categories, while clustering groups data based on
similarity.
b) Classifica on groups data based on similarity, while clustering uses predefined categories.
c) Classifica on is used for regression analysis, while clustering is used for classifica on.
10. What does the term "data warehousing" refer to in the context of data mining?
26. What is the difference between structured and unstructured data in data mining?
c) Structured data is easy to process, while unstructured data lacks a predefined structure.
d) Structured data is always big data, while unstructured data is small data.
27. Which of the following is a common visualiza on technique used in data mining?
a) Sca er plots
b) Histograms
c) Decision trees
a) A model that performs well on training data but poorly on unseen data
b) A model that performs equally well on both training and tes ng data
1. What is the primary objec ve of determining the op mal number of clusters in K-Means
clustering?
2. Which evalua on metric is commonly used to assess the quality of K-Means clustering with
different numbers of clusters?
a) F-score
b) Accuracy
c) Silhoue e Score
3. In the context of K-Means clustering, what does the "Elbow Method" help determine?
4. What does the "Within-Cluster Sum of Squares (WCSS)" represent in K-Means clustering?
6. What is one limita on of the Elbow Method for selec ng the op mal number of clusters?
9. Which factor can influence the choice of the op mal number of clusters in K-Means clustering?
10. In the Silhoue e Score metric, what does a high score indicate?
13. When applying the Silhoue e Score, what is the ideal outcome for cluster quality assessment?
14. What is an advantage of using the Silhoue e Score over the Elbow Method for determining the
op mal number of clusters?
c) The Silhoue e Score considers both cluster cohesion and separa on.
15. Which step is NOT typically involved in the process of determining the op mal number of clusters
in K-Means clustering?
d) Choosing the cluster number with the highest evalua on metric value
Chapter 7: Clustering
a) Volume
b) Density
c) Euclidean distance
d) Frequency
3. What is the primary difference between hierarchical and par onal clustering?
a) Hierarchical clustering produces a hierarchy of clusters, while par onal clustering assigns
each data point to a single cluster.
b) Par onal clustering produces a hierarchy of clusters, while hierarchical clustering assigns each
data point to a single cluster.
c) Hierarchical clustering uses a single-level hierarchy, while par onal clustering uses mul ple
levels.
4. Which clustering algorithm is known for its ability to handle large datasets efficiently?
a) K-means
b) Hierarchical clustering
c) DBSCAN
d) Spectral clustering
5. In clustering, what does the term "centroid" refer to?
6. Which of the following clustering algorithms is density-based and can find clusters of arbitrary
shapes?
a) K-means
b) Hierarchical clustering
c) DBSCAN
d) Spectral clustering
11. What is the "silhoue e score" used for in clustering evalua on?
a) K-means
b) Hierarchical clustering
c) DBSCAN
d) Spectral clustering