Professional Documents
Culture Documents
e
1. What is Data Mining?
un
2. What is Supervised and Unsupervised Learning?
3. What are the different tasks of Data Mining?
4. Discuss the Life cycle of Data Mining projects?
P
5. Explain the process of KDD.
6. What is Prediction?
7. What are the different fields where data mining is used? Explain any one field in detail.
g,
8. Give some data mining tools. Name some best tools which can be used for data analysis.
9. What do you understand by data aggregation and data generalization?
ng
10.What do you understand by a model in Data Mining?
11.What is the difference between univariate, bivariate, and multivariate analysis?
12.What is Visualization?
fE
13.What is Data Preprocessing? What preprocessing steps do you know?
14.What is the difference between Data Processing and Data Mining?
15.What is Data Binning?
o
16.What's the difference between Feature Engineering vs. Feature Selection?
ge
17.What's the difference between Covariance and Correlation?
18.What is Cross-Validation and why is it important in supervised learning?
19.Explain how do you understand Dimensionality Reduction
lle
31.What is the Normalization of a data? Can you use different Normalization methods on
different features?
32.Explain the two types of Data Reducing Algorithms
33.When would you use Equal Frequency Binning and when do you use Equal Width
Binning?
Cu
e
36.How would you deal with Outliers in your dataset?
37.How would you use a Confusion Matrix for determining a model performance?
un
38.When would you use chi-Square test?
39.How would you handle Missing Data?
40.How could I (statistically) find features that are more important than others?
P
g,
41.Differentiate Between Data Mining And Data Warehousing?
42.What Are Cubes?
ng
43.What are the differences between OLAP And OLTP?[IMP]
44.What is the difference between variance and covariance?
45.Why should we use data warehousing and how can you extract data for analysis?
46.What are the different storage models available in OLAP?
56.What are the parameters by which you can evaluate a classifier? Explain.
57.What are the methods to validate a classifier?
58.What are the techniques to select features
59.What is data normalization? How can you normalize data, explain with example.
60.What is bayes classification? Explain with example
ins
e
70.While performing K-Means Clustering, how do you determine the value of K?
un
71.What is the DBSCAN Algorithm?
72.Explain the DBSCAN Algorithm step by step.
73.Which is the most widely used Density-based Clustering Algorithm?
P
74.What is Density-based Clustering?
75.Explain the Input parameters given to the DBSCAN Algorithm.
76.What are density reachability and density connectivity?
g,
77.Explain the following terms related to DBSCAN Algorithm:
• Direct Density Reachable
ng
• Density Reachable
• Density Connected
78.What are the advantages of the DBSCAN density-based Clustering Algorithm?
fE
79.What are the disadvantages of the DBSCAN density-based Clustering Algorithm?
86.What is Classification?
87.What are ‘Training set’ and ‘Test set’?
Co
e
102.Define the terms: frequent itemsets, patterns, and market basket analysis.
un
103.Illustrate market basket analysis
104.What is meant by association rule mining? Explain the process of association rule mining,
using an example.
105.Explain the steps in Apriori algorithm used for frequent pattern mining with the help of an
P
example
106.Write the Apriori algorithm for frequent pattern mining.
g,
107.Explain the procedure for generating strong association rules from frequent itemsets
108.Explain the different algorithms for improving the efficiency of the Apriori algorithm
109.Explain a method for finding frequent patterns without generating frequent itemsets
ng
110.Write and explain the FP-growth algorithm
111.Explain the method of using vertical data format for generating frequent itemsets
112.What is a closed frequent itemset? Explain the approaches to mining closed frequent
fE
itemsets.
113.Explain about mining multilevel association rules using top-down approach, with a
suitable example
o
114.Explain about the variations to the top-down approach in multilevel association rule
mining
115.Explain about multidimensional association rules using suitable examples
ge
116.What is a categorical attribute? What is a quantitative attribute?
117.Explain the approaches for categorizing the techniques for the mining of quantitative
attributes for multidimensional association rules
lle
120.Removing words like “and”, “is”, “a”, “an”, “the” from a sentence is called as?
121.The process of deriving high quality information from text is referred to as ________.
122.The various aspects of text mining is/are____________.
123.________is fundamentally defining unstructured data to structured data and applying
text.
ins
124.In a structured and annotated text dataset you can just import into your program, to apply
text mining operation is statistically referred as _______.
125.Bag of words referred to as ________ .
mm
126.Machine learning algorithms cannot work with raw text directly; the text must be
converted into numbers. Specifically, vectors of numbers. This is called _________.
127.For a very large corpus, that the length of the vector might be thousands or millions of
positions and each document may contain very few of the known words in the vocabulary
then this results in a vector with lots of zero scores called as________.
128.Creating a vocabulary of two-word pairs is, in turn, called a _________ model.
Cu