Professional Documents
Culture Documents
2 What is curse of dimensionality?
3 Why is Dimensionality Reduction important? Comment on your
answer with an example.
4 Cluster the given data using E-M algorithm
○ 12,14,9,6,4,15
Assume learning constant = 1. Initial hypothesis as 12, 14
5 Specify in PCA algorithm with explanation of each step.
(PCA notes page 11)
○ Preprocessing
○ Calculate sigma (covariance matrix)
○ Calculate eigenvectors with svd
○ Take k vectors from U (Ureduce= U(:,1:k);) o Calculate z (z
=Ureduce' * x;)
6 What is SVD? Compare SVD with PCA.
7.Following is the output of E-step in the EM algorithm (2 clusters)
A]. Identify the values for E[z12], E[z22], E[z32], E[z42], E[z52]
B]. Find new hypothesis
8.What is idea behind Classification and regression trees(cart)? What is
the advantage of using CART?
➢ Idea:
➢ Advantages
12 What are ensembles in ML? What is the reason of using
ensembles?
• Diverse set of models gives better decisions in comparison to single
models.
• This diversification in Machine Learning is achieved by a technique
called Ensemble Learning.
• Ensemble: is a machine learning technique that combines several
base models in order to produce one optimal predictive model.
• Reason of using ensemble:
○ ➢ Weak learners which fail to converge.
○ ➢Model might perform well on some data and less accurate on
other.
13 What is bagging? Explain the working of bagging technique.
• Bagging is also called bootstrap aggregation.
• Bootstrapping: create subsets of observations from the original
dataset, with replacement.
○ • Implementation Steps of Bagging
■ • Step 1: Multiple subsets are created from the original data
set with equal tuples, selecting observations with
replacement.
■ • Step 2: A base model is created on each of these
subsets.
■ • Step 3: Each model is learned in parallel with each
training set and independent of each other.
■ • Step 4: The final predictions are determined by combining
the predictions from all the models.
14 Explain the working of random forest algorithm.
• Random Forest Classifier or regressor is a bagging technique.
• A forest is comprised of trees, similarly random forest classification is
a collection of different decision trees.
○ Implementation of Random Forest
■ • Step 1. Select random samples from a given dataset.
■ • Step 2. Construct a decision tree for each sample and get
a prediction result from each decision tree.
■ • Step 3. Select the prediction result with the most votes as
the final prediction
18 Explain K-fold cross-validation with example.
K-Fold Cross validation:
• Step 1. Randomly split the entire dataset into k folds/subsets.
• Step 2. In each iteration(or kth round), train the model using (k – 1)
folds of the dataset and validate/test the model using the kth fold.
• Step 3. Calculate the accuracy for this iteration.
• Step 4. Repeat this process until each of the k-
folds has served as the validation/test set.
• Step 5. Take the average of all k such accuracies to get the final
validation accuracy.
20 Define support vectors. What is the importance of support vectors?
21 Why SVM’s are called Maximum margin separators?
• To separate the two classes of data points, there are many possible
hyperplanes that could be chosen.
• Our objective is to find a plane that has the maximum margin, i.e the
maximum distance between data points of both classes.
• Maximizing the margin distance provides some reinforcement so that
future data points can be classified with more confidence.
• So, SVM’s also called as Maximum Margin Seperators.
22 What are hyperplanes?
- Hyperplanes are decision boundaries that help classify the data points.
- The dimension of the hyperplane depends upon the number of
features.
22 Describe quadratic programming problem in SVM.
23 What is kernel trick? List various kernel function used in SVM.
24 Numerical on SVM.
25 What is linearly separable and non-linearly separable data .Support
answer with diagram.
26 Write a short note on XGBoost.
(Not in the ppt…..geeks for geeks)
XGBoost is an implementation of Gradient Boosted decision trees.
XGBoost models majorly dominate in many Kaggle Competitions.
In this algorithm, decision trees are created in sequential form. Weights
play an important role in XGBoost. Weights are assigned to all the
independent variables which are then fed into the decision tree which
predicts results.
The weight of variables predicted wrong by the tree is increased and
these variables are then fed to the second decision tree.
These individual classifiers/predictors then ensemble to give a strong
and more precise model. It can work on regression, classification,
ranking, and user-defined prediction problems.
27 Short note on support vector regressors.