You are on page 1of 3

<<ALL DOUBTS and CLARIFICATION needed Topics from ML>>

======================================================

Whats the difference in using SVR and linear regression

Decision tree regression:


Information Entropy - how the algorithm will split the data

Random forest is part of 'ensemble learning'

What is difference between using single tree in decision tree reg and 500+ trees in
random forest reg?

How to interpret adj R square.

In the Adjusted R2 formula, what does p represent?-> (p means the no of independent


variables or predictors)

Shouldnt we bother to do backward elimination process in python using sklearn?

Churn modeling - classifing customers who stays and who leaves.

Maximum likely hood of logistic regression curve for finding best fit

Confusion matrix interpretation in Classification metrics :


according to sklearn.metrics:
Predicted Class 0 Predicted Class 1
Actual Class 0 True Negative (TN) False Positive (FP)
Actual Class 1 False Negative (FN) True Positive (TP)

Sigmoid function used in logistic regression and its intuition

KNN - Eucledian distance, other types of distance metrics used.

Does KNN model memorises all the input data points and use them to identify the
classification of new data point?
ans:-
yes

How can naive bayes theorem could clasify the new data point without comparing with
original data points after model got trained?
ANS:-
It dont stores all the data , insted it finds the distribution of the data (multi
variate normal distribution)

Study the 'DecisionTreeClassifier' class parameter - "classifier" its default value


is 'gini' but we used Entropy. where do we use gini then?. also, what is 'gini
impurity'?
ans:-
both gini ,entropy are measures of disorder. but gini is easy to compute than
entropy.

"Receiver operating Characteristcs" in the context of performance metrics of


classification model- what is it?

Forward looking variable problems,post-factum variables (in too good to be true ,


high accurate model).
Regularization parameter lambda or the Penalty parameter C (these are hyper
parameters) how to deal with them?

Higherorical clustering (HC) -> working of Devisive HC(higher to lower level).

In HC, one of the clustering technique is "method of minimum variance" (or 'ward').
how does it work?

Association rule learning:(Apriori model) why the rules are one directional. like,
for ex, rule1:(burger --> frenchFries). in this rule, why not (frenchFries -->
burger) dont you think there is no first preference and only equal likeness for
both as a collective set of items.
Ans:-
because in Apriori, we calculate Confidence(M1-->M2) which implies, what is the
probability of a person watch movie2 privided he watched movie1. it is equivalent
ot bayes theorem called as P(M2/M1)=P(M2&M1)/P(M1). clearly this relation is not
symmetrical.

Apriori vs Eclat, whats the difference in implementation procedure.


(i think apriori is the superset of Eclat, as it first finds support of all the
frequent sets as Eclat does.)

Should we implement same code as in Apriori which checks for additional metricks
like confidence and lift for identifying frequent items sets in Eclat model.
(since there is no other efficient python library other than apriori for Eclat. So,
we have no choice)

what is AB test

upper confidence bound algorithm: how does the confidence interval shrinks after
each new trial. and by how much value.

Thompson sampling: what is beta distribution.

NLP sparse matrix, stemming in preprocessing

how to use CAP curves in identifying accuracy of classification models

Accuracy,Precison, Recall, F1 score in classification models - study about these


metrics

Try c5.0 , maximum entropy classification models for NLP

What is Significance of activation function in ANN

Softmax Function - used in logistic regression multiple classification.

PCA vs LDA - how this algorithms reduces dimentionality of a data with many columns
to preferred no of columns.

Types of 'loss functions' used in all regression and classification algorithms

ensemble learning - 1. bagging and 2. boosting what are the types in it.

Yes Regression problems in ML.

---------------------------------

What is ANOVA test


What is Turkey method 1977

point-biserial correlation test

what is FactorPlot in detail and all its possible use cases

what is BarPlot between categorical and numerical variables

Learning curves plots for ML Model learning

VotingClassifier() - an Ensemble model

You might also like