ML Notes and Doubts

<<ALL DOUBTS and CLARIFICATION needed Topics from ML>>
======================================================
Whats the difference in using SVR and linear regression
Decision tree regression:

Information Entropy - how the algorithm will split the data
Random forest is part of 'ensemble learning'
What is difference between using single tree in decision tree reg and 500+ trees in
random forest reg?
How to interpret adj R square.
In the Adjusted R2 formula, what does p represent?-> (p means the no of independent

variables or predictors)
Shouldnt we bother to do backward elimination process in python using sklearn?
Churn modeling - classifing customers who stays and who leaves.
Maximum likely hood of logistic regression curve for finding best fit
Confusion matrix interpretation in Classification metrics :

according to sklearn.metrics:
Predicted Class 0 Predicted Class 1
Actual Class 0 True Negative (TN) False Positive (FP)
Actual Class 1 False Negative (FN) True Positive (TP)
Sigmoid function used in logistic regression and its intuition
KNN - Eucledian distance, other types of distance metrics used.
Does KNN model memorises all the input data points and use them to identify the
classification of new data point?
ans:-
yes
How can naive bayes theorem could clasify the new data point without comparing with
original data points after model got trained?
ANS:-
It dont stores all the data , insted it finds the distribution of the data (multi
variate normal distribution)
Study the 'DecisionTreeClassifier' class parameter - "classifier" its default value

is 'gini' but we used Entropy. where do we use gini then?. also, what is 'gini
impurity'?
ans:-
both gini ,entropy are measures of disorder. but gini is easy to compute than
entropy.
"Receiver operating Characteristcs" in the context of performance metrics of

classification model- what is it?
Forward looking variable problems,post-factum variables (in too good to be true ,

high accurate model).
Regularization parameter lambda or the Penalty parameter C (these are hyper
parameters) how to deal with them?
Higherorical clustering (HC) -> working of Devisive HC(higher to lower level).
In HC, one of the clustering technique is "method of minimum variance" (or 'ward').
how does it work?
Association rule learning:(Apriori model) why the rules are one directional. like,
for ex, rule1:(burger --> frenchFries). in this rule, why not (frenchFries -->
burger) dont you think there is no first preference and only equal likeness for
both as a collective set of items.
Ans:-
because in Apriori, we calculate Confidence(M1-->M2) which implies, what is the
probability of a person watch movie2 privided he watched movie1. it is equivalent
ot bayes theorem called as P(M2/M1)=P(M2&M1)/P(M1). clearly this relation is not
symmetrical.
Apriori vs Eclat, whats the difference in implementation procedure.

(i think apriori is the superset of Eclat, as it first finds support of all the
frequent sets as Eclat does.)
Should we implement same code as in Apriori which checks for additional metricks
like confidence and lift for identifying frequent items sets in Eclat model.
(since there is no other efficient python library other than apriori for Eclat. So,
we have no choice)
what is AB test
upper confidence bound algorithm: how does the confidence interval shrinks after
each new trial. and by how much value.
Thompson sampling: what is beta distribution.
NLP sparse matrix, stemming in preprocessing
how to use CAP curves in identifying accuracy of classification models
Accuracy,Precison, Recall, F1 score in classification models - study about these

metrics
Try c5.0 , maximum entropy classification models for NLP
What is Significance of activation function in ANN
Softmax Function - used in logistic regression multiple classification.
PCA vs LDA - how this algorithms reduces dimentionality of a data with many columns
to preferred no of columns.
Types of 'loss functions' used in all regression and classification algorithms
ensemble learning - 1. bagging and 2. boosting what are the types in it.
Yes Regression problems in ML.
---------------------------------
What is ANOVA test

What is Turkey method 1977
point-biserial correlation test
what is FactorPlot in detail and all its possible use cases
what is BarPlot between categorical and numerical variables
Learning curves plots for ML Model learning
VotingClassifier() - an Ensemble model

ML Notes and Doubts

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ML Notes and Doubts

Uploaded by

Copyright:

Available Formats

<<ALL DOUBTS and CLARIFICATION needed Topics from ML>>

Whats the difference in using SVR and linear regression

Decision tree regression:

Random forest is part of 'ensemble learning'

How to interpret adj R square.

In the Adjusted R2 formula, what does p represent?-> (p means the no of independent

Shouldnt we bother to do backward elimination process in python using sklearn?

Churn modeling - classifing customers who stays and who leaves.

Confusion matrix interpretation in Classification metrics :

Sigmoid function used in logistic regression and its intuition

KNN - Eucledian distance, other types of distance metrics used.

Study the 'DecisionTreeClassifier' class parameter - "classifier" its default value

"Receiver operating Characteristcs" in the context of performance metrics of

Forward looking variable problems,post-factum variables (in too good to be true ,

Higherorical clustering (HC) -> working of Devisive HC(higher to lower level).

Apriori vs Eclat, whats the difference in implementation procedure.

Thompson sampling: what is beta distribution.

NLP sparse matrix, stemming in preprocessing

how to use CAP curves in identifying accuracy of classification models

Accuracy,Precison, Recall, F1 score in classification models - study about these

Try c5.0 , maximum entropy classification models for NLP

What is Significance of activation function in ANN

Softmax Function - used in logistic regression multiple classification.

Types of 'loss functions' used in all regression and classification algorithms

Yes Regression problems in ML.

What is ANOVA test

point-biserial correlation test

what is FactorPlot in detail and all its possible use cases

what is BarPlot between categorical and numerical variables

Learning curves plots for ML Model learning

VotingClassifier() - an Ensemble model

You might also like