Machine Learning Midterm

MACHINE LEARNING
ASSIGNMENT
1) What is classification, and what are the different families of classification
algorithms? Explain with examples.
2) What is clustering, and what are the different types of clustering
algorithms? Provide examples of applications of clustering in different
domains.
3) What are the different types of machine learning, and how do they differ
from each other? Provide examples of algorithms for each type.
4) What is a Naive Bayes classifier, and how does it differ from other
classifiers? Provide an example of a problem that can be solved using a
Naive Bayes classifier.
5) What is a K-Nearest Neighbors classifier, and how does it work? Provide
an example of a problem that can be solved using a K-Nearest Neighbors
classifier.
 The K-Nearest Neighbors (KNN) classifier is a simple and intuitive machine

learning algorithm used for classification tasks. It is based on the principle of
similarity: objects that are similar are likely to belong to the same class. KNN
makes predictions by comparing the new instance to the labeled instances in
the training data and selecting the class that is most common among its k
nearest neighbors.
 How K-Nearest Neighbors (KNN) Classifier Works:
 Training Phase:
o During the training phase, KNN simply stores the training instances in
memory without any explicit model training. The training instances
consist of feature vectors and their corresponding class labels.
 Prediction Phase:
o When given a new, unseen instance to classify, KNN calculates the
distance between the new instance and all instances in the training
data.
o KNN then selects the k nearest neighbors to the new instance based on
their distances. The value of k is a hyperparameter chosen by the user.
o For classification tasks, KNN assigns the class label that is most
common among the k nearest neighbors to the new instance. This can
be determined by a majority voting scheme.
o For regression tasks, KNN calculates the average (or weighted
average) of the target values of the k nearest neighbors and assigns it
as the predicted value for the new instance.
 Example Problem:
 An example problem that can be solved using a KNN classifier is the
classification of handwritten digits in images. Suppose we have a dataset of
images of handwritten digits (0 to 9), where each image is represented as a
feature vector of pixel values and is associated with its corresponding digit
label.
 Features:
o Pixel values of the handwritten digit images, where each pixel
corresponds to a feature.
 Labels:
o Digit labels (0 to 9) indicating the identity of the handwritten digit in
each image.
 Given a new, unseen image of a handwritten digit, the task is to classify it into
one of the 10 possible digit classes (0 to 9) using KNN. The KNN algorithm
would compare the new image to the labeled images in the training data,
calculate the distances to the nearest neighbors, and assign the most common
digit label among its k nearest neighbors as the predicted label for the new
image.
6) What is a Support Vector Machine (SVM), and how does it work?
Provide an example of a problem that can be solved using an SVM.
A Support Vector Machine (SVM) is a supervised machine learning algorithm used

for classification and regression tasks. It is particularly effective in high-dimensional
spaces and when the number of features exceeds the number of samples. SVM aims to
find the optimal hyperplane that best separates the classes in the feature space by
maximizing the margin between the classes.
How SVM Works:

Linear Separation:
Given a set of labeled training data, SVM tries to find the hyperplane that separates
the classes with the maximum margin. This hyperplane is defined by the support
vectors, which are the data points closest to the hyperplane.
Maximizing Margin:
The margin is the distance between the support vectors and the hyperplane. SVM
selects the hyperplane that maximizes this margin, as it provides the greatest margin
of safety against misclassification.
Optimization:
SVM solves an optimization problem to find the optimal hyperplane. It aims to
minimize the classification error while maximizing the margin. This optimization is
typically performed using techniques such as quadratic programming or gradient
descent.
Kernel Trick:
SVM can handle non-linearly separable data by mapping the input features into a
higher-dimensional space using a kernel function. In this higher-dimensional space,
the classes may become linearly separable, allowing SVM to find a hyperplane.
Common kernel functions include polynomial kernels, radial basis function (RBF)
kernels, and sigmoid kernels.
Example Problem:
An example problem that can be solved using an SVM is classifying emails as spam
or non-spam based on their content (text data). The features in this problem could be
various characteristics of the emails, such as the frequency of certain words or
phrases.
Features:
Word frequency: Number of times specific words or phrases appear in the email.
Email length: Number of words or characters in the email.
Presence of attachments: Boolean indicator of whether the email contains
attachments.
Labels:
Spam (1) or non-spam (0) label assigned to each email.

7) What is linear regression, and how is it used in machine learning?
Provide an example of a problem that can be solved using linear
regression.
AAKASH
1. Given any two methods to avoid over fitting? IMP [X3]
 Two common methods to avoid overfitting in machine learning are:
 Cross-Validation: Cross-validation is a technique used to assess how well a
predictive model generalizes to an independent data set. It involves
partitioning the original data set into subsets, typically called the training set
and the validation set. The model is trained on the training set and evaluated
on the validation set. This process is repeated multiple times with different
partitions of the data, and the performance metrics are averaged across all
runs. By using cross-validation, we can get a more accurate estimate of the
model's performance and prevent overfitting by ensuring that the model's
performance is consistent across different subsets of the data.
 Regularization: Regularization is a technique used to prevent overfitting by
adding a penalty term to the loss function during the training process. This
penalty term discourages overly complex models by penalizing large
coefficients or parameters. Common regularization techniques include L1
regularization (Lasso), which adds the absolute values of the coefficients to
the loss function, and L2 regularization (Ridge), which adds the squared
values of the coefficients. By controlling the strength of the regularization
term, we can control the trade-off between model complexity and training
error, thereby preventing overfitting.
 Both cross-validation and regularization are effective methods for avoiding
overfitting and improving the generalization performance of machine learning
models.
2. What is the difference between Supervised and unsupervised
learning? Imp [X3]
3. Discuss Bagging and Boosting. IMP [X3]
 Bagging and boosting are both ensemble learning techniques used to improve
the performance of machine learning models by combining the predictions of
multiple base learners. While they share the goal of reducing prediction
variance and improving generalization, they differ in their approaches and
methodologies. Let's discuss each technique:
 Bagging (Bootstrap Aggregating):
o Bagging involves training multiple base learners independently on
different subsets of the training data, which are sampled with
replacement (bootstrap samples). Each base learner produces its own
predictions, and the final prediction is typically made by averaging (for
regression) or voting (for classification) over the predictions of all base
learners. The key components of bagging are as follows:
o Bootstrap Sampling: Randomly selecting subsets of the training data
with replacement. This results in each base learner being trained on a
slightly different subset of the data.
o Base Learners: Using a base learning algorithm (e.g., decision trees) to
train multiple models independently on the bootstrap samples.
o Combining Predictions: Combining the predictions of all base learners
to make the final prediction. For regression problems, this usually
involves averaging the predictions, while for classification problems, it
involves majority voting.
o Bagging helps to reduce overfitting by introducing diversity among the
base learners, as each model is trained on a different subset of the data.
It often results in improved generalization performance, especially
when the base learners are unstable (high variance).
 Boosting:
o Boosting is an iterative ensemble learning technique where base
learners are trained sequentially, and each subsequent learner focuses
on the examples that were misclassified by the previous ones. The key
components of boosting are as follows:
o Sequential Training: Base learners are trained sequentially, and each
subsequent learner pays more attention to the examples that were
misclassified by the previous ones.
o Weighted Data: During training, each example is assigned a weight
based on its difficulty in the training process. Misclassified examples
are given higher weights to ensure that subsequent base learners focus
more on correcting these mistakes.
o Combining Predictions: Unlike bagging, where predictions are
combined using simple averaging or voting, boosting assigns weights
to each base learner's prediction based on its performance during
training. Final predictions are made by combining these weighted
predictions.
o Boosting aims to reduce bias and variance by iteratively focusing on
the examples that are difficult to classify. Popular boosting algorithms
include AdaBoost (Adaptive Boosting) and Gradient Boosting
Machines (GBM), which differ in their specific methodologies for
assigning weights to base learners and combining predictions.
4. Explain decision tree algorithm in detail? [X2] IMP
 Decision Tree is a Supervised learning technique that can be used for both
classification and Regression problems, but mostly it is preferred for solving
Classification problems. It is a tree-structured classifier, where internal nodes
represent the features of a dataset, branches represent the decision rules and
each leaf node represents the outcome.
 In a Decision tree, there are two nodes, which are the Decision Node and Leaf
Node. Decision nodes are used to make any decision and have multiple
branches, whereas Leaf nodes are the output of those decisions and do not
contain any further branches.
 The decisions or the test are performed on the basis of features of the given
dataset.
 It is a graphical representation for getting all the possible solutions to a
problem/decision based on given conditions.
 It is called a decision tree because, similar to a tree, it starts with the root node,
which expands on further branches and constructs a tree-like structure.
 In order to build a tree, we use the CART algorithm, which stands for
Classification and Regression Tree algorithm.
 A decision tree simply asks a question, and based on the answer (Yes/No), it
further split the tree into subtrees.
 Decision Tree Terminologies

o Root Node: Root node is from where the decision tree starts. It
represents the entire dataset, which further gets divided into two or
more homogeneous sets.
o Leaf Node: Leaf nodes are the final output node, and the tree cannot be
segregated further after getting a leaf node.
o Splitting: Splitting is the process of dividing the decision node/root
node into sub-nodes according to the given conditions.
o Branch/Sub Tree: A tree formed by splitting the tree.
o Pruning: Pruning is the process of removing the unwanted branches
from the tree.
o Parent/Child node: The root node of the tree is called the parent node,
and other nodes are called the child nodes.
 Example: Suppose there is a candidate who has a job offer and wants to decide
whether he should accept the offer or Not. So, to solve this problem, the
decision tree starts with the root node (Salary attribute by ASM). The root
node splits further into the next decision node (distance from the office) and
one leaf node based on the corresponding labels. The next decision node
further gets split into one decision node (Cab facility) and one leaf node.
Finally, the decision node splits into two leaf nodes (Accepted offers and
Declined offer). Consider the below diagram:
5. Write ADA Boost Algorithm. [x2] imp
 AdaBoost short for Adaptive Boosting is an ensemble learning used in

machine learning for classification and regression problems. The main idea
behind AdaBoost is to iteratively train the weak classifier on the training
dataset with each successive classifier giving more weightage to the data
points that are misclassified. The final AdaBoost model is decided by
combining all the weak classifier that has been used for training with the
weightage given to the models according to their accuracies. The weak model
which has the highest accuracy is given the highest weightage while the model
which has the lowest accuracy is given a lower weightage.
 AdaBoost Algorithm:
 Initialize Sample Weights:
o Assign equal weights to each training sample initially.
 For Each Iteration (t):
o Train Weak Learner: Train a weak learner (e.g., decision tree) on the
training data. The weak learner aims to perform slightly better than
random chance.
o Compute Error: Calculate the weighted error of the weak learner. This
error is the sum of the weights of the misclassified samples divided by
the total weight of all samples.
Compute Learner Weight: Compute the weight of the weak learner in the final
ensemble. The weight is determined by the error rate: a lower error rate results
in a higher weight.
o Update Sample Weights: Increase the weights of the misclassified
samples and decrease the weights of the correctly classified samples.
This emphasizes the importance of the misclassified samples in the
next iteration.
 Combine Weak Learners:
o Combine the weak learners into a strong classifier using weighted
majority voting. The weight of each weak learner is used as a weight in
the final prediction.
 Final Prediction:
o Make predictions using the combined model. Each weak learner's
prediction is weighted according to its weight, and the final prediction
is determined by the sum of these weighted predictions.
6. What is Logistic Regression? Write short note on : Logistic
Regression [X2]
 Logistic regression is a statistical model used for binary classification tasks,

where the output variable (target) takes on only two possible values, typically
represented as 0 and 1. It is widely used in various fields, including healthcare,
finance, marketing, and social sciences. Despite its name, logistic regression is
a classification algorithm, not a regression algorithm.
 Key Concepts:
o Binary Classification: Logistic regression is used when the outcome
variable is categorical with two possible outcomes, such as yes/no,
pass/fail, or spam/not spam.
o Sigmoid Function: In logistic regression, the relationship between the
independent variables and the probability of the outcome is modeled
using the sigmoid (logistic) function. The sigmoid function maps any
real-valued number to the range [0, 1], making it suitable for modeling
probabilities.
o Log Odds: Logistic regression models the log-odds (logarithm of the
odds ratio) of the probability of the positive class. Mathematically, it
can be represented as:
o Maximum Likelihood Estimation: The parameters of the logistic

regression model are estimated using maximum likelihood estimation
(MLE), which maximizes the likelihood of observing the actual
outcomes given the model predictions.
 Training and Prediction:

o Training: During training, the logistic regression model learns the
optimal coefficients that best fit the training data. This is typically done
using optimization algorithms such as gradient descent or Newton's
method to minimize the cost function, which measures the difference
between the predicted probabilities and the actual outcomes.
o Prediction: Once trained, the logistic regression model can make
predictions on new data by computing the probability of the positive
class using the learned coefficients and the sigmoid function. If the
predicted probability is above a certain threshold (usually 0.5), the
instance is classified as belonging to the positive class; otherwise, it is
classified as belonging to the negative class.
 Advantages and Applications:

o Interpretability: Logistic regression provides interpretable results, as
the coefficients can be directly interpreted as the effect of each
independent variable on the log-odds of the outcome.
o Efficiency: Logistic regression is computationally efficient and can
handle large datasets with relatively low memory and processing
requirements.
o Applications: Logistic regression is widely used in various fields,
including healthcare (e.g., predicting disease risk), finance (e.g., credit
scoring), marketing (e.g., customer churn prediction), and social
sciences (e.g., predicting voting behavior).
7. Describe the limitation of Perceptron Model? [X2]
 The perceptron model, while historically important in the development of
neural networks and machine learning, has several limitations that restrict its
applicability and effectiveness in certain scenarios. Some of the key
limitations of the perceptron model include:
o Binary Classification Only: The perceptron model is designed for
binary classification tasks, where the output is either 0 or 1. It cannot
be directly applied to multi-class classification problems without
modifications or extensions, such as the one-vs-all strategy.
o Convergence Guarantee: While the perceptron algorithm guarantees
convergence if the data is linearly separable, there is no guarantee of
convergence if the data is not linearly separable. In such cases, the
algorithm may not converge or may converge to a suboptimal solution.
o Lack of Hidden Layers: The perceptron model consists of a single
layer of neurons with direct connections to the input features. It lacks
hidden layers, which limits its ability to learn hierarchical
representations of the data. This can be a significant drawback when
dealing with complex, high-dimensional datasets.
o Inability to Capture Non-Linear Relationships: Due to its linear nature,
the perceptron model cannot capture non-linear relationships between
input features and output labels. This restricts its performance on tasks
where non-linear relationships are prevalent, such as image recognition
or natural language processing.
o Not Robust to Noise: The perceptron model is sensitive to noisy data
and outliers, as it tries to find a decision boundary that perfectly
separates the classes. Even a small amount of noise or outliers can
significantly impact the learned decision boundary and degrade the
model's performance.
8. Explain the generative probabilistic classification. [X2]
 Generative probabilistic classification is a machine learning approach that
models the joint probability distribution of the features and the class labels in a
dataset. This approach learns how the data is generated and then uses this
knowledge to make predictions about the class labels of new instances.
 Steps in Generative Probabilistic Classification:
1) Modeling Class-Conditional Distributions:
a. The first step in generative probabilistic classification is to model the
probability distributions of the features given each class label. This
involves estimating the likelihood of observing different feature values
given each class.
2) Modeling Prior Probabilities of Classes:
a. In addition to modeling the class-conditional distributions, generative
classifiers also estimate the prior probabilities of each class label. This
represents the likelihood of each class occurring in the dataset.
3) Applying Bayes' Theorem:
a. Once the class-conditional distributions and prior probabilities are
estimated, Bayes' theorem is used to compute the posterior probability
of each class given the features:
b.
4) Making Predictions:
a. Once the posterior probabilities are computed for each class, the class
with the highest posterior probability is selected as the predicted class
label for the given set of features.
 Types of Generative Models:
 Naive Bayes Classifier:
o Naive Bayes is one of the most commonly used generative probabilistic
classifiers. It assumes that the features are conditionally independent given
the class label, which simplifies the modeling process.
 Gaussian Naive Bayes:
o Gaussian Naive Bayes is a variant of the Naive Bayes classifier that
assumes that the class-conditional distributions of the features given the
class labels are Gaussian (normal) distributions.
 Linear Discriminant Analysis (LDA):
o LDA is another generative model that assumes that the class-conditional
distributions of the features given the class labels are multivariate
Gaussian distributions with a shared covariance matrix.
9. What are the goals of machine learning? [X2]
 The goals of machine learning can be broadly categorized into two main
objectives:
 Prediction:
o Prediction is one of the primary goals of machine learning, where the
aim is to develop models that can accurately predict future outcomes or
behaviors based on historical data.
o This includes tasks such as regression, classification, and time series
forecasting, where the model learns patterns and relationships from
data to make predictions about unseen instances.
 Understanding and Interpretation:
o Another important goal of machine learning is to gain insights and
understanding from data, helping humans comprehend complex
phenomena or make informed decisions.
o This involves developing models that not only provide accurate
predictions but also offer explanations or interpretations of how and
why certain predictions are made.
o Interpretability is crucial in domains like healthcare, finance, and law,
where decisions based on machine learning models need to be
transparent and explainable.
10.What is learning? Discuss any four Learning Techniques. [x2]
 Learning, in the context of machine learning, refers to the process of training a
model to recognize patterns or relationships in data and make predictions or
decisions based on those patterns. It involves the model adjusting its
parameters or internal representations through exposure to examples (training
data), with the goal of improving its performance on unseen data.
 Four Learning Techniques:
 Supervised Learning:
o Supervised learning is a type of machine learning where the model is
trained on labeled data, meaning each input is paired with an output
label. The goal is to learn a mapping from inputs to outputs, such that
the model can accurately predict the output for new inputs.
o Common algorithms include:
o Linear Regression: Predicts a continuous output variable based on
input features.
o Logistic Regression: Predicts the probability of an input belonging to a
certain class.
o Support Vector Machines (SVM): Finds the optimal hyperplane that
separates different classes in feature space.
 Unsupervised Learning:
o Unsupervised learning involves training a model on unlabeled data,
where the goal is to uncover hidden patterns or structures in the data
without explicit guidance.
o Common techniques include:
o Clustering: Groups similar data points together based on their
characteristics, such as K-means clustering and hierarchical clustering.
 Reinforcement Learning:
o Reinforcement learning is a type of learning where an agent learns to
interact with an environment by performing actions and receiving
rewards or penalties in return. The goal is to learn a policy that
maximizes cumulative rewards over time.
o Key components include:
o Agent: The learner or decision-maker that interacts with the
environment.
o Environment: The external system or world that the agent interacts
with.
o Rewards: Feedback signals provided by the environment to indicate
the desirability of actions taken by the agent.
o Policy: Strategy or behavior that the agent uses to select actions in
different states.
 Semi-Supervised Learning:
o Semi-supervised learning combines elements of supervised and
unsupervised learning, where the model is trained on a combination of
labeled and unlabeled data.
o This approach leverages the abundance of unlabeled data and a smaller
amount of labeled data to improve model performance.
o Techniques include:
o Self-training: Initially, the model is trained on labeled data. Then, it
uses its predictions on unlabeled data to generate pseudo-labels, which
are used to retrain the model iteratively.
o Co-training: The model is trained on different subsets of features or
views of the data. Each subset of features contributes to the learning
process independently.
11.What do you understand by Reinforcement Learning?
.
 Reinforcement Learning (RL) is the science of decision making. It is about
learning the optimal behavior in an environment to obtain maximum reward.
In RL, the data is accumulated from machine learning systems that use a trial-
and-error method. Data is not part of the input that we would find in
supervised or unsupervised machine learning.
 Reinforcement learning uses algorithms that learn from outcomes and decide
which action to take next. After each action, the algorithm receives feedback
that helps it determine whether the choice it made was correct, neutral or
incorrect. It is a good technique to use for automated systems that have to
make a lot of small decisions without human guidance.
 Reinforcement learning is an autonomous, self-teaching system that
essentially learns by trial and error. It performs actions with the aim of
maximizing rewards, or in other words, it is learning by doing in order to
achieve the best outcomes.
12.What is over fitting? What do you understand by over fitting of

data?
 Overfitting is a common problem in machine learning where a model learns
the training data too well, capturing noise or random fluctuations in the data
instead of the underlying patterns or relationships. This results in a model that
performs well on the training data but fails to generalize to unseen data,
leading to poor performance on new, unseen examples.
 Overfitting occurs when a model becomes too complex or flexible, effectively
memorizing the training data rather than learning the true underlying structure.
As a result, the model may exhibit high variance, meaning it is sensitive to
small changes in the training data and produces vastly different predictions
when applied to different datasets.
 In essence, overfitting is like "fitting the noise" in the data rather than "fitting
the signal," causing the model to perform poorly on new, unseen examples. It
is essential to detect and mitigate overfitting by employing techniques such as
regularization, cross-validation, early stopping, and using simpler models with
fewer parameters. These methods help strike a balance between model
complexity and generalization performance, ensuring that the model can
effectively learn from the data without overfitting.
13.What do you understand by POMDP?
 A POMDP is an extension of the Markov decision process (MDP) framework
that incorporates partial observability. In a POMDP, the agent does not have
full access to the state of the environment but instead receives noisy or
incomplete observations. The agent must maintain a belief state, which is a
probability distribution over the possible states of the environment, and make
decisions based on this belief state and the observed evidence. POMDPs are
used in situations where the agent has limited sensing capabilities or when the
environment is stochastic and uncertain
14.Explain the concept of Hidden Markov model?
 A Hidden Markov Model is a probabilistic graphical model used for modeling
sequences of observable events or data. It consists of a set of hidden states,
each associated with a probability distribution over observable outcomes. The
model assumes that the hidden states form a Markov chain, meaning the
probability of transitioning from one state to another depends only on the
current state. HMMs are commonly used in speech recognition, natural
language processing, bioinformatics, and other sequential data analysis tasks.
15.Explain: Markov Decision Process (MDP)
 A Markov Decision Process is a mathematical framework used for modeling
decision-making problems in situations where outcomes are influenced by
random factors. It consists of a set of states, a set of actions, transition
probabilities between states, and rewards associated with state-action pairs.
The key assumption of an MDP is the Markov property, which states that the
future state of the system depends only on the current state and action,
independent of the history of previous states and actions. MDPs are widely
used in reinforcement learning and optimal control problems.
16.Explain: Bellman’s Equation
 Bellman's equation is a fundamental equation in the field of dynamic
programming and reinforcement learning. It expresses the value of a state or
state-action pair in an MDP recursively in terms of the immediate reward and
the value of the next state or state-action pair. The Bellman equation provides
a way to calculate the optimal value function, which represents the expected
cumulative reward an agent can achieve starting from a particular state or
state-action pair and following an optimal policy thereafter. Bellman's
equation is central to many algorithms for solving MDPs, such as value
iteration and policy iteration.

17.Explain: Value function approximation algorithm
18.Explain in detail Q Learning
19.What do you mean by Linear quadratic regulation(LQR)
20. Explain in detail Spectral Clustering?
21.What is nearest Neighbor?
22.What is Naives Bayes theorem? How is it useful in machine learning?
 The Naive Bayes theorem is a fundamental concept in probability theory,
particularly in the field of Bayesian statistics. It provides a way to calculate the
conditional probability of an event given prior knowledge or evidence. In
machine learning, the Naive Bayes theorem is utilized in the Naive Bayes
classifier, a popular classification algorithm known for its simplicity and
effectiveness in various applications.
 How Naive Bayes Classifier Works:
 The Naive Bayes classifier is based on the assumption of conditional
independence between the features given the class label. Despite this
simplifying assumption, Naive Bayes classifiers have been found to perform
well in practice, particularly in text classification and spam filtering tasks.

 Prediction Phase:
o When given a new instance to classify, the classifier calculates the
posterior probability of each class label given the features using the
Naive Bayes theorem.
o The class label with the highest posterior probability is assigned as the
predicted label for the new instance.
 Advantages of Naive Bayes Classifier:
o Simplicity: Naive Bayes classifiers are simple and easy to implement.
They have few parameters to tune and are computationally efficient.
o Scalability: Naive Bayes classifiers can handle large datasets with high
dimensionality efficiently, making them suitable for big data
applications.
o Robustness to Irrelevant Features: Naive Bayes classifiers are robust to
irrelevant features and noise in the data due to the conditional
independence assumption.
 Applications of Naive Bayes Classifier:
o Text Classification: Naive Bayes classifiers are widely used for text
classification tasks such as spam detection, sentiment analysis, and
document categorization.
o Medical Diagnosis: Naive Bayes classifiers can be applied to medical
diagnosis tasks, such as predicting the presence or absence of a disease
based on patient symptoms.
o Recommendation Systems: Naive Bayes classifiers can be used in
recommendation systems to predict user preferences or recommend
products based on user behavior.
23.What do you understand by noise in data? What could be implications on
the result, if noise is not treated properly?
24.When should we use classification over regression? Explain using
example.
 The choice between classification and regression depends on the nature of the
problem and the type of output variable being predicted. Here are some
guidelines for when to use classification over regression:
 Nature of the Output Variable:
o If the output variable is categorical or qualitative in nature (e.g., classes
or labels), classification is typically more appropriate. For example,
predicting whether an email is spam or not spam, classifying images
into different categories, or identifying whether a patient has a certain
disease.
 Discrete Output Space:
o Classification is suitable when the output space is discrete and consists
of a finite number of distinct classes or categories. Each instance is
assigned to one of these predefined classes based on its features.
 Interpretability of Results:
o In many cases, classification models provide more interpretable results
than regression models. Class labels are often easier to understand and
communicate, making classification outputs more intuitive for
decision-making.
 Imbalanced Data:
o If the dataset is highly imbalanced, with one class significantly
outnumbering the others, classification algorithms are often more
effective in handling such scenarios. They can focus on accurately
identifying instances of the minority class, which may be of particular
interest in certain applications such as fraud detection or medical
diagnosis.
 Error Analysis:
o Classification models are often evaluated using metrics such as
accuracy, precision, recall, and F1-score, which provide insights into
the performance of the model in terms of correctly classifying
instances into their respective classes. These metrics are well-suited for
evaluating classification tasks.
25.Define the terms-Precision. Recall, F1-score and accuracy?
26.Define LDA and any of its two limitations.
27.Explain Bayesian estimation and maximum likelihood estimation in
generative learning.
28.Write short note on : Support Vector Machine (SVM)
29.Given a data set and set of machine algorithms, how to choose an
appriopriate algorithm.

Machine Learning Midterm

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Machine Learning Midterm

Uploaded by

Copyright:

Available Formats

MACHINE LEARNING

 The K-Nearest Neighbors (KNN) classifier is a simple and intuitive machine

A Support Vector Machine (SVM) is a supervised machine learning algorithm used

How SVM Works:

Spam (1) or non-spam (0) label assigned to each email.

 Decision Tree Terminologies

5. Write ADA Boost Algorithm. [x2] imp

 AdaBoost short for Adaptive Boosting is an ensemble learning used in

 Logistic regression is a statistical model used for binary classification tasks,

o Maximum Likelihood Estimation: The parameters of the logistic

 Training and Prediction:

 Advantages and Applications:

12.What is over fitting? What do you understand by over fitting of

You might also like