ML GTU Solution

ML GTU Questions
Chapter-1: Introduction To Machine Learning

1. Define Machine learning? Briefly explain the types of learning. Or
Define: a. Supervised Learning b. Classification c. Regression
• Machine learning: According Tom Mitchell the formal definition of machine
Learning says that a computer-program is saved to learn experience E wrt some
class of tasks T and performance measure P, if it’s measured performance at
Task T as measured by P increases with experience E is called as Machine
Learning.
• Types of ML:
i. Supervised learning: Classification, Regression
ii. Unsupervised learning: Clustering, Association analysis
iii. Reinforcement learning
2. Define Machine learning and list out few applications in Engineering?
• Machine learning: According Tom Mitchell the formal definition of machine
Learning says that a computer-program is saved to learn experience E wrt some
class of tasks T and performance measure P, if it’s measured performance at
Task T as measured by P increases with experience E is called as Machine
Learning.
• Applications:
i. Entertainment-movie recommendations, gaming
ii. Healthcare- Diagnosis, prescriptions
iii. Robotics
iv. Finance-Stock prediction, loan risk predictions
v. Social media- post and videos recommendations, customers segmentation
vi. Travel- routes suggest, destination recommendation, weather forecast
3. Explain the concept of penalty and reward in reinforcement. Learning.
• Reinforcement learning is deal with agents that must sense and act upon their
environment.
• It combines classical artificial intelligence and machine learning techniques.
•
• It allows machine and software agent to automatically determine the ideal
behaviour with a specific context in order to maximize its performance.
• Simple reward feedback is required for the agent to learn its behaviour this is
known as the reinforcement signal.
• Two most important distinguishing feat of reinforcement learning is trail and
error and delayed reward.
• With reinforcement learning algorithms an agent can improve its performance
by using the feedback it gets from the environment. This environment feedback
is called the reward signal.
4. What do you mean by a well-posed learning problem? Explain important features that
are required to well-define a learning problem.
5. Give the difference between supervised learning and unsupervised learning. Or
compare different types of Machine Learning.
6. Define issues in machine Learning.
7. Draw and explain the flow diagram of machine learning procedure.
8. Explain the process of supervised learning model.
• In supervised learning you create a model by using labelled training data that
consists of input data and a wanted output.
• The supervision comes in the form of the wanted output which in turn lets you
adjust the function based on the actual output it produces.
• The supervised learning algorithm uses a labelled data set to produce a model.
• You can then use this model with new data to validate the model’s accuracy or in
production with live data.
• Supervised learning models can be used to build and advance number of

business applications including the following:
i. Predictive analysis: A widespread use case in creating predictive analysis
systems to provide deep insights into various business data points.
ii. Customer sentiment analysis: Using supervised ML algorithms,
organizations can extract and classify important pieces of information
from large volume of data including context, emotion and intent.
iii. Spam detection: Spam detection is another example of supervised
learning model. Using supervised classification algorithms, organizations
can train databases to recognize patterns or anomalies in new data to
organize spam and non-spam related correspondences effectively
Chapter-2: Preparing to Model
1. How can we take care of outliers in data?
• Handling Outliers:
o Remove: Number of records having outliers are not many, remove them
o Imputation: Impute the value with mean or median or mode. Or impute
with similar data item value
o Capping: Values lie outside 1.5*IQR limits, cap them by replacing value
5th percentile below lower limit and value 95th percentile above upper
limit
• Handling Missing Values:
o Eliminate records with missing value: Tolerable limit of data element with
missing value, remove records. Possible if the quantum of data is left after
removing. 398-6=392
o Impute records with missing values: For quantitative attribute: imputed
with mean, median or mode of remaining values under same attribute. For
qualitative attribute: imputed by mode of all remaining values under same
attribute.
o Estimate records with missing values: Missing attribute value of a data
point can be derived from similar data points. To find similar data points,
distance function is used.
2. Define feature and explain the process of transforming numeric features to categorical
features with suitable example.
• In machine learning, a feature refers to an individual measurable property or
characteristic of a phenomenon being observed. These features are also known as
variables, attributes, or columns in a dataset. Features provide information or
input to a machine learning model to make predictions, classify data, or uncover
patterns within the dataset.
• Transforming Numeric Features to Categorical Features:
o Converting numeric features to categorical features involves discretizing
continuous or numerical variables into distinct categories or bins. This
transformation is useful when the numeric data holds categorical or
ordinal information, and the model may perform better or more
meaningfully on these discretized categories rather than continuous
values.
o Process of transforming numeric features to categorical features:
▪ Identify the Numeric Feature: Choose the numeric feature
that you want to convert into categorical values.
▪ Determine Categories or Bins: Define the categories or bins
that will represent the ranges of numeric values. This can be
done based on domain knowledge or by using statistical
methods (e.g., quartiles, equal-width, equal-frequency
binning).
▪ Perform Discretization: Map the numeric values to
corresponding categories based on the defined bins.
▪ Encoding Categorical Features (if needed): If the categories
are nominal (unordered), you might need to encode them into
numerical values using techniques like one-hot encoding or
label encoding before feeding them into machine learning
models.
• Example: Consider an example where you have a dataset containing information
about customer income (a numeric feature) and you want to categorize income
into different income brackets:
o Original Numeric Feature: Income
o Numeric Values: [25000, 35000, 45000, 30000, 55000, 40000, 65000,
60000]
o Categories/Bins: Low Income, Medium Income, High Income
o After discretization, the transformed categorical feature could look like
this:
▪ Categorical Feature: Income Category
▪ Categories: ['Low Income', 'Medium Income', 'High Income']
▪ Transformed Values: ['Low Income', 'Medium Income', 'Low
Income', 'Medium Income', 'High Income', 'High Income',
'High Income', 'High Income']
o This transformation allows you to work with income categories instead of
specific income values, which might be more informative for certain
machine learning tasks or analyses.
3. What is categorical data? Explain its types with examples.
• Qualitative/Categorical:
Provides information about the quality of an object or information which
cannot be measured.
E.g. Performance of student is ‘poor’, ‘average’ or ‘good’ name, roll
number Can not be measured using scaled of measurement.
o Nominal data:
▪ Which has no numerical value, but named value. It can not be
quantified.
▪ Example:
Blood group: A, B, O, AB etc.
Nationality: Indian, British, etc.
Gender: Male, Female, Other.
▪ Special case: Only two possible labels to data, say result: pass/
fail. This subtype is called as dichotomous.
▪ Operations:
o Can not perform – mathematical (addition, subtraction,
divide, multiply), statistical (mean, variance, average)
o Can perform – count, mode(most frequent occurring
value)
o Ordinal data:
▪ Also assigned named values, but can be arranged in sequence
of increasing or decreasing values. It is possible to identify
which value is better or greater than another value.
▪ Example:
Customer satisfaction: ‘very happy’, ‘happy’,
‘unhappy’, etc.
Grades: A, B, C, etc.
Hardness of metal: ‘very hard’, ‘hard’, ‘soft’, etc.
▪ Operations:
o Can perform: counting, mode, median, quartiles
o Can not perform: mean
4. Explain Key elements of Machine Learning. Explain various function approximation
methods.
• Key elements of ML:
o Datasets: machines need a lot of data to function to learn from and
ultimately make decisions based on it. This data can be any unprocessed
fact, value, sound, image, text which can be interpreted and analyzed.
o Algorithms: Machine learning algorithms use computational methos to
“learn” information directly from data without relying on a predetermined
equations as a model.
o Models: In machine learning a model is a computational representation of
real-world process. An ML is trained to recognize certain types of patterns
by training it over a set of data using relevant algorithm, once a model is
trained it can be used to make predictions.
o Feedback extraction: Datasets can have multiple features if the features in
the dataset are similar or vary to a large extent, then the observations store
in the dataset are likely to make an ML model suffer from overfitting.
5. What are the basic data types in machine learning? Give an example of each one of
them.
• Basic datatypes in ML are:
o Qualitative/Categorical:
Provides information about the quality of an object or information which
cannot be measured.
E.g. Performance of student is ‘poor’, ‘average’ or ‘good’ name, roll
number Can not be measured using scaled of measurement.
▪ Nominal data:
▪ Which has no numerical value, but named value. It can not be
quantified.
▪ Example:
Blood group: A, B, O, AB etc.
Nationality: Indian, British, etc.
Gender: Male, Female, Other.
▪ Special case: Only two possible labels to data, say result: pass/
fail. This subtype is called as dichotomous.
▪ Operations:
o Can not perform – mathematical (addition, subtraction,
divide, multiply), statistical (mean, variance, average)
o Can perform – count, mode(most frequent occurring
value)
▪ Ordinal data:
▪ Also assigned named values, but can be arranged in sequence
of increasing or decreasing values. It is possible to identify
which value is better or greater than another value.
▪ Example:
Customer satisfaction: ‘very happy’, ‘happy’,
‘unhappy’, etc.
Grades: A, B, C, etc.
Hardness of metal: ‘very hard’, ‘hard’, ‘soft’, etc.
▪ Operations:
o Can perform: counting, mode, median, quartiles
o Can not perform: mean
o Quantitative/Numerical:
Information about quantity of an object. Can be measured using scale of
measurement.
▪ Interval:
▪ Numeric data for which not only order is known, but exact
difference between values is also known.
▪ Example
o Celsius temperature: difference between 12°C degree
and 18°C degree is measurable and is 6°C degree as in
case of 15.5°C degree and 21.5°C degree.
o Date , time, etc.
▪ Operations:
o Can perform: addition, subtraction, mean, median,
mode, standard deviation.
o Can not perform: ratio (no ‘true zero’ value like ‘no
temperature’ – Can have positive and negative values)
▪ Ratio:
▪ Numeric data for which order is known, exact difference
between values is known and absolute zero value is available.
(Only Positive values , No negative values)
▪ Example: Marks, salary, weight, age, height, etc.
▪ Operations
o Can perform: addition, subtraction, mean, median,
mode, standard deviation, ratio.
6. What are the Techniques Provided in Data Preprocessing? Explain in brief.
• Data Preprocessing:
o Dimensionality reduction
o Feature subset selection
• Dimensionality reduction:
o Dimensionality reduction is the transformation of data from a high-
dimensional space into a low-dimensional space so that the low-
dimensional representation retains some meaningful properties of the
original data.
o Biology and social-media pattern and analysis projects produces high
dimensional data sets. High dimensional data sets with 20,000 or more
features being common.
o Needs:
▪ High-dimensional data sets need a high amount of
computational space and time.
▪ Not all features are useful some also degrade the performance
of algorithm.
▪ Most ML algorithms performs better if dimensionality of data
set i.e. number of features is reduced.
▪ Helps in reducing irrelevance and redundancy in features.
▪ Makes easier to understand a model if number of features
involved in learning activity are less.
o Methods:
▪ PCA: Principal Component Analysis
▪ SVD: Singular Value Decomposition
▪ LDA: Linear Discriminant Analysis
• Feature Subset Selection:
o Feature (subset) selection – try to find out the optimal subset of the entire
feature set which significantly reduces computational cost without any
major impact on the learning accuracy.
o Used for both Supervised as well as unsupervised learning.
o For elimination only features which are not relevant or redundant are
selected.
o Irrelevant features if feature plays an insignificant role (or contributes
almost no information) in classifying or grouping together a set of data
instances. All irrelevant features are eliminated while selecting the final
feature subset.
o Redundant feature A feature is potentially redundant when the information
contributed by the feature is more or less same as one or more other
features. Among a group of potentially redundant features, a small number
of features can be selected without causing any negative impact to learn
model accuracy.
Chapter-3: Modelling and Evaluation
1. Elaborate the cross validation in training a model. Or explain the process of K-fold-cross-
validation method.
2. Distinguish lazy vs eager learner with an example.
3. Explain the training of Predictive Model.

4. What is data sampling? Explain data sampling methods?
• Data sampling is a technique used in machine learning and statistics to select a
subset of data points from a larger dataset to effectively train a model, analyse
patterns, or draw conclusions about the population from which the data is collected.
It involves the process of gathering representative samples that accurately reflect
the characteristics of the entire dataset.
• Probability Sampling: In probability sampling, every element of the population has

an equal chance of being selected. Probability sampling gives us the best chance to
create a sample that is truly representative of the population
• Non-Probability Sampling: In non-probability sampling, all elements do not have
an equal chance of being selected. Consequently, there is a significant risk of
ending up with a non-representative sample which does not produce generalizable
results
5. Explain K-fold cross validation method with suitable example.
• K-fold CV is where a given data set is split into K number of sections/folds where
each fold is used as testing set at some point.
• Let us take the scenario of 5-fold cross validation (k=5), here the data set is split
into 5 folds.
• In the first iteration, the first fold is used to test model and the rest are used to train
the model.
• In the second iteration 2nd fold is used as testing set while the rest serve as the
training set. This process is repeated until each fold of the 5 folds have been used as
a testing set.
•
6. What is sampling? Explain Bootstrap sampling.
7. While predicting malignancy of tumor of a set of patients using a classification model,
following are the data recorded: (a) Correct predictions – 15 malignant, 75 benign (b)
Incorrect predictions – 3 malignant, 7 benign. Calculate the model accuracy, error rate,
Kappa value, sensitivity, precision, and F- measure of the model.
➢ Confusion matrix:
• To calculate various performance metrics for the classification model, let's define
the following:
i. True Positives (TP): Number of correctly predicted malignant cases.
ii. True Negatives (TN): Number of correctly predicted benign cases.
iii. False Positives (FP): Number of benign cases incorrectly predicted as
malignant.
iv. False Negatives (FN): Number of malignant cases incorrectly predicted as
benign.
• Given:
i. TP = 15 (Correctly predicted malignant)
ii. TN = 75 (Correctly predicted benign)
iii. FP = 7 (Benign cases incorrectly predicted as malignant)
iv. FN = 3 (Malignant cases incorrectly predicted as benign)
• Model Accuracy:
• Error rate:
• Kappa value: Kappa is a measure of how well the model's predictions agree with
actual classifications beyond that which would be expected by chance.
• Sensitivity:
• Precision:
• F-measure:
8. What is model accuracy in reference to classification? Also Explain the performance
parameters Precision, Recall and F-measure with its formula and example.
9. List the methods for Model evaluation. Explain each. How we can improve the
performance of model.
• The types of model evaluation are:
i. Selecting a model:
▪ Model selection is the task of selecting a statistical model from a set of
candidate model given data.
▪ The process of assigning a model and fitting a specific model to a data
set is called model training.
▪ All the models have some predictive error given the statistical noise in
the data, the incompleteness of the data sample and the limitations of
each different model type.
▪ The best approach to model selection requires “sufficient” data which
may be nearly infinite depending on the complexity of the problem.
ii. Predictive model:
▪ It is also called predictive analytics. It is a mathematical process that
seeks to predict future event or outcomes by analysing patterns that are
likely to forecast future results.
▪ It has a clear focus what they want to learn and how they want to
learn.
▪ It involves the supervised learning functions used for the prediction of
the target value.
▪ The methods fall under this mining category are the classification,
time series analysis and regression.
▪ It may also be used to predict numerical values of the target feature
based on the predictor features. Popular regression models are
regression and logistic regression.
iii. Descriptive model:
▪ It is used for tasks that would benefit from the insight gained from
summarizing data in new and interesting ways.
▪ The process of training a descriptive model is called unsupervised
learning.
▪ It is the conventional form of Business Intelligence and data analysis
seek to provide a depiction or “summer view” of facts and figures in
understandable format.
▪ It helps organizations to understand what happened in the past, it helps
to understand the relationship between product and customer.
▪ It also helps to describe and present data in such format which can be
easily understood by a wide variety of business readers.
▪ The descriptive modelling task called pattern discovering is used to
identify useful associations within data.
▪ Pattern discovery is often used for market basket analysis on retailers
transactional purchase data.
10.Consider the following confusion matrix of the win/loss prediction of cricket match.
Calculate model accuracy and error rate, sensitivity, precision, F-measure and kappa value
for the same.
Actual Win Actual Loss
Predicted Win 82 7
Predicted Loss 3 8
Chapter-4: Basics of Feature Engineering
1. Show various distance-based similarity measure with its example.

2. What is the purpose of Singular value decomposition? How does it achieve?
3. What is principal component analysis? How does it work? Explain.

4. Explain the need of feature engineering in ML.
5. Explain SVD as a feature extraction technique with suitable example.

• Same as answer 2
6. . Explain with an example, main underlying concept of feature extraction. What are
the most popular algorithms of feature extraction, briefly explain anyone.
• Feature extraction:
o In feature extraction, new features are created from a combination of
original features. Some of the commonly used operators for combining
the original features include
▪ For Boolean features: Conjunctions, Disjunctions, Negation, etc.
▪ For nominal features: Cartesian product, M of N, etc.
▪ For numerical features: Min, Max, Addition, Subtraction,
Multiplication, Division, Average, Equivalence, Inequality, etc.
• Let’s discuss the most popular feature extraction algorithms used in machine
learning:
o Principle Component Analysis (PCA): explain in brief from ans 3
o Singular Value Decomposition (SVD): explain in brief from ans 2
o Linear Discriminant Analysis (LDA):
▪ Linear discriminant analysis (LDA) is another commonly used
feature extraction technique like PCA or SVD. The objective of
LDA is similar to the sense that it intends to transform a data set
into a lower dimensional feature space. However, unlike PCA, the
focus of LDA is not to capture the data set variability. Instead,
LDA focuses on class separability, i.e. separating the features
based on class separability so as to avoid overfitting of the
machine learning model.
▪ Unlike PCA that calculates eigenvalues of the covariance matrix of
the data set, LDA calculates eigenvalues and eigenvectors within a
class and inter-class scatter matrices. Below are the steps to be
followed:
a. Calculate the mean vectors for the individual classes.
b. Calculate intra-class and inter-class scatter matrices.
c. Calculate eigenvalues and eigenvectors for SW -1 and SB,
where SW is the intra-class scatter matrix and SB is the
inter-class scatter matrix
where, mi is the sample mean for each class, m is the overall

mean of thedata set, Ni is the sample size of each class
d. Identify the top ‘k’ eigenvectors having top ‘k’ eigenvalues
7. What is feature selection? Why it is needed? What are the different approaches of
feature selection, briefly explain anyone.
Chapter-5: Overview of Probability
1. Define the following terms.
(i) Variance:
• Variance is a statistical measure that describes the spread or dispersion of a
set of data points around their mean value. It indicates the extent to which
data points in a dataset differ from the mean.
• Mathematically, variance is calculated as the average of the squared
differences between each data point and the mean of the dataset.
• A higher variance implies that the data points are more spread out from the
mean, while a lower variance suggests that the data points are closer to the
mean.
(ii) Covariance:
• Covariance is a measure of the relationship between two random variables. It
describes how two variables change together. Specifically, covariance
measures the degree to which two variables vary or move in relation to each
other.
• A positive covariance indicates that as one variable increases, the other
variable also tends to increase, while a negative covariance indicates that as
one variable increases, the other tends to decrease.
• A covariance value of zero suggests that there is no linear relationship
between the variables.
(iii) Joint Probability:
• Joint probability refers to the likelihood of the occurrence of multiple events
happening simultaneously or jointly.
• It is used in probability theory to describe the probability of two or more
events occurring at the same time.
• For two events A and B, the joint probability P(A ∩ B) represents the
probability of both events A and B happening together.
• Joint probability is calculated by considering the intersection of probabilities
of individual events, taking into account their simultaneous occurrence.
2. What is Bernoulli distribution? Explain briefly with its formula.
3. What is conditional probability? Define its importance.

• We define the conditional probability of event A, given that event B is true, as
follows:
where, p(A, B) is the joint probability of A and B and can also be denoted as p(A ∩ B)
• Similarly,
4. If 3% of electronic units manufactured by a company are defective. Find the
probability that in a sample of 200 units, less than 2 bulbs are defective.
5. In a communication system each data packet consists of 1000 bits. Due to the noise,
each bit may be received in error with probability 0.1. It is assumed bit errors occur
independently. Find the probability that there are more than 120 errors in a certain data
packet.
6. Explain Binomial Distribution with an example.

7. Explain Monte Carlo Approximation.
• Monte Carlo method is used for drawing a sample at random from the empirical
distribution.
• Monte Carlo technique we can approximate the expected value of any function
of a random variable by simply drawing samples from the population of random
variable and then computing the arithmetic mean of the function applied to the
samples.
• These methods are used in cases where analytical or numerical solutions don’t
exist or too difficult to implement.
• Monte Carlo methods generally follow the following steps:
i. Determine the statistical properties of possible inputs.
ii. Generate many steps of possible inputs which follows the above
properties.
iii. Perform a deterministic calculation with these sets.
iv. Analyse statistically the result.
• Monte Carlo integration use random sampling of a function to numerically
compute an estimate of its integral.
•
• We can approximate this integral by averaging samples of the function f at
uniform random points within the interval.
8. Define probability of union of two events with equation.
9. What is joint probability? What is its formula?

Chapter-6: Bayesian Concept Learning
1. Explain the concept of Bayesian belief network.
2. Explain posterior probability with its formula.
3. Explain how Naïve Bayes classifier is used for Spam Filtering.
• Naïve Bayes classifier work by correlating the use of token with spam and non-
spam emails and then using Bayes theorem to calculate a probability that an email
is not a spam.
• Naïve Bayes spam filtering is a baseline technique for dealing with spam that can
tailor itself to the email needs of individual users and give low false positive spam
detection rates that are generally acceptable to user.
• Modern spam filtering software continuously struggles to categories the emails
correctly unwanted spam and promotional communication it is the toughest of all
of them.
• Spam communication algorithms must be iterated continuously since there is an
ongoing battel between spam filtering software and anonymous spam and
promotional mail server.
• The formula used by the software to determine that is derived from Bayes theorem
4. What is likelihood probability? Give an example.
5. Explain Bayes’ theorem in details.
• Prior:
• Posterior: from ans 2

• Likelihood: from ans 4
6. What is concept learning? Explain with example.
•
• And then ans 5 in continue.
Chapter-7: Supervised Learning
1. What are the strengths and weaknesses of SVM? Or What are the factors
determining the effectiveness of SVM?
• Strengths of SVM:
o SVM can be used for both classification and regression.
o It is robust, i.e. not much impacted by data with noise or outliers.
o The prediction results using this model are very promising.
• Weaknesses of SVM:
o SVM is applicable only for binary classification, i.e., when there
are only two classes in the problem domain.
o The SVM model is very complex – almost like a black box when it
deals with a high-dimensional data set. Hence, it is very difficult
and close to impossible to understand the model in such cases.
o It is slow for a large dataset, i.e., a data set with either a large
number of features or a large number of instances.
o It is quite memory-intensive.
• Application of SVM:
o SVM is most effective when it is used for binary classification, i.e.
for solving a machine learning problem with two classes. One
common problem on which SVM can be applied is in the field of
bioinformatics – more specifically, in detecting cancer and other
genetic disorders. It can also be used in detecting the image of a
face by binary classification of images into face and non-face
components. More such applications can be described.
2. Define Entropy. Show its importance with suitable example.

• In machine learning, entropy is a measure of uncertainty or disorder
within a set of data. It is commonly used in decision tree algorithms,
particularly in the context of classification problems. Entropy helps in
determining the optimal feature for splitting a dataset at each node of a
decision tree.
• The entropy of a dataset is calculated based on the distribution of classes
or categories within that dataset. If a dataset contains samples from
different classes, the entropy is higher, indicating higher uncertainty.
Conversely, if the dataset contains samples from only one class, the
entropy is lower, signifying lower uncertainty or higher purity.
•
• Entropy is used as a criterion in decision trees to determine the best
attribute to split the data. The goal is to minimize entropy, which means
maximizing the information gain after splitting. The attribute that results
in the greatest reduction in entropy or the most information gain is chosen
as the splitting criterion.
• Example:
•
3. Explain decision tree approach with suitable example. Or Explain Decision tree
algorithm.
4. Explain KNN algorithm with suitable example. Or write a note on KNN.
• K-Nearest Neighbour is one of the simplest Machine Learning algorithms
based on Supervised Learning technique.
• K-NN algorithm assumes the similarity between the new case/data and
available cases and put the new case into the category that is most similar
to the available categories.
• K-NN algorithm stores all the available data and classifies a new data
point based on the similarity. This means when new data appears then it
can be easily classified into a well suite category by using K- NN
algorithm.
• K-NN algorithm can be used for Regression as well as for Classification
but mostly it is used for the Classification problems.
• It is also called a lazy learner algorithm because it does not learn from the
training set immediately instead it stores the dataset and at the time of
classification, it performs an action on the dataset.
• Step to perform:
o Step-1: Select the number K of the neighbours
o Step-2: Calculate the Euclidean distance of K number of
neighbours
o Step-3: Take the K nearest neighbours as per the calculated
Euclidean distance.
o Step-4: Among these k neighbours, count the number of the data
points in each category.
o Step-5: Assign the new data points to that category for which the
number of the neighbour is maximum.
o Step-6: Our model is ready.
• We have a new entry but it doesn't have a class yet. To know its class, we
have to calculate the distance from the new entry to other entries in the
data set using the Euclidean distance formula.
5. Discuss the error rate and validation error in the kNN algorithm.
• Error rate in kNN:
o The error rate in the kNN algorithm refers to the proportion of
incorrectly classified instances in the dataset. When using kNN for
classification, the algorithm assigns a class label to a new data
point based on the majority class among its k nearest neighbors.
The error rate is calculated by comparing the predicted class labels
with the actual labels for the test dataset.
o Error Rate = (Number of Incorrect Predictions) / (Total Number of
Predictions)
o A lower error rate indicates better accuracy, suggesting that the
model is effectively classifying instances.
• Validation Error in kNN:
o Validation error, often estimated using techniques like cross-
validation, refers to the error rate on an independent dataset or a
subset of the original dataset that was not used during the training
phase. It assesses the generalization performance of the model and
helps to estimate how well the kNN algorithm will perform on
unseen data.
o Validation error is crucial to avoid overfitting, which occurs when
the model learns the training data too well but fails to generalize to
new, unseen data. Cross-validation techniques, such as k-fold
cross-validation, help in estimating the validation error by
partitioning the dataset into multiple subsets (folds). The model is
trained on a portion of the data and tested on the remaining unseen
portions, and this process is repeated multiple times to calculate an
average validation error.
o By evaluating the validation error, practitioners can select
appropriate hyperparameters for the kNN algorithm, such as the
value of k (number of nearest neighbours), and determine the
model's ability to generalize to new data.
6. What is supervised learning? Draw and explain classification steps in detail.
7. Define linear regression. Also explain Sum of squares with its formula.
8. Explain sum of squares due to error in multiple linear regression with example.
9. Explain dependent variable and an independent variable in a linear equation

with example.
10. What are the conditions of a negative slope in linear regression?
• A slope in which two variables, the variable at x-axis and variable at the
y-axis shows a negative slope.
• A negative slope moves in the downward direction or in downward
sloping.
• Graphically a negative slope in one in which the line on the graph falls
when it moves from left to right.
• One of the best examples of negative slope of the graph is the demand
curve in economics.
Chapter-8: Unsupervised Learning
1. How does the apriori principle help in reducing the calculation overhead for a market
basket analysis? Explain with an example.
• The Apriori principle is a crucial concept in association rule mining, specifically
for market basket analysis. It helps in reducing the computational overhead by
focusing on frequent itemsets rather than examining all possible item
combinations. This principle relies on the observation that if an itemset is
infrequent, then all of its supersets (larger itemsets containing it) are also
infrequent.
• Let's illustrate this with an example:
o Suppose you have a dataset representing transactions in a grocery store:
o Transaction 1: Bread, Milk, Diapers
o Transaction 2: Bread, Beer, Eggs
o Transaction 3: Milk, Beer, Diapers, Cornflakes
o Transaction 4: Bread, Milk, Beer, Diapers
o Transaction 5: Bread, Milk, Diapers, Cornflakes
• Let's apply the Apriori algorithm with a minimum support of 2:
o Generate single items and calculate support:
▪ Bread: 4, Milk: 4, Diapers: 4, Beer: 3, Eggs: 1, Cornflakes: 2
▪ Frequent items (support >= 2): Bread, Milk, Diapers, Beer,
Cornflakes
o Generate pairs and calculate support:
▪ {Bread, Milk}: 4, {Bread, Diapers}: 3, {Bread, Beer}: 3, {Milk,
Diapers}: 3, {Milk, Beer}: 2, {Diapers, Beer}: 3, {Diapers,
Cornflakes}: 2
▪ Frequent itemsets (support >= 2): {Bread, Milk}, {Bread, Diapers},
{Bread, Beer}, {Milk, Diapers}, {Diapers, Beer}, {Diapers,
Cornflakes}
o Generate triples (not possible in this example as all possible triples have
support less than 2)
• In this example, the Apriori principle helps by reducing the number of itemsets to
consider for association rules. Instead of checking all possible combinations, it
eliminates infrequent itemsets early in the process, thus reducing computational
overhead significantly.
• This process continues until no new frequent itemsets can be found or until
reaching a specified itemset size or support threshold. The remaining frequent
itemsets are then used to derive association rules that reveal relationships between
items frequently purchased together.
2. Mention few applications areas of unsupervised learning in Engineering.

• Because of its flexibility that it can work on uncategorized and unlabelled data,
there are many domains where unsupervised learning finds its application. Few
examples of such applications are as follows:
o Segmentation of target consumer populations by an advertisement
consulting agency on the basis of few dimensions such as demography,
financial data, purchasing habits, etc. so that the advertisers can reach
their target consumers efficiently.
o Anomaly or fraud detection in the banking sector by identifying the
pattern of loan defaulters.
o Image processing and image segmentation such as face recognition,
expression identification, etc.
o Grouping of important characteristics in genes to identify important
influencers in new areas of genetics.
o Utilization by data scientists to reduce the dimensionalities in sample data
to simplify modelling.
o Document clustering and identifying potential labelling options.
• Today, unsupervised learning is used in many areas involving Artificial
Intelligence (AI) and Machine Learning (ML). Chat bots, self-driven cars, and
many more recent innovations are results of the combination of unsupervised and
supervised learning.
3. Explain how the Market Basket Analysis uses the concepts of association analysis.
• Market Basket Analysis is a technique which identifies the strength of association
between pairs of products purchased together and identify patterns of co-
occurrence. Cooccurrence is when two or more things take place together.
• It takes data at transaction level which list all items bought by a customer in a
single purchase.
• The technique determines relationships of what products were purchased with
which other products these relationships are used to build profiles containing. If
then rules of this item purchased.
• The rules could be written as {A} then {B}
• Association rules are “If-then” statement, that help to show the probability of
relationship between data items within data sets in various types of data.
• The association rules are used to find correction and cooccurrences between
datasets.
• They are ideally used to explain patterns in data from seemingly independent
information repositories such as relational databases and transactional databases.
• The act of using association rules is sometimes referred to as “association rule
mining” are “mining associations”.
• Application of association rules:
o Market Basket Analysis
o Medical diagnosis
o Census Data
o Logistic regression
o Fraud detection in Wed
4. Explain the Apriori algorithm for association rule learning with an example.
• Apriori is a seminal algorithm proposed by R. Agrawal and R. Srikant in 1994 for
mining frequent item sets for Boolean association rules [AS94b]. The name of the
algorithm is based on the fact that the algorithm uses prior knowledge of frequent
itemset properties
• Support:
o The rule A→B holds in the transaction set D with support s, where s is the
percentage of transactions in D that contain A U B (i.e., the union of sets
A and B say, or, both A and B).
o
• Confidence:
o The rule A→B has confidence c in the transaction set D, where c is the
percentage of transactions in D containing A that also contain B.
o
• Example:
o
o
5. Briefly explain K-Medoids.

• The K-medoids algorithm is a clustering algorithm related to the K-means
algorithm and the medoid shift algorithm. K-medoid is a classical partitioning
technique of clustering that clusters the data set of n objects into K clusters known
as priori. A useful tool for determining K is the silhouette.
• The most common realisation of K-medoid clustering is the partitioning around
medoids (PAM) algorithm. PAM uses a greedy search which may not find the
optimum solution, but it is faster than exhaustive search.
• Instead of taking the mean value of the object in a cluster as a reference point,
medoids can be used, which is the most centrally located object in a cluster.
6. Explain k-means clustering technique.

• An unsupervised learning algorithm, ‘K’ in the name of the algorithm represents
the number of groups/clusters we want to classify our items into.
• Property of K-Means Clustering Algorithm:
o All the data points in a cluster should be similar to each other.
o The data points from different clusters should be as different as possible.
•
• Points in the red cluster completely differ from the customers in the blue cluster.
• All the customers in the red cluster have high income and high debt, while the
customers in the blue cluster have high income and low debt value.
• K-means is a centroid-based algorithm or a distance-based algorithm, where we
calculate the distances to assign a point to a cluster. In K-Means, each cluster is
associated with a centroid.
• The main objective of the K-Means algorithm is to minimize the sum of distances
between the points and their respective cluster centroid.
• Apply K-Means Clustering Algorithm
•
o Step 1: Choose the number of clusters k
▪ The first step in k-means is to pick the number of clusters, k.
▪
o Step 2: Select k random points as centroids
▪ Next, we randomly select the centroid for each cluster. Let’s say we
want to have 2 clusters, so k is equal to 2 here. We then randomly
select the centroid:
▪
o Step 3: Assign all the points to the closest cluster centroid
▪
o Step 4: Recompute the centroids of newly formed clusters
▪ Now, once we have assigned all of the points to either cluster, the
next step is to compute the centroids of newly formed clusters:
▪
o Step 5: Repeat steps 3 and 4
▪
• Stopping Criteria for K-Means Clustering
o There are essentially three stopping criteria that can be adopted to stop the
K-means algorithm: 1.
▪ Centroids of newly formed clusters do not change
▪ Points remain in the same cluster
▪ Maximum number of iterations is reached
7. Describe the concept of single link and complete link in the context of hierarchical
clustering.
8. Describe the main difference in the approach of k-means and k-medoids algorithms with
a neat diagram.
•
Chapter-9: Neural Network
1. Show the Step, ReLU and sigmoid activation functions with its equations and sketch
2. Briefly explain Perceptron and Mention its limitation.
• Perceptron is Machine Learning algorithm for supervised learning of various binary
classification tasks. Further, Perceptron is also understood as an Artificial Neuron or
neural network unit that helps to detect certain input data computations in business
intelligence.
• Perceptron model is also treated as one of the best and simplest types of Artificial
Neural networks. However, it is a supervised learning algorithm of binary classifiers.
Hence, we can consider it as a single-layer neural network with four main
parameters, i.e., input values, weights and Bias, net sum, and an activation function.
• Input nodes or input layer: This is the primary component of Perceptron which
accepts the initial data into the system for further processing. Each input node
contains a real numerical value.
• Weight and bias: Weight parameter represents the strength of the connection between
units. This is another most important parameter of Perceptron components. Weight is
directly proportional to the strength of the associated input neuron in deciding the
output. Further, Bias can be considered as the line of intercept in a linear equation.
• Activation function: These are the final and important components that help to
determine whether the neuron will fire or not. Activation Function can be considered
primarily as a step function.
•
•
3. What is difference between Machine Learning and Deep Learning.
4. Draw a flow chart which represents backpropagation algorithm.

• In this method, the difference in output values of the output layer and the expected
values, are propagated back from the output layer to the preceding layers.
• Hence, the algorithm implementing this method is known as BACK
PROPAGATION i.e. propagating the errors back to the preceding layers.
o The backpropagation algorithm is applicable for multi-layer feed-forward
network.
o It is a supervised learning algorithm which continues adjusting the weights
of the connected neurons with an objective to reduce the deviation of the
output signal from the target output.
o This algorithm consists of multiple iterations, known as epochs.
• Each epoch consists of two phases:
o Forward Phase: Signal flow from neurons in the input layer to the neurons in
the output layer through the hidden layers. The weights of the
interconnections and activation functions are used during the flow. In the
output layer, the output signals are generated.
o Backward Phase: Signal is compared with the expected value. The computed
errors are propagated backwards from the output to the preceding layer. The
error propagated back is used to adjust the interconnection weights between
the layers.
5. Explain Rosenblatt’s perceptron model.

• Rosenblatt’s perceptron is built around the McCulloch-Pitts neural model.
• The perceptron receives a set of input x1 , x2 ,….., xn .
• The linear combiner or the adder mode computes the linear combination of the
inputs applied to the synapses with synaptic weights being w1 , w2 ,……,wn .
• Then, the hard limiter checks whether the resulting sum is positive or negative.
• If the input of the hard limiter node is positive, the output is +1, and if the input is
negative, the output is -1.
• The objective of the perceptron is o classify a set of inputs into two classes c1 and
c2.
• This can be done using a very simple decision rule – assign the inputs to c1 if the
output of the perceptron i.e. yout is +1 and c2 if yout is -1.
• So for an n-dimensional signal space i.e. a space for ‘n’ input signals, the simplest
form of perceptron will have two decision regions, resembling two classes, separated
by a hyperplane defined by:

• This model implements the functioning of a single neuron that can solve linear
classification problems through very simple learning algorithms. Rosenblatt
Perceptrons are considered as the first generation of neural networks (the network is
only compound of one neuron).
6. Write a short note on feed forward neural network.
• Feedback networks also known as recurrent neural network or interactive network
are deep learning models in which information flows in backward direction.
• It allows feedback loops in the network, feedback networks are dynamic in nature,
powerful and get complicated at some stage of execution.
• Its state always keeps changing until it reach equilibrium point and it remain at
equilibrium until there is a change in input and new equilibrium needs to be found.
• A feed forward network is artificial neural network where in connections between
the nodes do not form a cycle.
• The feed forward neural network was first and simplest type of Artificial neural
network devices.
• In this network the information moves only one direction-forward-from the input
nodes, through the hidden nodes.
• There are no cycles or loops in the network.
7. Describe, in details, the process of adjusting the interconnection weights in a multi-layer
neural network.
8. Explain, with example, the challenge in assigning synaptic weights for the interconnection
between neurons? How can this challenge be addressed?
•
9. What are the different types of activation functions popularly used? Explain each of them.
Types of activation functions:
• Threshold/step function:
o Step/threshold function is a commonly used activation function. It gives 1 as
output if the input is either 0 or positive. If the input is negative, the step
function gives 0 as output. Expressing mathematically,
o
o The threshold function is almost like the step function, with the only
difference being the fact that θ is used as a threshold value instead of 0.
Expressing mathematically,
o
• ReLU (rectified linear unit) function:
o ReLU is the most popularly used activation function in the areas of
convolutional neural networks and deep learning. It is of the form
o This means that f(x) is zero when x is less than zero and f(x) is equal to x
when x is above or equal to zero. Figure below depicts the curve for a ReLU
activation function.
o This function is differentiable, except at a single point x = 0. In that sense,

the derivative of a ReLU is actually a sub-derivative.
• Sigmoid function:
o Sigmoid function, is by far the most commonly used activation function in
neural networks. The need for sigmoid function stems from the fact that
many learning algorithms require the activation function to be differentiable
and hence continuous. Step function is not suitable in those situations as it is
not continuous. There are two types of sigmoid function:
▪ Binary sigmoid function
▪ Bipolar sigmoid function
• Binary sigmoid function:
o A binary sigmoid function, is of the form
o where k = steepness or slope parameter of the sigmoid function. By varying

the value of k, sigmoid functions with different slopes can be obtained. It has
range of (0, 1).
o The slope at origin is k/4. As the value of k becomes very large, the sigmoid
function becomes a threshold function.
o Bipolar sigmoid function: A bipolar sigmoid function, is of the form
o The range of values of sigmoid functions can be varied depending on the

application. However, the range of (−1, +1) is most commonly adopted.
• Hyperbolic tangent function:
o Hyperbolic tangent function is another continuous activation function, which
is bipolar in nature. It is a widely adopted activation function for a special
type of neural network known as backpropagation network. The hyperbolic
tangent function is of the form
o This function is similar to the bipolar sigmoid function.
• Note that all the activation functions have values ranging between 0 and 1. However,
in some cases, it is desirable to have values ranging from −1 to +1. In that case, there
will be a need to reframe the activation function.
• For example, in the case of step function, the revised definition would be as follows:
10.Explain in detail, the backpropagation algorithm. What are the limitations of this
algorithm?
• Definition same as ans 4
• Limitations:
o Backpropagation possibly be sensitive to nosiy data and irregularity.
o The performance of this is highly reliant on the input data.
o Needs excessive time for training.
o The need for a matrix-based method for backpropagation instead of mini-
batch.
• Advantages:
o It is simple, fast and easy to program.
o Only numbers of the input are tuned and not any other parameter.
o No need to have prior knowledge about the network.
o It is flexible.
o A standard approach and works efficiently.
o It does not require the user to learn special functions.

ML GTU Solution

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ML GTU Solution

Uploaded by

Copyright:

Available Formats

ML GTU Questions

Chapter-1: Introduction To Machine Learning

• Supervised learning models can be used to build and advance number of

3. Explain the training of Predictive Model.

• Probability Sampling: In probability sampling, every element of the population has

1. Show various distance-based similarity measure with its example.

3. What is principal component analysis? How does it work? Explain.

5. Explain SVD as a feature extraction technique with suitable example.

where, mi is the sample mean for each class, m is the overall

3. What is conditional probability? Define its importance.

6. Explain Binomial Distribution with an example.

8. Define probability of union of two events with equation.

9. What is joint probability? What is its formula?

• Posterior: from ans 2

2. Define Entropy. Show its importance with suitable example.

9. Explain dependent variable and an independent variable in a linear equation

2. Mention few applications areas of unsupervised learning in Engineering.

5. Briefly explain K-Medoids.

6. Explain k-means clustering technique.

4. Draw a flow chart which represents backpropagation algorithm.

5. Explain Rosenblatt’s perceptron model.

by a hyperplane defined by:

o This function is differentiable, except at a single point x = 0. In that sense,

o where k = steepness or slope parameter of the sigmoid function. By varying

o The range of values of sigmoid functions can be varied depending on the

o This function is similar to the bipolar sigmoid function.

You might also like