Professional Documents
Culture Documents
▪ Ordinal data:
▪ Also assigned named values, but can be arranged in sequence
of increasing or decreasing values. It is possible to identify
which value is better or greater than another value.
▪ Example:
Customer satisfaction: ‘very happy’, ‘happy’,
‘unhappy’, etc.
Grades: A, B, C, etc.
Hardness of metal: ‘very hard’, ‘hard’, ‘soft’, etc.
▪ Operations:
o Can perform: counting, mode, median, quartiles
o Can not perform: mean
o Quantitative/Numerical:
Information about quantity of an object. Can be measured using scale of
measurement.
▪ Interval:
▪ Numeric data for which not only order is known, but exact
difference between values is also known.
▪ Example
o Celsius temperature: difference between 12°C degree
and 18°C degree is measurable and is 6°C degree as in
case of 15.5°C degree and 21.5°C degree.
o Date , time, etc.
▪ Operations:
o Can perform: addition, subtraction, mean, median,
mode, standard deviation.
o Can not perform: ratio (no ‘true zero’ value like ‘no
temperature’ – Can have positive and negative values)
▪ Ratio:
▪ Numeric data for which order is known, exact difference
between values is known and absolute zero value is available.
(Only Positive values , No negative values)
▪ Example: Marks, salary, weight, age, height, etc.
▪ Operations
o Can perform: addition, subtraction, mean, median,
mode, standard deviation, ratio.
6. What are the Techniques Provided in Data Preprocessing? Explain in brief.
• Data Preprocessing:
o Dimensionality reduction
o Feature subset selection
• Dimensionality reduction:
o Dimensionality reduction is the transformation of data from a high-
dimensional space into a low-dimensional space so that the low-
dimensional representation retains some meaningful properties of the
original data.
o Biology and social-media pattern and analysis projects produces high
dimensional data sets. High dimensional data sets with 20,000 or more
features being common.
o Needs:
▪ High-dimensional data sets need a high amount of
computational space and time.
▪ Not all features are useful some also degrade the performance
of algorithm.
▪ Most ML algorithms performs better if dimensionality of data
set i.e. number of features is reduced.
▪ Helps in reducing irrelevance and redundancy in features.
▪ Makes easier to understand a model if number of features
involved in learning activity are less.
o Methods:
▪ PCA: Principal Component Analysis
▪ SVD: Singular Value Decomposition
▪ LDA: Linear Discriminant Analysis
• Feature Subset Selection:
o Feature (subset) selection – try to find out the optimal subset of the entire
feature set which significantly reduces computational cost without any
major impact on the learning accuracy.
o Used for both Supervised as well as unsupervised learning.
o For elimination only features which are not relevant or redundant are
selected.
o Irrelevant features if feature plays an insignificant role (or contributes
almost no information) in classifying or grouping together a set of data
instances. All irrelevant features are eliminated while selecting the final
feature subset.
o Redundant feature A feature is potentially redundant when the information
contributed by the feature is more or less same as one or more other
features. Among a group of potentially redundant features, a small number
of features can be selected without causing any negative impact to learn
model accuracy.
Chapter-3: Modelling and Evaluation
1. Elaborate the cross validation in training a model. Or explain the process of K-fold-cross-
validation method.
2. Distinguish lazy vs eager learner with an example.
• Error rate:
• Kappa value: Kappa is a measure of how well the model's predictions agree with
actual classifications beyond that which would be expected by chance.
• Sensitivity:
• Precision:
• F-measure:
8. What is model accuracy in reference to classification? Also Explain the performance
parameters Precision, Recall and F-measure with its formula and example.
9. List the methods for Model evaluation. Explain each. How we can improve the
performance of model.
• The types of model evaluation are:
i. Selecting a model:
▪ Model selection is the task of selecting a statistical model from a set of
candidate model given data.
▪ The process of assigning a model and fitting a specific model to a data
set is called model training.
▪ All the models have some predictive error given the statistical noise in
the data, the incompleteness of the data sample and the limitations of
each different model type.
▪ The best approach to model selection requires “sufficient” data which
may be nearly infinite depending on the complexity of the problem.
ii. Predictive model:
▪ It is also called predictive analytics. It is a mathematical process that
seeks to predict future event or outcomes by analysing patterns that are
likely to forecast future results.
▪ It has a clear focus what they want to learn and how they want to
learn.
▪ It involves the supervised learning functions used for the prediction of
the target value.
▪ The methods fall under this mining category are the classification,
time series analysis and regression.
▪ It may also be used to predict numerical values of the target feature
based on the predictor features. Popular regression models are
regression and logistic regression.
iii. Descriptive model:
▪ It is used for tasks that would benefit from the insight gained from
summarizing data in new and interesting ways.
▪ The process of training a descriptive model is called unsupervised
learning.
▪ It is the conventional form of Business Intelligence and data analysis
seek to provide a depiction or “summer view” of facts and figures in
understandable format.
▪ It helps organizations to understand what happened in the past, it helps
to understand the relationship between product and customer.
▪ It also helps to describe and present data in such format which can be
easily understood by a wide variety of business readers.
▪ The descriptive modelling task called pattern discovering is used to
identify useful associations within data.
▪ Pattern discovery is often used for market basket analysis on retailers
transactional purchase data.
10.Consider the following confusion matrix of the win/loss prediction of cricket match.
Calculate model accuracy and error rate, sensitivity, precision, F-measure and kappa value
for the same.
Actual Win Actual Loss
Predicted Win 82 7
Predicted Loss 3 8
Chapter-4: Basics of Feature Engineering
6. . Explain with an example, main underlying concept of feature extraction. What are
the most popular algorithms of feature extraction, briefly explain anyone.
• Feature extraction:
o In feature extraction, new features are created from a combination of
original features. Some of the commonly used operators for combining
the original features include
▪ For Boolean features: Conjunctions, Disjunctions, Negation, etc.
▪ For nominal features: Cartesian product, M of N, etc.
▪ For numerical features: Min, Max, Addition, Subtraction,
Multiplication, Division, Average, Equivalence, Inequality, etc.
• Let’s discuss the most popular feature extraction algorithms used in machine
learning:
o Principle Component Analysis (PCA): explain in brief from ans 3
o Singular Value Decomposition (SVD): explain in brief from ans 2
o Linear Discriminant Analysis (LDA):
▪ Linear discriminant analysis (LDA) is another commonly used
feature extraction technique like PCA or SVD. The objective of
LDA is similar to the sense that it intends to transform a data set
into a lower dimensional feature space. However, unlike PCA, the
focus of LDA is not to capture the data set variability. Instead,
LDA focuses on class separability, i.e. separating the features
based on class separability so as to avoid overfitting of the
machine learning model.
▪ Unlike PCA that calculates eigenvalues of the covariance matrix of
the data set, LDA calculates eigenvalues and eigenvectors within a
class and inter-class scatter matrices. Below are the steps to be
followed:
a. Calculate the mean vectors for the individual classes.
b. Calculate intra-class and inter-class scatter matrices.
c. Calculate eigenvalues and eigenvectors for SW -1 and SB,
where SW is the intra-class scatter matrix and SB is the
inter-class scatter matrix
where, p(A, B) is the joint probability of A and B and can also be denoted as p(A ∩ B)
• Similarly,
4. If 3% of electronic units manufactured by a company are defective. Find the
probability that in a sample of 200 units, less than 2 bulbs are defective.
5. In a communication system each data packet consists of 1000 bits. Due to the noise,
each bit may be received in error with probability 0.1. It is assumed bit errors occur
independently. Find the probability that there are more than 120 errors in a certain data
packet.
•
• We can approximate this integral by averaging samples of the function f at
uniform random points within the interval.
•
• And then ans 5 in continue.
Chapter-7: Supervised Learning
1. What are the strengths and weaknesses of SVM? Or What are the factors
determining the effectiveness of SVM?
• Strengths of SVM:
o SVM can be used for both classification and regression.
o It is robust, i.e. not much impacted by data with noise or outliers.
o The prediction results using this model are very promising.
• Weaknesses of SVM:
o SVM is applicable only for binary classification, i.e., when there
are only two classes in the problem domain.
o The SVM model is very complex – almost like a black box when it
deals with a high-dimensional data set. Hence, it is very difficult
and close to impossible to understand the model in such cases.
o It is slow for a large dataset, i.e., a data set with either a large
number of features or a large number of instances.
o It is quite memory-intensive.
• Application of SVM:
o SVM is most effective when it is used for binary classification, i.e.
for solving a machine learning problem with two classes. One
common problem on which SVM can be applied is in the field of
bioinformatics – more specifically, in detecting cancer and other
genetic disorders. It can also be used in detecting the image of a
face by binary classification of images into face and non-face
components. More such applications can be described.
•
3. Explain decision tree approach with suitable example. Or Explain Decision tree
algorithm.
4. Explain KNN algorithm with suitable example. Or write a note on KNN.
• K-Nearest Neighbour is one of the simplest Machine Learning algorithms
based on Supervised Learning technique.
• K-NN algorithm assumes the similarity between the new case/data and
available cases and put the new case into the category that is most similar
to the available categories.
• K-NN algorithm stores all the available data and classifies a new data
point based on the similarity. This means when new data appears then it
can be easily classified into a well suite category by using K- NN
algorithm.
• K-NN algorithm can be used for Regression as well as for Classification
but mostly it is used for the Classification problems.
• It is also called a lazy learner algorithm because it does not learn from the
training set immediately instead it stores the dataset and at the time of
classification, it performs an action on the dataset.
• Step to perform:
o Step-1: Select the number K of the neighbours
o Step-2: Calculate the Euclidean distance of K number of
neighbours
o Step-3: Take the K nearest neighbours as per the calculated
Euclidean distance.
o Step-4: Among these k neighbours, count the number of the data
points in each category.
o Step-5: Assign the new data points to that category for which the
number of the neighbour is maximum.
o Step-6: Our model is ready.
• We have a new entry but it doesn't have a class yet. To know its class, we
have to calculate the distance from the new entry to other entries in the
data set using the Euclidean distance formula.
5. Discuss the error rate and validation error in the kNN algorithm.
• Error rate in kNN:
o The error rate in the kNN algorithm refers to the proportion of
incorrectly classified instances in the dataset. When using kNN for
classification, the algorithm assigns a class label to a new data
point based on the majority class among its k nearest neighbors.
The error rate is calculated by comparing the predicted class labels
with the actual labels for the test dataset.
o Error Rate = (Number of Incorrect Predictions) / (Total Number of
Predictions)
o A lower error rate indicates better accuracy, suggesting that the
model is effectively classifying instances.
• Validation Error in kNN:
o Validation error, often estimated using techniques like cross-
validation, refers to the error rate on an independent dataset or a
subset of the original dataset that was not used during the training
phase. It assesses the generalization performance of the model and
helps to estimate how well the kNN algorithm will perform on
unseen data.
o Validation error is crucial to avoid overfitting, which occurs when
the model learns the training data too well but fails to generalize to
new, unseen data. Cross-validation techniques, such as k-fold
cross-validation, help in estimating the validation error by
partitioning the dataset into multiple subsets (folds). The model is
trained on a portion of the data and tested on the remaining unseen
portions, and this process is repeated multiple times to calculate an
average validation error.
o By evaluating the validation error, practitioners can select
appropriate hyperparameters for the kNN algorithm, such as the
value of k (number of nearest neighbours), and determine the
model's ability to generalize to new data.
6. What is supervised learning? Draw and explain classification steps in detail.
7. Define linear regression. Also explain Sum of squares with its formula.
8. Explain sum of squares due to error in multiple linear regression with example.
1. How does the apriori principle help in reducing the calculation overhead for a market
basket analysis? Explain with an example.
• The Apriori principle is a crucial concept in association rule mining, specifically
for market basket analysis. It helps in reducing the computational overhead by
focusing on frequent itemsets rather than examining all possible item
combinations. This principle relies on the observation that if an itemset is
infrequent, then all of its supersets (larger itemsets containing it) are also
infrequent.
• Let's illustrate this with an example:
o Suppose you have a dataset representing transactions in a grocery store:
o Transaction 1: Bread, Milk, Diapers
o Transaction 2: Bread, Beer, Eggs
o Transaction 3: Milk, Beer, Diapers, Cornflakes
o Transaction 4: Bread, Milk, Beer, Diapers
o Transaction 5: Bread, Milk, Diapers, Cornflakes
• Let's apply the Apriori algorithm with a minimum support of 2:
o Generate single items and calculate support:
▪ Bread: 4, Milk: 4, Diapers: 4, Beer: 3, Eggs: 1, Cornflakes: 2
▪ Frequent items (support >= 2): Bread, Milk, Diapers, Beer,
Cornflakes
o Generate pairs and calculate support:
▪ {Bread, Milk}: 4, {Bread, Diapers}: 3, {Bread, Beer}: 3, {Milk,
Diapers}: 3, {Milk, Beer}: 2, {Diapers, Beer}: 3, {Diapers,
Cornflakes}: 2
▪ Frequent itemsets (support >= 2): {Bread, Milk}, {Bread, Diapers},
{Bread, Beer}, {Milk, Diapers}, {Diapers, Beer}, {Diapers,
Cornflakes}
o Generate triples (not possible in this example as all possible triples have
support less than 2)
• In this example, the Apriori principle helps by reducing the number of itemsets to
consider for association rules. Instead of checking all possible combinations, it
eliminates infrequent itemsets early in the process, thus reducing computational
overhead significantly.
• This process continues until no new frequent itemsets can be found or until
reaching a specified itemset size or support threshold. The remaining frequent
itemsets are then used to derive association rules that reveal relationships between
items frequently purchased together.
3. Explain how the Market Basket Analysis uses the concepts of association analysis.
• Market Basket Analysis is a technique which identifies the strength of association
between pairs of products purchased together and identify patterns of co-
occurrence. Cooccurrence is when two or more things take place together.
• It takes data at transaction level which list all items bought by a customer in a
single purchase.
• The technique determines relationships of what products were purchased with
which other products these relationships are used to build profiles containing. If
then rules of this item purchased.
• The rules could be written as {A} then {B}
• Association rules are “If-then” statement, that help to show the probability of
relationship between data items within data sets in various types of data.
• The association rules are used to find correction and cooccurrences between
datasets.
• They are ideally used to explain patterns in data from seemingly independent
information repositories such as relational databases and transactional databases.
• The act of using association rules is sometimes referred to as “association rule
mining” are “mining associations”.
• Application of association rules:
o Market Basket Analysis
o Medical diagnosis
o Census Data
o Logistic regression
o Fraud detection in Wed
4. Explain the Apriori algorithm for association rule learning with an example.
• Apriori is a seminal algorithm proposed by R. Agrawal and R. Srikant in 1994 for
mining frequent item sets for Boolean association rules [AS94b]. The name of the
algorithm is based on the fact that the algorithm uses prior knowledge of frequent
itemset properties
• Support:
o The rule A→B holds in the transaction set D with support s, where s is the
percentage of transactions in D that contain A U B (i.e., the union of sets
A and B say, or, both A and B).
o
• Confidence:
o The rule A→B has confidence c in the transaction set D, where c is the
percentage of transactions in D containing A that also contain B.
o
• Example:
o
o
•
o Step 1: Choose the number of clusters k
▪ The first step in k-means is to pick the number of clusters, k.
▪
o Step 2: Select k random points as centroids
▪ Next, we randomly select the centroid for each cluster. Let’s say we
want to have 2 clusters, so k is equal to 2 here. We then randomly
select the centroid:
▪
o Step 3: Assign all the points to the closest cluster centroid
▪
o Step 4: Recompute the centroids of newly formed clusters
▪ Now, once we have assigned all of the points to either cluster, the
next step is to compute the centroids of newly formed clusters:
▪
o Step 5: Repeat steps 3 and 4
▪
• Stopping Criteria for K-Means Clustering
o There are essentially three stopping criteria that can be adopted to stop the
K-means algorithm: 1.
▪ Centroids of newly formed clusters do not change
▪ Points remain in the same cluster
▪ Maximum number of iterations is reached
7. Describe the concept of single link and complete link in the context of hierarchical
clustering.
8. Describe the main difference in the approach of k-means and k-medoids algorithms with
a neat diagram.
•
Chapter-9: Neural Network
1. Show the Step, ReLU and sigmoid activation functions with its equations and sketch
2. Briefly explain Perceptron and Mention its limitation.
• Perceptron is Machine Learning algorithm for supervised learning of various binary
classification tasks. Further, Perceptron is also understood as an Artificial Neuron or
neural network unit that helps to detect certain input data computations in business
intelligence.
• Perceptron model is also treated as one of the best and simplest types of Artificial
Neural networks. However, it is a supervised learning algorithm of binary classifiers.
Hence, we can consider it as a single-layer neural network with four main
parameters, i.e., input values, weights and Bias, net sum, and an activation function.
• Input nodes or input layer: This is the primary component of Perceptron which
accepts the initial data into the system for further processing. Each input node
contains a real numerical value.
• Weight and bias: Weight parameter represents the strength of the connection between
units. This is another most important parameter of Perceptron components. Weight is
directly proportional to the strength of the associated input neuron in deciding the
output. Further, Bias can be considered as the line of intercept in a linear equation.
• Activation function: These are the final and important components that help to
determine whether the neuron will fire or not. Activation Function can be considered
primarily as a step function.
•
•
3. What is difference between Machine Learning and Deep Learning.
• The objective of the perceptron is o classify a set of inputs into two classes c1 and
c2.
• This can be done using a very simple decision rule – assign the inputs to c1 if the
output of the perceptron i.e. yout is +1 and c2 if yout is -1.
• So for an n-dimensional signal space i.e. a space for ‘n’ input signals, the simplest
form of perceptron will have two decision regions, resembling two classes, separated
o
• ReLU (rectified linear unit) function:
o ReLU is the most popularly used activation function in the areas of
convolutional neural networks and deep learning. It is of the form
o This means that f(x) is zero when x is less than zero and f(x) is equal to x
when x is above or equal to zero. Figure below depicts the curve for a ReLU
activation function.
o The slope at origin is k/4. As the value of k becomes very large, the sigmoid
function becomes a threshold function.
o Bipolar sigmoid function: A bipolar sigmoid function, is of the form
• Note that all the activation functions have values ranging between 0 and 1. However,
in some cases, it is desirable to have values ranging from −1 to +1. In that case, there
will be a need to reframe the activation function.
• For example, in the case of step function, the revised definition would be as follows:
10.Explain in detail, the backpropagation algorithm. What are the limitations of this
algorithm?
• Definition same as ans 4
• Limitations:
o Backpropagation possibly be sensitive to nosiy data and irregularity.
o The performance of this is highly reliant on the input data.
o Needs excessive time for training.
o The need for a matrix-based method for backpropagation instead of mini-
batch.
• Advantages:
o It is simple, fast and easy to program.
o Only numbers of the input are tuned and not any other parameter.
o No need to have prior knowledge about the network.
o It is flexible.
o A standard approach and works efficiently.
o It does not require the user to learn special functions.