Professional Documents
Culture Documents
Answer Key
Part A
Part B
11. (a) Outline the uniformed search strategies like breadth-first arch and depth first search
with examples.
Uniformed search strategies are algorithms used in computer science and artificial
intelligence to explore and traverse state spaces, such as those found in search problems
and graph structures. These strategies do not use any specific information about the
JCT COLLEGE OF ENGINEERING AND TECHNOLOGY
PICHANUR, COIMBATORE – 641105
problem domain and rely on systematic exploration. Two common uniformed search
strategies are breadth-first search (BFS) and depth-first search (DFS).
Description
Data Structure
Property
(b) State the constraint satisfaction problem. Outline local search for constraint satisfaction
problem with an example.
Initial Assignment:
A=1, B=2, C=3
Initialization: Start with the initial assignment A=1, B=2, C=3.
Evaluation: Calculate the number of violated constraints. In this case, all constraints are
satisfied, so the cost is 0.
Neighbor Generation: Generate neighboring assignments. One possible neighbor is A=1,
B=2, C=1 (changing the value of C).
Selection of Next Assignment: Since the cost of A=1, B=2, C=1 is 1 (violating the third
constraint), we select A=1, B=2, C=3 (the current assignment) as the next assignment.
12. (a) (i) Elaborate on unconditional probability and conditional probability with an example.
Unconditional Probability (Marginal Probability):
Unconditional probability, also known as marginal probability, refers to the probability of
an event occurring without considering any specific conditions or additional information. It
is the basic probability associated with an event. Mathematically, the unconditional
probability of an event A is denoted as P(A).
Example of Unconditional Probability:
Imagine rolling a fair six-sided die. The unconditional probability of getting any specific
outcome, say rolling a 3, is 1/6 because there are six equally likely outcomes (1, 2, 3, 4, 5,
6), and only one of them is a 3.
Conditional Probability:
Conditional probability refers to the probability of an event occurring given that another
event has already occurred or under certain conditions. It quantifies the likelihood of an
event A happening when we have some information about event B. Mathematically, the
conditional probability of A given B is denoted as P(A|B) and is calculated as:
Where:
P(A|B) is the conditional probability of A given B.
P(A and B) is the probability of both A and B occurring.
P(B) is the unconditional probability of B.
Example of Conditional Probability:
Let's consider the example of drawing cards from a standard deck of 52 playing cards:
Suppose you draw one card from the deck, and you want to calculate the probability of
drawing a red card (hearts or diamonds) given that you've already drawn a face card (king,
queen, or jack).
P(A) = Probability of drawing a red card (hearts or diamonds).
P(B) = Probability of drawing a face card.
The conditional probability P(A|B) would be the probability of drawing a red card when
you know you've already drawn a face card. Let's say you've already drawn a face card,
and there are 26 red cards and 12 face cards in the deck.
P(A and B) = Probability of drawing both a red card and a face card = 6/52 (there are 6 red
face cards in the deck).
JCT COLLEGE OF ENGINEERING AND TECHNOLOGY
PICHANUR, COIMBATORE – 641105
So, the conditional probability of drawing a red card given that you've already drawn a face
card is 1/2, or 50%.
Inference in Bayesian networks refers to the process of using the network to answer
probabilistic queries about variables of interest. It involves estimating the probability
distribution of one or more variables in the network given observed evidence or
conditions.
Inference by Enumeration:
Inference by enumeration is a fundamental method for performing probabilistic inference
in Bayesian networks. It involves systematically considering all possible combinations of
values for the unobserved variables (variables of interest) and calculating the conditional
probabilities based on the network's structure and conditional probability tables (CPTs).
Here are the steps for performing inference by enumeration:
Initialize: Start with the variables you want to perform inference on, known as the query
variables (Q), and the observed evidence variables (E) with their values.
Create an Empty Table: Create an empty probability table to store the results of the
inference.
Iterate Over Values of Query Variables: For each possible combination of values for the
query variables (Q), do the following:
a. Set Evidence Variables: Set the values of the observed evidence variables (E) to their
observed values.
b. Calculate Joint Probability: Calculate the joint probability of the current combination of
values for Q and E by considering the network's structure and CPTs.
c. Update the Table: Update the probability table with the joint probability for the current
combination of values.
Normalize: After considering all possible combinations of values for the query variables,
normalize the probability table by dividing each entry by the sum of all entries. This
ensures that the probabilities sum to 1.
Obtain Results: The resulting probability table represents the conditional probability
distribution of the query variables given the observed evidence.
Example for Inference by Enumeration:
Consider a Bayesian network for a medical diagnosis scenario:
Variables:
Disease (D): Presence or absence of a disease (e.g., flu).
Symptom (S): Presence or absence of a specific symptom (e.g., cough).
Test (T): The result of a diagnostic test for the disease.
CPTs (as previously defined):
P(D)
P(S|D)
P(S|¬D)
P(T|D)
P(T|¬D)
Suppose we want to perform inference to find the probability of having the disease (D)
given that the symptom (S) is observed (S = true) and the test (T) is positive (T = true).
JCT COLLEGE OF ENGINEERING AND TECHNOLOGY
PICHANUR, COIMBATORE – 641105
13. (a) Elaborate on logistics regression with an example. Explain the process of
computing coefficients.
Logistic Regression is a statistical method used for analyzing a dataset in which there are
one or more independent variables that determine an outcome. It is primarily used for
binary classification problems, where the outcome variable is categorical and has only two
classes, such as Yes/No, True/False, or 0/1. Logistic Regression estimates the probability
that a given input belongs to a particular class.
Logistic Regression Process:
Data Collection: Collect the data that includes both the independent variable (GPA) and
the dependent variable (Admission status).
Data Preprocessing: Clean and preprocess the data. This may involve handling missing
values, outliers, and scaling the GPA scores if necessary.
1. Model Selection: Choose Logistic Regression as the modeling technique since we are
dealing with binary classification.
2. Hypothesis Function:
In Logistic Regression, the logistic function (sigmoid function) is used to model the
probability that a student will be admitted:
3. Compute Coefficients (b0 and b1):
To compute the coefficients, we use a method called Maximum Likelihood
Estimation (MLE). The goal is to find values for b0 and b1 that maximize the
likelihood of observing the given data.
This involves an iterative optimization algorithm, such as gradient descent. The
algorithm starts with initial values for b0 and b1 and updates them in such a way
that the likelihood of observing the data increases. This process continues until
convergence.
4. Model Training:
Train the logistic regression model using the computed coefficients on the training
data.
5. Evaluation:
Evaluate the model's performance using various metrics like accuracy, precision,
recall, F1-score, and ROC-AUC to assess how well it predicts student admissions.
6. Deployment:
Once satisfied with the model's performance, you can deploy it to make predictions
on new, unseen data.
Thus logistic regression is a valuable tool for binary classification problems, and its
coefficients are computed using Maximum Likelihood Estimation, which aims to
maximize the likelihood of observing the given data by iteratively adjusting the
coefficients.
(b) What is a classification tree? Explain the steps to construct classification tree. List and
explain about the different procedures used.
JCT COLLEGE OF ENGINEERING AND TECHNOLOGY
PICHANUR, COIMBATORE – 641105
A classification tree, also known as a decision tree, is a machine learning algorithm used
for both classification and regression tasks. It is a graphical representation of a decision-
making process that recursively divides a dataset into smaller subsets based on the values
of input features.
Steps to construct a classification tree:
1. Data Preparation:
Collect and prepare your dataset, ensuring it is clean and well-structured.
Divide the dataset into two parts: one for training the model and one for
testing or validating the model's performance.
2. Select a Root Node:
Choose the feature that will be used to split the dataset at the root node.
3. Split the Dataset:
Based on the selected feature, split the dataset into subsets. Each subset
corresponds to a particular value or range of values of the selected feature.
4. Recursive Splitting:
Repeat steps 2 and 3 for each subset created in the previous step.
Continue this process recursively until a stopping criterion is met. Common
stopping criteria include:
Maximum tree depth: Limiting the depth of the tree to prevent
overfitting.
Minimum samples per leaf: Ensuring that each leaf node contains a
minimum number of samples.
Minimum impurity reduction: Stop splitting if the impurity reduction is
below a certain threshold.
5. Pruning (Optional):
Pruning is a technique used to reduce the complexity of the tree and
prevent overfitting. It involves removing branches that do not
significantly improve the tree's performance on validation data.
14. (a) . i) What is bagging and boosting? Give example.
When you want to make a prediction for a new person, you collect
predictions from all 100 trees and take a majority vote to determine the final
prediction.
2. Boosting:
Boosting, unlike bagging, gives more weight to the instances that were previously
misclassified by the base models. It works iteratively, where each base model is
trained sequentially, and at each step, the focus is on the data points that were
incorrectly classified by the previous models.
Example: AdaBoost (Adaptive Boosting)
In AdaBoost, you start by assigning equal weights to all training instances.
Train a weak learner (e.g., a decision stump, which is a simple decision tree
with one level) on the data.
Increase the weight of misclassified instances, so the next weak learner pays
more attention to them.
Train the next weak learner, and again, increase the weight of misclassified
instances.
This process continues for a predetermined number of iterations.
Finally, combine the predictions of all weak learners with weighted voting to
make the final prediction.
ii)Outline the steps in the AdaBoost algorithm with an example.
AdaBoost (Adaptive Boosting) is an ensemble learning method used for classification and
regression tasks. It aims to improve the accuracy of weak classifiers by combining them into a
strong classifier.
Steps in the AdaBoost algorithm:
1. Initialize Sample Weights: Assign equal weights to all training examples.
2. For each iteration (t):
a. Train a Weak Classifier: Select a weak classifier (usually a decision tree with
limited depth, also called a "stump") that minimizes the weighted error on the
current training set. The error is weighted by the sample weights.
b. Calculate the Weak Classifier's Weight: Calculate the weight of the weak
classifier's vote in the final decision. This weight depends on the classifier's accuracy
in the weighted dataset.
c. Update Sample Weights: Increase the weights of the misclassified examples,
making them more important for the next iteration. The idea is to focus on the
examples that are difficult to classify correctly.
d. Normalize Sample Weights: Normalize the updated sample weights so that they
sum up to 1. This step ensures that the sample weights remain a probability
distribution.
3. Final Classifier Creation: a. Combine the individual weak classifiers by assigning a
weight to each of them based on their performance.
(b) Elaborate on the steps in expectation-maximization algorithm.
JCT COLLEGE OF ENGINEERING AND TECHNOLOGY
PICHANUR, COIMBATORE – 641105
Initialize the neural network's weights and biases randomly or using some
predefined values. These initial values play a significant role in the
convergence of the network.
2. Forward Propagation:
Perform a forward pass through the neural network by feeding an input data
point into the network.
3. Compute the Loss:
Compare the predicted output with the actual target or ground truth value.
4. Backward Propagation:
Start by computing the gradient of the loss with respect to the output layer's
activations. This is typically done using the chain rule of calculus.
5. Repeat:
Iterate through steps 2 to 4 for a specified number of epochs or until the loss
converges to a satisfactory level.
6. Evaluation:
After training, assess the model's performance on a separate validation or
test dataset to ensure it generalizes well to unseen data.
Importance of Backpropagation in Designing Neural Networks:
These below factors are useful in terms of back propagation:
1. Learning
2. Flexibility.
3. Automation
4. Scalability
5. Generalization.
6. Adaptability
Example:
Here's a simple sketch of a deep feedforward neural network:
Input Layer Hidden Layers Output Layer [ ]-------->[ ]-------->[ ]-------->[ ] | | | | Input Node 1
Node 2 Output Layer (Layer 1) (Layer 2) Layer (Hidden) (Hidden) Layer 0
JCT COLLEGE OF ENGINEERING AND TECHNOLOGY
PICHANUR, COIMBATORE – 641105
In this sketch:
The square brackets represent nodes/neurons in each layer.
The arrows indicate connections between neurons, where each connection has a
weight associated with it.
The layers are labeled as "Input Layer," "Hidden Layers," and "Output Layer."
The input layer receives data, which is then passed through the hidden layers, and finally,
the output layer produces the network's predictions or results.
Each neuron in the hidden layers performs calculations, including a weighted sum of its
inputs and the application of an activation function (e.g., sigmoid, ReLU) to produce its
output. This process continues through the network until the final output is generated.
Training a deep feedforward network involves adjusting the weights of the connections to
minimize the difference between the predicted outputs and the actual targets through
techniques like backpropagation and gradient descent.
Part C
16. (a) The values of x and their corresponding values of y are shown in the
table below.
X 123 4567
Y ,3 4, 5, 6, 8, 10
i)Find the least square regression line y = ax + b
Step 1: Calculate the mean (average) of X and Y.
Mean of X (X̄) = (1 + 2 + 3 + 4 + 5 + 6 + 7) / 7 = 28 / 7 = 4
Mean of Y (Ȳ) = (3 + 4 + 5 + 6 + 8 + 10) / 6 = 36 / 6 = 6
Step 2: Calculate the deviations from the mean for both X and Y.
Deviation from X̄ (ΔX) = X - X̄ Deviation from Ȳ (ΔY) = Y - Ȳ
Now, calculate ΔX and ΔY for each data point:
X: 1 2 3 4 5 6 7 Y: 3 4 5 6 8 10
ΔX: (-3) (-2) (-1) (0) (1) (2) (3) ΔY: (-3) (-2) (-1) (0) (2) (4)
Step 3: Calculate the product of ΔX and ΔY, as well as the squared values of ΔX.
ΔXΔY: (3) (4) (1) (0) (2) (8) (12) ΔX^2: 9 4 1 0 1 4 9
Step 4: Calculate 'a' using the formula:
a = Σ(ΔXΔY) / Σ(ΔX^2)
a = (3 + 4 + 1 + 0 + 2 + 8 + 12) / (9 + 4 + 1 + 0 + 1 + 4 + 9) a = 30 / 28 ≈ 1.0714
Step 5: Calculate 'b' using the formula:
b = Ȳ - (a * X̄)
b = 6 - (1.0714 * 4) b ≈ 6 - 4.2856 ≈ 1.7143
So, the least squares regression line is approximately:
y ≈ 1.0714x + 1.7143
(ii) Estimate the value of 'y' when 'x' = 10:
y ≈ 1.0714 * 10 + 1.7143 y ≈ 10.7143 + 1.7143 y ≈ 12.4286
So, when x = 10, the estimated value of y is approximately 12.4286.
JCT COLLEGE OF ENGINEERING AND TECHNOLOGY
PICHANUR, COIMBATORE – 641105
b)Consider five points \{x1, x2, x3, x4, x5} with the following coordinates as a two-
dimensional sample for clustering:
x1 = (0.5, 1.75), x2 = (1, 2), x3 = (1.75, 0.25), x4= (4, 1)
x5 = (6, 3)
Illustrate the k-means algorithm on the above data set. The required number of
clusters is two, and initially, clusters are formed from random distribution of samples:
C1 = {x1, x2,, x2 and C2 ={x3 , x5}.
Step 1: Initialize the cluster centroids for each cluster randomly. In this case, you have
already provided the initial clusters C1 and C2.
C1: {x1, x2} C2: {x3, x5}
So, the initial cluster centroids would be the mean of the points in each cluster.
Initial centroid for C1 (μ1): ((0.5 + 1) / 2, (1.75 + 2) / 2) = (0.75, 1.875) Initial centroid for
C2 (μ2): ((1.75 + 6) / 2, (0.25 + 3) / 2) = (3.875, 1.625)
Step 2: Assign each point to the nearest cluster based on Euclidean distance.
Assign x1 to C1 because it's closer to μ1. Assign x2 to C1 because it's closer to μ1. Assign
x3 to C2 because it's closer to μ2. Assign x4 to C1 because it's closer to μ1. Assign x5 to
C2 because it's closer to μ2.
Updated clusters: C1: {x1, x2, x4} C2: {x3, x5}
Step 3: Recalculate the centroids for each cluster.
New centroid for C1 (μ1): ((0.5 + 1 + 4) / 3, (1.75 + 2 + 1) / 3) = (1.5, 1.583) New centroid
for C2 (μ2): ((1.75 + 6) / 2, (0.25 + 3) / 2) = (3.875, 1.625)
Step 4: Repeat steps 2 and 3 until convergence. Check if the centroids remain the same
or the assignment of points to clusters no longer changes significantly.
In this case, let's check the assignments:
x1, x2, and x4 still belong to C1.
x3 and x5 still belong to C2.
The assignments haven't changed, and the centroids haven't changed either. Therefore,
the algorithm has converged.
Final clusters: C1: {x1, x2, x4} C2: {x3, x5}
Final centroids: μ1: (1.5, 1.583) μ2: (3.875, 1.625)
The k-means algorithm has converged, and the data points have been clustered into two
groups based on their proximity to the centroids.