You are on page 1of 12

JCT COLLEGE OF ENGINEERING AND TECHNOLOGY

PICHANUR, COIMBATORE – 641105

CS3491- Artificial Intelligence and Machine Learning (Apr/May 2023)

Answer Key
Part A

1. Define artificial intelligence.


The ability of a computer or a robot controlled by a computer to do tasks that are usually
done by humans because they require human intelligence and discernment.
2. What is adversarial search?
Adversarial search is a method applied to a situation where you are planning while another
actor prepares against you.
3. Define uncertainty.
The lack of certainty, a state of limited knowledge where it is impossible to exactly describe
the existing state, a future outcome, or more than one possible outcome.
4. State Bayes rule.
The Bayes theorem helps determine the likelihood that one event will occur with unclear
information while another has already happened. The mathematical formulation of the
Bayes theorem is P X Y = P Y X · P X P Y
5. Outline the difference between supervised and unsupervised learning.
Supervised learning model predicts the output. Unsupervised learning model finds the
hidden patterns in data
6. What is a random forest?
Supervised learning model predicts the output. Unsupervised learning model finds the
hidden patterns in data.
7. Define ensemble learning.
A general meta approach to machine learning that seeks better predictive performance by
combining the predictions from multiple models.
8. What is the significance of Gaussian mixture model?
They do not require which subpopulation a data point belongs to. It allows the model to
learn the subpopulations automatically.
9. Draw the architecture of multilayer perceptron.
It contains a series of layers, composed of neurons and their connections
10. Name any two activation functions.
 Linear or Identity Activation Function.
 Non-linear Activation Function.

Part B
11. (a) Outline the uniformed search strategies like breadth-first arch and depth first search
with examples.
Uniformed search strategies are algorithms used in computer science and artificial
intelligence to explore and traverse state spaces, such as those found in search problems
and graph structures. These strategies do not use any specific information about the
JCT COLLEGE OF ENGINEERING AND TECHNOLOGY
PICHANUR, COIMBATORE – 641105

problem domain and rely on systematic exploration. Two common uniformed search
strategies are breadth-first search (BFS) and depth-first search (DFS).

Breadth-First Search (BFS):

 Description
 Data Structure
 Property

partnerships, shaping the overall supply chain structure.

(b) State the constraint satisfaction problem. Outline local search for constraint satisfaction
problem with an example.

A Constraint Satisfaction Problem (CSP) is a formal representation of a problem where a set


of variables must be assigned values from their respective domains, subject to constraints
that define allowable combinations of values. The goal is to find an assignment of values to
variables that satisfies all constraints.

Local Search for Constraint Satisfaction Problems:


Local search is a meta heuristic technique used to solve CSPs by iteratively exploring the
solution space and gradually improving the current solution until a satisfying assignment is
found or a stopping criterion is met.
Initialization: Start with an initial assignment of values to variables. This assignment can be
generated randomly or by using some heuristics.
Evaluation: Calculate a cost or fitness value associated with the current assignment. In CSPs,
this typically measures the number of violated constraints (constraints not satisfied).
Neighbor Generation: Generate neighboring assignments by making small changes to the
current assignment. These changes can involve swapping values between variables or
changing the value of a single variable.
Selection of Next Assignment: Choose a neighboring assignment based on some criteria.
Common criteria include selecting the assignment that minimizes the number of violated
constraints or that improves the cost or fitness function.
Example:
Let's consider a simple CSP example:
Variables: A, B, C Domains: {1, 2, 3} Constraints:
A≠B
B≠C
A+B<C
JCT COLLEGE OF ENGINEERING AND TECHNOLOGY
PICHANUR, COIMBATORE – 641105

Initial Assignment:
A=1, B=2, C=3
Initialization: Start with the initial assignment A=1, B=2, C=3.
Evaluation: Calculate the number of violated constraints. In this case, all constraints are
satisfied, so the cost is 0.
Neighbor Generation: Generate neighboring assignments. One possible neighbor is A=1,
B=2, C=1 (changing the value of C).
Selection of Next Assignment: Since the cost of A=1, B=2, C=1 is 1 (violating the third
constraint), we select A=1, B=2, C=3 (the current assignment) as the next assignment.

12. (a) (i) Elaborate on unconditional probability and conditional probability with an example.
Unconditional Probability (Marginal Probability):
Unconditional probability, also known as marginal probability, refers to the probability of
an event occurring without considering any specific conditions or additional information. It
is the basic probability associated with an event. Mathematically, the unconditional
probability of an event A is denoted as P(A).
Example of Unconditional Probability:
Imagine rolling a fair six-sided die. The unconditional probability of getting any specific
outcome, say rolling a 3, is 1/6 because there are six equally likely outcomes (1, 2, 3, 4, 5,
6), and only one of them is a 3.
Conditional Probability:
Conditional probability refers to the probability of an event occurring given that another
event has already occurred or under certain conditions. It quantifies the likelihood of an
event A happening when we have some information about event B. Mathematically, the
conditional probability of A given B is denoted as P(A|B) and is calculated as:

Where:
P(A|B) is the conditional probability of A given B.
P(A and B) is the probability of both A and B occurring.
P(B) is the unconditional probability of B.
Example of Conditional Probability:
Let's consider the example of drawing cards from a standard deck of 52 playing cards:
Suppose you draw one card from the deck, and you want to calculate the probability of
drawing a red card (hearts or diamonds) given that you've already drawn a face card (king,
queen, or jack).
P(A) = Probability of drawing a red card (hearts or diamonds).
P(B) = Probability of drawing a face card.
The conditional probability P(A|B) would be the probability of drawing a red card when
you know you've already drawn a face card. Let's say you've already drawn a face card,
and there are 26 red cards and 12 face cards in the deck.
P(A and B) = Probability of drawing both a red card and a face card = 6/52 (there are 6 red
face cards in the deck).
JCT COLLEGE OF ENGINEERING AND TECHNOLOGY
PICHANUR, COIMBATORE – 641105

P(B) = Probability of drawing a face card = 12/52.


Now, you can calculate the conditional probability using the following:

So, the conditional probability of drawing a red card given that you've already drawn a face
card is 1/2, or 50%.

(ii)What is a Bayesian network? Explain the steps followed to construct a Bayesian


network with an example.
Bayesian Network:
A Bayesian network, also known as a belief network or a probabilistic graphical model, is a
graphical representation of probabilistic relationships among a set of variables. It uses a
directed acyclic graph (DAG) to model these relationships, along with conditional
probability tables (CPTs) associated with each node to quantify the probabilistic
dependencies.
Steps to Construct a Bayesian Network:
Constructing a Bayesian network involves several steps, including defining variables,
specifying dependencies, and assigning conditional probabilities. Here are the key steps:
Identify Variables:
Begin by identifying the variables of interest in your problem domain. These variables
represent the factors that you believe are related to each other in a probabilistic manner.
Define the Structure:
Create a directed acyclic graph (DAG) that represents the causal or probabilistic
relationships between the variables
Specify Conditional Probability Tables (CPTs):
For each node (variable) in the graph, specify a conditional probability table (CPT). The CPT
quantifies how the variable depends on its parents in the graph. It defines the probability
distribution of the variable given the values of its parents.
Determine Probabilistic Dependencies:
Based on domain knowledge, data, or expert input, determine the probabilistic
dependencies between variables. These dependencies are expressed through the CPTs.
Verify the Model:
Carefully review the constructed Bayesian network to ensure that it accurately represents
the probabilistic relationships and dependencies in the problem domain. Verify that the
graph structure and CPTs align with your understanding of the problem.

(b) What do you meant by inference in Bayesian networks? Outline inferences by


enumeration with an example.

Inference in Bayesian Networks:


JCT COLLEGE OF ENGINEERING AND TECHNOLOGY
PICHANUR, COIMBATORE – 641105

Inference in Bayesian networks refers to the process of using the network to answer
probabilistic queries about variables of interest. It involves estimating the probability
distribution of one or more variables in the network given observed evidence or
conditions.
Inference by Enumeration:
Inference by enumeration is a fundamental method for performing probabilistic inference
in Bayesian networks. It involves systematically considering all possible combinations of
values for the unobserved variables (variables of interest) and calculating the conditional
probabilities based on the network's structure and conditional probability tables (CPTs).
Here are the steps for performing inference by enumeration:
Initialize: Start with the variables you want to perform inference on, known as the query
variables (Q), and the observed evidence variables (E) with their values.
Create an Empty Table: Create an empty probability table to store the results of the
inference.
Iterate Over Values of Query Variables: For each possible combination of values for the
query variables (Q), do the following:
a. Set Evidence Variables: Set the values of the observed evidence variables (E) to their
observed values.
b. Calculate Joint Probability: Calculate the joint probability of the current combination of
values for Q and E by considering the network's structure and CPTs.
c. Update the Table: Update the probability table with the joint probability for the current
combination of values.
Normalize: After considering all possible combinations of values for the query variables,
normalize the probability table by dividing each entry by the sum of all entries. This
ensures that the probabilities sum to 1.
Obtain Results: The resulting probability table represents the conditional probability
distribution of the query variables given the observed evidence.
Example for Inference by Enumeration:
Consider a Bayesian network for a medical diagnosis scenario:
Variables:
Disease (D): Presence or absence of a disease (e.g., flu).
Symptom (S): Presence or absence of a specific symptom (e.g., cough).
Test (T): The result of a diagnostic test for the disease.
CPTs (as previously defined):
P(D)
P(S|D)
P(S|¬D)
P(T|D)
P(T|¬D)
Suppose we want to perform inference to find the probability of having the disease (D)
given that the symptom (S) is observed (S = true) and the test (T) is positive (T = true).
JCT COLLEGE OF ENGINEERING AND TECHNOLOGY
PICHANUR, COIMBATORE – 641105

13. (a) Elaborate on logistics regression with an example. Explain the process of
computing coefficients.
Logistic Regression is a statistical method used for analyzing a dataset in which there are
one or more independent variables that determine an outcome. It is primarily used for
binary classification problems, where the outcome variable is categorical and has only two
classes, such as Yes/No, True/False, or 0/1. Logistic Regression estimates the probability
that a given input belongs to a particular class.
Logistic Regression Process:
Data Collection: Collect the data that includes both the independent variable (GPA) and
the dependent variable (Admission status).
Data Preprocessing: Clean and preprocess the data. This may involve handling missing
values, outliers, and scaling the GPA scores if necessary.
1. Model Selection: Choose Logistic Regression as the modeling technique since we are
dealing with binary classification.
2. Hypothesis Function:
In Logistic Regression, the logistic function (sigmoid function) is used to model the
probability that a student will be admitted:
3. Compute Coefficients (b0 and b1):
To compute the coefficients, we use a method called Maximum Likelihood
Estimation (MLE). The goal is to find values for b0 and b1 that maximize the
likelihood of observing the given data.
This involves an iterative optimization algorithm, such as gradient descent. The
algorithm starts with initial values for b0 and b1 and updates them in such a way
that the likelihood of observing the data increases. This process continues until
convergence.
4. Model Training:
Train the logistic regression model using the computed coefficients on the training
data.
5. Evaluation:
Evaluate the model's performance using various metrics like accuracy, precision,
recall, F1-score, and ROC-AUC to assess how well it predicts student admissions.
6. Deployment:
Once satisfied with the model's performance, you can deploy it to make predictions
on new, unseen data.
Thus logistic regression is a valuable tool for binary classification problems, and its
coefficients are computed using Maximum Likelihood Estimation, which aims to
maximize the likelihood of observing the given data by iteratively adjusting the
coefficients.

(b) What is a classification tree? Explain the steps to construct classification tree. List and
explain about the different procedures used.
JCT COLLEGE OF ENGINEERING AND TECHNOLOGY
PICHANUR, COIMBATORE – 641105

A classification tree, also known as a decision tree, is a machine learning algorithm used
for both classification and regression tasks. It is a graphical representation of a decision-
making process that recursively divides a dataset into smaller subsets based on the values
of input features.
Steps to construct a classification tree:
1. Data Preparation:
 Collect and prepare your dataset, ensuring it is clean and well-structured.
 Divide the dataset into two parts: one for training the model and one for
testing or validating the model's performance.
2. Select a Root Node:
 Choose the feature that will be used to split the dataset at the root node.
3. Split the Dataset:
 Based on the selected feature, split the dataset into subsets. Each subset
corresponds to a particular value or range of values of the selected feature.
4. Recursive Splitting:
 Repeat steps 2 and 3 for each subset created in the previous step.
 Continue this process recursively until a stopping criterion is met. Common
stopping criteria include:
 Maximum tree depth: Limiting the depth of the tree to prevent
overfitting.
 Minimum samples per leaf: Ensuring that each leaf node contains a
minimum number of samples.
 Minimum impurity reduction: Stop splitting if the impurity reduction is
below a certain threshold.
5. Pruning (Optional):
 Pruning is a technique used to reduce the complexity of the tree and
prevent overfitting. It involves removing branches that do not
significantly improve the tree's performance on validation data.
14. (a) . i) What is bagging and boosting? Give example.

Bagging (Bootstrap Aggregating):


Bagging involves creating multiple copies of the original dataset through a process called
bootstrapping. Bootstrapping randomly samples the dataset with replacement, resulting in
several subsets of the data.
Example: Random Forest
 Random Forest is a popular bagging algorithm.
 Suppose you want to predict whether a person will buy a product based on
their age and income.
 You create 100 bootstrap samples from your dataset, each containing a
subset of the data.
 Train a decision tree on each of these samples.
JCT COLLEGE OF ENGINEERING AND TECHNOLOGY
PICHANUR, COIMBATORE – 641105

 When you want to make a prediction for a new person, you collect
predictions from all 100 trees and take a majority vote to determine the final
prediction.
2. Boosting:
Boosting, unlike bagging, gives more weight to the instances that were previously
misclassified by the base models. It works iteratively, where each base model is
trained sequentially, and at each step, the focus is on the data points that were
incorrectly classified by the previous models.
Example: AdaBoost (Adaptive Boosting)
 In AdaBoost, you start by assigning equal weights to all training instances.
 Train a weak learner (e.g., a decision stump, which is a simple decision tree
with one level) on the data.
 Increase the weight of misclassified instances, so the next weak learner pays
more attention to them.
 Train the next weak learner, and again, increase the weight of misclassified
instances.
 This process continues for a predetermined number of iterations.
 Finally, combine the predictions of all weak learners with weighted voting to
make the final prediction.
ii)Outline the steps in the AdaBoost algorithm with an example.
AdaBoost (Adaptive Boosting) is an ensemble learning method used for classification and
regression tasks. It aims to improve the accuracy of weak classifiers by combining them into a
strong classifier.
Steps in the AdaBoost algorithm:
1. Initialize Sample Weights: Assign equal weights to all training examples.
2. For each iteration (t):
a. Train a Weak Classifier: Select a weak classifier (usually a decision tree with
limited depth, also called a "stump") that minimizes the weighted error on the
current training set. The error is weighted by the sample weights.
b. Calculate the Weak Classifier's Weight: Calculate the weight of the weak
classifier's vote in the final decision. This weight depends on the classifier's accuracy
in the weighted dataset.
c. Update Sample Weights: Increase the weights of the misclassified examples,
making them more important for the next iteration. The idea is to focus on the
examples that are difficult to classify correctly.
d. Normalize Sample Weights: Normalize the updated sample weights so that they
sum up to 1. This step ensures that the sample weights remain a probability
distribution.
3. Final Classifier Creation: a. Combine the individual weak classifiers by assigning a
weight to each of them based on their performance.
(b) Elaborate on the steps in expectation-maximization algorithm.
JCT COLLEGE OF ENGINEERING AND TECHNOLOGY
PICHANUR, COIMBATORE – 641105

The Expectation-Maximization (EM) algorithm is an iterative optimization technique used to


estimate the parameters of statistical models, particularly in the context of unsupervised machine
learning, such as clustering and Gaussian mixture models. EM alternates between two main steps:
the Expectation (E-step) and the Maximization (M-step). These steps are repeated until
convergence is achieved. Here's a detailed explanation of each step:
1. Initialization:
 Initialize the model's parameters randomly or using some heuristic method.
The choice of initial parameters can impact the algorithm's convergence, so
it's essential to initialize them thoughtfully.
2. Expectation (E-step):
 In this step, you calculate the expected values (posterior probabilities) of the
latent variables, given the current estimates of the model parameters.
 Compute the posterior probabilities or membership probabilities for each
data point belonging to each component (in the case of Gaussian mixture
models) or cluster. These probabilities represent the likelihood of a data
point being generated by each component.
3. Maximization (M-step):
 In this step, you update the model's parameters to maximize the expected
complete-data log-likelihood obtained in the E-step. This involves finding the
parameter values that make the data most probable under the current
model.
4. Convergence Check:
 If the convergence criteria are met, terminate the algorithm. Otherwise,
return to the E-step.
5. Iteration:
 Repeat the E-step and M-step iteratively until the algorithm converges. Each
iteration typically brings the model parameters closer to their optimal values,
leading to a better fit of the model to the data.
6. Output:
 Once the EM algorithm converges, the final estimates of the model
parameters are obtained.
 These parameter estimates can then be used for various purposes, such as
clustering data points (in the case of Gaussian mixture models) or estimating
missing values in a dataset (in the context of data imputation).
15. a) Explain the steps in the back propagation learning algorithm. What is the importance
of it in designing neural networks?
Back propagation is a crucial algorithm for training artificial neural networks, and it is the
foundation of many machine learning and deep learning techniques.
Steps involved in the backpropagation learning algorithm:
1. Initialization:
JCT COLLEGE OF ENGINEERING AND TECHNOLOGY
PICHANUR, COIMBATORE – 641105

 Initialize the neural network's weights and biases randomly or using some
predefined values. These initial values play a significant role in the
convergence of the network.
2. Forward Propagation:
 Perform a forward pass through the neural network by feeding an input data
point into the network.
3. Compute the Loss:
 Compare the predicted output with the actual target or ground truth value.
4. Backward Propagation:
 Start by computing the gradient of the loss with respect to the output layer's
activations. This is typically done using the chain rule of calculus.
5. Repeat:
 Iterate through steps 2 to 4 for a specified number of epochs or until the loss
converges to a satisfactory level.
6. Evaluation:
 After training, assess the model's performance on a separate validation or
test dataset to ensure it generalizes well to unseen data.
Importance of Backpropagation in Designing Neural Networks:
These below factors are useful in terms of back propagation:
1. Learning
2. Flexibility.
3. Automation
4. Scalability
5. Generalization.
6. Adaptability

b)Explain a deep feedforward network with a neat sketch.


Conceptual Explanation:
A deep feedforward network consists of multiple layers of interconnected artificial
neurons, organized into three main types of layers:
1. Input Layer: The input layer is responsible for receiving the initial data or features.
Each node (circle) in this layer represents a feature or input variable.
2. Hidden Layers: Between the input and output layers, there can be one or more
hidden layers. These layers are called "hidden" because they are not directly
connected to the input or output of the network.
3. Output Layer: The output layer produces the final results or predictions of the network's
task. The number of nodes in this layer depends on the nature of the problem

Example:
Here's a simple sketch of a deep feedforward neural network:
Input Layer Hidden Layers Output Layer [ ]-------->[ ]-------->[ ]-------->[ ] | | | | Input Node 1
Node 2 Output Layer (Layer 1) (Layer 2) Layer (Hidden) (Hidden) Layer 0
JCT COLLEGE OF ENGINEERING AND TECHNOLOGY
PICHANUR, COIMBATORE – 641105

In this sketch:
 The square brackets represent nodes/neurons in each layer.
 The arrows indicate connections between neurons, where each connection has a
weight associated with it.
 The layers are labeled as "Input Layer," "Hidden Layers," and "Output Layer."
The input layer receives data, which is then passed through the hidden layers, and finally,
the output layer produces the network's predictions or results.
Each neuron in the hidden layers performs calculations, including a weighted sum of its
inputs and the application of an activation function (e.g., sigmoid, ReLU) to produce its
output. This process continues through the network until the final output is generated.
Training a deep feedforward network involves adjusting the weights of the connections to
minimize the difference between the predicted outputs and the actual targets through
techniques like backpropagation and gradient descent.
Part C
16. (a) The values of x and their corresponding values of y are shown in the
table below.

X 123 4567
Y ,3 4, 5, 6, 8, 10
i)Find the least square regression line y = ax + b
Step 1: Calculate the mean (average) of X and Y.
Mean of X (X̄) = (1 + 2 + 3 + 4 + 5 + 6 + 7) / 7 = 28 / 7 = 4
Mean of Y (Ȳ) = (3 + 4 + 5 + 6 + 8 + 10) / 6 = 36 / 6 = 6
Step 2: Calculate the deviations from the mean for both X and Y.
Deviation from X̄ (ΔX) = X - X̄ Deviation from Ȳ (ΔY) = Y - Ȳ
Now, calculate ΔX and ΔY for each data point:
X: 1 2 3 4 5 6 7 Y: 3 4 5 6 8 10
ΔX: (-3) (-2) (-1) (0) (1) (2) (3) ΔY: (-3) (-2) (-1) (0) (2) (4)
Step 3: Calculate the product of ΔX and ΔY, as well as the squared values of ΔX.
ΔXΔY: (3) (4) (1) (0) (2) (8) (12) ΔX^2: 9 4 1 0 1 4 9
Step 4: Calculate 'a' using the formula:
a = Σ(ΔXΔY) / Σ(ΔX^2)
a = (3 + 4 + 1 + 0 + 2 + 8 + 12) / (9 + 4 + 1 + 0 + 1 + 4 + 9) a = 30 / 28 ≈ 1.0714
Step 5: Calculate 'b' using the formula:
b = Ȳ - (a * X̄)
b = 6 - (1.0714 * 4) b ≈ 6 - 4.2856 ≈ 1.7143
So, the least squares regression line is approximately:
y ≈ 1.0714x + 1.7143
(ii) Estimate the value of 'y' when 'x' = 10:
y ≈ 1.0714 * 10 + 1.7143 y ≈ 10.7143 + 1.7143 y ≈ 12.4286
So, when x = 10, the estimated value of y is approximately 12.4286.
JCT COLLEGE OF ENGINEERING AND TECHNOLOGY
PICHANUR, COIMBATORE – 641105

b)Consider five points \{x1, x2, x3, x4, x5} with the following coordinates as a two-
dimensional sample for clustering:
x1 = (0.5, 1.75), x2 = (1, 2), x3 = (1.75, 0.25), x4= (4, 1)
x5 = (6, 3)
Illustrate the k-means algorithm on the above data set. The required number of
clusters is two, and initially, clusters are formed from random distribution of samples:
C1 = {x1, x2,, x2 and C2 ={x3 , x5}.

Step 1: Initialize the cluster centroids for each cluster randomly. In this case, you have
already provided the initial clusters C1 and C2.
C1: {x1, x2} C2: {x3, x5}
So, the initial cluster centroids would be the mean of the points in each cluster.
Initial centroid for C1 (μ1): ((0.5 + 1) / 2, (1.75 + 2) / 2) = (0.75, 1.875) Initial centroid for
C2 (μ2): ((1.75 + 6) / 2, (0.25 + 3) / 2) = (3.875, 1.625)
Step 2: Assign each point to the nearest cluster based on Euclidean distance.
Assign x1 to C1 because it's closer to μ1. Assign x2 to C1 because it's closer to μ1. Assign
x3 to C2 because it's closer to μ2. Assign x4 to C1 because it's closer to μ1. Assign x5 to
C2 because it's closer to μ2.
Updated clusters: C1: {x1, x2, x4} C2: {x3, x5}
Step 3: Recalculate the centroids for each cluster.
New centroid for C1 (μ1): ((0.5 + 1 + 4) / 3, (1.75 + 2 + 1) / 3) = (1.5, 1.583) New centroid
for C2 (μ2): ((1.75 + 6) / 2, (0.25 + 3) / 2) = (3.875, 1.625)
Step 4: Repeat steps 2 and 3 until convergence. Check if the centroids remain the same
or the assignment of points to clusters no longer changes significantly.
In this case, let's check the assignments:
 x1, x2, and x4 still belong to C1.
 x3 and x5 still belong to C2.
The assignments haven't changed, and the centroids haven't changed either. Therefore,
the algorithm has converged.
Final clusters: C1: {x1, x2, x4} C2: {x3, x5}
Final centroids: μ1: (1.5, 1.583) μ2: (3.875, 1.625)
The k-means algorithm has converged, and the data points have been clustered into two
groups based on their proximity to the centroids.

You might also like