You are on page 1of 26

ACADEMICS

Document ID 2023-24/RPSIT/AI&DS/LM/02 Document Name Lab Manual

Subject Name Machine Learning Laboratory Subject Code AD3461


Artificial Intelligence & Data
Department Year / Sem II / IV
Science

Ex. No. 1 :
For a given set of training data examples stored in a .CSV file, implemen and demonstrate the
Candidate-Elimination algorithm to output a description of the set of all hypotheses
consistenwith the training examples.

Aim:
To implement and demonstrate the Candidate-Elimination algorithm to output a
description of the set of all hypotheses consistent with the training example stored in a .CSV file.

Algorithm:

Input: Dataset (features and target class)


Output: Specific hypothesis (specific_h), General hypotheses (general_h)

1. Load Dataset from 'Ex1_data.csv'


2. Separate 'concepts' (features) and 'target' (class labels) from Dataset
3. Initialize specific_h with the first instance of concepts
4. Initialize general_h with a list containing a hypothesis as general as possible (all "?")
5. For each instance and its corresponding target in the Dataset:
a. If target is "yes" (positive example):
i. For each feature in specific_h:
- If the feature value does not match the instance's feature value, set it to "?"
ii. Update each hypothesis in general_h:
- Set the feature to "?" if it does not match the instance
iii. Prune general_h:
- Keep only those hypotheses that are as general as or more general than specific_h
b. If target is "no" (negative example):
i. Initialize a temporary list general_h_new
ii. For each hypothesis in general_h:
- For each feature in the hypothesis:
• If the feature is "?", generate new hypotheses for each possible value except the
instance's value
• If the feature specifies a value different from the instance, copy the hypothesis to
general_h_new
iii. Update general_h to be general_h_new
iv. Prune general_h:
- Keep only those hypotheses that are at least as general as specific_h
6. Final Pruning of general_h:
a. Remove completely general hypotheses (all "?") from general_h, as they don't
contribute to distinguishing instances

7. Return specific_h and general_h as the final specific and general hypotheses consistent with the
training data

End Algorithm

Programe:
Data Set Used (Ex1_data.csv):

Sky,Temp,Humidity,Wind,Water,Forecast,EnjoySport
sunny,warm,normal,strong,warm,same,yes
sunny,warm,high,strong,warm,same,yes
rainy,cold,high,strong,warm,change,no
sunny,warm,high,strong,cool,change,yes

Output :

Final Specific Hypothesis: ['sunny' 'warm' '?' 'strong' '?' '?']


Final General Hypotheses: [['sunny', '?', '?', '?', '?', '?'], ['?', 'warm', '?', '?', '?', '?']]
Ex. No. 2:

Write a program to demonstrate the working of the decision tree based ID3 algorithm. Use an
appropriate data set for building the decision tree and apply this knowledge to classify a new
sample.
Aim:
To demonstrate the working of the decision tree based ID3 algorithm with an appropriate data set
for building the decision tree and to apply the knowledge to classify a new sample.
Algorithm:

Input: Dataset (features and target class)


Output: Decision tree

1. Load Dataset from 'data.csv'


- Dataset, Features = Load_CSV('data.csv')

2. Initialize the root of the tree


- Node root = new Node

3. Function Build_Tree(Data, Features)


a. If all instances in Data have the same target value:
- Create a leaf node with this target value
b. Else:
i. Compute Gain for each feature in Data
ii. Choose the Feature with the highest Gain as Split_Feature
iii. Create a new Node with Split_Feature
iv. For each possible value of Split_Feature:
- Create Subtables of Data where Split_Feature equals this value
- Recursively call Build_Tree on each Subtable with remaining Features
- Add the result as a child of Node

4. Recursively build the decision tree


- DecisionTree = Build_Tree(Dataset, Features)

5. Function Classify(Node, Instance, Features)


a. If Node is a leaf:
- Return the answer of Node
b. Else:
i. Find the child of Node that corresponds to the value of Node's attribute in Instance
ii. Recursively call Classify with this child, Instance, and Features

6. Classify new instances


- For each Instance in testdata:
- Classification = Classify(DecisionTree, Instance, Features)
- Print "The label for test instance: ", Classification

7. Procedure Load_CSV(filename)
Open filename as CSV
Read lines into dataset
Extract headers
Return dataset and headers

8. Procedure Subtables(data, col, delete)


For each unique value in column col:
Create a subset of data where column col equals this value
If delete is true, remove column col from this subset
Store this subset in a dictionary keyed by the unique value

9. Procedure Entropy(S)
Calculate and return the entropy of the set S

10. Procedure Compute_Gain(data, col)


For each subtable created by splitting data on col:
Calculate the entropy
Compute the information gain as the difference in entropy before and after the split
Return the information gain

11. Procedure Print_Tree(node, level)


If node is a leaf, print its label
Else, print the node's attribute and recursively print each child indented

Programme:
Data Set Used:

Ex2_data.csv

Outlook,Temperature,Humidity,Wind,Answer
sunny,hot,high,weak,no
sunny,hot,high,strong,no
overcast,hot,high,weak,yes
rain,mild,high,weak,yes
rain,cool,normal,weak,yes
rain,cool,normal,strong,no
overcast,cool,normal,strong,yes
sunny,mild,high,weak,no
sunny,cool,normal,weak,yes
rain,mild,normal,weak,yes
sunny,mild,normal,strong,yes
overcast,mild,high,strong,yes
overcast,hot,normal,weak,yes
rain,mild,high,strong,no

Ex2_data_test.csv

Outlook,Temperature,Humidity,Wind
rain,cool,normal,strong
sunny,mild,normal,strong

Output :

The decision tree for the dataset using ID3 algorithm is


Outlook
sunny
Humidity
normal
yes
high
no
rain
Wind
weak
yes
strong
no
overcast
yes
The test instance: ['rain', 'cool', 'normal', 'strong']
The label for test instance: no
The test instance: ['sunny', 'mild', 'normal', 'strong']
The label for test instance: yes
Ex. No.: 3

Build an Artificial Neural Network by implementing the Backpropagation algorithm and test
the same using appropriate data sets.

Aim:
To Build an Artificial Neural Network by implementing the Backpropagation algorithm and to
test the same using appropriate data sets.

Algorithm:


Progrmme:

Output:
Ex. No.: 4

Write a program to implement the naïve Bayesian classifier for a sample training data set
stored as a .CSV file and compute the accuracy with a few test data sets.

Aim:
To implement the naïve Bayesian classifier for a sample training data set stored as a .CSV file and
to compute the accuracy with a few test data sets.

Algorithm:
1. Load the CSV file and preprocess the data:
a. Read data from the CSV file.
b. Convert each string in the data to a floating point number.
2. Split the preprocessed data into a training set and a testing set:
a. Determine the size of the training set based on the given ratio.
b. Shuffle the data randomly.
c. Split the data into training and testing sets accordingly.
3. Separate the training data by class:
a. Create a dictionary to hold data for each class.
b. Sort data into the dictionary based on the class.
4. Summarize the dataset:
a. Calculate the mean and standard deviation for each attribute in the training set.
b. Store the summaries for each class.
5. Calculate probabilities:
a. Calculate the probability of a given data point belonging to each class.
6. Make predictions:
a. For each data point in the testing set, calculate class probabilities and choose the class with
the highest probability.
7. Evaluate the classifier:
a. Compare the predictions against the actual labels in the testing set to calculate accuracy.
b. Print the confusion matrix and classification report.
8. Execute the main process:
a. Call the functions in order to perform the classification.
b. Print out the accuracy and metrics.
Programme:
Data set:

1.5,2.2,0
1.6,2.1,0
1.7,2.3,0
1.8,2.2,0
............
.............

Output:
Ex. No.: 5

Implement naïve Bayesian Classifier model to classify a set of documents and measure the
accuracy, precision, and recall.

Aim:
To Implement naïve Bayesian Classifier model to classify a set of documents and to measure the
accuracy, precision, and recall.

Algorithm:

1. Import the necessary libraries for data handling, machine learning, and evaluation.
2. Load the dataset containing messages and their corresponding labels.
3. Convert the text labels to numerical form (encode 'pos' as 1 and 'neg' as 0).
4. Assign the messages to X and their encoded labels to y.
5. Split X and y into training and testing sets.
6. Create a CountVectorizer object to convert text messages into a matrix of token counts.
7. Fit the vectorizer to the training data and transform the training messages into a document-term
matrix (DTM).
8. Transform the testing messages into a DTM using the same vectorizer.
9. Initialize the MultinomialNB classifier.
10. Train the classifier using the training DTM and the corresponding labels.
11. Use the trained classifier to predict the labels of the testing data.
12. Evaluate the classifier by comparing the predicted labels to the true labels from the testing set.
13. Print the accuracy, confusion matrix, precision, and recall to assess the classifier's performance.
14. End the algorithm.
Program:

Data Set:

I love this sandwich,pos


This is an amazing place,pos
I feel very good about these beers,pos
This is my best work,pos
What an awesome view,pos
I do not like this restaurant,neg
I am tired of this stuff,neg
I can't deal with this,neg
He is my sworn enemy,neg
My boss is horrible,neg
This is an awesome place,pos
I do not like the taste of this juice,neg
I love to dance,pos
I am sick and tired of this place,neg
What a great holiday,pos
That is a bad locality to stay,neg
We will have good fun tomorrow,pos
I went to my enemy's house today,neg
Output:
Ex. No.: 6
Write a program to construct a Bayesian network to diagnose CORONA infection using
standard WHO Data Set.

Aim:
To Implement naïve Bayesian Classifier model to classify a set of documents and to measure the
accuracy, precision, and recall.

Algorithm:

1. Import required libraries: numpy, pandas, pgmpy.


2. Load the dataset:
3. Read the CSV file containing the data into a pandas DataFrame.
4. Replace missing values represented by '?' with NaN.
5. Display initial data:
6. Print the first few instances of the dataset.
7. Print the attributes and their data types from the DataFrame.
8. Define the structure of the Bayesian Network:
9. Specify the nodes and edges between them representing the conditional dependencies (for example,
'Fever' and 'Cough' leading to 'InfectionStatus').
10. Learn the Conditional Probability Distributions (CPDs) using Maximum Likelihood Estimation:
11. Fit the Bayesian Network model to the data using Maximum Likelihood Estimator.
12. Perform inference using the Bayesian Network:
13. Initialize the Variable Elimination process on the model.
14. Compute the probability of infection given evidence:
15. Query the Bayesian Network for the probability of 'InfectionStatus' given evidence (such as
'Fever' and 'Cough').
16. Print the results of the query to output the probability of infection.
Program:

Output:
Ex. No.: 7
Apply EM algorithm to cluster a set of data stored in a .CSV file. Use the same data set for
clustering using the k-Means algorithm. Compare the results of these two algorithms.

Aim:
To apply EM algorithm to cluster a set of data stored in a .CSV file, to use the same data set
for clustering using the k-Means algorithm and to compare the results of these two
algorithms.

Algorithm:

1. Begin by importing necessary libraries for data handling, clustering, and plotting.
2. Load the dataset 'Ex7_data.csv' using pandas into a DataFrame iris_csv.
3. Prepare the data for clustering:
a) Remove the 'Target' column from iris_csv and assign to X.
b) Map string labels in 'Target' to integers using target_mapping and assign to y.
4. Implement K-Means clustering:
a) Create a KMeans object with 3 clusters and fit it to the data X.
5. Plot the real classifications of the dataset:
a) Initialize a subplot.
b) Plot 'Petal_Length' vs 'Petal_Width' using the actual labels y.
6. Plot the K-Means cluster assignments:
a) Initialize a second subplot.
b) Plot 'Petal_Length' vs 'Petal_Width' using the labels from K-Means.
7. Standardize the data:
a) Scale the features in X using StandardScaler and assign to xs.
8. Implement Gaussian Mixture Model (GMM) clustering:
a) Create a GMM object with 3 components and fit it to the standardized data xs.
b) Predict the labels using GMM and assign to y_gmm.
9. Plot the GMM cluster assignments:
a) Initialize a new figure.
b) Plot 'Petal_Length' vs 'Petal_Width' using the labels from GMM.
10. Evaluate the clustering models:
a) Print accuracy scores and confusion matrices for K-Means and GMM using true labels y
and predicted labels from the models.
11. Display the plots.
12. End.
Programme:
Output :
Ex. No.: 8

Write a program to implement k-Nearest Neighbour algorithm to classify the iris data
set. Print both correct and wrong predictions.

Aim:
To implement k-Nearest Neighbour algorithm to classify the iris data set and to print
both correct and wrong predictions.

Algorithm:
Programme:

Output:
Ex. No.: 9
Implement the non-parametric Locally Weighted Regression algorithm in order to
fit data points. Select an appropriate data set for your experiment and draw graphs.

Aim:
To Implement the non-parametric Locally Weighted Regression algorithm in order to fit
data points by selecting an appropriate data set for the experiment and to draw graphs.

Algorithm:

1. Import required libraries (numpy for calculations, matplotlib for plotting).


2. Define a function local_regression to perform LWR:
a) Augment the query point x0 and input dataset X with a bias term.
b) Calculate weights for each instance in X using the Gaussian kernel centered at x0.
c) Compute the weighted least squares solution (beta) using the normal equation.
d) Return the predicted value for x0 using the calculated coefficients.
3. Define a function plot_lwr to visualize the regression:
a) Create a range domain spanning the scope of X.
b) Calculate predictions over domain using LWR for visualization.
c) Plot the original dataset X and Y.
d) Overlay the plot with LWR predictions.
e) Display the plot with appropriate legends and axis labels.
4. Generate a synthetic dataset X and corresponding targets Y with added noise.
5. Apply the plot_lwr function for different values of the bandwidth parameter tau.
6. End.
Programme:

Output:

You might also like