ML - LAB - FILE Pankaj

MACHINE LEARNING USING PYTHON
CAP4013L
School of Engineering & Sciences
Department of Computer Sciences and Engineering
Practical File
Submitted By
Student Name Pankaj Kumar
Enrolment Number 220160307085
Programme Master of Computer Application
Department Computer Science and Engineering
Session/Semester 2022-2024/Third Semester
Submitted To
Faculty Name Dr. Apeksha Mittal
INDEX
S.no Aim of Experiment Date Sign

1. Download a dataset from Kaggle (.csv format, atleast 1000 19-OCT-2023
rows and 20 columns) and
write a program in python programming language to perform
the following operations:
i) Read the dataset file in Python IDE.
ii) Display the dataset
iii) Display the shape of the dataset.
iv) Display the datatypes of the attributes of the dataset.
v) Find out the mean, median and mode of all the numeric
columns.
vi) Describe the entire dataset in terms of count, min, max,
standard deviation, variance
etc.
2. Write a program in python to implement Linear Regression. 30-OCT-2023
3. Write a Program in python to implement Binary Logistic 03-NOV-2023

Regression on a dataset
downloaded from Kaggle.
4. Write a Program in python to implement Naïve Bayes on the 08-NOV-2023

iris dataset. Study the
confusion matrix.
5. Write a program in Python to implement Naïve Bayes 16-NOV-2023

Algorithm on a dataset from Kaggle.
Also print Confusion Matrix, Accuracy, Precision, Recall.
6. Write a program in python to implement Support Vector 21-NOV-2023

Machine on the iris dataset.
1. Download a dataset from Kaggle (.csv format, atleast 1000 rows and 20
columns) and write
a program in python programming language to perform the following
operations:
i) Read the dataset file in Python IDE.
iii) Display the shape of the dataset.
iv) Display the datatypes of the attributes of the dataset.
v) Find out the mean, median and mode of all the numeric columns.
vi) Describe the entire dataset in terms of count, min, max, standard deviation, variance
# Import necessary libraries

import pandas as pd
i) Read the dataset file in Python IDE
# Replace 'path/to/titanic_dataset.csv' with the actual file path
file_path = 'path/to/match.csv'
df = pd.read_csv("match.csv")
print("Dataset:")
print(df)
iii) Display the shape of the dataset
print("\nShape of the dataset:")
print(df.shape)
iv) Display the datatypes of the attributes of the dataset
print("\nDatatypes of the attributes:")
print(df.dtypes)
v) Find out the mean, median, and mode of all the numeric columns
print("\nMean of numeric columns:")
print(df.mean())
print("\nMedian of numeric columns:")
print(df.median())
print("\nMode of numeric columns:")
print(df.mode().iloc[0])
vi) Describe the entire dataset in terms of count, min, max, standard deviation, variance,
etc.
print("\nSummary statistics of the dataset:")
print(df.describe())
2. Write a program in python to implement Linear Regression
import numpy as np
import matplotlib.pyplot as plt
# Generate some random data for demonstration purposes
np.random.seed(42)
X = 2 * np.random.rand(100, 1)
y = 4 + 3 * X + np.random.randn(100, 1)
# Visualize the data
plt.scatter(X, y)
plt.xlabel('X')
plt.ylabel('y')
plt.title('Generated Data for Linear Regression')
plt.show()
# Linear Regression implementation using NumPy
X_b = np.c_[np.ones((100, 1)), X] # Add bias term to X
theta_best = np.linalg.inv(X_b.T.dot(X_b)).dot(X_b.T).dot(y)
# Print the calculated parameters

print("Intercept (theta_0):", theta_best[0][0])
print("Slope (theta_1):", theta_best[1][0])
# Make predictions on new data

X_new = np.array([[0], [2]])
X_new_b = np.c_[np.ones((2, 1)), X_new]
y_predict = X_new_b.dot(theta_best)
# Plot the linear regression line

plt.plot(X_new, y_predict, "r-")
plt.scatter(X, y)
plt.xlabel('X')
plt.ylabel('y')
plt.title('Linear Regression Fit')
plt.show()
3. Write a Program in python to implement Binary Logistic Regression on a
dataset downloaded from Kaggle
I take Titanic dataset from Kaggle
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, confusion_matrix, precision_score,
recall_score
# Load the Titanic dataset (replace 'path/to/titanic.csv' with the actual file path)
df = pd.read_csv('Titanic.csv')
# Preprocess the data (handle missing values, encode categorical variables, etc.)
# For simplicity, let's drop some irrelevant columns
df = df[['Pclass', 'Sex', 'Age', 'SibSp', 'Parch', 'Fare', 'Survived']].dropna()
# Convert categorical variables to numerical using one-hot encoding

df = pd.get_dummies(df, columns=['Sex'], drop_first=True)
# Separate features and target variable

X = df.drop('Survived', axis=1)
y = df['Survived']
# Split the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Standardize the features (optional but often recommended)

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
# Create and train the Logistic Regression model

logreg_model = LogisticRegression()
logreg_model.fit(X_train, y_train)
# Make predictions on the test set

y_pred = logreg_model.predict(X_test)
# Evaluate the model

accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
# Print the results

print("Accuracy:", accuracy)
print("Precision:", precision)
print("Recall:", recall)
print("Confusion Matrix:")
print(conf_matrix)
4. Write a Program in Python to implement Naïve Bayes on iris Dataset . Study

the Confusion Matrix
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import confusion_matrix, accuracy_score
from sklearn import datasets
# Load the Iris dataset

iris = datasets.load_iris()
X = iris.data
y = iris.target

# Create a Naive Bayes model (Gaussian Naive Bayes for continuous features)
nb_model = GaussianNB()
# Train the model

nb_model.fit(X_train, y_train)

y_pred = nb_model.predict(X_test)

# Display the results

print(conf_matrix)
5. Write a Program in Python To implement Naive Bayes Algorithm on a

Dataset From Kaggle. Also Print Confusion Matrix ,Accuracy ,Precision ,Recall.

import pandas as pd
from sklearn.naive_bayes import GaussianNB
from sklearn import metrics
from sklearn.metrics import confusion_matrix, accuracy_score, precision_score,
recall_score
# Load the Titanic dataset (you can download it from Kaggle or use seaborn library to load
it)
# For example, using seaborn:
# import seaborn as sns
# df = sns.load_dataset('titanic')
# Assuming you have a 'titanic.csv' file

df = pd.read_csv('Titanic.csv')
# Preprocess the data (you may need to handle missing values, encode categorical variables,
etc.)
# For simplicity, let's drop some irrelevant columns
df = df[['Pclass', 'Sex', 'Age', 'SibSp', 'Parch', 'Fare', 'Survived']].dropna()
# Convert categorical variables to numerical using one-hot encoding

df = pd.get_dummies(df, columns=['Sex'], drop_first=True)
# Separate features and target variable

X = df.drop('Survived', axis=1)
y = df['Survived']

# Create and train the Naive Bayes model (Gaussian Naive Bayes for numerical features)
naive_bayes = GaussianNB()
naive_bayes.fit(X_train, y_train)

y_pred = naive_bayes.predict(X_test)

precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
# Print the results

print(conf_matrix)
print("\nAccuracy:", accuracy)
print("Precision:", precision)
print("Recall:", recall)
6. Write a program in python to implement Support Vector Machine on the iris

dataset.

import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, confusion_matrix
# Load the Iris dataset

iris = datasets.load_iris()
X = iris.data
y = iris.target

# Create and train the Support Vector Machine model

svm_model = SVC(kernel='linear') # You can try different kernels like 'rbf', 'poly', etc.
svm_model.fit(X_train, y_train)

y_pred = svm_model.predict(X_test)

# Print the results

print(conf_matrix)
# Visualization (2D plot for simplicity, considering only the first two features)
plt.figure(figsize=(8, 6))
# Plot the decision boundary

h = .02 # Step size in the mesh
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
Z = svm_model.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.contourf(xx, yy, Z, cmap=plt.cm.coolwarm, alpha=0.8)
# Plot the points

scatter = plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.coolwarm)
plt.xlabel('Sepal length')
plt.ylabel('Sepal width')
plt.title('Support Vector Machine on Iris Dataset')
plt.legend(*scatter.legend_elements(), title='Classes')
plt.show()

ML - LAB - FILE Pankaj

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ML - LAB - FILE Pankaj

Uploaded by

Copyright:

Available Formats

MACHINE LEARNING USING PYTHON

Department of Computer Sciences and Engineering

S.no Aim of Experiment Date Sign

3. Write a Program in python to implement Binary Logistic 03-NOV-2023

4. Write a Program in python to implement Naïve Bayes on the 08-NOV-2023

5. Write a program in Python to implement Naïve Bayes 16-NOV-2023

6. Write a program in python to implement Support Vector 21-NOV-2023

# Import necessary libraries

# Print the calculated parameters

# Make predictions on new data

# Plot the linear regression line

# Convert categorical variables to numerical using one-hot encoding

# Separate features and target variable

# Split the dataset into training and testing sets

# Standardize the features (optional but often recommended)

# Create and train the Logistic Regression model

# Make predictions on the test set

# Evaluate the model

# Print the results

4. Write a Program in Python to implement Naïve Bayes on iris Dataset . Study

# Load the Iris dataset

# Split the dataset into training and testing sets

# Train the model

# Make predictions on the test set

# Evaluate the model

# Display the results

5. Write a Program in Python To implement Naive Bayes Algorithm on a

# Import necessary libraries

# Assuming you have a 'titanic.csv' file

# Convert categorical variables to numerical using one-hot encoding

# Separate features and target variable

# Split the dataset into training and testing sets

# Make predictions on the test set

# Evaluate the model

# Print the results

6. Write a program in python to implement Support Vector Machine on the iris

# Import necessary libraries

# Load the Iris dataset

# Split the dataset into training and testing sets

# Create and train the Support Vector Machine model

# Make predictions on the test set

# Evaluate the model

# Print the results

# Plot the decision boundary

# Plot the points

You might also like