You are on page 1of 13

MACHINE LEARNING USING PYTHON

CAP4013L
School of Engineering & Sciences

Department of Computer Sciences and Engineering

Practical File

Submitted By
Student Name Pankaj Kumar
Enrolment Number 220160307085
Programme Master of Computer Application
Department Computer Science and Engineering
Session/Semester 2022-2024/Third Semester

Submitted To
Faculty Name Dr. Apeksha Mittal
INDEX

S.no Aim of Experiment Date Sign


1. Download a dataset from Kaggle (.csv format, atleast 1000 19-OCT-2023
rows and 20 columns) and
write a program in python programming language to perform
the following operations:
i) Read the dataset file in Python IDE.
ii) Display the dataset
iii) Display the shape of the dataset.
iv) Display the datatypes of the attributes of the dataset.
v) Find out the mean, median and mode of all the numeric
columns.
vi) Describe the entire dataset in terms of count, min, max,
standard deviation, variance
etc.
2. Write a program in python to implement Linear Regression. 30-OCT-2023

3. Write a Program in python to implement Binary Logistic 03-NOV-2023


Regression on a dataset
downloaded from Kaggle.

4. Write a Program in python to implement Naïve Bayes on the 08-NOV-2023


iris dataset. Study the
confusion matrix.

5. Write a program in Python to implement Naïve Bayes 16-NOV-2023


Algorithm on a dataset from Kaggle.
Also print Confusion Matrix, Accuracy, Precision, Recall.

6. Write a program in python to implement Support Vector 21-NOV-2023


Machine on the iris dataset.
1. Download a dataset from Kaggle (.csv format, atleast 1000 rows and 20
columns) and write
a program in python programming language to perform the following
operations:
i) Read the dataset file in Python IDE.
ii) Display the dataset
iii) Display the shape of the dataset.
iv) Display the datatypes of the attributes of the dataset.
v) Find out the mean, median and mode of all the numeric columns.
vi) Describe the entire dataset in terms of count, min, max, standard deviation, variance

# Import necessary libraries


import pandas as pd
i) Read the dataset file in Python IDE
# Replace 'path/to/titanic_dataset.csv' with the actual file path
file_path = 'path/to/match.csv'
df = pd.read_csv("match.csv")
ii) Display the dataset
print("Dataset:")
print(df)
iii) Display the shape of the dataset
print("\nShape of the dataset:")
print(df.shape)
iv) Display the datatypes of the attributes of the dataset
print("\nDatatypes of the attributes:")
print(df.dtypes)
v) Find out the mean, median, and mode of all the numeric columns
print("\nMean of numeric columns:")
print(df.mean())
print("\nMedian of numeric columns:")
print(df.median())
print("\nMode of numeric columns:")
print(df.mode().iloc[0])

vi) Describe the entire dataset in terms of count, min, max, standard deviation, variance,
etc.
print("\nSummary statistics of the dataset:")
print(df.describe())
2. Write a program in python to implement Linear Regression
import numpy as np
import matplotlib.pyplot as plt
# Generate some random data for demonstration purposes
np.random.seed(42)
X = 2 * np.random.rand(100, 1)
y = 4 + 3 * X + np.random.randn(100, 1)
# Visualize the data
plt.scatter(X, y)
plt.xlabel('X')
plt.ylabel('y')
plt.title('Generated Data for Linear Regression')
plt.show()
# Linear Regression implementation using NumPy
X_b = np.c_[np.ones((100, 1)), X] # Add bias term to X
theta_best = np.linalg.inv(X_b.T.dot(X_b)).dot(X_b.T).dot(y)

# Print the calculated parameters


print("Intercept (theta_0):", theta_best[0][0])
print("Slope (theta_1):", theta_best[1][0])

# Make predictions on new data


X_new = np.array([[0], [2]])
X_new_b = np.c_[np.ones((2, 1)), X_new]
y_predict = X_new_b.dot(theta_best)

# Plot the linear regression line


plt.plot(X_new, y_predict, "r-")
plt.scatter(X, y)
plt.xlabel('X')
plt.ylabel('y')
plt.title('Linear Regression Fit')
plt.show()
3. Write a Program in python to implement Binary Logistic Regression on a
dataset downloaded from Kaggle
I take Titanic dataset from Kaggle
# Import necessary libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, confusion_matrix, precision_score,
recall_score

# Load the Titanic dataset (replace 'path/to/titanic.csv' with the actual file path)
df = pd.read_csv('Titanic.csv')

# Preprocess the data (handle missing values, encode categorical variables, etc.)
# For simplicity, let's drop some irrelevant columns
df = df[['Pclass', 'Sex', 'Age', 'SibSp', 'Parch', 'Fare', 'Survived']].dropna()

# Convert categorical variables to numerical using one-hot encoding


df = pd.get_dummies(df, columns=['Sex'], drop_first=True)

# Separate features and target variable


X = df.drop('Survived', axis=1)
y = df['Survived']

# Split the dataset into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the features (optional but often recommended)


scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Create and train the Logistic Regression model


logreg_model = LogisticRegression()
logreg_model.fit(X_train, y_train)

# Make predictions on the test set


y_pred = logreg_model.predict(X_test)

# Evaluate the model


accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)

# Print the results


print("Accuracy:", accuracy)
print("Precision:", precision)
print("Recall:", recall)
print("Confusion Matrix:")
print(conf_matrix)

4. Write a Program in Python to implement Naïve Bayes on iris Dataset . Study


the Confusion Matrix
# Import necessary libraries
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import confusion_matrix, accuracy_score
from sklearn import datasets

# Load the Iris dataset


iris = datasets.load_iris()
X = iris.data
y = iris.target

# Split the dataset into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Create a Naive Bayes model (Gaussian Naive Bayes for continuous features)
nb_model = GaussianNB()

# Train the model


nb_model.fit(X_train, y_train)

# Make predictions on the test set


y_pred = nb_model.predict(X_test)

# Evaluate the model


accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)

# Display the results


print("Accuracy:", accuracy)
print("Confusion Matrix:")
print(conf_matrix)

5. Write a Program in Python To implement Naive Bayes Algorithm on a


Dataset From Kaggle. Also Print Confusion Matrix ,Accuracy ,Precision ,Recall.

# Import necessary libraries


import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn import metrics
from sklearn.metrics import confusion_matrix, accuracy_score, precision_score,
recall_score

# Load the Titanic dataset (you can download it from Kaggle or use seaborn library to load
it)
# For example, using seaborn:
# import seaborn as sns
# df = sns.load_dataset('titanic')

# Assuming you have a 'titanic.csv' file


df = pd.read_csv('Titanic.csv')

# Preprocess the data (you may need to handle missing values, encode categorical variables,
etc.)
# For simplicity, let's drop some irrelevant columns
df = df[['Pclass', 'Sex', 'Age', 'SibSp', 'Parch', 'Fare', 'Survived']].dropna()

# Convert categorical variables to numerical using one-hot encoding


df = pd.get_dummies(df, columns=['Sex'], drop_first=True)

# Separate features and target variable


X = df.drop('Survived', axis=1)
y = df['Survived']

# Split the dataset into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Create and train the Naive Bayes model (Gaussian Naive Bayes for numerical features)
naive_bayes = GaussianNB()
naive_bayes.fit(X_train, y_train)

# Make predictions on the test set


y_pred = naive_bayes.predict(X_test)

# Evaluate the model


conf_matrix = confusion_matrix(y_test, y_pred)
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)

# Print the results


print("Confusion Matrix:")
print(conf_matrix)
print("\nAccuracy:", accuracy)
print("Precision:", precision)
print("Recall:", recall)

6. Write a program in python to implement Support Vector Machine on the iris


dataset.

# Import necessary libraries


import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, confusion_matrix

# Load the Iris dataset


iris = datasets.load_iris()
X = iris.data
y = iris.target

# Split the dataset into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create and train the Support Vector Machine model


svm_model = SVC(kernel='linear') # You can try different kernels like 'rbf', 'poly', etc.
svm_model.fit(X_train, y_train)

# Make predictions on the test set


y_pred = svm_model.predict(X_test)

# Evaluate the model


accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)

# Print the results


print("Accuracy:", accuracy)
print("Confusion Matrix:")
print(conf_matrix)

# Visualization (2D plot for simplicity, considering only the first two features)
plt.figure(figsize=(8, 6))

# Plot the decision boundary


h = .02 # Step size in the mesh
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
Z = svm_model.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.contourf(xx, yy, Z, cmap=plt.cm.coolwarm, alpha=0.8)

# Plot the points


scatter = plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.coolwarm)
plt.xlabel('Sepal length')
plt.ylabel('Sepal width')
plt.title('Support Vector Machine on Iris Dataset')
plt.legend(*scatter.legend_elements(), title='Classes')

plt.show()

You might also like