ME P4252-II Semester - MACHINE LEARNING

TABLE OF CONTENTS
Ex.No. Date Title of the Experiments Page No. Marks Signature
1 Linear Regression with a Real

Dataset
2 Binary classification model
3 Classification with Nearest

Neighbors
4 Analyze deltas between training set
and Validation set results to
determine the Model is overfitting
5 Implement the k-means algorithm
using the Given dataset
6 Implement the naïve bayes

classifer using the given dataset
7 Project
Brain tumour detection using
machine learning algorithm
VISION
To create an environment which is conducive to produce competent Computer Science Engineers through
quality education and research-oriented education and equip them for the needs of the industry and society.
MISSION
The Department strives to contribute to the expansion of knowledge in the discipline of Computer Science and
Engineering by
 Adopting an efficient teaching learning process in concurrence with increasing industrial demands.
 Ensuring technical proficiency, facilitating to pursue higher studies and carry out Research &
Development activities.
 Developing problem solving and analytical skills with deep knowledge in thorough understanding of
basic sciences and Computer Science Engineering.
 Infusing managerial and entrepreneurship skills to become ethical, socially responsible and
competitive professionals.
Program Specific Outcomes:
1. Design and development of software and firmware solutions using latest Computer Science tools and
technologies to address societal problems.
2. Apply acquired knowledge to involve enthusiastically in software development, software testing, storage,
computing and business intelligence sectors.
3. Create a conducive environment to excel in their career by using their technical expertise in the
latesttechnologies and update their knowledge continuously in Computer Science and Engineering.
Programme Outcomes:
Engineering Graduates will be able to:
1. Engineering knowledge: Apply the knowledge of mathematics, science, engineering fundamentals,

and an engineering specialization to the solution of complex engineering problems.
2. Problem analysis: Identify, formulate, review research literature, and analyze complex engineering
problems reaching substantiated conclusions using first principles of mathematics, natural sciences, and
engineering sciences.
3. Design/development of solutions: Design solutions for complex engineering problems and design
system components or processes that meet the specified needs with appropriate consideration for the
public health and safety, and the cultural, societal, and environmental considerations.
4. Conduct investigations of complex problems: Use research-based knowledge and research methods
including design of experiments, analysis and interpretation of data, and synthesis of the information to
provide valid conclusions.
5. Modern tool usage: Create, select, and apply appropriate techniques, resources, and modern
engineering and IT tools including prediction and modeling to complex engineering activities with an
understanding of the limitations.
6. The engineer and society: Apply reasoning informed by the contextual knowledge to assess societal,
health, safety, legal and cultural issues and the consequent responsibilities relevant to the professional
engineering practice.
7. Environment and sustainability: Understand the impact of the professional engineering solutions in
societal and environmental contexts, and demonstrate the knowledge of, and need for sustainable
development.
8. Ethics: Apply ethical principles and commit to professional ethics and responsibilities and norms of
the engineering practice.
9. Individual and team work: Function effectively as an individual, and as a member or leader in diverse
teams, and in multidisciplinary settings.
10. Communication: Communicate effectively on complex engineering activities with the engineering
community and with society at large, such as, being able to comprehend and write effective reports and
design documentation, make effective presentations, and give and receive clear instructions.
11. Project management and finance: Demonstrate knowledge and understanding of the engineering and
management principles and apply these to one’s own work, as a member and leader in a team, to manage
projects and in multidisciplinary environments.
12. Life-long learning: Recognize the need for, and have the preparation and ability to engage in
independent and life-long learning in the broadest context of technological change.
Course Outcomes
CO1: Understand and outline problems for each type of machine learning
CO2: Design a Decision tree and Random forest for an application
CO3: Implement Probabilistic Discriminative and Generative algorithms for an application and analyze the
results.
CO4: Use a tool to implement typical Clustering algorithms for different types of applications.
CO5: Design and implement an HMM for a Sequence Model type of application and identify applications suitable
for different types of Machine Learning with suitable justification..
Ex.No : 1 Linear Regression with a Real Dataset
Date:
AIM:
Implement a Linear Regression with a Real Dataset (https://www.kaggle.com/harrywang/housing).
Experiment with different features in building a model. Tune the model's hyperparameters.
ALGORITHM:
1. Start the program.
2. Import Library and Dataset
3. Data Preprocessing
a. Encoding
i. Label encoding
ii. One hot encoding
iii. Dummy variable trap
4. first use matplotlib to import libraries and define functions for plotting the data. Depending on the data,
not all plots will be made.
5. In order to validated model we need to check few assumption of linear regression model.
The model assumption linear regression as follows
1. In our model the actual vs predicted plot is curve so linear assumption fails
2. The residual mean is zero and residual error plot right skewed
3. Q-Q plot shows as value log value greater than 1.5 trends to increase
4. The plot is exhibit heteroscedastic, error will insease after certian point.
5. Variance inflation factor value is less than 5, so no multicollearity
6. End The Program.
Program:
from mpl_toolkits.mplot3d import Axes3D

from sklearn.preprocessing import StandardScalerimport
matplotlib.pyplot as plt# plotting import numpy as np #
linear algebra
import os# accessing directory structure
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
print(os.listdir('../input'))
nRowsRead=1000# specify 'None' if want to read whole file
df1=pd.read_csv('../input/anscombe.csv',delimiter=',',nrows=nRowsRead)df1.dataframeName='anscombe.csv'
nRow,nCol=df1.shape
print(f'There are {nRow} rows and {nCol} columns')
df1.head(5)
plotPerColumnDistribution(df1,10,5)
plotCorrelationMatrix(df1,8) plotScatterMatrix(df1,6,15)
nRowsRead=1000# specify 'None' if want to read whole file
# housing.csv has 20641 rows in reality, but we are only loading/previewing the first1000 rows
df2=pd.read_csv('../input/housing.csv',delimiter=',',nrows=nRowsRead)df2.dataframeName='housing.csv'
nRow,nCol=df2.shape
print(f'There are {nRow} rows and {nCol} columns')
df2.head(5)
plotPerColumnDistribution(df2,10,5)
plotCorrelationMatrix(df2,8) plotScatterMatrix(df2,20,10)
Output:
RESULT:
Thus the python program was implemented and verified successfully.
Ex.No : 2 Binary classification model
Date:
AIM:
To Implement a binary classification model. That is, answers a binary question such as "Are houses in
this neighborhood above a certain price?"(data used from exercise 1).AModify the classification threshold and
determine how that modification influences the model. Experiment with different classification metrics to
determine your model's effectiveness.
ALGORITHM:
a. Encoding
i. Label encoding
4. Normalize the value
a. Create a binary label
b. Represent features in feature columns
c. Define functions in feature and train a model
d. Define a plotting function
5. Invoke the creating, training and plotting function
6. Evaluate the model against the test set
7. Experiment with the classification threshold
8. Summarize model performance
9. End the program.
Program:
Import numpy asnp

import pandas as pd
import matplotlib.pyplot as plt
fromsklearn.model_selectionimporttrain_test_split
fromsklearn.neighborsimportKNeighborsClassifier
fromsklearn.metricsimportconfusion_matrix
fromsklearn.metricsimportclassification_report
fromsklearn.metricsimportroc_curve
fromsklearn.metricsimportroc_auc_score
fromsklearn.model_selectionimportGridSearchCV
plt.style.use('ggplot')
df=pd.read_csv('../input/diabetes.csv')df.head()
df.shape X=df.drop('Outcome',axis=1).values
y=df['Outcome'].values
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.4,random_state=42,stratify=y)
neighbors=np.arange(1,9)
train_accuracy=np.empty(len(neighbors))
test_accuracy=np.empty(len(neighbors))
fori,kinenumerate(neighbors):
#Setup a knn classifier with k neighbors
knn=KNeighborsClassifier(n_neighbors=k)
#Fit the model

knn.fit(X_train,y_train)
#Compute accuracy on the training set

train_accuracy[i]=knn.score(X_train,y_train)
#Compute accuracy on the test set

test_accuracy[i]=knn.score(X_test,y_test) plt.title('k-
NN Varying number of neighbors')
plt.plot(neighbors,test_accuracy,label='Testing Accuracy')
plt.plot(neighbors,train_accuracy,label='Training accuracy')plt.legend()
plt.xlabel('Number of neighbors')
plt.ylabel('Accuracy')
plt.show()
knn=KNeighborsClassifier(n_neighbors=7)
knn.fit(X_train,y_train) knn.score(X_test,y_test)
y_pred=knn.predict(X_test)
confusion_matrix(y_test,y_pred)
pd.crosstab(y_test,y_pred,rownames=['True'],colnames=['Predicted'],margins=True)
print(classification_report(y_test,y_pred))
y_pred_proba=knn.predict_proba(X_test)[:,1]
fpr,tpr,thresholds=roc_curve(y_test,y_pred_proba)
plt.plot([0,1],[0,1],'k--')
plt.plot(fpr,tpr,label='Knn')
plt.xlabel('fpr') plt.ylabel('tpr')
plt.title('Knn(n_neighbors=7) ROC curve')
knn=KNeighborsClassifier()
knn_cv=GridSearchCV(knn,param_grid,cv=5)knn_cv.fit(X,y)
knn_cv.best_score_
knn_cv.best_params_
plt.show()
roc_auc_score(y_test,y_pred_proba)
param_grid= {'n_neighbors':np.arange(1,50)}
Output:
KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',

metric_params=None, n_jobs=1, n_neighbors=7, p=2,
weights='uniform')
0.7305194805194806
array([[165, 36],
[ 47, 60]])
Linkcode
Considering confusion matrix above:
True negative = 165
False positive = 36
True postive = 60
Fasle negative = 47
precision recall f1-score support
0 0.78 0.82 0.80 201

1 0.62 0.56 0.59 107
avg / total 0.73 0.73 0.73 308
ridSearchCV(cv=5, error_score='raise',
estimator=KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
metric_params=None, n_jobs=1, n_neighbors=5, p=2,
weights='uniform'),
fit_params=None, iid=True, n_jobs=1,
param_grid={'n_neighbors': array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
16, 17,
18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,
35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49])},
pre_dispatch='2*n_jobs', refit=True, return_train_score='warn',
scoring=None, verbose=0)
{'n_neighbors': 14}
RESULT:
Ex.No : 3 Classification with Nearest Neighbors
Date:
AIM:
To implement program for the Classification with Nearest Neighbors. In this question, you will use the
scikit-learn’s KNN classifier to classify real vs. fake news headlines. The aim of this question is for you to read
the scikit-learn API and get comfortable with training/validation splits.
ALGORITHM:
a. Encoding
i. Label encoding
4. Normalize the value
a. Create a binary label
b. Represent features in feature columns
c. Define functions in feature and train a model
d. Define a plotting function
5. Generating model for K=5
6. Model evaluation for K=5
7. Summarize model performance
8. End the program.
Program:
Import numpy as np
Import pandas as pd
Import matplotlib.pyplotasplt
fromsklearn.model_selectionimporttrain_test_split fromsklearn.neighborsimportKNeighborsClassifier
fromsklearn.metricsimportconfusion_matrix fromsklearn.metricsimportclassification_report
fromsklearn.metricsimportroc_curve fromsklearn.metricsimportroc_auc_score
fromsklearn.model_selectionimportGridSearchCV
plt.style.use('ggplot') df=pd.read_csv('../input/diabetes.csv') df.head()

df.shape X=df.drop('Outcome',axis=1).values y=df['Outcome'].values
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.4,random_state=42,stratify=y) neighbors=np.arange(1,9)
train_accuracy=np.empty(len(neighbors)) test_accuracy=np.empty(len(neighbors))
fori,kinenumerate(neighbors):
#Setup a knn classifier with k neighbors
knn=KNeighborsClassifier(n_neighbors=k)
#Fit the model

knn.fit(X_train,y_train)
#Compute accuracy on the training set

train_accuracy[i]=knn.score(X_train,y_train)
#Compute accuracy on the test set test_accuracy[i]=knn.score(X_test,y_test) plt.title('k-NN Varying number of neighbors')
plt.plot(neighbors,test_accuracy,label='Testing Accuracy') plt.plot(neighbors,train_accuracy,label='Training accuracy')
plt.legend()
plt.xlabel('Number of neighbors') plt.ylabel('Accuracy')
plt.show() knn=KNeighborsClassifier(n_neighbors=7) knn.fit(X_train,y_train) knn.score(X_test,y_test)
y_pred=knn.predict(X_test) confusion_matrix(y_test,y_pred)
pd.crosstab(y_test,y_pred,rownames=['True'],colnames=['Predicted'],margins=True)
le =preprocessing.LabelEncoder()
# Converting string labels into numbers. weather_encoded=le.fit_transform(weather) print(weather_encoded)
temp_encoded=le.fit_transform(temp)
label=le.fit_transform(play) features=list(zip(weather_encoded,temp_encoded))
model.fit(features,label)
#Predict Output
predicted=model.predict([[0,2]])# 0:Overcast, 2:Mild print(predicted)

wine =datasets.load_wine() print(wine.data[0:5]) print(wine.target)
print(wine. print(wine. X_train,
.shape)
.shape)
,y_test=train_test_split(wine.data,wine.target,test_size=0.3)# 70% training and 30% test
knn=KNeighborsClassifier(n_neighbors=5)
#Train the model using the training sets knn.fit(X_train,y_train)
#Predict the response for test dataset y_pred=knn.predict(X_test)

print("Accuracy:",metrics.accuracy_score(y_test,y_pred))
#Create KNN Classifier knn=KNeighborsClassifier(n_neighbors=7)
#Train the model using the training sets knn.fit(X_train,y_train)
#Predict the response for test dataset y_pred=knn.predict(X_test)

print("Accuracy:",metrics.accuracy_score(y_test,y_pred))
Output:
[22011102212001]
[[1.42300000e+011.71000000e+002.43000000e+001.56000000e+01
1.27000000e+022.80000000e+003.06000000e+002.80000000e-01
2.29000000e+005.64000000e+001.04000000e+003.92000000e+00
1.06500000e+03]
[1.32000000e+011.78000000e+002.14000000e+001.12000000e+01
1.00000000e+022.65000000e+002.76000000e+002.60000000e-01
1.28000000e+004.38000000e+001.05000000e+003.40000000e+00
1.05000000e+03]
[1.31600000e+012.36000000e+002.67000000e+001.86000000e+01
1.01000000e+022.80000000e+003.24000000e+003.00000000e-01
2.81000000e+005.68000000e+001.03000000e+003.17000000e+00
1.18500000e+03]
[1.43700000e+011.95000000e+002.50000000e+001.68000000e+01
1.13000000e+023.85000000e+003.49000000e+002.40000000e-01
2.18000000e+007.80000000e+008.60000000e-013.45000000e+00
1.48000000e+03]
[1.32400000e+012.59000000e+002.87000000e+002.10000000e+01
1.18000000e+022.80000000e+002.69000000e+003.90000000e-01
1.82000000e+004.32000000e+001.04000000e+002.93000000e+00
7.35000000e+02]]
[0000000000000000000000000000000000000
0000000000000000000000111111111111111
1111111111111111111111111111111111111
1111111111111111111222222222222222222
222222222222222222222222222222]
Accuracy:0.685185185185
Accuracy:0.777777777778
RESULT:
EXP NO:4 ANALYZE DELTAS BETWEEN TRAINING SET AND
DATE: VALIDATION SET RESULTS TO DETERMINE THE
MODEL IS OVERFITTING
PROBLEM STATEMENT:
Analyze deltas between training set and validation set results. Test the trained model with a
test set to determine whether your trained model isoverfitting. Detect and fix a common training
problem.
AIM:
To analyze deltas between training set and validation set results todetermine whether the
trained model is overfitting.
ALGORITHM:
STEP1: Start the algorithm and implementing a program

STEP2: Open the jupyter notebook and activate the environment
STEP3:Create a new environment and Rename it.
STEP4:Then install the necessary packages and libraries in thecreated environment
on thejupyter notebook
STEP5: Train the data and splitting the training dataset
STEP6: Analyze deltas between training set and validation set results
STEP7:Test the train model with a test data to determine whether thetrained model is
Overfitting
INPUT DATASET:
F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 Label
-
25.673 14.478 -3.2051 2.1827 0.98786 -1.4679 52.952 -21.948 7.7545 F11 1
-
24.826 14.41 -3.4241 2.0805 0.39366 -0.7101 50.521 -21.963 6.3084 F12 0
-
24.358 14.414 -1.143 2.1846 0.59173 -0.7365 57.01 -21.958 10.883 F13 0
- - -
26.391 14.474 0.81757 2.0026 0.39366 0.53185 53.348 -21.956 7.1289 F14 0
-
25.698 14.527 0.50269 2.0685 0.39366 -1.9843 53.181 -21.962 8.6682 F15 1
-
24.966 14.389 0.57771 2.111 0.98786 0.51482 54.273 -21.96 5.4255 F16 1
-
26.271 14.586 -3.7381 2.1169 0.19559 0.19739 53.699 -21.968 2.6379 F17 0
-
24.651 14.275 -1.202 2.1401 0.39366 1.393 51.461 -21.96 5.7199 F18 1
-
24.594 14.493 -4.2662 2.0673 0.39366 -3.1282 52.738 -21.959 4.5568 F19 0
PROGRAM:
import pandas as pd
fromsklearn.datasets import make_classification
fromsklearn.model_selection import train_test_split
dataset=pd.read_csv("D:\\Overfit.csv") print(dataset)
X, y = make_classification(n_samples=10000, n_features=20, n_informative=5,n_redundant=15,

random_state=1)
print(X.shape, y.shape)
X, y = make_classification(n_samples=10000, n_features=20, n_informative=5,n_redundant=15,

random_state=1)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
print(X_train.shape, X_test.shape, y_train.shape, y_test.shape) values = [i for i
in range(1, 21)]
for i in values:
model = DecisionTreeClassifier(max_depth=i)
model.fit(X_train, y_train)
train_yhat = model.predict(X_train)
train_acc = accuracy_score(y_train, train_yhat)train_scores.append(train_acc)
test_yhat = model.predict(X_test)
test_acc = accuracy_score(y_test, test_yhat)test_scores.append(test_acc)
print('>%d, train: %.3f, test: %.3f' % (i, train_acc, test_acc))pyplot.plot(values,
train_scores, '-o', label='Train')
pyplot.plot(values, test_scores, '-o', label='Test')
pyplot.legend()
pyplot.show()
OUTPUT:
1, train: 0.769, test: 0.761
2, train: 0.808, test: 0.804
3, train: 0.879, test: 0.878
4, train: 0.902, test: 0.896
5, train: 0.915, test: 0.903
6, train: 0.929, test: 0.918
7, train: 0.942, test: 0.921
8, train: 0.951, test: 0.924
9, train: 0.959, test: 0.926
10, train: 0.968, test: 0.923
RESULT:
EXP NO:5 IMPLEMENT THE K-MEANS ALGORITHM USING THE
DATE: GIVEN DATASET
PROBLEM STATEMENT:
Implement The K-Means Algorithm Using The Given Dataset

AIM:
To Implement A Program For The K-Means Algorithm Using The Given Dataset
ALGORITHM:
Step-1: Select the value of K, to decide the number of clusters to be formed.
Step-2: Select random K points which will act as centroids.
Step-3: Assign each data point, based on their distance from the randomly selected points
(Centroid), to the nearest/closest centroid which will form thepredefined clusters.
Step-4: place a new centroid of each cluster.
Step-5: Repeat step no.3, which reassign each datapoint to the new closestcentroid of each
cluster.
Step-6: If any reassignment occurs, then go to step-4 else go to Step 7.
Step-7: FINISH
INPUT DATASET:
Kingdom DNAtype SpeciesID Ncodons SpeciesName UUU UUC UUA UUG

Epizootic
haematopoietic 5.00E-
vrl 0 100217 1995 necrosis virus 0.01654 0.01203 04 0.00351
vrl 0 100220 1474 Bohleiridovirus 0.02714 0.01357 0.00068 0.00678
Sweet potato
vrl 0 100755 4862 leaf curl virus 0.01974 0.0218 0.01357 0.01543
Northern
cereal mosaic
vrl 0 100880 1915 virus 0.01775 0.02245 0.01619 0.00992
Soil-borne
cereal mosaic
vrl 0 100887 22831 virus 0.02816 0.01371 0.00767 0.03679
Human
adenovirus
vrl 0 101029 5274 type 7d 0.02579 0.02218 0.01479 0.01024
Apple latent
vrl 0 101688 3042 spherical virus 0.04635 0.01545 0.02005 0.024
Aconitum
vrl 0 101764 2801 latent virus 0.02285 0.02678 0.01214 0.02321
Pseudorabies
vrl 0 101947 2897 virus Ea 0.01105 0.02106 0.00035 0.00104
CUU CUC CUA CUG AUU AUC AUA AUG GUU
0.01203 0.03208 0.001 0.0401 0.00551 0.02005 0.00752 0.02506 0.01103
0.00407 0.02849 0.00204 0.0441 0.01153 0.0251 0.00882 0.03324 0.00814
0.00782 0.01111 0.01028 0.01193 0.02283 0.01604 0.01316 0.0218 0.01625
0.01567 0.01358 0.0094 0.01723 0.02402 0.02245 0.02507 0.02924 0.02089
0.0138 0.00548 0.00473 0.02076 0.02716 0.00867 0.0131 0.02773 0.02803
0.02294 0.00758 0.01782 0.01403 0.02636 0.01327 0.01896 0.02579 0.01877
0.02761 0.01611 0.01052 0.00493 0.02597 0.00888 0.01512 0.02268 0.02893
0.01714 0.02213 0.00893 0.01928 0.01785 0.02356 0.01107 0.01999 0.01714
0.00035 0.04142 0.00069 0.05868 0.00104 0.01933 0.00173 0.01691 0

GUC GUA GUG GCU GCC GCA GCG CCU CCC
0.0411 0.00902 0.03308 0.01003 0.05013 0.01554 0.01103 0.02356 0.03208
0.04071 0.00814 0.03256 0.01085 0.04885 0.01221 0.01357 0.00678 0.02714
0.01872 0.01213 0.0107 0.02406 0.01234 0.0144 0.00514 0.01604 0.0146
0.02141 0.01723 0.01932 0.02141 0.00679 0.02245 0.00522 0.01358 0.00418
0.00508 0.0092 0.02965 0.02878 0.00574 0.01572 0.01577 0.01007 0.00508
0.01346 0.00721 0.01782 0.02067 0.02313 0.01214 0.00265 0.01232 0.01953
0.00789 0.01151 0.01611 0.04011 0.01151 0.01118 0.00427 0.02794 0.01085
0.01142 0.00893 0.02499 0.03106 0.01357 0.01571 0.01321 0.01392 0.01107
0.04004 0.00069 0.04522 0.00276 0.06973 0.00069 0.05212 0.00173 0.04764
CCA CCG UGG GGU GGC GGA GGG UCU UCC

0.01203 0.00501 0.01003 0.01203 0.03158 0.01905 0.02456 0.01353 0.02155
0.01221 0.00407 0.01425 0.01221 0.01967 0.02239 0.01289 0.02103 0.01493
0.02098 0.0107 0.01728 0.01851 0.00864 0.01172 0.01892 0.01933 0.01419
0.0141 0.00574 0.01201 0.00992 0.00366 0.02402 0.02663 0.02872 0.00992
0.00604 0.00679 0.01205 0.03127 0.00775 0.00959 0.00797 0.02006 0.00359
0.02105 0.00322 0.01232 0.01631 0.01327 0.0256 0.01119 0.0201 0.01763
0.01216 0.00131 0.01216 0.02137 0.01052 0.01512 0.00986 0.03846 0.01085
0.01178 0.005 0.01142 0.01964 0.01714 0.01214 0.01464 0.01499 0.00536
0.00173 0.04177 0.01139 0.00242 0.04108 0.00621 0.02969 0.00449 0.01622
UCA UCG AGU AGC ACU ACC ACA ACG UAU

0.00251 0.00652 0.0015 0.01554 0.00501 0.02105 0.00902 0.01053 0.00501
0.00407 0.00475 0.00068 0.02035 0.0095 0.02782 0.01425 0.00611 0.00475
0.01296 0.00967 0.01337 0.01337 0.01851 0.01131 0.01419 0.0109 0.02612
0.0235 0.00522 0.01619 0.00836 0.02037 0.01358 0.02089 0.00731 0.02141
0.00933 0.01191 0.01616 0.00788 0.02593 0.00854 0.012 0.02098 0.02089
0.00986 0.0036 0.00929 0.01138 0.03527 0.03015 0.02844 0.00284 0.01555
0.01249 0.00066 0.01085 0.00427 0.02728 0.01381 0.01315 0.00427 0.02465
0.00821 0.005 0.01678 0.01464 0.01535 0.01142 0.01607 0.01071 0.01142
0.00035 0.02313 0.00035 0.02071 0.00104 0.02589 0 0.03176 0.00173
UAC CAA CAG AAU AAC UGU UGC CAU CAC

0.02256 0.00301 0.03108 0.00401 0.02607 0.00251 0.01153 0.00501 0.02356
0.02917 0.00407 0.02374 0.00882 0.02917 0.00271 0.01628 0.00204 0.01967
0.01275 0.01522 0.02365 0.02962 0.01789 0.01625 0.01234 0.01604 0.01687
0.00888 0.01567 0.01253 0.02298 0.01358 0.00992 0.00888 0.00783 0.00679
0.01367 0.01502 0.01809 0.02738 0.01796 0.01082 0.00705 0.01174 0.00858
0.02825 0.01953 0.01251 0.03906 0.03546 0.01138 0.00948 0.00683 0.01043
0.01151 0.02465 0.01085 0.03452 0.01348 0.01545 0.0069 0.01512 0.00756
0.02321 0.01856 0.01107 0.0282 0.01535 0.01142 0.01821 0.01071 0.01464
0.02485 0.00138 0.01622 0.00138 0.01519 0.00035 0.01968 0.00173 0.01933
AAG CGU CGC CGA CGG AGA AGG GAU
0.0386 0.00401 0.00702 0.00401 0.00451 0.01303 0.03559 0.01003
0.03392 0.00136 0.00678 0.00136 0.00136 0.01696 0.03596 0.01221
0.03949 0.00864 0.00596 0.00926 0.00596 0.01974 0.02489 0.03126
0.04282 0.00627 0.00261 0.00261 0.00366 0.0141 0.01671 0.0376
0.03964 0.0095 0.00429 0.00578 0.00604 0.01494 0.01734 0.04148
0.00986 0.00398 0.00853 0.00322 0.00303 0.01593 0.00171 0.02427
0.02696 0.00888 0.00625 0.0069 0.00329 0.01315 0.00822 0.04011
0.02713 0.01499 0.01607 0.00714 0.00678 0.0125 0.01107 0.03534
0.00897 0.00207 0.0535 0.00311 0.02658 0.00207 0.00311 0.00414
GAC GAA GAG UAA UAG UGA
0.04612 0.01203 0.04361 0.00251 5.00E-04 0
0.04545 0.0156 0.0441 0.00271 0.00068 0
0.02036 0.02242 0.02468 0.00391 0 0.00144
0.01932 0.03029 0.03446 0.00261 0.00157 0
0.02483 0.03359 0.03679 0 0.00044 0.00131
0.02503 0.02825 0.0127 0.00133 0.00038 0.00209
0.01183 0.02663 0.02663 0.00033 0.00033 0
0.01571 0.03642 0.02785 0.00107 0.00036 0.00071
0.04556 0.00449 0.04867 0.00138 0.00035 0.00138

PROGRAM:
Import numpy as np
import pandas as pd
Import statsmodels.api as sm
import matplotlib.pyplot as plt
importseaborn as sns sns.set()
fromsklearn.cluster import KMeans
data = pd.read_csv("D:\\ML EXP\codon_usage.csv")data
x = data.iloc[:,1:3] # 1t for rows and second for columnsx
kmeans = kMeans(3)
means.fit(x)
identified_clusters = kmeans.fit_predict(x)
identified_clusters
array([1, 1, 0, 0, 0, 2])
data_with_clusters = data.copy()
data_with_clusters['Clusters'] = identified_clusters
plt.scatter(data_with_clusters['Longitude'],data_with_clusters['Latitude'],c=data_with
_clusters['Clusters'],cmap='rainbow')
OUTPUT:
RESULT:
EXP NO:6 IMPLEMENT THE NAÏVE BAYES CLASSIFER USING THE GIVEN
DATE: DATASET
PROBLEM STATEMENT:
Implement The Naïve Bayes Classifer Using The Given Dataset.
AIM:
To Implement The Naïve Bayes Classifer Using The Given Dataset For Predicting The
Results.
ALGORITHM:
STEP1: Start the algorithm and implementing a program
STEP2: Open the jupyter notebook and activate the environment
STEP3:Create a new environment and Rename it.
STEP4:Then install the necessary packages and libraries in the created environment on the
jupyter notebook
STEP 5: Data Pre-processing step
STEP 6: Fitting Naive Bayes to the Training set
STEP 7:Predicting the test result
STEP 8:Test accuracy of the result(Creation of Confusion matrix)
STEP 9:Visualizing the test set result.
STEP 10: Finish

EstimatedSalary Purchased
User ID Gender Age
15624510 Male 19 19000 0
15810944 Male 35 20000 0
15668575 Female 26 43000 0
15603246 Female 27 57000 0
15804002 Male 19 76000 0
15728773 Male 27 58000 0
15598044 Female 27 84000 0
15694829 Female 32 150000 1
15600575 Male 25 33000 0
P
15727311 Female 35 65000 0
15570769 Female 26 80000 0
15606274 Female 26 52000 0
15746139 Male 20 86000 0
15704987 Male 32 18000 0
15628972 Male 18 82000 0
15697686 Male 29 80000 0
15733883 Male 47 25000 1
15617482 Male 45 26000 1
15704583 Male 46 28000 1

Program
Import numpy as nm
Import matplotlib.pyplot as mtp

import pandas as pd
dataset = pd.read_csv('dataset.csv')x =
dataset.iloc[:, [2, 3]].values
y = dataset.iloc[:, 4].values fromsklearn.model_selection import

train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.25,random_state =

0)
fromsklearn.preprocessing import StandardScalersc =
StandardScaler()
x_train = sc.fit_transform(x_train)x_test =
sc.transform(x_test)
fromsklearn.naive_bayes import GaussianNB

classifier = GaussianNB() classifier.fit(X_train,
y_train)
y_pred = classifier.predict(x_test)
fromsklearn.metrics import confusion_matrix, accuracy_scoreac =
accuracy_score(y_test,y_pred)
cm = confusion_matrix(y_test, y_pred)
from matplotlib.colors import ListedColormapx_set,
y_set = x_train, y_train

X1, X2 = nm.meshgrid(nm.arange(start = x_set[:, 0].min() -1, stop =
x_set[:, 0].max() + 1, step = 0.01),
nm.arange(start = x_set[:, 1].min() -
1, stop = x_set[:, 1].max() + 1, step = 0.01))
mtp.contourf(X1, X2, classifier.predict(nm.array([X1.ravel(), X2.ravel()]).T).resh

ape(X1.shape),alpha = 0.75, cmap = ListedColormap(('purple', 'green')))
mtp.xlim(X1.min(), X1.max())
mtp.ylim(X2.min(), X2.max())
for i, j in enumerate(nm.unique(y_set)): mtp.scatter(x_set[y_set
== j, 0], x_set[y_set == j, 1],
c = ListedColormap(('purple', 'green'))(i), label = j)
mtp.title('Naive Bayes (Training set)')
mtp.xlabel('Age')
mtp.ylabel('Estimated Salary')
mtp.legend()
mtp.show()
OUTPUT:
confusion matrix –
ac – 0.9125
RESULT:
Ex.No : PROJECT
Date:7 Brain Tumour Detection Using Machine Learning Algorithm
Abstract
The development of aberrant brain cells, some of which may turn cancerous, is known as a brain tumour.
Magnetic Resonance Imaging (MRI) scans are the most common technique for finding brain tumours.
Information about the aberrant tissue growth in the brain is discernible from the MRI scans. In numerous
research papers, machine learning and deep learning algorithms are used to detect brain tumours. It takes
extremely little time to forecast a brain tumour when these algorithms are applied to MRI pictures, and the
better accuracy makes it easier to treat patients. The radiologist can make speedy decisions thanks to these
predictions. A self-defined Artificial Neural Network (ANN) and Convolution Neural Network (CNN) are
used in the proposed work to detect the existence of the presence of brain tumor and their performance is
analyzed.
Keywords: Convolution Neural Network, Machine Learning, Brain tumor, Algorithms
Introduction
The brain is one of the most crucial parts of the human body since it regulates the operation of all other
organs and aids in decision-making It is basically the central nervous system's command post and is in
charge of carrying out the body's regular voluntary and involuntary functions. The tumour is an
uncontrolled proliferation of undesirable tissue that has formed a fibrous mesh inside of our brain. A brain
tumour is identified in roughly 3,540 youngsters this year at the age of 15 To effectively prevent and treat
the condition, it is crucial to have a thorough grasp of brain tumours and their stages. ANN and CNN is
used in the classification of normal and tumor brain.
ANN(Artifical Neural Network) works like a human brain nervous system, on this basis a digital
computer is connected with large amount of interconnections and networking which makes neural network
to train with the use of simple processing units applied on the training set and stores the experiential
knowledge. It has different layers of neurons which is connected together. The neural network can acquire
the knowledge by using data set applied on learning process. There will be one input and output layer
whereas there may be any number of hidden layers. In the learning process, the weight and bias is added to
neurons of each layer depending upon the input features and on the previous layers(for hidden layers and
output layers). A model is trained based on the activation function applied on the input features and on the
hidden layers where more learning happens to achieve the expected output
Existing Methodology
Brain tumour is detected by using Image processing techniques. Various algorithms are used for
the partial fulfilment of the requirements to arrive the best results. Some of the algorithms used
are Probabilistic neural network has been used for more productivity using SVM and KNN technique.
Segmentation plays major role to detect brain tumour.
Pre- Processing
Pre-Processing ways purpose the upgrade of the image while not dynamic the
information content. The first driver of image flaws is as Low.
Segmentation
Local developing could be a basic district primarily based image division strategy. It’s in
addition delegated a pixel-based image division strategy since it includes the determination of
introductory seed focuses. This manner to alter division inspects neighbouring elements of
introductory seed focuses and figures out if the pixel neighbours need to be additional to the
district. The procedure is iterated on, in associate degree indistinguishable approach from
general data grouping calculations. A general discourse of the venue developing calculation is
portrayed beneath.
Convolutional Neural Network
Convolutional Neural Network (CNN) are easier to coach and fewer liable to over fitting.
Methodology like mentioned earlier within the report, we have a tendency to use a patch
primarily based segmentation approach. The Convolutional spec and implementation
administrated exploitation CAFFE. CNNs are the continuation of the multi-layer Perceptron. In
the MLP, a unit performs an easy computation by taking the weighted add of all different units
that function input to that. The network is organized into layers of units within the previous
l2ayer. The essence of CNNs is that the convolutions. The most trick that convolutional Neural
Network that avoid the mater too several parameters is distributed connections. Each unit
isn’t connected connect to each different unit within the previous layer.
Proposed Methodology
The two techniques ANN and CNN are applied on the brain tumor dataset and their
performance on classifying the image is analyzed. Steps followed in applying ANN on the
brain tumor dataset are
1. Import the needed packages
2. Import the data folder
3. Read the images, provide the labels for the image (Set Image having Brain Tumor as
1 and image not having brain tumor as 0) and store them in the Data Frame.
4. Change the size of images as 256x256 by reading the images one by one.
5. Normalize the image
6. Split the data set into train, validation and test sets
7. Create the model
8. Compile the model
9. Apply the model on the train set.
10. Evaluate the model by applying it on the test set.
The ANN model used here has seven layers. First layer is the flatten layer which
converts the 256x256x3 images into single dimensional array. The next five layers are the
dense layers having the activation function as relu and number of neurons in each layers are
128,256,512,256 and 128 respectively. These five layers act as the hidden layers and the last
dense layer having the activation function is sigmoid is the output layer with 1 neuron
representing the two classes.
The model is compiled with the adam optimization technique and binary crossentropy
loss function. The model is generated and trained by providing the training images and the
validation images. Once the model is trained, it is tested using the test image set. Next the
same dataset is given to the CNN technique. Steps followed in applying CNN on the brain
tumor dataset are
1. Import the needed packages
2. Import the data folder(Yes and No)
3. Set the class labels for images(1 for Brain Tumor and 0 for No Brain Tumor)
4. Convert the images into shape(256X256)
5. Normalize the Image
6. Split the images into the train, validation and test set images.
7. Create the sequential model.
8. Compile the model.
9. Apply it on the train dataset(use validation set to evaluate the training performance).
10. Evaluate the model using the test images.
11. Plot the graph comparing the training and validation accuracy.
12. Draw the confusion matrix for actual output against the predicted output.
The CNN sequential model is generated by implementing different layers. The input
image is reshaped into 256x256. The convolve layer is applied on the input image with the
relu as activation function, padding as same which means the output images looks like the
input image and the number of filters are 32,32,64,128,256 for different convolve layers.
The max pooling applied with the 2x2 window size and droupout function is called with 20%
of droupout. Flatten method is applied to convert the features into one dimensional array. The
fully connected layer is done by calling the dense method with the number of units as 256 and
relu as the activation function. The output layer has 1 unit to represent the two classes and the
sigmoid as activation function. The architecture of CNN model is shown in the Figure . The
implementation is done using Python language and are executed in google
colab. The model is applied for 200 epoches with the training and the validation
dataset. The history of execution is stored and plotted to understand the models
generated.
Convolve(64,3x3,"relu",(256x256x3),padding=same)
Optimizer= "adam" and loss

Model= Sequential()
Fully Connected Layer

Max Pooling(2x2)
Max Pooling(2x2)
Max Pooling(2x2)
Max Pooling(2x2)
Max Pooling(2x2)
Output Layer
Droupout(0.2)
Droupout(0.2)
Droupout(0.2)
Droupout(0.2)
Droupout(0.2)
Flatten()
Figure . Architecture of CNN model
DataSet
The dataset is taken from Github website. This dataset contains MRI images of brain
tumor. Figure shows the sample normal and brain tumor image. Out of 1672 training image,
877 images are tumor image and 795 images are non tumor image. 92 tumor and 94 non
tumor images taken from 186 validation images. Among 207 testing images, 116 tumor
images and 91 non tumor images.
Experimental Result Analysis
Comparing training/validation accuracy and loss of ANN model
Comparing training/validation accuracy and loss of CNN model
Conclusion:
CNN is considered as one of the best technique in analyzing the image
dataset. The CNN makes the prediction by reducing the size the image without
losing the information needed for making predictions. ANN model generated here
produces 65.21% of testing accuracy and this can be increased by providing
more image data. The same can be done by applying the image augmentation
techniques and the analyzing the performance of the ANN and CNN can be
done. The model developed here is generated based on the trail and error method.
In future optimization techniques can be applied so as to decide the number of
layers and filters that can used in a model. As of now for the given dataset the
CNN proves to be the better technique in predicting the presence of brain tumor.
References:
[1] Javeria Amin Muhammad Sharif Mudassar Raza Mussarat Yasmin 2018 Detection of Brain Tumor based on Features
Fusion and Machine Learning Journal of Ambient Intelligence and Humanized Computing Online Publication.
[2]. Rajeshwar Nalbalwar Umakant Majhi Raj Patil Prof.Sudhanshu Gonge 2014 Detection of Brain Tumor by
using ANN International Journal of Research in Advent Technology
[3]. Fatih Özyurt Eser Sert Engin Avci Esin Dogantekin 2019 Brain tumor detection based on Convolutional Neural Network
with neutrosophic expert maximum fuzzy sure entropy Elsevier Ltd 147

ME P4252-II Semester - MACHINE LEARNING

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ME P4252-II Semester - MACHINE LEARNING

Uploaded by

Copyright:

Available Formats

TABLE OF CONTENTS

Ex.No. Date Title of the Experiments Page No. Marks Signature

1 Linear Regression with a Real

3 Classification with Nearest

6 Implement the naïve bayes

Program Specific Outcomes:

1. Engineering knowledge: Apply the knowledge of mathematics, science, engineering fundamentals,

from mpl_toolkits.mplot3d import Axes3D

Import numpy asnp

#Fit the model

#Compute accuracy on the training set

#Compute accuracy on the test set

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',

Considering confusion matrix above:

True negative = 165

0 0.78 0.82 0.80 201

avg / total 0.73 0.73 0.73 308

plt.style.use('ggplot') df=pd.read_csv('../input/diabetes.csv') df.head()

#Fit the model

#Compute accuracy on the training set

predicted=model.predict([[0,2]])# 0:Overcast, 2:Mild print(predicted)

print(wine. print(wine. X_train,

#Train the model using the training sets knn.fit(X_train,y_train)

#Predict the response for test dataset y_pred=knn.predict(X_test)

#Train the model using the training sets knn.fit(X_train,y_train)

#Predict the response for test dataset y_pred=knn.predict(X_test)

DATE: VALIDATION SET RESULTS TO DETERMINE THE

STEP1: Start the algorithm and implementing a program

STEP5: Train the data and splitting the training dataset

fromsklearn.datasets import make_classification

fromsklearn.model_selection import train_test_split

X, y = make_classification(n_samples=10000, n_features=20, n_informative=5,n_redundant=15,

X, y = make_classification(n_samples=10000, n_features=20, n_informative=5,n_redundant=15,

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

print(X_train.shape, X_test.shape, y_train.shape, y_test.shape) values = [i for i

train_acc = accuracy_score(y_train, train_yhat)train_scores.append(train_acc)

test_acc = accuracy_score(y_test, test_yhat)test_scores.append(test_acc)

print('>%d, train: %.3f, test: %.3f' % (i, train_acc, test_acc))pyplot.plot(values,

train_scores, '-o', label='Train')

pyplot.plot(values, test_scores, '-o', label='Test')

Implement The K-Means Algorithm Using The Given Dataset

Step-1: Select the value of K, to decide the number of clusters to be formed.

Step-2: Select random K points which will act as centroids.

Step-4: place a new centroid of each cluster.

Step-6: If any reassignment occurs, then go to step-4 else go to Step 7.

Kingdom DNAtype SpeciesID Ncodons SpeciesName UUU UUC UUA UUG

CUU CUC CUA CUG AUU AUC AUA AUG GUU

0.01203 0.03208 0.001 0.0401 0.00551 0.02005 0.00752 0.02506 0.01103

0.00407 0.02849 0.00204 0.0441 0.01153 0.0251 0.00882 0.03324 0.00814

0.00782 0.01111 0.01028 0.01193 0.02283 0.01604 0.01316 0.0218 0.01625

0.01567 0.01358 0.0094 0.01723 0.02402 0.02245 0.02507 0.02924 0.02089

0.0138 0.00548 0.00473 0.02076 0.02716 0.00867 0.0131 0.02773 0.02803

0.02294 0.00758 0.01782 0.01403 0.02636 0.01327 0.01896 0.02579 0.01877

0.02761 0.01611 0.01052 0.00493 0.02597 0.00888 0.01512 0.02268 0.02893

0.01714 0.02213 0.00893 0.01928 0.01785 0.02356 0.01107 0.01999 0.01714

0.00035 0.04142 0.00069 0.05868 0.00104 0.01933 0.00173 0.01691 0

CCA CCG UGG GGU GGC GGA GGG UCU UCC

UCA UCG AGU AGC ACU ACC ACA ACG UAU

UAC CAA CAG AAU AAC UGU UGC CAU CAC

0.0386 0.00401 0.00702 0.00401 0.00451 0.01303 0.03559 0.01003

0.03392 0.00136 0.00678 0.00136 0.00136 0.01696 0.03596 0.01221

0.03949 0.00864 0.00596 0.00926 0.00596 0.01974 0.02489 0.03126

0.04282 0.00627 0.00261 0.00261 0.00366 0.0141 0.01671 0.0376