You are on page 1of 10

AI LAB

SUBMITTED TO:
MR. AIZAZ

SUBMITTED BY:
M. HAIDER AKHTAR
(2020-CS-657)
MARIA HASSAN
(2020-CS-693)

UNIVERSITY OF ENGINEERING AND TECHNOLOGY, NEW CAMPUS


REPORT

INTRODUCTION
Autocorrect is a word processing feature that identifies misspelled words, and uses
algorithms to identify the words most likely to have been intended, and edits the text
accordingly. When a word is typed that is not in the dictionary, software will typically
underline it in red. When the user enters the misspelling, our software will show the
list of suggestions of correct spellings.

OBJECTIVE
Our main objective behind this project is to create a tool or application using python
language that helps user to correct any of their spelling. As, autocorrect word
speaks for itself what functionality it will provide. As user type any misspells it will
correct it within seconds. Our application help user to implement this in real time.

DESCRIPTION
Autocorrect is an application of AI that we use every day. It identifies mistakes, uses
algorithms to state the correct words, and edits them accordingly. Autocorrect works
similarly to an auto-suggestion keyboard by completing the words you want to type.
However, this time it corrects the misspellings as you type. It makes our lives easier
by taking care of spelling mistakes. A good example is the Microsoft Word editor
with the autocorrects feature

FUNCTIONALITY
 K NEAREST NEIGHBOR:
Code:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
from sklearn.neighbors import KNeighborsClassifier
import os
os.chdir(r'C:\Users\2020c\Downloads')

data = pd.read_csv("diabetes.csv")
# print(data.head())
h = np.array(data['Glucose'])
w = np.array(data['BloodPressure'])
label = np.array(data['Outcome'])
################# PLOT ############
# sns.relplot(x=h, y=w, hue=label, s=100, data = data)
# plt.show()

features=list(zip(h,w))
# # Get Model
model = KNeighborsClassifier(n_neighbors=3)
# # Train the model using the training sets
model.fit(features,label)
# #Predict Output
predicted= model.predict([[120,69]]) # h, w
print(predicted)
Screenshot:

EXPLANATION:
KNN algorithm uses a number k to identify the cluster to which any new item
belongs. When a new item is added to the dataset it calculates the distance of k
number of its neighbours from its position, and whatever cluster has the greatest
number of neighbours, new data item is added to that cluster. In the above given
example, the value of k is 3 and it predicts whether a person is diabetic or not. So,
the probability we calculate is if a person does or does not have diabetes, given the
circumstances. So, you can see that the values we took are 120 for glucose and 69
for blood pressure and the model says that the person should be classified as non-
diabetic.

PERFORMANCE Metrics for knn:


Result:
10-fold cross validation for KNN:
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import cross_val_predict
from sklearn.model_selection import KFold
from sklearn.neighbors import KNeighborsClassifier
from numpy import mean
from numpy import absolute
from numpy import sqrt
import pandas as pd
import numpy as np

import os
data = pd.read_csv("./diabetes.csv")
X = np.array(data['BMI'])
y = np.array(data['Outcome'])
newarr=X.reshape(-1, 1)
newarr1=y.reshape(-1, 1)
X_train,X_test,y_train,y_test=train_test_split(newarr,newarr1,test_size=0.3)

cv = KFold(n_splits=10, random_state=1, shuffle=True)

#build multiple linear regression model
model = KNeighborsClassifier(n_neighbors=3)
mymodel=model.fit(X_train,y_train)
#use k-fold CV to evaluate model
scores = cross_val_score(mymodel, X_train,y_train,cv=10)#n_jobs=-1)
print("10-Fold Validation for KNN:")
print(np.mean(scores))
pred=cross_val_predict(model,X_test,y_test)
#pred
score_test=cross_val_score(model,X_test,y_test,cv=10)
#score_test
print('Score:')
print(score_test)
print("Mean Score:")
print(np.mean(score_test))
#view mean absolute error
print("MSE:")
print(mean(absolute(scores)))
#view RMSE
print("RMSE:")
print(sqrt(mean(absolute(scores))))

Result:

 K means cluster algorithm:


Code:

import numpy as np
import pandas as pd

import seaborn as sns

from sklearn.cluster import KMeans

import matplotlib.pyplot as plt

import os

os.chdir(r'C:\Users\2020c\Downloads')

data = pd.read_csv("diabetes.csv")

print(data.head())

f1 = np.array(data['Age'])

f2 = np.array(data['BMI'])

################# PLOT ############

sns.relplot(x=f1, y=f2, data = data)

plt.show()

#### finding optimal number of clusters using the elbow method ##

features=list(zip(f1, f2))

wcss_list = [] # Initializing the list for the values of WCSS

# # # Using for loop for iterations from 1 to 10.

for i in range(1, 11):

kmeans = KMeans(n_clusters=i, init='k-means++', random_state=12) # init='random'

kmeans.fit(features)

wcss_list.append(kmeans.inertia_) # Sum of squared distances of samples to their closest cluster cente

print(wcss_list)

plt.plot(range(1, 11), wcss_list)

plt.title('The Elobw Method Graph')

plt.xlabel('Number of clusters(k)')

plt.ylabel('wcss_list')

plt.show()

########## Train Model #########

kmeans = KMeans(n_clusters=3, init='k-means++', random_state= 16)

kmeans.fit(features)

print(kmeans.labels_)

print(kmeans.inertia_) # gives within-cluster sum of squares.

print(kmeans.n_iter_)

print(kmeans.cluster_centers_)

#### Plot Clusters #####

sns.relplot(x=f1, y=f2, s=100, data = data, style=kmeans.labels_)

plt.show()
screenshot:

Explanation:
K-Means Clustering Algorithm K-means is a clustering algorithm—one of the
simplest and most popular unsupervised machine learning (ML) algorithms for data
scientists. In this example the graph suggests that the number of clusters should be
3 as the lower bend starts at 3. It calculates the mean values for the centroids in k
number of iterations.
However, the elbow method does not always give optimal answer instead it gives a
heuristic value to consider. If we take k=3 in this example we get an accuracy of 0.35
but f we use k=2 we get accuracy of 0.71. That’s why we used k=2 even though
elbow method suggested k=3.

PERFORMANCE Metrics for k means cluster

You might also like