Professional Documents
Culture Documents
AI Lab
AI Lab
SUBMITTED TO:
MR. AIZAZ
SUBMITTED BY:
M. HAIDER AKHTAR
(2020-CS-657)
MARIA HASSAN
(2020-CS-693)
INTRODUCTION
Autocorrect is a word processing feature that identifies misspelled words, and uses
algorithms to identify the words most likely to have been intended, and edits the text
accordingly. When a word is typed that is not in the dictionary, software will typically
underline it in red. When the user enters the misspelling, our software will show the
list of suggestions of correct spellings.
OBJECTIVE
Our main objective behind this project is to create a tool or application using python
language that helps user to correct any of their spelling. As, autocorrect word
speaks for itself what functionality it will provide. As user type any misspells it will
correct it within seconds. Our application help user to implement this in real time.
DESCRIPTION
Autocorrect is an application of AI that we use every day. It identifies mistakes, uses
algorithms to state the correct words, and edits them accordingly. Autocorrect works
similarly to an auto-suggestion keyboard by completing the words you want to type.
However, this time it corrects the misspellings as you type. It makes our lives easier
by taking care of spelling mistakes. A good example is the Microsoft Word editor
with the autocorrects feature
FUNCTIONALITY
K NEAREST NEIGHBOR:
Code:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
from sklearn.neighbors import KNeighborsClassifier
import os
os.chdir(r'C:\Users\2020c\Downloads')
data = pd.read_csv("diabetes.csv")
# print(data.head())
h = np.array(data['Glucose'])
w = np.array(data['BloodPressure'])
label = np.array(data['Outcome'])
################# PLOT ############
# sns.relplot(x=h, y=w, hue=label, s=100, data = data)
# plt.show()
features=list(zip(h,w))
# # Get Model
model = KNeighborsClassifier(n_neighbors=3)
# # Train the model using the training sets
model.fit(features,label)
# #Predict Output
predicted= model.predict([[120,69]]) # h, w
print(predicted)
Screenshot:
EXPLANATION:
KNN algorithm uses a number k to identify the cluster to which any new item
belongs. When a new item is added to the dataset it calculates the distance of k
number of its neighbours from its position, and whatever cluster has the greatest
number of neighbours, new data item is added to that cluster. In the above given
example, the value of k is 3 and it predicts whether a person is diabetic or not. So,
the probability we calculate is if a person does or does not have diabetes, given the
circumstances. So, you can see that the values we took are 120 for glucose and 69
for blood pressure and the model says that the person should be classified as non-
diabetic.
import os
data = pd.read_csv("./diabetes.csv")
X = np.array(data['BMI'])
y = np.array(data['Outcome'])
newarr=X.reshape(-1, 1)
newarr1=y.reshape(-1, 1)
X_train,X_test,y_train,y_test=train_test_split(newarr,newarr1,test_size=0.3)
cv = KFold(n_splits=10, random_state=1, shuffle=True)
#build multiple linear regression model
model = KNeighborsClassifier(n_neighbors=3)
mymodel=model.fit(X_train,y_train)
#use k-fold CV to evaluate model
scores = cross_val_score(mymodel, X_train,y_train,cv=10)#n_jobs=-1)
print("10-Fold Validation for KNN:")
print(np.mean(scores))
pred=cross_val_predict(model,X_test,y_test)
#pred
score_test=cross_val_score(model,X_test,y_test,cv=10)
#score_test
print('Score:')
print(score_test)
print("Mean Score:")
print(np.mean(score_test))
#view mean absolute error
print("MSE:")
print(mean(absolute(scores)))
#view RMSE
print("RMSE:")
print(sqrt(mean(absolute(scores))))
Result:
import numpy as np
import pandas as pd
import os
os.chdir(r'C:\Users\2020c\Downloads')
data = pd.read_csv("diabetes.csv")
print(data.head())
f1 = np.array(data['Age'])
f2 = np.array(data['BMI'])
plt.show()
features=list(zip(f1, f2))
kmeans.fit(features)
print(wcss_list)
plt.xlabel('Number of clusters(k)')
plt.ylabel('wcss_list')
plt.show()
kmeans.fit(features)
print(kmeans.labels_)
print(kmeans.n_iter_)
print(kmeans.cluster_centers_)
plt.show()
screenshot:
Explanation:
K-Means Clustering Algorithm K-means is a clustering algorithm—one of the
simplest and most popular unsupervised machine learning (ML) algorithms for data
scientists. In this example the graph suggests that the number of clusters should be
3 as the lower bend starts at 3. It calculates the mean values for the centroids in k
number of iterations.
However, the elbow method does not always give optimal answer instead it gives a
heuristic value to consider. If we take k=3 in this example we get an accuracy of 0.35
but f we use k=2 we get accuracy of 0.71. That’s why we used k=2 even though
elbow method suggested k=3.