You are on page 1of 3

Department of Computer Engineering

Academic Term: Jan-April 2022


Rubrics for Mini Project

Class : B.E. Computer Subject Name :NLP


Semester : VIII Subject Code :DLO8012

Practical No:

Mini project code and Output


Title:

15-04-2022
Date of Performance:

8762,8764,8767
Roll No:

Shorn Correia, Dunstan D’souza, Mathew Zachariah


Name of the Student:

Rubric for Mini Project

Indicator Very Poor Poor Average Good Excellent

Timeline:
More than Two sessions
Maintains Project not One session Early or on
two session late
project deadline done (0) late (0.5) (1)
late (1.5) time (2)
(2)

Completeness:
< 40% 100%
Complete all ~ 60% complete ~ 80%
N/A complete complete(2
parts of project (1) complete(1.5)
(0.5) )
(2)

Application Working
design: Design project with
Project with Working
aspects good design
(4) Poorly limited project with
are not and advanced
designed (1) functionalities good design
used techniques
(2) (3)
(0) are used
(4)

Presentation(2) Poorly
Not Report with Report with Well written
written and
submitted major less than 3-4 accurate
poorly kept
report (0) mistakes(1) mistakes (1.5) report(2)
report(0.5)
Total marks: Signature of Teacher:
Code:

import pandas as pd import numpy as np


import re import seaborn as sns import
matplotlib.pyplot as plt import warnings
warnings.simplefilter("ignore") # Loading the
dataset data = pd.read_csv("/Language
Detection.csv")
# value count for each language
data["Language"].value_counts()
# separating the independent and dependant features
X = data["Text"] y = data["Language"]
# converting categorical variables to numerical
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder() y = le.fit_transform(y)
# creating a list for appending the preprocessed text data_list
= []
# iterating through all the text for
text in X:
# removing the symbols and numbers text =
re.sub(r'[!@#$(),n"%^*?:;~`0-9]', ' ', text) text =
re.sub(r'[[]]', ' ', text) # converting the text to
lower case text = text.lower() # appending to
data_list data_list.append(text)
# creating bag of words using countvectorizer from
sklearn.feature_extraction.text import CountVectorizer cv =
CountVectorizer()
X = cv.fit_transform(data_list).toarray()
#train test splitting from sklearn.model_selection
import train_test_split
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size = 0
.20)
#model creation and prediction
from sklearn.naive_bayes import MultinomialNB
model = MultinomialNB() model.fit(x_train,
y_train)
# prediction
y_pred = model.predict(x_test)
# model evaluation from sklearn.metrics import
accuracy_score, confusion_matrix ac = accuracy_score(y_test,
y_pred)
cm = confusion_matrix(y_test, y_pred) #
visualising the confusion matrix
plt.figure(figsize=(15,10))
sns.heatmap(cm, annot = True)
plt.show()
# function for predicting language
def predict(text):
x = cv.transform([text]).toarray()
lang = model.predict(x) lang =
le.inverse_transform(lang)
print("The langauge is in",lang[0])
# English predict("Vidhya provides a community based knowledge portal
for Analyti cs and Data Science professionals")
# French predict("fournit un portail de connaissances basé sur la
communauté pou r les professionnels de l'analyse et de la science des
données")
Arabic
‫توف ر بوابة معرفية قائمة على المجتم ع لمحترف ي التحليال ت وعلو م ا‬#
predict("‫ل‬
‫)"بيانات‬
# Spanish predict("proporciona un portal de conocimiento basado en la
comunidad p ara profesionales de Analytics y Data Science.")
# Malayalam
predict("അനലിറ്റിക്്്,സ ഡാറ്റാ സയൻസ് പ്രാഫഷണലുകൾക്കായി
കമ്മ്യൂണി റ്റി അധിഷ്ഠ ിത വിജ്ഞാന രരാർട്ടൽ അനലിറ്റിക്്് സ വിദ്യ
നൽകുന്നു")
# German
predict("Berlin ist die Hauptstadt von Deutschland..")

Output:

Google Colab:
https://colab.research.google.com/drive/1TLzff491CkMB_mblEJ5He3H0LY6SaYq?us p=sharing

You might also like