Rubrics For Mini Project

Department of Computer Engineering
Academic Term: Jan-April 2022

Rubrics for Mini Project
Class : B.E. Computer Subject Name :NLP

Semester : VIII Subject Code :DLO8012
Practical No:
Mini project code and Output

Title:
15-04-2022
Date of Performance:
8762,8764,8767
Roll No:
Shorn Correia, Dunstan D’souza, Mathew Zachariah

Name of the Student:
Rubric for Mini Project
Indicator Very Poor Poor Average Good Excellent
Timeline:
More than Two sessions
Maintains Project not One session Early or on
two session late
project deadline done (0) late (0.5) (1)
late (1.5) time (2)
(2)
Completeness:
< 40% 100%
Complete all ~ 60% complete ~ 80%
N/A complete complete(2
parts of project (1) complete(1.5)
(0.5) )
(2)
Application Working
design: Design project with
Project with Working
aspects good design
(4) Poorly limited project with
are not and advanced
designed (1) functionalities good design
used techniques
(2) (3)
(0) are used
(4)
Presentation(2) Poorly
Not Report with Report with Well written
written and
submitted major less than 3-4 accurate
poorly kept
report (0) mistakes(1) mistakes (1.5) report(2)
report(0.5)
Total marks: Signature of Teacher:
Code:
import pandas as pd import numpy as np

import re import seaborn as sns import
matplotlib.pyplot as plt import warnings
warnings.simplefilter("ignore") # Loading the
dataset data = pd.read_csv("/Language
Detection.csv")
# value count for each language
data["Language"].value_counts()
# separating the independent and dependant features
X = data["Text"] y = data["Language"]
# converting categorical variables to numerical
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder() y = le.fit_transform(y)
# creating a list for appending the preprocessed text data_list
= []
# iterating through all the text for
text in X:
# removing the symbols and numbers text =
re.sub(r'[!@#$(),n"%^*?:;~`0-9]', ' ', text) text =
re.sub(r'[[]]', ' ', text) # converting the text to
lower case text = text.lower() # appending to
data_list data_list.append(text)
# creating bag of words using countvectorizer from
sklearn.feature_extraction.text import CountVectorizer cv =
CountVectorizer()
X = cv.fit_transform(data_list).toarray()
#train test splitting from sklearn.model_selection
import train_test_split
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size = 0
.20)
#model creation and prediction
from sklearn.naive_bayes import MultinomialNB
model = MultinomialNB() model.fit(x_train,
y_train)
# prediction
y_pred = model.predict(x_test)
# model evaluation from sklearn.metrics import
accuracy_score, confusion_matrix ac = accuracy_score(y_test,
y_pred)
cm = confusion_matrix(y_test, y_pred) #
visualising the confusion matrix
plt.figure(figsize=(15,10))
sns.heatmap(cm, annot = True)
plt.show()
# function for predicting language
def predict(text):
x = cv.transform([text]).toarray()
lang = model.predict(x) lang =
le.inverse_transform(lang)
print("The langauge is in",lang[0])
# English predict("Vidhya provides a community based knowledge portal
for Analyti cs and Data Science professionals")
# French predict("fournit un portail de connaissances basé sur la
communauté pou r les professionnels de l'analyse et de la science des
données")
Arabic
‫توف ر بوابة معرفية قائمة على المجتم ع لمحترف ي التحليال ت وعلو م ا‬#
predict("‫ل‬
‫)"بيانات‬
# Spanish predict("proporciona un portal de conocimiento basado en la
comunidad p ara profesionales de Analytics y Data Science.")
# Malayalam
predict("അനലിറ്റിക്്്,സ ഡാറ്റാ സയൻസ് പ്രാഫഷണലുകൾക്കായി
കമ്മ്യൂണി റ്റി അധിഷ്ഠ ിത വിജ്ഞാന രരാർട്ടൽ അനലിറ്റിക്്് സ വിദ്യ
നൽകുന്നു")
# German
predict("Berlin ist die Hauptstadt von Deutschland..")
Output:
Google Colab:
https://colab.research.google.com/drive/1TLzff491CkMB_mblEJ5He3H0LY6SaYq?us p=sharing

Rubrics For Mini Project

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Rubrics For Mini Project

Uploaded by

Copyright:

Available Formats

Department of Computer Engineering

Academic Term: Jan-April 2022

Class : B.E. Computer Subject Name :NLP

Mini project code and Output

Shorn Correia, Dunstan D’souza, Mathew Zachariah

Rubric for Mini Project

Indicator Very Poor Poor Average Good Excellent

import pandas as pd import numpy as np

You might also like