You are on page 1of 1

National University

of Computer & Emerging Sciences Peshawar Campus

Student Name: _____________________ Roll No: __________________


Program: BS (CS) Examination: Mid Exam
Semester: Summer 2020 Total Marks: 60 Weightage:30
Time Allowed: 2:00 hour Date: 10th August 2020
Course: Natural Language Processing Instructor: Taimoor Khan

NOTE: Attempt all questions.

Question 1: Answer the following briefly. [6 x 5 = 30 Marks]


a) Why TF-IDF based text representation is more effective than frequency based representation.
b) In what scenario, binary representation can be a better option.
c) If you are classifying documents, would it be descriptive, prescriptive or theoretical analysis and why?
d) Differentiate between sociolinguistics and psycholinguistics?
e) Explain duality and recursiveness in natural language with the help of an example.
f) Why natural languages are considered to be the most complex systems?

Question 2: Write pseudocode for the K-Nearest neighbors classifier. [10 Marks]

Question 3: Convert the following dataset into frequency based representation. [10 Marks]
<s> Cricket Test Pakistan </s>
<s> England Broad Wickets</s>
<s> Test Wickets Cricket </s>

Question 4: Perform K-means clustering on a dataset having 5 documents. Keep the value of k = 2. Perform
at least 2 iterations. [10 Marks]

X Y
D1 3 1
D2 4 2
D3 6 4
D4 5 2
D5 2 3

Natural Language Processing Page 1 of 1

You might also like