You are on page 1of 3

Annexure ‘CD – 01’

FORMAT FOR COURSE


CURRICULUM

Course Title: Introduction to Natural Language Processing L T P/S SW/F TOTAL


Credit Units:3 W CREDIT
UNITS
3 - - - 3
Course Level: UG
Course Code: CSE447
Course Objectives:
To understand natural language processing and to learn how to apply basic algorithms in this field. To get acquainted with the algorithmic description of the
main language levels: morphology, syntax, semantics, and pragmatics, as well as the resources of natural language data - corpora.

Pre-requisites:
1. Courses of Data Structures and Algorithms should have been done.
2. Exposure to Linguistics is useful, though not mandatory.

Course Learning Outcomes:


1. Introduction and understanding of natural language processing and learn how to apply basic algorithms in this field.
2. Understanding of algorithmic description of the main language levels: morphology, syntax, semantics, and pragmatics.
3. Understanding basics of knowledge representation, inference, and their relations.
4. Understanding of lexical knowledge networks, word sense disambiguation and wordnet theory.
5. Learning basics of sentiment analysis, machine translation and cross lingual information retrieval.

Course Contents/Syllabus:
Weightage (%)
Module I Sound 20%
Descriptors/Topics
Biology of Speech Processing; Place and Manner of Articulation; Word Boundary Detection; Argmax based computations; HMM and
Speech Recognition.

Module II Words and Word Forms 20%


Descriptors/Topics
Morphology fundamentals; Morphological Diversity of Indian Languages; Morphology Paradigms; Finite State
Machine Based Morphology; Automatic Morphology Learning; Shallow Parsing; Named Entities;
Maximum Entropy Models; Random Fields.

Module III Structures 20%


Descriptors/Topics
Theories of Parsing, Parsing Algorithms; Robust and Scalable Parsing on Noisy Text as in Web documents;
Hybrid of Rule Based and Probabilistic Parsing; Scope Ambiguity and Attachment Ambiguity resolution.

Module IV Meaning 20%


Descriptors/Topics
Lexical Knowledge Networks, Wordnet Theory; Indian Language Wordnets and Multilingual Dictionaries;
Semantic Roles; Word Sense Disambiguation; WSD and Multilinguality; Metaphors; Coreferences.

Module V Web 2.0 Applications 20%


Descriptors/Topics
Sentiment Analysis; Text Entailment; Robust and Scalable Machine Translation; Question Answering in
Multilingual Setting; Cross Lingual Information Retrieval (CLIR).

Pedagogy for Course Delivery:

The class will be taught using remote teaching methodology. Students’ learning and assessment will be on the basis of four quadrants and flipped
class method. E-content will be also provided to the students for better learning. The course will be taught in theory and practical based mode.
The instructor will discuss computation problems to the students for better understanding of the concept.

Assessment/ Examination Scheme:

Theory L/T (%) Lab/Practical/Studio (%) End Term Examination

100% - 100
Theory Assessment (L&T):

Continuous Assessment/Internal Assessment 40% End Term Examination


60%
Components (Drop down) Attendance Class Test HA Quiz EE
Weightage (%) 5 15 10 10 60

Text Reading:
1. Allen, James, Natural Language Understanding, Second Edition, Benjamin/Cumming, 1995.
2. Charniack, Eugene, Statistical Language Learning, MIT Press, 1993.

References:
1. Jurafsky, Dan and Martin, James, Speech and Language Processing, Second Edition, Prentice Hall, 2008.
2. Manning, Christopher and Heinrich, Schutze, Foundations of Statistical Natural Language Processing, MIT Press, 1999.

Any other Study Material: https://nptel.ac.in/syllabus/106101007/

You might also like