Computational Linguistics

Computational Linguistics
What is NLP?
NLP stands for Natural Language Processing, which is a subfield of artificial intelligence
(AI) that focuses on the interaction between computers and human language. NLP aims to enable
computers to understand, interpret, and generate human language in a way that is both
meaningful and useful.
Key tasks and objectives within NLP

1. Text Understanding: This involves tasks like text classification (categorizing text into
predefined categories), sentiment analysis (determining the sentiment or emotional tone
of a text), and entity recognition (identifying and categorizing named entities such as
names, dates, and locations in text).
2. Language Generation: NLP can be used to generate human-like text, which is useful in
applications like chatbots, language translation, and content generation.
3. Machine Translation: NLP plays a crucial role in machine translation systems that
automatically translate text from one language to another, such as Google Translate.
4. Speech Recognition: While technically a separate field, speech recognition often

overlaps with NLP. It involves converting spoken language into written text, making it
possible for voice assistants like Siri or Alexa to understand and respond to spoken
commands.
5. Question Answering: NLP systems can be designed to answer questions posed in natural
language, such as those used in search engines or virtual assistants.
6. Information Retrieval: NLP helps improve the accuracy of search engines by

understanding the intent behind user queries and returning relevant results.
7. Language Modeling: This involves building probabilistic models of language to predict

the likelihood of a word or phrase given its context. Such models are fundamental in
many NLP applications, including text generation and machine translation.
8. Sentiment Analysis: NLP can be used to determine the sentiment (positive, negative, or
neutral) expressed in a piece of text, which is valuable for understanding public opinion
and customer feedback.
NLP relies on various techniques, including machine learning, deep learning, and linguistic rules,
to analyze and process natural language. It has a wide range of applications, from chatbots and
virtual assistants to language translation and content recommendation systems, and it continues
to advance rapidly, making human-computer interaction more natural and effective.
Computational Linguistics
Computational linguistics, often referred to as "CL" or "Natural Language Processing
(NLP)," is an interdisciplinary field that combines principles from linguistics and computer
science to develop algorithms and models for the automatic processing of human language. It
focuses on the application of computational techniques to the analysis, understanding, and
generation of natural language.
Development of the Field

 1950s: Machine translation was started at this phase and Russian was translated
into English in this phase of development
 1960-70: Through ontology (Networking of computers) different chatbots were
made possible
 1980s: Parsing, the technique to decompose sentence into its components was
designed in this phase
 1990s: Machine learning was introduced in this phase of the development of the
field. Machine learning is actually the ability of a machine to do some intelligent
tasks.
 2000: Unsupervised learning in computational linguistics refers to a category of
machine learning techniques where a model is trained on a dataset without
explicit supervision or labeled examples. In other words, the model is not
provided with predefined categories or target outputs; instead, it learns patterns,
structures, and relationships within the data on its own. Unsupervised learning is
particularly valuable in natural language processing (NLP) and computational
linguistics
 2010- Present: It is the era of deep learning in the field of CL. Deep learning in
its most simplified sense is the intensified and deep insights about machine
learning.
Tasks of NLP
Extraction
In the context of Natural Language Processing (NLP), "extraction" typically
refers to the process of extracting specific pieces of information or structured data from
unstructured text.
Noise Reduction
Nose reduction is the process of removing unwanted tags and symbols from the data
extracted from social sites.
Normalization
Normalization is a way to convert or transform unstructured data into partially structured
data.
Tokenization
Tokenization is a fundamental preprocessing step in Natural Language Processing (NLP)
that involves breaking down a text into individual units or "tokens." These tokens are typically
words, but they can also be subword units like subword pieces or characters, depending on the
specific tokenization method used. Tokenization is important because it allows NLP models to
work with discrete units of text, making it easier to process and analyze language data.
Stemming
Stemming is a text normalization technique in Natural Language Processing (NLP) that
aims to reduce words to their root or base form, known as the "stem." The stem is a
morphological core shared by a set of related words, which helps in grouping words with similar
meanings together. Stemming is particularly useful in tasks like information retrieval, text
indexing, and document clustering. However, it's important to note that stemming is a heuristic
and rule-based process and may not always produce valid word stems.
Lemmatization
Lemmatization is a text normalization technique in Natural Language Processing (NLP)
that aims to reduce words to their base or dictionary form, known as the "lemma." Unlike
stemming, which often involves heuristic and rule-based processes to strip off suffixes from
words, lemmatization takes into account the word's context and uses linguistic rules to produce
valid lemmas.
Name Entity Recoganization

Named Entity Recognition (NER) is a fundamental task in Natural Language Processing
(NLP) that involves identifying and classifying named entities in text into predefined categories.
They are user defined words.
Parsing
Parsing in Natural Language Processing (NLP) refers to the process of analyzing the
grammatical structure of a sentence to determine the relationships between words and their
syntactic roles within the sentence. It involves breaking down a sentence into its constituent parts
and representing them in a structured format, often in the form of a parse tree or dependency
tree. Parsing is essential for understanding the syntax of natural language, which is crucial for
many NLP tasks, such as machine translation, question answering, and text generation.
Parts of Speech Tagging

Part-of-Speech (POS) tagging, also known as grammatical tagging or word-category
disambiguation, is a fundamental task in Natural Language Processing (NLP). It involves
assigning grammatical labels or tags to each word in a sentence, indicating its syntactic and
grammatical role in the sentence. POS tagging is essential for various NLP applications,
including parsing, information retrieval, and machine translation.

Computational Linguistics

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Computational Linguistics

Uploaded by

Copyright:

Available Formats

Computational Linguistics

Key tasks and objectives within NLP

4. Speech Recognition: While technically a separate field, speech recognition often

6. Information Retrieval: NLP helps improve the accuracy of search engines by

7. Language Modeling: This involves building probabilistic models of language to predict

Development of the Field

Name Entity Recoganization

Parts of Speech Tagging

You might also like