You are on page 1of 5

Computational linguistics

Seminar 1
1. What is computational linguistics?
Computational linguistics is the scientific and engineering discipline concerned
with understanding written and spoken language from a computational
perspective.
What is the purpose of computational linguistics?
It seeks to develop systems that facilitate human-computer interaction, and
to automate a range of practical linguistic tasks. These tasks include machine
translation, text summarization, speech recognition and generation,
information extraction and retrieval, and sentiment analysis of text.
The theoretical goals of computational linguistics include the formulation of
grammatical and semantic frameworks for characterizing languages in ways
enabling computationally tractable implementations of syntactic and semantic
analysis; the discovery of processing techniques and learning principles that
exploit both the structural and distributional (statistical) properties of
language; and the development of cognitively and neuro scientifically plausible
computational models of how language processing and learning might occur in
the brain.
The practical goals of the field are broad and varied. Some of the most
prominent are: efficient text retrieval on some desired topic; effective machine
translation (MT); question answering (QA), ranging from simple factual
questions to ones requiring inference and descriptive or discursive answers
(perhaps with justifications); text summarization; analysis of texts or spoken
language for topic, sentiment, or other psychological attributes; dialogue
agents for accomplishing particular tasks (purchases, technical trouble
shooting, trip planning, schedule maintenance, medical advising, etc.); and
ultimately, creation of computational systems with human-like competency in
dialogue, in acquiring language, and in gaining knowledge from text.

2. Formal foundations
Computational linguistics is the scientific study of human language from a
computational point of view. Computational linguists provide computer
models of different types of linguistic phenomena. Computer oriented studies
have evolved into a hybrid type called computational linguistics. As an
interdisciplinary field, computational linguistics has a history of nearly half a
century. The ultimate goal of computational linguistics is to explain the basic
techniques used to create computer models for the generation and
understanding of natural language. Computational linguistics is a very
important field nowadays. Computational linguists create tools and resources
for important practical tasks such as machine translation, speech recognition,
speech synthesis, information extraction from text, grammar checking, and
text mining. Computational linguistics is the study of computer systems for
understanding and understanding Natural language generation. Tools that
work in computational linguistics use artificial intelligence i.e. Algorithms, data
structures, formal models for expressing knowledge, models for inference
processes, and so on.
Formal Language Theory
¢regular languages and regular expressions;
¢languages vs. computational machinery;
¢finite state automata;
¢regular relations and finite state transducers;
¢ context-free grammars and languages;
¢the Chomsky hierarchy;
¢weak and strong generative capacity;
¢mildly context-sensitive languages.
The central task of a future-oriented computational linguistics is the
development of cognitive machines which humans can freely talk with in their
respective natural language. In the long run, this task will ensure the
development of a functional theory of language, an objective method of
verification, and a wide range of practical applications. Natural communication
requires not only verbal processing, but also non-verbal perception and action.
3. Current Methods
1.Maximum Enthropy Models- Maximum entropy modeling (MaxEnt) uses
techniques developed from machine learning, allowing empirical data
to be used to predict the probability of finding something under
certain conditions distributed in space (Dudík et al. 2007). MaxEnt uses
presence only data by generating random test points.
2.Memory-Based Learning- A memory-based learning system is an extended
memory management system that decomposes the input space either
statically or dynamically into subregions for the purpose of storing and
retrieving functional information.
3.Decision Trees- it is one way to display an algorithm that only contains
conditional control statements.
4.Unsupervised Learning and Grammar Induction
5.Artificial Networks
6.Linguistic Annotation- Linguistic annotation. Also referred to as corpus
annotation, linguistic annotation simply describes the process of tagging
language data in text or audio recordings. With linguistic annotation,
annotators are tasked with identifying and flagging grammatical,
semantic or phonetic elements in the text or audio data.
7. Evaluation of NLP (Natural Language Processing ) Systems- Natural language
processing (NLP) refers to the branch of computer science—and more
specifically, the branch of artificial intelligence or AI—concerned with giving
computers the ability to understand text and spoken words in much the
same way human beings can.
The methods employed in theoretical and practical research in computational
linguistics have often drawn upon theories and findings in theoretical
linguistics, philosophical logic, cognitive science (especially
psycholinguistics), and of course computer science.
A major shift in nearly all aspects of natural language processing began in the
late 1980s and was virtually complete by the end of 1995: this was the shift
to corpus-based, statistical approaches (signalled for instance by the
appearance of two special issues on the subject by the quarterly
Computational Linguistics in 1993). The new paradigm was enabled by the
increasing availability and burgeoning volume of machine-readable text and
speech data, and was driven forward by the growing awareness of the
importance of the distributional properties of language, the development of
powerful new statistically based learning techniques, and the hope that
these techniques would overcome the scalability problems that had beset
computational linguistics (and more broadly AI) since its beginnings.
4.Common CL applications
Computational linguistics is used in tools like instant machine translation,
speech recognition systems, text-to-speech synthesizers, interactive voice
response systems, search engines, text editors and language instruction
materials.
¢1. Machine Translation
¢2. Speech Recognition
¢3. Text-to-Speech
¢4. Natural Language Generation
¢5. Human-Computer Dialogs
¢6. Information Retrieval
¢7. Computational Modeling

You might also like