Professional Documents
Culture Documents
UNIT – II: N-gram Language Models: N-Grams, Evaluating Language Model, Sampling sentences from a language model,
Sequence Labeling for Parts of Speech and Named Entities: Part-of-Speech Tagging, Named Entities and Named Entity
Tagging.
UNIT – III: Naive Bayes and Sentiment Classification: Naive Bayes Classifiers, Training the Naive Bayes Classifier,
Optimizing for Sentiment Analysis, Naive Bayes as a Language Model, Evaluation: Precision, Recall, F-measure, Test sets
and Cross-validation
UNIT – IV: Word Senses and WordNet: Word Senses, Relations Between Senses, WordNet: A Database of Lexical
Relations, Word Sense Disambiguation, WSD Algorithm: Contextual Embeddings
UNIT – V: Question Answering: Information Retrieval, IR-based Factoid Question Answering, IR[1]based QA: Datasets,
Entity Linking, Knowledge-based Question Answering, Using Language Models to do QA, Classic QA Models.
UNIT – VI: Chatbots & Dialogue Systems: Properties of Human Conversation, Chatbots, GUS: Simple Frame-based
Dialogue Systems, The Dialogue-State Architecture, Evaluating Dialogue Systems, Dialogue System Design, Automatic
Speech Recognition and Text-to-Speech: The Automatic Speech Recognition Task, Feature Extraction for ASR: Log Mel
Spectrum, Speech Recognition Architecture
NLP Lab Syllabus
● WEEK 9: Write a program to generate N-gram model
● WEEK 10: Write a program to perform sentiment classification
● WEEK 11: Write a program to perform text classification and evaluate the model.
● WEEK 12 & 13: Implementation of real-world case study - IR based Question
Answering system (or) chat bot application
● WEEK 14: Internal Lab exam
Text Books & References
TEXT BOOKS:
1. Speech and Language Processing, Dan Jurafsky and James H. Martin (Stanford.edu), 3rd Edition, Pearson
Publications
2. Natural Language Processing with Python, Analyzing Text with the Natural Language Toolkit, Steven Bird,
Ewan Klein, and Edward Loper
REFERENCES:
1. Practical Natural Language Processing: A Comprehensive Guide to Building Real-World NLP Systems,
Sowmya Vajjala, Bodhisattwa Majumder, Anuj Gupta, Harshit Surana
2. Foundations of Statistical Natural Language Processing, Christopher Manning and Hinrich Schütze
3. Natural Language Processing in Action, Understanding, Analysing, and Generating Text with Python, Hobson
Lane, Cole Howard, Hannes Max Hapke
4. The Handbook of Computational Linguistics and Natural Language Processing, (Blackwell Handbooks in
Linguistics) 1st Edition
Natural Language Processing (NLP)
The main aim of the NLP is to read, understand and decode human
words in a valuable manner.
Note:
1. Most of the NLP techniques depend on machine learning to obtain
meaning from the human languages.
Uses of NLP
To search for “Buttercup”, we type /Buttercup/. The expression /Buttercup/ matches any
string containing the substring Buttercup.
Note: Regular expressions are case sensitive, lower case /s/ is distinct from uppercase /S/
(/s/ matches a lower case s but not an uppercase S).
This means that the pattern /woodchucks/ will not match the string Woodchucks.
Basic Regular Expression Patterns
We can solve this problem with the use of the square braces [ and ]. The string of characters
inside the braces specifies a disjunction of characters to match.
Fig. shows that the pattern /[wW]/ matches patterns containing either w or W.
Basic Regular Expression Patterns
In cases where there is a well-defined sequence associated with a set of characters, the
brackets can be used with the dash (-) to specify any one character in a range.
Basic Regular Expression Patterns
The square braces can also be used to specify what a single character cannot be, by use of
the caret ˆ.
If the caret ˆ is the first symbol after the open square brace [, the resulting pattern is negated.
Example: The pattern /[ˆa]/ matches any single character (including special characters)
except a. This is only true when the caret is the first symbol after the open square brace.
Basic Regular Expression Patterns
Example:
1. /a*/ means “any string of zero or more as”. This will match a or aaaaaa.
2. /aa*/, meaning one a followed by zero or more as
3. /[ab]*/ means “zero or more a’s or b’s”. This will match strings like aaaa or ababab or
bbbb.
4. /[0-9][0-9]*/ - an integer
Basic Regular Expression Patterns
Kleene +: which means “one or more occurrences of the immediately preceding character”.
Example:
Anchors: Anchors are special characters that anchor regular expressions to particular places in a
string. The most common anchors are shown below.
Example:
/\b99\b/ will match the string 99 in “There are 99 bottles on the wall”, but not 99 in "There are 299
bottles on the wall”
But it will match 99 in $99 (since 99 follows a dollar sign ($), which is not a digit, underscore, or letter)
Disjunction, Grouping, and Precedence
In some cases, we might want to search for either the string cat or the string dog. Since we
can’t use the square brackets to search for “cat or dog” ( /[catdog]/?)).
we need a new operator, the disjunction operator, also called the pipe symbol |
Example:
1. The pattern /cat|dog/ matches either the string cat or the string dog.
2. How can we specify both happy and happiest?
3. Regular expressions for Column 1 Column 2 Column 3
Disjunction, Grouping, and Precedence
In some cases, we might want to search for either the string cat or the string dog. Since we
can’t use the square brackets to search for “cat or dog” ( /[catdog]/?)).
we need a new operator, the disjunction operator, also called the pipe symbol |
Example:
1. The pattern /cat|dog/ matches either the string cat or the string dog.
2. How can we specify both happy and happiest? /happ(y|iest)/
3. Regular expressions for Column 1 Column 2 Column 3 /(Column [0-9]+ *)*/
Disjunction, Grouping, and Precedence
Operator precedence hierarchy
Substitution
1. s/regexp1/pattern/
Example:
2. s/colour/color/
Substitution, Capture Groups
/the (.*)er they (.*), the \1er we \2/ - will match the faster they ran, the
faster we ran but not the faster they ran, the faster we ate.
Similarly, the third capture group is stored in \3, the fourth is \4, and so on.
non-capturing group: Which is specified by putting the commands non-
capturing group ?: after the open paren, in the form (?: pattern ).
/(?:some|a few) (people|cats) like some \1/ - will match some cats like
some cats but not some cats like some a few.
WORDS
We need to decide what counts as a word. Let’s start by looking at one particular
corpus, a computer-readable collection of text or speech.
Brown Corpus - Million-word collection of samples from 500 written English texts
from different genres (newspaper, fiction, non-fiction, academic, etc.), and these are
available at Brown University in 1963–64.
Example: He stepped out into the hall, was delighted to encounter a water brother.
15 if we count punctuation.
Whether we treat period (“.”), comma (“,”), and so on as words depends on the task.
WORDS
The Switchboard corpus of American English telephone conversations has 3
million words. This corpora of spoken language don’t have punctuation but it
introduces other complications with regard to defining words. aparticipants
sound or word that
in a
conversation use to
an utterance is the spoken correlate of a sentence signal that they are
pausing to think but
are not finished
Example: I do uh main- mainly business data processing speaking
The utterance has two kinds of disfluencies: The broken-off word main- is
called a fragment. Words like uh and um are called fillers or filled pauses.
Should we consider these to be words?
Are inflected words like cats and cat same? These two words have the same lemma.
The word form is the full inflected or derived form of the word.
To know how many words are there in English? To answer this question we need to distinguish
two ways of talking about words.
Types are the number of distinct words in a corpus; if the set of words in the vocabulary is V,
the number of types is the vocabulary size V. Tokens are the total number N of running words.
If we ignore punctuation, the following Brown sentence has 16 tokens and 14 types:
WORDS
Example:They picnicked by the pool, then lay back on the grass and looked at the
stars.This sentence has 16 tokens and 14 types.
where k and β are positive constants, and 0 < β < 1. The value of β depends on
the corpus size and the genre (ranges from .67 to .75)
● Another measure of the number of words in the language is the
number of lemmas instead of wordform types.
● The 1989 edition of the Oxford English Dictionary had 615,000 entries
CORPORA
Corpora - Corpus - a computer-readable collection of text or speech
NLP algorithms works well only if we have corpus for multi languages.
Code switching: It’s also quite common for speakers or writers to use multiple
languages in a single communicative act, a phenomenon called code switching
dost tha or ra- hega ... dont wory ... but dherya rakhe
[“he was and will remain a friend ... don’t worry ... but have faith”]
CORPORA
Any particular piece of text that we study is produced by one or more writers or
speakers, in a specific dialect of specific language, at a specific time, in a specific
place.
NLP algorithms are most useful when they apply across many languages.
Now the algorithms developed for the languages - Chinese, Spanish, Japanese,
German, but we don’t limit tools to these few languages.
CORPORA
Most languages have multiple varieties often spoken in different regions or by
different social groups.
AAL(American African language) - Twitter post might use features often used by
speakers, such as iont (I don’t) and talmbout(talking about), these influence word
segmentation.
CORPORA
The other dimension of variation is the genre. The text that our algorithms must
process might
● Come from newswire, fiction or non-fiction books, scientific articles,
Wikipedia, or religious texts.
● Come from spoken genres like telephone conversations, business meetings,
medical interviews, or transcripts of television shows or movies.
● Come from work situations like doctors’ notes, legal text.
● Text also reflects the demographic characteristics of the writer (or speaker):
their age, gender, race, socioeconomic class can all influence the linguistic
properties of the text we are processing.
● The time matters too, language changes over time, and for some languages
we have good corpora of texts from different historical periods.
CORPORA
Because language is so situated, when developing computational models for
language processing from a corpus, it’s important to consider
● who produced the language
● in what context
● for what purpose
The best way is for the corpus creator to build a datasheet or data statement for
each corpus.
CORPORA
A datasheet specifies properties of a dataset like:
Motivation: Why was the corpus collected, by whom, and who funded it?
Language variety: What language (including dialect/region) was the corpus in?
Speaker demographics: What was the age or gender of the authors of the text?
Collection process: How big is the data? If it is a subsample how was it sampled?
Was the data collected with consent? How was the data pre-processed, and what
metadata is available?
CORPORA
Annotation process: What are the annotations, what are the demographics of the
annotators, how were they trained, how was the data annotated?
Output:
THE
SONNETS
by
William
Shakespeare
From
fairest
creatures
We
UNIX Tools for Tokenization or Normalization
we can sort the lines, and pass them to uniq -c which will collapse and count them
Output:
1945 A
72 AARON
19 ABBESS
25 Aaron
6 Abate
1 Abates
5 Abbess
6 Abbey
3 Abbot
UNIX Tools for Tokenization or Normalization
we can collapse all the upper case to lower case:
Output:
14725 a
97 aaron
1 abaissiez
10 abandon
2 abandoned
2 abase
1 abash
14 abate
3 abated
3 abatement
UNIX Tools for Tokenization or Normalization
Now we can sort again to find the frequent words. The -n option means to sort
numerically rather than alphabetically, and the -r option means to sort in reverse
order (highest-to-lowest):
tr -sc ’A-Za-z’ ’\n’ < sh.txt | tr A-Z a-z | sort | uniq -c | sort -n -r
Output:
27378 the
26084 and
22538 i
19771 to
17481 of
Word Tokenization
Tokenization - It is the task of segmenting running text into words.
The UNIX command sequence removed all the numbers and punctuation. But we will
need to keep these in our tokenization.
● Commas are a useful piece of information for parsers
● Periods help indicate sentence boundaries
● Keep the punctuation that occurs word internally (Ph.D, AT&T, m.p.h)
● Special characters and numbers will need to be kept in prices ($99.45) and dates
● URLs and Twitter hashtags, and Email addresses.
● Number expressions introduce other complications - commas are used inside
numbers in English like 5,55,500.50- hence tokenization requirements differ on
this.
Word Tokenization
A tokenizer can also be used to expand clitic contractions that are marked by
apostrophes. ( converting what’re to the two tokens what are, we’re to we are).
This standard separates out clitics (doesn’t becomes does plus n’t), keeps
hyphenated words together, and separates out all punctuation.
Input: "The San Francisco-based restaurant," they said, "doesn’t charge $10".
Output: " The San Francisco-based restaurant , " they said , " does n’t charge $ 10
".
Word Tokenization
A Python trace of regular expression tokenization in the NLTK
Word Tokenization
Word tokenization is more complex in languages like written Chinese, Japanese,
and Thai, which do not use spaces to mark potential word-boundaries.
Chinese simply to ignore words altogether and use characters as the basic
elements, treating the sentence as a series of 7 characters.
● Sentence tokenization and word tokenization are both processes
commonly used in natural language processing (NLP) for breaking
down text into smaller, more manageable units. Here's how they differ:
1. Sentence Tokenization:
1. Definition: Sentence tokenization involves splitting a text into individual sentences.
2. Purpose: It is used to divide text into meaningful segments, making it easier to process
sentences individually.
3. Example: Given the input "Hello, how are you? I'm doing well.", sentence tokenization
would produce two sentences: "Hello, how are you?" and "I'm doing well."
4. Method: Sentence tokenization typically relies on punctuation (e.g., periods, question
marks, exclamation marks) to identify sentence boundaries, although more advanced
techniques may also consider contextual information.
2. .
1. Word Tokenization:
1. Definition: Word tokenization involves splitting a text into individual words or tokens.
2. Purpose: It is used to break down text into its basic units for further analysis and processing.
3. Example: Given the input "The quick brown fox jumps over the lazy dog.", word tokenization
would produce ten tokens: ["The", "quick", "brown", "fox", "jumps", "over", "the", "lazy",
"dog", "."].
4. Method: Word tokenization typically relies on whitespace and punctuation to identify word
boundaries. However, it may also consider language-specific rules and patterns, such as
contractions and hyphenated words.
● In summary, sentence tokenization divides text into sentences, while word
tokenization divides text into individual words or tokens.
● Both processes are fundamental in NLP and are often used as preprocessing
steps for various tasks such as text classification, sentiment analysis, and
machine translation
Byte Pair Encoding (BPE) Tokenization
This method is used for tokenization.
In general NLP algorithms often learn some facts about language from one corpus (a
training corpus) and then use these facts to make decisions about a separate test
corpus and its language.
Suppose if our training corpus contains the word like low, new, newer, but not lower,
then if the word lower appears in our test corpus, our system will not know what to do
with it.
Byte Pair Encoding Tokenization
To deal with this unknown word problem, modern tokenizers often automatically
induce sets of tokens that include tokens smaller than word.
In modern tokenization schemes, most tokens are words, but some tokens are
frequently occurring morphemes or other subwords like -er. Every unseen words
like lower can thus be represented by some sequence of known subword units,
such as low and er, or even as a sequence of individual letters if necessary.
Byte Pair Encoding Tokenization
Most tokenization schemes have two parts: a token learner, and a token segmenter.
The token learner takes a raw training corpus and induces a vocabulary, a set of
tokens.
The token segmenter takes a raw test sentence and segments it into the tokens in the
vocabulary.
1. Byte-pair encoding
2. Unigram language modeling
3. WordPiece.
Byte Pair Encoding Tokenization
The BPE token learner begins with a vocabulary that is the set of all individual
characters.
It then examines the training corpus, chooses the two symbols that are most
frequently adjacent (say ‘A’, ‘B’), adds a new merged symbol ‘AB’ to the vocabulary,
and replaces every adjacent ’A’ ’B’ in the corpus with the new ‘AB’.
It continues to count and merge, creating new longer and longer character strings,
until k merges have been done creating k novel tokens ( K is the parameter of the
algorithm).
The resulting vocabulary consists of the original set of characters plus k new symbols.
Byte Pair Encoding Tokenization
The input corpus is first white-space-separated to give a set of strings, each
corresponding to the characters of a word, plus a special end-of-word symbol_,
and its counts
Byte Pair Encoding Tokenization
The BPE algorithm first count all pairs of adjacent symbols: the most frequent is
the pair e r because it occurs in newer (frequency of 6) and wider (frequency of 3)
for a total of 9 occurrences.
Byte Pair Encoding Tokenization
Byte Pair Encoding Tokenization
Byte Pair Encoding Tokenization
BPE - Algorithm
Byte Pair Encoding Tokenization
Once we’ve learned our vocabulary, the token parser is used to tokenize a test
sentence.
The token parser just runs on the test data the merges we have learned from the
training data, greedily, in the order we learned them. (Thus the frequencies in the
test data don’t play a role, just the frequencies in the training data).
Word Normalization, Lemmatization and Stemming
Normalization is the task of putting words/tokens in a standard format, choosing a
single normal form for words with multiple forms like USA and US or uh-huh and
uhhuh.
For sentiment analysis and other text classification tasks, information extraction,
and machine translation, by contrast, case can be quite helpful and case folding is
generally not done.
Lemmatization
Lemmatization is the process of determining that two words having the same root.
Example:
The lemmatized form of a sentence like “He is reading detective stories” would
thus be “He be read detective story”.
How Lemmatization done?
The most methods for lemmatization involve complete morphological parsing of
the word.
Morphology: It is the study of the way words are built up from smaller meaning-
bearing units called morphemes.
It parses a Spanish word like amaren (‘if in the future they would love’) into the
morpheme amar ‘to love’, and the morphological features 3PL and future
subjunctive.
Stemming:
Stemming is a process of removing affixes from words to obtain their root forms, known
as stems.
Stemming algorithms apply heuristic rules to chop off prefixes or suffixes from words.
It is a simpler and faster process compared to lemmatization.
Stemmed words may not always result in actual words.
For example, "running" may be stemmed to "run", but "run" is a valid word in English.
Stemming is useful in tasks where speed and simplicity are prioritized over accuracy, such as
information retrieval or indexing in search engines.
● In linguistics, an affix is a morpheme that is attached to a word or stem to create a new word or modify its
meaning or grammatical function. Affixes can be prefixes, suffixes, infixes, or circumfixes:
1. Prefix: An affix that is attached to the beginning of a word. For example, in the word "unhappy", "un-" is a
prefix.
2. Suffix: An affix that is attached to the end of a word. For example, in the word "happily", "-ly" is a suffix.
3. Infix: An affix that is inserted into the middle of a word. This is less common in English but is found in some
languages. For example, in Tagalog, a language spoken in the Philippines, the infix "-um-" is inserted into verbs
to indicate past tense, as in "ganda" (beautiful) becomes "gumanda" (became beautiful).
4. Circumfix: An affix that consists of two parts, one attached to the beginning of a word and the other attached to
the end. The word is modified by both parts. This is also less common in English but can be found in some
languages. For example, in German, the verb "gehen" (to go) can be transformed into the past participle
"gegangen" by adding the circumfix "ge-" at the beginning and "-en" at the end.
● Affixes play a significant role in morphology, the study of the structure of words and how they are formed. They
can change the meaning, part of speech, or grammatical function of words.
● Morphological analysis is a linguistic process that involves breaking down words into
their smallest meaningful units, called morphemes, and studying how these
morphemes contribute to the overall structure and meaning of words. Morphemes are
the smallest units of meaning in language.
● Here's how morphological analysis of words works:
1. Identification of Morphemes: Linguists analyze words to identify the morphemes within them. Morphemes can be either free
morphemes, which can stand alone as words (e.g., "cat", "run"), or bound morphemes, which must be attached to other
morphemes to form words (e.g., "-s" for plural, "-ing" for progressive tense).
2. Classification of Morphemes: Once identified, morphemes are classified based on their grammatical and semantic functions.
For example, morphemes may indicate tense, plurality, possession, or serve as prefixes, suffixes, or roots.
3. Study of Morphological Processes: Morphological analysis also involves studying how morphemes combine to form words
through processes such as affixation (adding prefixes, suffixes, or infixes), compounding (combining two or more words),
derivation (forming new words from existing ones), or inflection (altering the form of a word to indicate grammatical features
like tense, number, or gender).
4. Analysis of Word Structure: Morphological analysis examines the internal structure of words, including the order and
arrangement of morphemes within them. This helps understand how words are formed and how their meanings are constructed.
● Morphological analysis is essential in understanding language structure, word formation, and the relationship between form and
meaning in words. It provides insights into the rules and patterns governing word formation in a language, contributing to
linguistic research, language teaching, and natural language processing tasks such as machine translation, text analysis, and
speech recognition.
Lemmatization:
●
Lemmatization, on the other hand, involves reducing words to their base or dictionary form, known as
lemma.
Lemmatization uses a vocabulary and morphological analysis of words to accurately derive their
lemma.
It considers the context of the word and applies linguistic rules to transform words to their dictionary forms.
Lemmatization produces valid words, ensuring linguistic correctness.
It's a more complex and computationally intensive process compared to stemming.
Lemmatization is often preferred in tasks where accuracy is crucial, such as natural language understanding,
sentiment analysis, or machine translation.
● In summary, while stemming and lemmatization both aim to reduce words to their base forms, stemming is a
simpler and faster process that may result in non-words, while lemmatization is more accurate and considers the
linguistic context to produce valid dictionary forms of words.
● Example :
● Consider the word "running":
1. Stemming:
When stemming "running", a stemming algorithm might simply remove the suffix "-ing" to
obtain the stem "run". This is a heuristic approach that doesn't consider the linguistic
context deeply.
Stemming result: "running" → "run"
2. Lemmatization:
When lemmatizing "running", the process understands that "running" is a form of the base
verb "run". It consults a dictionary or a set of linguistic rules to derive the lemma.
Lemmatization result: "running" → "run"
● Another example with the word "mice":
1. Stemming:
A stemming algorithm might just remove the suffix "-s" to obtain the stem "mice". This is based on a simple rule, without considering
linguistic context.
Stemming result: "mice" → "mic"
2. Lemmatization:
Lemmatization, however, recognizes that "mice" is the plural form of "mouse" and converts it to the singular form.
Lemmatization result: "mice" → "mouse"
● These examples demonstrate how stemming and lemmatization can produce different results based on their respective algorithms and
linguistic analysis. Lemmatization aims for accuracy by considering the context and linguistic rules, while stemming operates more crudely
by chopping off affixes.
●
The Porter Stemmer
Lemmatization algorithms can be complex. For this reason we sometimes make
use of a simpler but cruder method, which mainly consists of chopping off word-
final stemming affixes.
This was not the map we found in Billy Bones’s chest, but an accurate copy,
complete in all things-names and heights and soundings-with the single exception
of the red crosses and the written notes.
Stemmed output:
Thi wa not the map we found in Billi Bone s chest but an accur copi complet in all
thing name and height and sound with the singl except of the red cross and the
written note
The Porter Stemmer
The algorithm is based on series of rewrite rules run in series, as a cascade, in
which the output of each pass is fed as input to the next pass.
The most useful cues for segmenting a text into sentences are punctuation, like
periods, question marks, and exclamation points.
For example: In spelling correction, the user typed some erroneous string—let’s
say graffe–and we want to know what the user meant. (The most similar word is
giraffe - Which differs by one letter).
Coreference - The task of deciding whether two strings refers to the same entity:
Minimum Edit Distance: Minimum Edit Distance between two strings is defined
as the minimum number of editing operations (operations like insertion, deletion,
substitution) needed to transform one string into another.
This algorithm uses dynamic programming for finding minimum edit distance.
Given two strings, the source string X of length n, and target string Y of length m,
we’ll define D[i, j] as the edit distance between X[1..i] and Y[1.. j].
In the base case, with a source substring of length i but an empty target string,
going from i characters to 0 requires i deletes. With a target substring of length j
but an empty source going from 0 characters to j characters requires j inserts.
Minimum Edit Distance Algorithm
Insertion and deletion cost is 1 and substitution cost is 2. Then
Minimum Edit Distance Algorithm
● Minimum Edit distance (Dynamic Programming) for converting one string to a
nother string (youtube.com)
● Edit Distance between 2 Strings | The Levenshtein
Distance Algorithm + Code - YouTube
Example: Minimum edit distance between “execution” and “intention”
N-grams Language Model
Models that assigns probabilities to the sequence of words are called Language Models or
LM.
A simplest model that assigns probabilities to sentences and sequence of words - n-grams.
2 bigram
S.No Type of n-gram Generated n-grams
3 Trigram
1 Unigram [“I”,”reside”,”in”,“Bengaluru”]
n n-gram
2 Bigram [“I reside”,”reside in”,”in Bengaluru”]
N-Grams
P(w|h) - The probability of a word w given some history h.
history h is “its water is so transparent that” and know the probability that the next
word is the.
P(the | its water is so transparent that)
One way is using relative frequency counts, take a very large corpus, count the
number of times we see its water is so transparent that, and count the number of
times this is followed by the.
It works fine for many cases. But it turns out that even the web isn’t big enough to
give us good estimates in most cases. This is because language is creative, new
sentences are created all the time, and we won’t always be able to count entire
sentences.
N-Grams
Example: ( “Walden Pond’s water is so transparent that the”; well, used to have
counts of zero).
Similarly, if we wanted to know the joint probability of an entire sequence of words
like its water is so transparent.
N-Grams
The general equation for this n-gram approximation to the conditional probability of
the next word in a sequence is
N-Grams
N-Gram
N-Gram
N-Gram