Professional Documents
Culture Documents
Full Chapter Deep Learning and Linguistic Representation 1St Edition Shalom Lappin PDF
Full Chapter Deep Learning and Linguistic Representation 1St Edition Shalom Lappin PDF
https://textbookfull.com/product/programming-pytorch-for-deep-
learning-creating-and-deploying-deep-learning-applications-1st-
edition-ian-pointer/
https://textbookfull.com/product/deep-learning-pipeline-building-
a-deep-learning-model-with-tensorflow-1st-edition-hisham-el-amir/
https://textbookfull.com/product/r-deep-learning-essentials-1st-
edition-wiley/
https://textbookfull.com/product/deep-learning-on-windows-
building-deep-learning-computer-vision-systems-on-microsoft-
windows-1st-edition-thimira-amaratunga/
Deep Learning for the Life Sciences Applying Deep
Learning to Genomics Microscopy Drug Discovery and More
1st Edition Bharath Ramsundar
https://textbookfull.com/product/deep-learning-for-the-life-
sciences-applying-deep-learning-to-genomics-microscopy-drug-
discovery-and-more-1st-edition-bharath-ramsundar/
https://textbookfull.com/product/deep-learning-with-python-
develop-deep-learning-models-on-theano-and-tensorflow-using-
keras-jason-brownlee/
https://textbookfull.com/product/deep-learning-with-python-1st-
edition-francois-chollet/
https://textbookfull.com/product/deep-learning-in-biometrics-1st-
edition-mayank-vatsa/
https://textbookfull.com/product/deep-learning-with-r-1st-
edition-francois-chollet/
Deep Learning
and Linguistic
Representation
Chapman & Hall/CRC Machine Learning & Pattern Recognition
Introduction to Machine Learning with Applications in Information Security
Mark Stamp
Bayesian Programming
Pierre Bessiere, Emmanuel Mazer, Juan Manuel Ahuactzin, Kamel Mekhnacha
Shalom Lappin
First edition published 2021
by CRC Press
6000 Broken Sound Parkway NW, Suite 300, Boca Raton, FL 33487-2742
The right of Shalom Lappin to be identified as author of this work has been asserted by him in accor-
dance with sections 77 and 78 of the Copyright, Designs and Patents Act 1988.
Reasonable efforts have been made to publish reliable data and information, but the author and pub-
lisher cannot assume responsibility for the validity of all materials or the consequences of their use.
The authors and publishers have attempted to trace the copyright holders of all material reproduced
in this publication and apologize to copyright holders if permission to publish in this form has not
been obtained. If any copyright material has not been acknowledged please write and let us know so
we may rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced,
transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or
hereafter invented, including photocopying, microfilming, and recording, or in any information stor-
age or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, access www.copyright.
com or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA
01923, 978-750-8400. For works that are not available on CCC please contact mpkbookspermis-
sions@tandf.co.uk
Trademark notice: Product or corporate names may be trademarks or registered trademarks and are
used only for identification and explanation without intent to infringe.
Preface xi
vii
viii Contents
Over the past 15 years deep learning has produced a revolution in artifi-
cial intelligence. In natural language processing it has created robust,
large coverage systems that achieve impressive results across a wide
range of applications, where these were resistant to more traditional
machine learning methods, and to symbolic approaches. Deep neural
networks (DNNs) have become dominant throughout many domains of
AI in general, and in NLP in particular, by virtue of their success as
engineering techniques.
Recently, a growing number of computational linguists and cognitive
scientists have been asking what deep learning might teach us about the
nature of human linguistic knowledge. Unlike the early connectionists of
the 1980s, these researchers have generally avoided making claims about
analogies between deep neural networks and the operations of the brain.
Instead, they have considered the implications of these models for the
cognitive foundations of natural language, in nuanced and indirect ways.
In particular, they are interested in the types of syntactic structure that
DNNs identify, and the semantic relations that they can recognise. They
are concerned with the manner in which DNNs represent this informa-
tion, and the training procedures through which they obtain it.
This line of research suggests that it is worth exploring points of
similarity and divergence between the ways in which DNNs and humans
encode linguistic information. The extent to which DNNs approach (and,
in some cases, surpass) human performance on linguistically interesting
NLP tasks, through efficient learning, gives some indication of the capac-
ity of largely domain general computational learning devices for language
learning. An obvious question is whether humans could, in principle, ac-
quire this knowledge through similar sorts of learning processes.
This book draws together work on deep learning applied to natural
language processing that I have done, together with colleagues, over the
past eight years. It focusses on the question of what current methods
of machine learning can contribute to our understanding of the way in
which humans acquire and represent knowledge of the syntactic and
xi
xii Preface
Shashi Kumar, the Latex support person, have given me much needed
assistance throughout the production process.
My family, particularly my children and grandchildren, are the source
of joy and wider sense of purpose needed to complete this, and many
other projects. While they share the consensus that I am a hopeless nerd,
they assure me that the scientific issues that I discuss here are worth-
while. They frequently ask thoughtful questions that help to advance my
thinking on these issues. They remain permanently surprised that some-
one so obviously out of it could work in such a cool field. In addition to
having them, this is indeed one of the many blessings I enjoy. Above all,
my wife Elena is a constant source of love and encouragement. Without
her none of this would be possible.
The book was written in the shadow of the covid 19 pandemic. This
terrible event has brought at least three phenomena clearly into view.
First, it has underlined the imperative of taking the results of scientific
research seriously, and applying them to public policy decisions. Leaders
who dismiss well motivated medical advice, and respond to the crisis
through denial and propaganda, are inflicting needless suffering on their
people. By contrast, governments that allow themselves to be guided by
well supported scientific work have been able to mitigate the damage
that the crisis is causing.
Second, the crisis has provided a case study in the damage that
large scale campaigns of disinformation, and defamation, can cause to
the health and the well-being of large numbers of people. Unfortunately,
digital technology, some of it involving NLP applications, has provided
the primary devices through which these campaigns are conducted. Com-
puter scientists working on these technologies have a responsibility to
address the misuse of their work for socially destructive purposes. In
many cases, this same technology can be applied to filter disinformation
and hate propaganda. It is also necessary to insist that the agencies for
which we do this work be held accountable for the way in which they
use it.
Third, the pandemic has laid bare the devastating effects of extreme
economic and social inequality, with the poor and ethnically excluded
bearing the brunt of its effects. Nowhere has this inequality been more
apparent than in the digital technology industry. The enterprises of this
industry sustain much of the innovative work being done in deep learn-
ing. They also instantiate the sharp disparities of wealth, class, and
opportunity that the pandemic has forced into glaring relief. The engi-
neering and scientific advances that machine learning is generating hold
xiv Preface
out the promise of major social and environmental benefit. In order for
this promise to be realised, it is necessary to address the acute deficit in
democratic accountability, and in equitable economic arrangements that
the digital technology industry has helped to create.
Scientists working in this domain can no longer afford to treat these
problems as irrelevant to their research. The survival and stability of
the societies that sustain this research depend on finding reasonable
solutions to them.
NLP has blossomed into a wonderfully vigorous field of research.
Deep learning is still in its infancy, and it is likely that the architec-
tures of its systems will change radically in the near future. By using it
to achieve perspective on human cognition, we stand to gain important
insight into linguistic knowledge. In pursuing this work it is essential
that we pay close attention to the social consequences of our scientific
research.
Shalom Lappin
London
October, 2020
CHAPTER 1
Introduction: Deep
Learning in Natural
Language Processing
1
2 Introduction: Deep Learning in Natural Language Processing
which confirms that this effect is a genuine property of the data, rather
than regression to the mean.
Lau et al. (2020) expand the set of neural language models to
include unidirectional and bidirectional transformers. They find that
bidirectional, but not unidirectional, transformers approach a plausible
estimated upper bound on individual human prediction of sentence ac-
ceptability, across context types. This result raises interesting questions
concerning the role of directionality in human sentence processing.
In Chapter 5 I discuss whether DNNs, particularly those described
in previous chapters, offer cognitively plausible models of linguistic rep-
resentation and language acquisition. I suggest that if linguistic theories
provide accurate explanations of linguistic knowledge, then NLP systems
that incorporate their insights should perform better than those that
do not, and I explore whether these theories, specifically those of for-
mal syntax, have, in fact, made significant contributions to solving NLP
tasks. Answering this question involves looking at more recent DNNs
enriched with syntactic structure. I also compare DNNs with grammars,
as models of linguistic knowledge. I respond to criticisms that Sprouse,
Yankama, Indurkhya, Fong, and Berwick (2018) raise against Lau, Clark,
and Lappin (2017)’s work on neural language models for the sentence
acceptability task to support the view that syntactic knowledge is proba-
bilistic rather than binary in nature. Finally, I consider three well-known
cases from the history of linguistics and cognitive science in which the-
orists reject an entire class of models as unsuitable for encoding human
linguistic knowledge, on the basis of the limitations of a particular mem-
ber of the class. The success of more sophisticated models in the class
has subsequently shown these inferences to be unsound. They represent
influential cases of over reach, in which convincing criticism of a fairly
simple computational model is used to dismiss all models of a given
type, without considering straightforward improvements that avoid the
limitations of the simpler system.
I conclude Chapter 5 with a discussion of the application of deep
learning to distributional semantics. I first briefly consider the type the-
oretic model that Coecke, Sadrzadeh, and Clark (2010) and Grefenstette,
Sadrzadeh, Clark, Coecke, and Pulman (2011) develop to construct com-
positional interpretations for phrases and sentences from distributional
vectors, on the basis of the syntactic structure specified by a pregroup
grammar. This view poses a number of conceptual and empirical prob-
lems. I then suggest an alternative approach on which semantic interpre-
tation in a deep learning context is an instance of sequence to sequence
4 Introduction: Deep Learning in Natural Language Processing
NLP tasks, we stand to gain useful insights into possible ways of encoding
and representing natural language as part of the language learning pro-
cess. We will also deepen our understanding of the relative contributions
of domain general induction procedures on one hand, and language spe-
cific learning biases on other, to the success and efficiency of this process.
(ii) one or more hidden layers, in which units (neurons) compute the
weights for components of the data, and
ezi
sof tmax(z)i = P zj
je
This function applies the exponential function to each input value, and
normalises it by dividing it with the sum of the exponentials for all the
inputs, to insure that the output values sum to 1. The softmax function
is widely used in the output layer of a DNN to generate a probability
distribution for a classifier, or for a probability model.
Words are represented in a DNN by vectors of real numbers. Each
element of the vector expresses a distributional feature of the word.
These features are the dimensions of the vectors, and they encode its
co-occurrence patterns with other words in a training corpus. Word em-
beddings are generally compressed into low dimensional vectors (200–300
dimensions) that express similarity and proximity relations among the
words in the vocabulary of a DNN model. These models frequently use
large pre-trained word embeddings, like word2vec (Mikolov, Kombrink,
Deoras, Burget, & Èernocký, 2011) and GloVe (Pennington, Socher, &
Manning, 2014), compiled from millions of words of text.
In supervised learning a DNN is trained on data annotated with
the features that it is learning to predict. For example, if the DNN is
learning to identify the objects that appear in graphic images, then its
training data may consist of large numbers of labelled images of the
objects that it is intended to recognise in photographs. In unsupervised
10 Introduction: Deep Learning in Natural Language Processing
learning the training data are not labelled.5 A generative neural language
model may be trained on large quantities of raw text. It will generate the
most likely word in a sequence, given the previous words, on the basis
of the probability distribution over words, and sequences of words, that
it estimates from the unlabelled training corpus.
ht h0 h1 h2 ht
A = A A A A
xt x0 x1 x2 ... xt
(i) The forgetting gate determines which part of the information re-
ceived from preceding units is discarded;
(ii) the input gate updates the retained information with the features
of a new element of the input sequence; and
(iii) the output gate defines the vector which is passed to the next unit
in the network.
feature map from this data. A pooling layer compresses the map by re-
ducing its dimensions, and rendering it invariant to small changes in
input (noise filtering). Successive convolutional + pooling layers con-
struct progressively higher level representations from the feature maps
received from preceding levels of the network. The output feature map
is passed to one or more fully interconnected layers, which transform the
map into a feature vector. A softmax function maps this vector into a
probability distribution over the states of a category variable. Fig 1.5
illustrates the structure of a CNN.7
In 1590 William Short, the same who ten years later bought
Rose Field, purchased of John Vavasour two messuages, two gardens
and four acres of land, with appurtenances, in St. Giles.[490] The
precise position of the property is not stated, but from evidence
which will be referred to, it is known that it lay to the west of Drury
Lane, and comprised The Greyhound inn in Broad Street, with land
to the south lying on both sides of what is now Short’s Gardens.
A portion of this property he leased,[491] in 1623–
4, to Esmé Stuart, Earl of March (afterwards Duke of
Lennox), for a term of 51 years as from Michaelmas,
1617. It is possible to ascertain within a little the
boundaries of this part of the Short estate. In a deed[492]
dated 10th January, 1614–5, relating to Elm Field, the
land lying between Castle Street and Long Acre, the Esmé
Stuart,
northern boundary is stated to be “certain closes called Seigneur
by the name of Marshlands alias Marshlins, and a D’Aubigny,
garden sometime in the tenure of William Short or his Duke of
assignes”; and in a later deed,[493] dated 2nd February, Lennox.
1632–3, relating to a portion of the same field, the
northern boundary, said to be 249 feet distant from
Long Acre, is referred to as “a way or back lane of 20 feet adjoining
the garden wall of the Right Honble. the Duchess of Lenox.”
The distance of the “back lane” from Long Acre corresponds
exactly with that of the present Castle Street, and it is therefore clear
that this was the southern boundary. The property afterwards came
into the possession of the Brownlow family, and an examination of
the leases which were granted in the early part of the 18th century,
shows that it reached as far as Drury Lane on the east and Short’s
gardens on the north. On the west it stretched as far as Marshland.
[494]
Whether the house leased to the Earl of March was one of the
two (the other being The Greyhound) purchased by Short in 1590, or
a house quite recently built, there is no evidence to show.
The Earl, in February, 1623–4, succeeded to the dukedom of
Lennox, and on 30th July of the same year he died. His widow[495]
continued to reside at the house. Letters from her, headed “Drury
Lane,” and dating from 1625 to 1629, are extant,[496] and she also, in
1628, joined with other “inhabitants adjoining the house of the
Countess of Castlehaven, in Drury Lane,” in a petition to the Privy
Council.[497] There is, therefore, ample evidence that she actually
resided at the house.
In 1632 she married James Hamilton, second Earl of
Abercorn, and died on 17th September, 1637, leaving to her husband,
in trust for their son James, “all that my capitall house, scituate in
Drury Lane.”[498]
The Earl sold the remainder of the lease[499] to the Duchess’s
cousin, Adrian Scroope, who apparently let the house, as the Subsidy
Roll for 1646 shows the “Earl of Downe” as occupying the premises.
[500]
In 1647 Sir Gervase Scroope, Adrian’s son, sold the lease to Sir
John Brownlow,[499] who certainly acquired the freehold also, though
no record of the transaction has come to light. Finding the house too
large[501] Sir John divided it in two, and in 1662 Lady Allington[502]
was paying a rent of £50 for the smaller of the two residences.[499] Sir
John died in November, 1679. By his will[503] (signed 10th April,
1673) he left to his wife all the plate, jewels, etc. “which shall be in
her closett within or neare our bedd chamber at London in my house
at Drury Lane ... and the household stuffe in the said house, except
all that shall then be in my chamber where the most part of my
bookes and boxes of my evidences are usually kept, and except all
those in the same house that shall then be in the chamber where I
use to dresse myselfe, both which chambers have lights towardes the
garden.” He also left to his wife “that part of my house in Drury Lane
which is now in my own possession for her life if she continue my
widowe,” together with “that house or part of my house wherein the
Lady Allington did heretofore live, ... by which houses I meane yards,
gardens and all grounds therewith used”; and moreover the furniture
“of two roomes in my house in Drury Lane where I use to dresse
myself, and where my evidences and bookes are usually kept.”
The estate afterwards came into the hands of Sir
John Brownlow, son of his nephew, Sir Richard
Brownlow, who at once took steps to develop the
property, letting plots on building lease for a term of
years expiring in 1728. Except in one case, information is
not to hand as to the date on which these leases were
Brownlow. granted, but in that instance it is stated to be 21st May,
1682,[504] a date which may be regarded as
approximately that of the beginning of the development of the
interior part of the estate by building,[505] though at least a part of the
frontages to Drury Lane and Castle Street had been built on before
1658 (see Plate 3).
At the same time (circ. 1682) apparently Lennox House was,
either wholly or in part, demolished. A deed of 1722[506] relates to the
assignment of two leases of a parcel of ground “lately belonging to
the capital messuage or tenement of Sir John Brownlow then in part
demolished, scituate in Drury Lane, in St. Giles, sometime called
Lenox House.” The description is obviously borrowed from the
original leases, since reference is also made to “a new street there
then to be built, intended to be called Belton Street,” which street
was certainly in existence in 1683.[507] What is apparently Lennox
House is shown in Morden and Lea’s Map of 1682 as occupying a
position in the central portion of the estate, with a wide approach
from Drury Lane, and this is to a certain extent confirmed by the
tradition that the first Lying-In Hospital in Brownlow Street
(occupying the site of the present No. 30) was a portion of the
original building. It is remarkable, however, that no hint of a house
in this position is given either in Hollar’s Plan of 1658 (Plate 3) or in
Faithorne’s Map of the same date (Plate 4).
The name of Brownlow Street was in 1877 altered to Betterton
Street.
XLVII.–XLVIII.—Nos. 24 and 32,
BETTERTON STREET.
General description and date of
structure.
No. 24, Betterton Street, dating from the 18th century, must at
one time have been a fine residence, but there is now nothing in it to
record. The doorcase is illustrated on Plate 35.
No. 32 also dates from the 18th century. Attached to these
premises is a boldly recessed carved wooden doorcase of interesting
design, illustrated on Plate 36. The interior of the house contains a
wood and compo chimney piece of some interest in the front room of
the ground floor, and one of white marble, relieved with a little
carving and red stone inlay, in the corresponding room on the floor
above.
Condition of repair.
The houses are in fair repair.
Biographical notes.
The sewer ratebook for 1718 shows “John Bannister” in occupation of
No. 32. This was probably John Bannister, the younger, “who came from an
old St. Giles’s family, his father having been a musician, composer and
violinist, and his grandfather one of the parish waits. He himself was in the
royal band during the reigns of Charles II., James II., William and Mary,
and Anne, and played first violin at Drury Lane theatre, when Italian operas
were first introduced into England.”[508]
In the Council’s collection are:—
No. 24, Betterton Street—General exterior (photograph).
[509]No. 24, Betterton Street—Entrance doorway (measured
drawing).
[509]No. 32, Betterton Street—Entrance doorway (photograph).