You are on page 1of 1

Bootstrapping for Syntactic Categories

John Hewitt Jordan Kodner Mitch Marcus Charles Yang


Department of Linguistics and Computer and Information Science, University of Pennsylvania
Problem Method Results Conclusion
• Children learn syntactic categories very From word frames to category frames All results measured by 1-to-1 accuracy • Cognitive and perceptual cues (a la
early and use them well (Valian 1986, (Reeder et al. 2013, Schuler et al. in press) compared with baseline (initial seeds only)
semantic bootstrapping) can be
Yang 2013)
A1 B2 effectively combined with distributional
X1 Exp 1: Child-directed English
• Whether categories are innate (Chomsky linguistic cues (a la lexical and category
A2 B3
1965) or emergent (Chomsky 1955), they A1 B1
• 86 salient seed words from the Chicago frames)

must be acquired from language specific A2 B1 early language corpus (e.g., selected this, • Robustness of results with respect to
X2 A2 X1,2,3 B2
distributional information.
A3 B3
not the, as a determiner seed)
seed selection (few seeds, ambiguity
A3 B3
• Popular methods in cognitive modeling • 220,000 words mapped to 7 categories
allowed)

A1 B1
(e.g., Redington et al. 1996) and X3 • The online learning algorithm leads to
A3 B2 • 25-fold cross-validation

unsupervised NLP (e.g., Haghighi & Klein developmental predictions and can
2006) are computationally expensive yet lexical and category frames • Accuracy 62.44% (baseline: 20.5%)
incorporate other mechanisms of word
still perform poorly
the_N_is: (the_is, D_is, the_V, D_V)⟹N learning (e.g., Lederer et al. 1999,
Exp 2: WSJ/Chinese Treebanks
• Simple and empirically motivated Stevens et al. 2016)

Two frame-based distance metrics • Haghighi & Klein (2006): large word vectors

methods (Mintz 2000) do not scale well • Future work: incorporation of structural
(Chemla et al. 2009)
Identity KL-clusters • Top 3 words per category (WSJ 45, Chinese information (e.g., morphology, non-local
w∈C, AwB, AxB
KLAB(w, x) clusterers
33): current model uses only text
contexts, abstract syntax)
• Need an effective model with better
⟹x∈C
⟹x∈argmin(x, C)

connections to language development.


online with offline still efficient
HK2006 Current CRF
developmental support (quadratic time) Selected References
WSJ 68.8% 55.6% 41.3%
Bootstrapping for Categories Chemla et al. (2009) Developmental
seeds Science. Haghighi & Klein (2006) EMNLP.
• Semantic bootstrapping: prototypical CTB 39.0% 46.7% 34.4%
exemplars for syntactic categories (dog, N) Lederer et al. (1999) Cognition. Mintz
(Grimshaw 1979, Pinker 1984)
(ran, V) Exp 3: KL-clusters (2002) Cognition. Naigels (1990) Language
• Strongly supported by analysis of child- (red, A)
Acquisition. Parkes, Malek, & Marcus
• A range of languages (top 1000 words only)
(1998) ACL. Redington, Chater, & Finch
directed input (Rondal & Cession 1990)

• Hierarchical clustering (Parkes et al. 1998)


(1996) Cognitive Science. Reeder,
• Distributional learning: young children
Frequent Classifiers Newport, & Aslin (2013) Cognitive
can use distributional regularities, • Scoring by types

A_B ⟹ C Psychology. Rondal & Cession (1990) J.


including word order and morphology,
both lexical and category
Child Language. Schuler et al. (in press)
to identify the categories of novel words # seeds Baseline Accuracy
frames are used Language Learning and Development Shi
(Brown 1957, Naigels 1990, Mintz 2002, CHILDES 100 10.0% 70.4% & Melacon (2010) Infancy. Stevens et al.
Shi & Melacon 2010)

WSJ 89 8.9% 76.2% (2016) Cognitive Science. Valian (1986)


• Proposal: starting from a small set of AxB Cognitive Psychology. Yang (2013) PNAS.
labeled words, iteratively construct CTB 85 8.5% 73.1%
classifiers on the basis of distributional
German 35 3.5% 52.9%
distance with the seeds
x∈C? Acknowledgements
• Seed set increases and classifiers go Indonesian 34 3.4% 71.8%
Partially funded by the DARPA LORELEI program
from concrete to abstract
(x, C) YES
Spanish 34 3.4% 62.8% under Agreement No. HR0011-15-2-0023.

You might also like