You are on page 1of 213

Systems That Learn

The MIT Press Series in Learning, Development, and Conceptual


Change
Lila Gleitman, Susan Carey, Elissa Newport, and Elizabeth Spelke,
editors

Names for Things: A Study in Human Learning, by John Macnamara, 1982


Conceptual Change in Childhood, by Susan Carey, 1985
"Gavagai!" or the Future History of the Animal Language Controversy,
by David Prcmack, 1985
Systems That Learn: An Introduction to Learning Theory for Cognitive and
Computer Scientists, by Daniel N. Osherson, Michael Stob, and Scott
Weinstein, 1986
Systems That Learn
An Introduction to Learning Theory for Cognitive and
Computer Scientists

Daniel N. Osherson
Michael Stob
Scott Weinstein

A Bradford Book
The MIT Press
Cambridge, Massachusetts
London, England
Second printing, 19 8 5

CQ 1986 by The Massachusetts Institute of Technology

All rights reserved. No part of this book may be reproduced in any form by any electronic or
mechanical means (including photo copying, recording, or information storage and retrieval)
withot:t permission in writing from the publisher.

This book was set in Times New Roman by Asco Trade Typesetting Ltd., Hong Kong,
and printed and bound by Halliday Lithograph in the United States of America

Library of Congress Cataloging-in-Publication Data

Osherson, Daniel N.
Systems that learn.

(The MIT Press series in learning, development, and conceptual change)


"A Bradford book."
Bibliography: p.
Includes indexes.
l. Learning-Mathematical aspects. 2. Learning, Psychology of. 3. Human
information processing-Mathematical models. L Stob, Michael. IL Weinstein, Scott.
HI. Title. IV. Series.
BFJI801B 1986 153.1'5'015113 85-19759
ISHN 0-262-15030-1
ISBN 0-262-65024-X (Paperback)
lilT Pre&&

0262150301
OSHERSON
SYST THAT LEARN
Contents

Series Foreword xi
Preface xiii
Acknowledgments XV

How to Use This Book xvii


Introduction

I IDENTIFICATION 5

1 Fundamentals of Learning Theory 7


1.1 Learning Paradigms 7
1.2 Background Material 8
1.2.1 Functions and Recursive Functions 8
1.2.2 Recursively Enumerable Sets 10
1.3 Identification: Basic Concepts 11
1.3.1 Languages 12
1.3.2 Hypotheses 13
1.3.3 Environments 13
1.3.4 Learners 14
1.4 Identification: Criterion of Success 17
1.4.1 Identifying Texts 17
1.4.2 Identifying Languages 18
1.4.3 Identifying Collections of Languages 19
1.5 Identification as a Limiting Process 22
1.5.1 Epistemology of Convergence 22
*1.5.2 Self~Monitoring Learning Functions 23

2 Central Theorems on Identification 25


2.1 Locking Sequences 25
2.2 Some Unidentifiable Collections of Languages 27
2.3 A Comprehensive, Identifiable Collection of Languages 29
2.4 Identifiable Collections Characterized 30
2.5 ldentifiabili ty of Single~ Valued Languages 31
vi Contents

3 Learning Theory and Natural Language 34


3.1 Comparative Grammar 34
3.2 Learning Theory and Linguistic Development 36
3.2.1 How Many Grammars for the Young Child? 36
3.2.2 Are the Child's Conjectures a Function of Linguistic
Input? 37
3.2.3 What Is a Natural Language? 38
3.2.4 Idealization 40

II IDENTIFICATION GENERALIZED 43

4 Strategies 45
4.1 Strategies as Sets of Learning Functions 45
4.2 Computational Constraints 47
4.2.1 Com potability 47
4.2.2 Time Bounds 50
4.2.3 On the Interest of Nonrecursive Learning Functions 53
4.3 Constraints on Potential Conjectures 53
4.3.1 Totality 53
4.3.2 Nontriviality 54
4.3.3 Consistency 56
4.3.4 Prudence and r.e. Roundedness 59
4.3.5 Accountability 61
*4.3.6 Simplicity 63
4.4 Constraints on the Information Available to a Learning
Function 66
4.4.1 Memory-Limitation 66
*4.4.2 Set-Driven Learning Functions 73
4.5 Constraints on the Relation between Conjectures 74
4.5.1 Conservatism 75
4.5.2 Gradualism 77
4.5.3 Induction by Enumeration 78
*4.5.4 Caution 79
*4.5.5 Decisiveness 80
Contents VII

4.6 Constraints on Convergence 82


4.6.1 Reliability 82
4.6.2 Confidence 86
4.6.3 Order Independence 88
*4.7 Local and Nonlocal Strategies 92

5 Environments 96
5.1 Order and Content in Natural Environments 96
5.2 Texts with Blanks 96
5.3 Evidential Relations 98
5.4 Texts with Imperfect Content 100
5.4.1 Noisy Text 100
5.42 Incomplete Text 103
*5.4.3 Imperfect Text 105
5.5 Constraints on Order 106
5.5.1 Ascending Text 106
5.5.2 Recursive Text 107
*5.5.3 Nonrecursive Text 109
*5.5.4 Fat Text 110
5.6 Informants 113
5.6.1 Informants and Characteristic Functions 113
5.6.2 Identification on Informant 115
*5.6.3 Memory-Limited Identification on Informant 116
5.7 A Note on "Reactive" Environments 118

6 Criteria of Learning 119


6.1 Convergence Generalized 119
6.1.1 Convergence Criteria 119
6.1.2 Identification Relativized 120
6.2 Finite Difference, Intensional Identification 123
6.2.1 FINT-Identification on Text 123
6.2.2 FINT-Identification on Imperfect Text 125
6.2.3 FINT-Identification in RE,. 126
6.3 Extensional Identification 129
viii Contents

6.3.1 EXT-Identification in RE 130


6.3.2 EXT-Identification in RE•• 1 132

*6.3.3 Finite Difference, Extensional Identification 133


*6.4 Bounded Extensional Identification 134
6.4.1 BEXT -Identification in RE 135
6.4.2 BEXT-Identification in REsvt 137
6.4.3 Bounded Finite Difference Extensional Identification 138
*6.5 Finite Difference Identification 139
6.5.1 FD-Identification in RE 139
6.5.2 FD-Identification in RE9• 1 140
6.5.3 Bounded Finite Difference Identification 141
*6.6 Simple Identification 143
6.7 Summary 144
6.8 Characteristic Index Identification 145
6.8.1 CI-Convergence 145
6.8.2 CI-Identification on Text and on Informant 147
*6.8.3 Variants of CI-Identification 149

7 Exact Learning 152


7.1 Paradigms of Exact Learning 152
*7.2 A Characterization of [.9'' text, INT]"'
00
, 155
7.3 Earlier Paradigms Considered in the Context of Exact
Learning 157
7.3.1 Strategies and Exact Learning 157
*7.3.2 Environments and Exact Learning 159
7.3.3 Convergence Criteria and Exact Learning 160
*7.4 Very Exact Learning 160
7.5 Exact Learning in Generalized Identification Paradigms 162

III OTHER PARADIGMS OF LEARNING 165

8 Efficient Learning 167


8.1 Text-Efficiency 167
8.2 Text-Efficient Identification 170
Contents ix

8.2.1 Text-Efficient Identification in the Context of% 170


8.2.2 Text-Efficient Identification and Rational Strategies 171
8.2.3 Text-Efficient Identification in the Context of%"' 172
8.2.4 Text-Efficiency and Induction by Enumeration 174
*8.2.5 Text-Efficiency and Simple Identification 175
8.3 Efficient Identification 176

9 Sufficient Input for Learning 178


9.1 Locking Sequences as Sufficient Input 178
9.2 Recursive Enumerability of LS• 179
*9.3 Predictability in Other Learning Paradigms 180

*10 Topological Perspective on Learning 182


10.1 Identification and the Baire Space 182
10.2 Continuity of Learning Functions 183
10.3 Another Proof of Proposition 2.1A 184
10.4 Locking Texts 185
10.5 Measure One Learning 186
10.5.1 Measures on Classes of Texts 186
10.5.2 Measure One Identifiability 187
10.5.3 Uniform Measures 188
10.6 Probabilistic Learning 190

Bibliography 195
List of Symbols 198
Name Index 201
Subject Index 203
Series Foreword

This series in learning, development, and conceptual change will include


state-of-the-art reference works, seminal book-length monographs, and
texts on the development of concepts and mental structures. It will span
learning in all domains of knowledge, from syntax to geometry to the social
world, and will be concerned with all phases of development, from infancy
through adulthood.
The series intends to engage such fundamental questions as
The nature and limits of learning and maturation: the influence of the
environment, of initial structures, and of maturational changes in the
nervous system on human development; learnability theory; the problem
of induction; domain specific constraints on development.
The nature of conceptual change: conceptual organization and conceptual
change in child development, in the acquisition of expertise, and in the
history of science.

Lila Gleitman
Susan Carey
Elissa Newport
Elizabeth Spelke
Preface

It is a familiar observation that an organism's genotype may be conceived


as a function that maps potential environments into potential phenotypes.
Relativizing this conception to cognitive science allows human intellectual
endowment to be construed as a particular function mapping early experi-
ence into mature cognitive competence. The function might be called
'"human nature relative to cognition." Learning theory is a mathematical
tool for the study of this function. This book attempts to acquaint the
reader with the use of this tool.
Less cryptically, learning theory is the study of systems that map evi-
dence into hypotheses. Of special interest are the circumstances under which
these hypotheses stabilize to an accurate representation of the environment
from which the evidence is drawn. Such stability and accuracy are conceived
as the hallmarks oflearning. Within learning theory, the concepts "evidence,"
"stabilization," ~·accuracy," and so on, give way to precise definitions.
As developed in this book, learning theory is a collection of theorems
about certain kinds of number-theoretic functions. We have discussed the
application of such theorems to cognitive science and epistemology in a
variety of places (e.g., Osherson, Stob, and Weinstein, 1984, 1985, 1985a;
Osherson and Weinstein, 1982a, 1984, 1985). In contrast, the present work
centers on the mathematical development of learning theory rather than
on empirical hypotheses about human learning. As an aid to intuition,
however, we have attempted to concretize the formal developments in this
book through extended discussion of first language acquisition.
We have not tried to survey the immense field of machine inductive
inference. Rather, we have selected for presentation just those results that
seem to us to clarify questions relevant to human intellectual development.
Several otherwise fascinating topics in machine learning have thus been
left aside. Our choices no doubt reflect tacit theoretical commitments not
universally shared. An excellent review of many topics passed over here is
provided by Angluin and Smith (1982). Our own previously published
work in the technical development of learning theory (e.g., Osherson and
Weinstein, 1982, 1982a; Osherson, Stob, and Weinstein, 1982, 1982a, 1985)
is entirely integrated herein.
Our concern in the present work for the mathematical development of
learning theory has resulted in rigorous exposition. Less formal introduc-
tions to the central concepts and topics of learning theory are available in
Osherson and Weinstein (1984) and Osherson, Stob, and Weinstein (1984).
We would be pleased to receive from our readers comments and correc-
tions, as well as word of new results.
Acknowledgments

Our principal intellectual debts are to the works of E. Mark Gold and
Noam Chomsky. Gold (1967) established the formal framework within
which learning theory has developed. Chomsky's writings have revealed
the intimate connection between the projection problem and human
intelligence. In addition, we have been greatly influenced by the research of
Blum and Blum (1975), Angluin (1980), Case and Smith (1983), and Wexler
and Culicover (1980). Numerous conversations with Lila Gleitman and
with Steven Pinker have helped us to appreciate the bearing of learning
theory on empirical studies of first language acquisition, and conversely
the bearing of first language acquisition studies on learning theory. We
thank them for their patient explanations.
Preparation of the manuscript was facilitated by a grant to Osherson
from the Fyssen Foundation for 1983-84 and by National Science Founda·
tion Grants MCS 80-02937 and 82-00032 to Stob. We thank these agencies
for their support.
How to Use This Book

Mathematical prerequisites for this text include elementary set theory


and an intuitive understanding of the concept "computable function."
Lewis and Papadimitriou (1981) provide an excellent introduction to this
materiaL Acquaintance with the elementary portion of recursion theory is
also advisable. We recommend Machtey and Young (1978).
Starred ~aterial in the text is of more advanced character and may
be omitted without loss of continuity. We have relegated considerable
exposition to the exercises, which should be at least attempted.
Definitions, examples, lemmas, propositions, open questions, and exer-
cises are numbered independently within the section or subsection in which
they appear. Thus proposition 4.4.1B refers to the second proposition of
section 4.4.1; it appears before lemma 4.4.1A, the first lemma of the same
section. Symbol, subject, and name indexes may be found at the end of the
book.
We use standard set-theoretic notation and recursion-theoretic nota-
tion drawn from Rogers (1967) throughout. Note that c denotes proper
inclusion, whereas <:::::: denotes (possibly improper) inclusion.
Systems That Learn
Introduction

Let us play a game.


We have selected a set of numbers, and you must guess the set that we
have in mind. The set consists of every positive integer with a sole exception.
Thus the set might be {2, 3, 4, 5, ... ) or {I, 3, 4, 5, ... ) or {1, 2, 4, 5, ... ), etc.
We will give you an unlimited number of clues about the set, and you are
to guess after each clue. We will never tell you whether you are right.
First clue: The set contains the number I.
Please guess the set we have in mind. Would you like to guess the set
{2, 3,4, 5, ... }? (That would be unwise.)
Second clue: The set contains the number 3.
Please make another guess. How about {I, 2, 3, 4, 5, 6, 8, 9, 10, ... ) or does
that seem arbitrary to you?
Third clue: The set contains the number 4.
Go ahead and guess.
Fourth clue: The set contains the number 2.
Does the fourth clue surprise you? Guess again.
Fifth clue: The set contains the number 6.
Guess.
Sixth clue: The set contains the number 7.
Guess.
Seventh clue: The set contains the number 8.
Guess.
We interrupt the game at this point because we would like to ask you
some questions about it.
First question: Are you confident about your seventh guess? Give an
example of an eighth clue that would lead you to repeat your last guess.
Give an example of an eighth clue that would lead you to change your
guess.
Second question: Let us say that a "guessing rule" is a list of instructions
for converting the clues received up to a given point into a guess about the
set we have in mind. Were your guesses chosen according to some guessing
rule, and if so, which one?
Third question: What should count as winning the game? Consider the
following criterion: You win just in case at least one of your guesses is right.
This criterion makes winning the game too easy. Say why.
Fourth question: We advocate the following criterion: You win just in
case you eventually make the right guess and subsequently never change
your mind regardless of the new clues you receive. In this case let us say
2 Introduction

that you "win in the limit." Is it possible to win the game in the limit even
though you make one hundred wrong guesses? Is there any number of
wrong guesses that is logically incompatible with winning the game in the
limit'/
Fifth question: Suppose that all the clues we give you are of the form:
The set contains the number n. Suppose furthermore that for every positive
integer i, we eventually give you a clue of this form if and only if i is in
fact contained in the set we have in mind. (So for every number i in our set,
you are eventually told that the set contains i; also you receive no false
information about the set.) Do not suppose anything about the order in
which you will get all these clues. We will order them any way we please.
(Recall how we surprised you with the fourth clue.) Now let us call a
guessing rule "winning" just in case the following is true. If you use the rule
to choose your guesses, then no matter which of the sets we have in mind,
you are guaranteed to win the game in the limit. Specify a winning guessing
rule for our game.
Sixth question: We make the game harder. This time we are allowed to
select any of the sets that are legal in the original game, but we may also
select the set { l, 2, 3,4, 5, 6, ... ) of all positive integers. The rules about clues
are the same as given in question 5. Play this new game with a friend, and
then think about the following question. Is there a winning guessing rule
for the new game?
Seventh question: Let us make the last game easier. The choice of sets is
the same as in the last game, but we now agree to order our clues in a
certain way. For all positive integers i andj, if both i andj are included in
the set we have in mind, and if i is less thanj, then you will receive the clue
••The set contains C' before you receive the clue "The set containsj." Can
you specify a winning guessing rule for this version of the game?
Eighth question: Here is another variant. We select a set from the original
collection (thus the set {l, 2, 3,4, 5, ... ) of all positive integers is no longer
allowed). Clues can be given in any order we please. You get only one guess.
You may wait to see as many clues as you like, but your first guess is
definitive. Play this game with a friend. Then show that no matter what
rule you use to make your guess, you are not guaranteed to be right. Think
about what happens if you are allowed two guesses in the game.
The games we have been playing resemble the process of scientific
discovery. Nature plays our role, selecting a certain pattern that is imposed
on the world. The scientist plays your role, examining an endless series
Introduction 3

of clues about this pattern. In response to the clues, the scientist emits
guesses. Nature never says whether the guesses are correct. Scientific
success consists of eventually offering a correct guess and never deviating
from it thereafter. Language acquisition by children can also be construed
in terms of our game. The child's parents have a certain language in mind
(the one they speak). They provide clues in the form of sentences. The child
converts these clues into guesses about the parents' language. Acquisition
is successful just in case the child eventually sticks with a correct guess.
The similarity of our game to these and other settings makes it worthy
of more careful study. We would like to know which versions of the
game are winnable and by what kinds of guessing rules. Research on these
questions began in the 1960s by Putnam (1975), Solomonoff (1964), and
Gold (1967). These initial investigations have given rise to a large literature
in computer science, linguistics, philosophy, and psychology. This body
of theoretical and applied results is generally known as learning theory
because many kinds of learning (e.g., language acquisition) can be con-
strued as successful performance in one of our games.
In this book we attempt to develop learning theory in systematic fashion,
presupposing only basic notions from set theory and the theory of compu-
tation. Throughout our exposition, definitions and theorems are illustrated
by consideration of language acquisition. However, no serious application
of the theory is described.
The book is divided into three parts. Part I advances a fundamental
model of learning due essentially to Gold (1967). Basic notation, termi-
nology, and theorems are there presented, to be relied on in all subsequent
discussion. In part II these initial definitions are generalized and varied in
dozens of ways, giving rise to a multitude oflearning models and theorems.
We attempt to impose some order on these results through a system of
notation and classification. Part III explores diverse issues in learning
theory that do not fit neatly into the classification offered in part II.
I IDENTIFICATION

We begin our presentation ofJeaming theory by introducing its most basic


ideas. The concept of a "learning paradigm" is the first topic of chapter I;
a particularly important learning paradigm is then presented in detail.
Chapter 2 provides hasic theorems about the learning paradigm described
in chapter t. The final chapter of this part is devoted to the interpretative
problems that arise in applying learning theory to linguistic development
in children.
1 Fundamentals of Learning Theory

1.1 Learning Paradigms

Learning typically involves


1. a learner,
2. a thing to be learned,
3. an environment in which the thing to be learned is exhibited to the
learner,
4. the hypotheses that occur to the learner about the thing to be learned on
the basis of the environment.

Learning is said to be successful in a given environment if the learner's


hypotheses about the thing to be learned eventually become stable and
accurate. To fix our subject matter, let us agree to call something "learning"
just in case it can be described in roughly these terms.
Language acquisition by children is an example of learning in the in-
tended sense. Children are the learners; a natural language is the thing to be
learned; the corpus of sentences available to the child is the relevant
environment; grammars serve as hypotheses. Language acquisition is com-
plete when the child's shifting hypotheses about the ambient language
stabilize to an accurate grammar.
By a (learning) paradigm we mean any precise rendition of the basic
concepts oflearningjust introduced. Thus a paradigm provides definitions
corresponding to 1 through 4 and advances a specific criterion of successful
learning. The latter requires; at minimum, definition of the notions of
stability and accuracy as used earlier. Alternative learning paradigms thus
ofTer alternative conceptions of learners, environments, hypotheses, and so
forth. Learning theory is the study of learning paradigms.
In 1967 E. M. Gold introduced a paradigm that has proved to be
fundamental to learning theory. This paradigm is called identification. All
the other paradigms to be discussed in this book may be conceived as
generalizations of identification. The present chapter defines the identifica-
tion paradigm, thereby laying the foundation for all subsequent develop-
ments. We proceed as follows. Section 1.2 provides essential background
concepts and terminology. Section 1.3is devoted to the construal of items 1
through 4 within the identification paradigm. The relevant criterion of
successful learning is given in section 104. Section 1.5 discusses an essential
feature of identification and related paradigms.
8 Identification

1.2 Background Material

1.2.1 Functions and Recursive Functions

We let N be the set {O, 1, 2, ... } of natural numbers. The set of all functions
(partial or total) from N to N is denoted ff. Following standard mathemat-
ical practice, members of ff will be construed as sets of ordered pairs of
numbers satisfying the "single-valuedness" condition. Single valued ness
specifies that no two pairs of numbers with the same first coordinates may
occur in the same function. There are nondenumerably many functions in
ff. We let the symbols qJ, rj;, e, cp', ... , represent possibly partial functions in
ff. The symbols f, q, h.]', .. . , are reserved for total functions in ff.1f qJ E!F
is defined on x E N, we sometimes write cp(x)!. Otherwise, we write cp(x)j.
It will often be useful to construe individual numbers as "tuples" of
numbers. This is achieved as follows. For each n E N we select some comput-
able isomorphism between N· (i.e., the n-fold Cartesian product on N) and
N. For Xl' x 2, ... , x.EN, (X l,X2,""X.) denotes the image under this
function of the (ordered) n-tuple (x l, X2,... , x.). In using this notation, the
reader should keep the following facts in mind (illustrating with n = 2):

1. For all x, yEN, (x,y) is a number, but (x,y) is an ordered pair of


numbers.
2. There is an effective procedure for finding (x,y) on the basis of x, YEN.
3. There is an effective procedure for finding both x and y on the basis of
(x,y) EN.

For A, B s;;; N, welet A x B = {<X,Y)IXEA and ye B}. Note that A x B


is a set of numbers, not a set of ordered pairs as in the usual definition of
A x B, the Cartesian product of A and B. We also introduce "projection
functions," JT1 and JT2 with the property that for all x E N, (JT 1 (x), JT2(X» = x,
Thus JT1 "picks out" the first coordinate of the pair coded by x; JT2 picks
out the second coordinate.
The set of recursive functions (partial or total) from N to N is denoted:
ffrc<. ffree is a denumerable subset of !!F. The members of ffrcc may be
thought of as those functions that are calculable by machine. "Machines"
can be understood as Turing machines, LISP programs, or any other
canonical means of computation. For concreteness we shall occasionally
invoke Turing machines to explain various definitions and results; however,
any other programming system would serve equally well.
We wish now to assign code numbers to the partial recursive functions.
Fundamentals of Learning Theory 9

This is achieved by listing the members of !Free and using ordinal positions
in the list as code numbers. To be useful, however. this listing of y;rce must
meet certain conditions, specifically:
DEFINITION 1.2.1A An acceptable indexing of !Free is an enumeration lfJo,

'PI' 'P2' ... , of (all of) $'m that meets the following conditions:

i. For some ifJ E$'m, ifJ((i, x») = 'Pi(X) for all i, x E N.


ii. For some total S E !Free,

for all i, m, Xl.···' Xm , YI•... ' YnEN.


Part ii of the definition allows us to parameterize the first m arguments with
respect to the ith partial recursive function.
Relative to a given acceptable indexing lfJo. fIJI' ... ' flJj,"" fIJi is referred to
as the partial recursive function of index i. Intuitively i may be thought of as
the code for a program that computes 'Pi' Indeed, one acceptable indexing
of !Free results from enumerating all Turing machines (or other canonical
computing agents) in lexicographical order and taking 'Pi to be the function
computed by the ith Turing machine in the enumeration. This indexing
orders Turing machines by their size (measured in number of symbols),
resolving ties by recourse to some arbitrary alphabetization of Turing
machine notation. We assume that Turing machines are specified by a finite
string of symbols drawn from a fixed, finite alphabet. The reader may
safely adopt this size interpretation of indexes, since none of the results in
this book depend on which acceptable indexing of $"" is selected. This
invariance is a consequence of the following result.

LEMMA 1.2.IA (Rogers, 1958) Let 'Po, 'PI"'" and ifJo, ifJ" ... , be any two
acceptable indexings of :!Free. Then there is a one-one, onto, total f E g;ree
such that 'Px = ifJf{X) for all x E N.
Proof See Machtey and Young (1978, theorem 3.4.7). 0

Thus any two acceptable indexings of the partial recursive functions are
identical up to a recursive isomorphism. We now fix on some specific
acceptable indexing of $"" (of the reader's choice). Indexes are henceforth
interpreted accordingly.
The following simple result will be useful in subsequent developments.

LEMMA 1.2.1B For all i E N, the set UI'Pi = 'Pi} is infinite.


10 Identification

Proof See Machtey and Young (1978, proposition 3.4.5). 0

Lemma 1.2.1 B reflects the fact that any computable function can be pro-
grammed in infinitely many ways (e.g., by inserting redundant instruc-
tions into a given program). If ({J; = ({Jj, we often say that i and j are
equivalent.

1.2.2 Recursively Enumerable Sets

For all i E N the domain of ({Ji is denoted: W;. As a consequence of lemma


1.2.1B, for all i o E N the set {jIlt] = W;o} is infinite. A set S ~ N is called
recursively enumerable (or r.e.) just in case there is i E N such that S = rt';; in
this case i is said to be an indexfor S (there are infinitely many indexes for
each r.e. set). Intuitively a set S is r.e. just in case there is a mechanical
procedure P (called a positive test) such that for all x E N, P eventually halts
on input x if and only if XES; indeed, the program coded by i serves this
purpose for rt';. Construed another way, the r.e. sets are those that can be
"generated" mechanically such as by a grammar.
The class of all r.e. sets is denoted: RE. Thus RE = {rt';li EN}.
Three special kinds of r.e. sets will often be of interest. These are presented
in definitions 1.2.2A through 1.2.2E. SERE is called finite just in case S
has only finitely many members; it is called infinite otherwise.

DEFINITION 1.2.2A The collection of all finite sets is denoted: RE fi n •

SERE is called recursive just in case SERE.

DEFINITION 1.2.2B The collection of all recursive sets is denoted: RE r c c '


It can be shown that RE f in C REm eRE. RE r c • may also be characterized
as follows.

DEFINlTlON 1.2.2C f E:iF is said to be the characteristicfunction for S ~ N


just in case for all x E N,

_
f( x} -
{O, if XES,
-
I, if XES.

It is not difficult to prove that S <;; N is recursive if and only if its character-
istic function is recursive. Intuitively S E REmjust in case there is a mechan-
ical procedure (called a test) that eventually responds "yes" to any input
drawn from S and eventually responds "no" to any other input (thus a test,
unlike a positive test, is required to respond to every input).
Fundamentals of Learning Theory 11

Turning to the third special kind ofr.e. set, recall from section 1.2.1 that
each n E N represents a unique ordered pair of numbers, namely the pair (i,j)
such that <i,j) = n. Accordingly:

DEFINITION 1.2.20

i. S,;; N is said to represent the set {(x,Y)I<X,Y>ES) of ordered pairs.


ii. S s; N is called single valued just in case S represents a function.
iii. A single-valued set is said to be totaljust in case the function it represents
is total.

Equivalently S is single valued just in case for all x, Y, zEN, if <x,Y> ESand
(x, a) ES, then Y ~ z. Plainly a single-valued set S represents the function <p
defined by the condition that for all x, yE N, <p(x) ~ Y if and only if
<x,y> ES. A single-valued set S is total just in casefor all x E N there is yE N
such that <X,y>ES.

DEFINITION 1.2.2E The collection of all single-valued, total, r.c, sets is


denoted: RE svt.

Exercises

l.2.2A Let S ~ N be single valued, and suppose that S represents CPE.'F. Show that
rec
a. tpE.'F ifand only if SERE.
b. tp is total recursive if and only if S is total and r.e.
c. if SERE and S is total, then S is recursive.
1.2.28
a. Prove: Let J E!IF be the characteristic function for S £; N. Then S E RE rec if and
only if some TE REsv, represents J.
b. Show that there is a total recursive function J such that for all i E N, if ltj E REsvl>
then tpf(i) is the characteristic function for Jt;.
c. Prove: REsv, c; RErec .

1.3 Identification: Basic Concepts

We now consider items I through 4 of section 1.1 as they are construed


within the paradigm of identification. We begin with 2.
12 Identification

1.3.1 Languages

Identification is intended as a model of language acquisition by children,


so languages are the things to be learned. In the model languages are
conceived in a manner familiar from the theory of formal grammar (see
Hopcroft and Ullman, 1979,ch. I) where a sentence is taken to be a finite
string of symbols drawn from some fixed, finite alphabet. A language is then
construed as a subset of the set of all possible sentences. This definition
embraces rich conceptions of sentences, for which derivational histories,
meanings, and even bits of context are parts of sentences. Since finite
derivations of almost any nature can be collapsed into strings of symbols
drawn from a suitably extended vocabulary, it is sufficiently general to
construe a language as the set of such strings. Simplifying mallers even
further, it is useful to conceive of the strings of a language (collapsed
derivations or otherwise) as single natural numbers; this is appropriate in
light of simple coding techniques for mapping strings univocally into
natural numbers (for discussion, see Rogers, 1967, sec. 1.10). In this way a
language is conceived as a set of natural numbers.
But not just any subset of N counts as a language within the identification
paradigm. Since natural languages are considered to have grammars, and
since grammars are intertranslatable with Turing machines, we restrict
attention to the recursively enumerable subsets of N-that is, to RE.
Henceforth in this book the term "language" is reserved for the r.e, sets. We
use the symbols L, L', ... , to denote languages.
In sum, within the identification paradigm what is learned are languages,
where languages are taken to be r.e. subsets of N (equivalently, members of
RE).
It is interesting in this context to consider the significance of single-
valued languages. Some linguistic theories envision the relation between
underlying and superficial representations of a sentence as a species of
functional dependence, different natural languages implementing different
functions of this kind (e.g., Wexler and Culicover, 1980). It is assumed
moreover that contextual clues give the child access to the underlying
representation of a sentence as well as to its superficial structure. On this
view a sentence is understood as an ordered pair of representations, under-
lying and superficial, and competence for a language consists in knowing
the function that maps one representation onto the other (variants of this
basic idea are possible). All of this suggests that empirically faithful models
Fundamentals of Learning Theory 13

of linguistic development construe natural languages as certain kinds of


single-valued sets.
Single-valued languages are also the appropriate means of representing
various learning situations distinct from language acquisition. For
example, a scientist faced with an unknown functional dependency can be
conceived as attempting to master a single-valued language selected arbit-
rarily from a set of theoretical possibilities.
For these reasons we shall often devote special attention to single-valued
languages, treating them separately from arbitrary r.e. sets.

1.3.2 Hypotheses

Languages construed as r.e. subsets of N, it is natural to identify the


learner's conjectures (item 4) with associated Turing machines. In turn, we
may appeal to our acceptable indexing of the partial recursive functions
(section 1.2.1) and identify Turing machines with indexes for r.e. sets (i.e.,
with N itself). Thus within the identification paradigm the number j is the
hypothesis associated with the language Wi (and with the language Jtj, if
Jtj = Wi)·
1.3.3 Environments

We turn now to item 3.

DEFINITION 1.3.3A A text is an infinite sequence i o, j I ' ... , of natural


numbers. The set ofnumbers appearing in a text t is denoted: rng(t). A text is
said to befor a set S ~ N just in case rng(t) = S. The set of all possible texts
is denoted: 3.

Example 1.3.3A

c = 0, 0, 2, 2, 4, 4, 6, 6, ... is a text. Since rng(c) = {O, 2, 4, 6, ... }, c is a text for the


language consisting of the even numbers.

Let t E3 be for L ERE. Then every member of L appears somewhere in t


(repetitions allowed), and no member of L appears in t. There are non-
denumerably many texts for a language with at least two elements. There
is only one text for a singleton language (i,e., a language consisting of only
one element). There are no texts for the empty language.
14 Identification

Within the identification paradigm an environment for a language Lis


construed as a text for L. We let the symbols r, s, t, r', ... , represent texts.
From the point of view oflanguage acquisition, texts may be understood
as follows. We imagine that the sentences of a language are presented to the
child in an arbitrary order, repetitions allowed, with no ungrammatical
intrusions. Negative information is withheld-that is, ungrammatical
strings, so marked, are not presented. Each sentence of the language eventu-
ally appears in the available corpus, but no restriction is placed on the order
of their arrival. Sentences are presented forever; no sentence ends the series.
The foregoing picture of the child's linguistic environment is motivated
by recent studies of language acquisition. Brown and Hanlon (1970), for
example, give reason to believe that negative information is not systemati-
cally available to the language learner. Studies by Newport, Gleitman, and
Gleitman (1977) underline the relative insensitivity of the acquisition pro-
cess to variations in the order in which language is addressed to children.
And Lenneberg (1967) describes clinical cases revealing that a child's own
linguistic productions are not essential to his or her mastery of an incoming
language.
The following asymmetrical property of texts is worth pointing out. Let
t E g- and n E N be given. If nE rng(t), then examination of some initial
sezment of t suffices to verify the presence of n in t once and for all. On the
other hand, no finite examination of t can definitively verify the absence of
n from t (since n may turn up in t after the finite examination).

1.3.4 Learners

We turn, finally, to item 1. Consider a child learning a natural language. At


any given moment the child has been exposed to only finitely many sen-
tences. Yet he or she is typically willing to conjecture grammars for infinite
languages. Within the identification paradigm the disposition to convert
finite evidence into hypotheses about potentially infinite languages is the
essential feature of a learner. More generally, the relation between finite
evidential states and infinite languages is at the heart of inductive inference
and learning theory.
Formally, let t E g- and n E N be given. Then the nth member of t is
denoted: tn. The sequence determined by the first n members of t is denoted:
tn. The sequence t" is called the finite sequence of length n in t. Note that for
any text t, to is the unique sequence of length zero, namely the empty
sequence which we identify with the empty set 0. The set of all finite
Fundamentals of Learning Theory 15

sequences of any length in any text is denoted: SEQ. SEQ may be thought of
as the set of all possible evidential states (e.g., the set of all possible finite
corpora of sentences that could be available to a child). We let the symbols
a, T, X, a', ... , represent finite sequences.
Now let a ESEQ. The length of a is denoted: lh(a). The (unordered) set of
sentences that constitute a is denoted: rng(u). We do not distinguish num-
bers from finite sequences of length 1. As a consequence of the foregoing
conventions, note that aE SEQ is in t E:!T if and only if a = "t;h(u)'

Example 1.3.4A

a. Let t ~ 0, 0, 2, 2, 4, 4, .... Then to = t 1 = 0, t 7 = 6, I 2 = (0,0) (but t 2 = 2), and


t 4 = (0,0,2,2) (but t 4 = 4). Moreover Ih(t7 ) = 7, Ih(T4) = 4, Ih(t4 ) = 1, rng(t4 ) =
{0,2}, and to = II = O. T = (2,2,4) is not in t because t does not begin with T.
b. Let 0" = (5,2,2,6,8). Then lh(O") = 5, and rngfo] = {S,2,6,8}.

With evidential states now construed as finite sequences and conjectures


construed as natural numbers (section 1.3.2), learners may be conceived as
functions from one to the other, that is, as functions from SEQ to N. Put
differently, learning may be viewed as the process of converting evidence
into theories (successful learning has yet to be defined). However, rather
than taking learners to be functions from SEQ to N, it will facilitate later
developments to code evidential states as natural numbers. Thus we choose
some fixed, computable isomorphism between SEQ and N and interpret, as
needed, the number n as some unique member of SEQ. None of our results
depend on which computable isomorphism between SEQ and N is chosen
for this purpose. Officially then a learning function is a member of!F (i.e., a
function from N to N) where the domain of the function is to be thought of
as the set of all possible evidential states and the range as the set of all
possible hypotheses. A learning function may be partial or total, recursive
or nonrecursive. A "learner" is any system that embodies a learning func-
tion. Learning theory thus applies to learners indirectly via the learning
functions they implement.
To talk about learning functions, we need a notation for the mapping
that codes SEQ as N. It will reduce clutter to allow finite sequences to
symbolize their own code numbers. Thus "c" represents ambiguously a
finite sequence of numbers as well as the single number coding it. No harm
16 Identification

will come of this equivocation. According to our notational conventions,


for <p Eg;, t E /T, and n E N, the term "<p(t.l" denotes the result of applying
q> to the code number of the finite sequence constituting the first n members
of t.
Let <PEg; and aESEQ be given. In conformity with the convention
governing i and L (section 1.2.1), if <p is defined on a, we write: <p(a)L.
Otherwise, we write: <p(a)i. Intuitively <p(a)i signifies that the learner imple-
menting <p advances no hypothesis when faced with the evidence a. This
omission might result from the complexity of a (relative to the learner's
cognitive capacity), or it may arise for other reasons. If <p(a)L we often say
that qJ conjectures W..,(<1) on a,

Example 1.3.48

We provide some sample learning functions j. g, h, cp,if!E9' by describing their


behavior on SEQ. For all aESEQ:
3. I(a) = the least index for the language rng(a). Informally,fbehaves as if its cur-
rent evidential state includes all the sentences it will ever see. Consequently it con-
jectures a grammar for the finite language made up of the elements received to date.
Being parsimonious,f's conjectured grammars are as small as possible (relative to
the acceptable indexing of section 1.2.1). We shall have occasion to refer to this
function many times in later chapters.
b. g(o) = 5. The function 9 has fixed ideas about the language it is observing.
c. h(lT) = the smallest i such that rngfc] ~ Wi. Here h conjectures the first language
(relative to our acceptable indexing) that accounts for all the data it has received.
d. Let E be the set of even numbers.

least index for rngfe), if rngfe] s;;; E,


'P(a) = { t, .
otherwise.
<p is partial.
e.ljI(a)f. Although ljI is the empty partial function, it counts as a learning function.

Exercises

1.3.4A Let L be a nonempty language,and let t be a text for L. Let I be the learning
function of part a of example 1.3.48.

a. Show that LeREfi n if and only if for all but finitely many ne N, rng(tn) =
mg(t.+,).
b. Show that LE RE". if and only if l(l.) = l(t.+1) for all but finitely many n E N.
c.Suppose that there is n E N such that for infinitely many m E N,f(t.) ~ I(t.). Show
that H-!(i"l = L.
Fundamentals of Learning Theory 17

*1.3.4B A text t is called ascending if tn ::::;; tn+ 1 for all n e N; t is called strictly
ascending if'r, < t"+1 for all neN.
a. Let L be a finite language of at least two members. How manyascendingtexts are
therefor L?
b. Let L be an infinite language. How many strictly ascendingtexts are there for L?
These kinds of texts are treated again in section 5.5.1.

1.4 Identification: Criterion of Success

Languages, hypotheses, environments, and learners are the dramatis per-


sonae of learning theory. In section 1.3 we presented their construal within
the identification paradigm. We now turn to the associated criterion of
successful learning. Within the current paradigm successful learning is said
to result in "identification;" its definition proceeds in stages.

1.4.1 Identifying Texts

DEFINITION 1.4.IA Let cpE:!F and tEff be given.

i. cp is said to be defined on t just in case cp(t"H for all n E N.


ii. Let i E N. qJ is said to converge on t to rjust in case (a) cp is defined on t and
(b) for all but finitely many n E N, cp(t") = i.
iii. cp is said to identify tjust in case thereis lEN such that (a) cp converges on
t to i and (b) mg(t) = w,.

Clause ii of the definition may also be put this way: cp converges on t to Ijust
in case cp is defined on t, and there is n E N such that cp(tm ) = i for all m > n.
The intuition behind definition 1.4.IA is as follows. A text t is fed to a
learner lone number at a time. With each new input 1is faced with a new
finite sequence of numbers. I is defined on t if 1offers hypotheses on all of
these finite sequences. If 1is undefined somewhere in t, then I is "stuck" at
that point, lost in endless thought about the current evidence, unable to
accept more data. I converges on t to an index I just in case 1does not get
stuck in t, and after some finite number of inputs 1conjectures i thereafter.
To identify t, I must converge to an index for mg(t).
Let cpE:!F identify tEff. Note that definition 1.4.1A places no finite
bound on the number of times that cp "changes its mind" on t. In other
words, the set {n E NIcp(t") ,. CP(t"+l)} may be any finite size; it may not,
18 Identification

however, be infinite. Similarly the smallest no E such that N = rng(t) W."",,


may be any finite number. It is also permitted that for some n, = W.,,",
rng(t), but W."",,, ".
rng(t). In other words, q> may abandon correct conjec-
tures, although q> must eventually stick with some correct conjecture.

Example 1.4.IA

a. Let t be the text 2, 4, 6, 6, 6, 6, .... Let f E g; be as described in part a of example


L3.4B. On Io.f conjectures the least index for 0; on It.! conjectures the least index
for {2}; on £2'f conjectures the least index for {2, 4}; on T" for n ~ 3, f conjectures the
least index for {2,4,6}. Thus f converges on t to this latter index. Since rng(t) =
{2,4, 6},f identifies t.
b. Let t be the text 0, 1, 2, 3, 4, 5, .... Let f and 9 be as described in parts a and b of
example 1.3.4B.fis defined on t but does not converge on t, 9 converges on r, and
9 identifies t if and only if Ws = N.
c. Let t be the text 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, .... Let qJ E fF be as described in part d of
example 1.3.4B. rp is defined on til for n ~ 3; it is undefined thereafter. rp is thus not
defined on t.

Exercise

1.4.1A Let t be the text 0, 1.2,3,4, 5, .... Let h be as described in part c of example
1.3.4B. Does h identify t?

1.4.2 Identifying Languages

Children are able to learn their language on the basis of many orderings of
its sentences. Since definition 1.4.1A pertains to individual texts it does not
represent this feature oflanguage acquisition. The next definition remedies
this defect.

DEFINITION 1.4.2A Let q> EY and LE RE be given. q> is said to identify L


just in case q> identifies every text for L.

As a special case of the definition every learning function identifies the


empty language, for which there are no texts.
Let q> EY identify LE RE, and let sand t be dilTerent texts for L. It is
consistent with definition 1.4.2A that q> converge on sand t to dilTerent
indexes for L. Likewise q> might require more inputs from s than from t
before emitting an index for L.
Fundamentals of Learning Theory 19

Example t.4.2A

a. Let lEg: be as described in part a of example 1.3.48. Let L = {2,4,6}.Given any


text t for L, there is some no EN such that L = rng(l..) for all m 2. no. Hence, for all
m 2. no, f(T,.J = f(lm+'> and ""fam ) = rng(t). Hence I identifiesany such t. Hence f
identifies L.
b. Let gEg: be as described in part b of example 1.3.48. 9 identifies a language L if
and only if 5 is an index for L.
c. Let no be an index for L = {O, I}. Let h e g: be defined as follows: for all a E SEQ
no, if a does not end in 1,
h(a) = {
Ih(a), otherwise.
h identifies every text for L in which 1 occurs only finitely often; no other texts are
identified. h does not identify L.

1A.3 Identifying Collections of Languages

Children are able to learn any arbitrarily selected language drawn from a
large class; that is, their acquisition mechanism is not prewired for just a
single language. Definiton 1.4.2A does not reflect this fact. We are thus led
to extend the notion of identification to collections of languages.

DEFINlTION 1.4.3A Let qJ Eff be given, and let Y £ RE be a collection of


languages. qJ is said to identify Y just in case qJ identifies every LE 2. Ii'is
said to be identifiable just in case some qJ E ff identifies Y.

We let!i', Y', ... , represent collections oflanguages. As a special case of the


definition, the empty collection of languages is identifiable.
Every singleton collection {L} oflanguages is trivially identifiable. To see
this, let no be an index for L, and define / E:iF as follows. For all (1 E SEQ,
/«(1) = no' Then / identifies L, and hence / identifies {L} (compare part b of
example 1.4.2A). In contrast, questions about the identifiability of collec-
tions of more than one language are often nontrivial, for many such
questions receive negative answers (as will be seen in chapter 2). Such is the
consequence of requiring a single learning function to determine which of
several languages is inscribed in a given text.
The foregoing example also serves to highlight the liberal attitude that we
have adopted about learning. The constant function / defined above iden-
tifies ~o but exhibits not the slightest "intelligence" thereby (like the man
who announces an imminent earthquake every morning). Within the
20 Identification

identification paradigm it may thus be seen that learning presupposes


neither rationality nor warranted belief but merely stable and true conjec-
tures in the sense provided by the last three definitions. Does this liberality
render identification irrelevant to human learning? The answer depends on
both the domain in question, and the specific criterion of rationality to hand.
To take a pertinent example, normal linguistic development seems not to
culminate in warranted belief in any interesting sense, since natural lan-
guages exhibit a variety of syntactic regularities that are profoundly under-
determined by the linguistic evidence available to the child (see Chomsky
1980, 1980a, for discussion). Indeed, one might extend this argument (as
does Chomsky 1980) to every nontrivial example of human learning, that is,
involving a rich set of deductively interconnected beliefs to be discovered by
(and not simply told to) the learner. In any such case of inductive inference,
hypothesis selection is subject to drastic underdetermination by available
data, and thus selected hypotheses, however true, have little warrant. We
admit, however, that all of this is controversial (for an opposing point of
view, see Putnam 1980), and even the notion of belief in these contexts
stands in need of clarification (see section 3.2.4). In any case we shall soon
consider paradigms that incorporate rationality requirements in one or
another sense (see in particular sections 4.3.3, 4.3.4, 4.5.1, and 4.6.1).
To return to the identification paradigm, the following propositions
provide examples of identifiable collections of languages.

PROPOSITION 1.4.3A RE r i n is identifiable.

Proof Let f Eff be the function defined in part a of example 1.3.4B. By


consulting part a of example 1.4.2A, it is easy to see that f identifies every
finite language. 0

PROPOSITION 1.4.3B Let.:e = {N - {x} IXEN}. Then 2 is identifiable.


Proof We define 9 E ff which identifies .:e as follows. Given any 0' E SEQ,
let x; be the least x E N such that x ¢ rngtc). Now define g(o) = the least
index for N - {x,,}. It is clear that g identifies every LE 2, for given Xo E N
and any text t for N - {xo }, there is an n such that rng(T.) ~ to, 1, ... ,
Xo - I}. Then for all m :;;:: n, g(tm ) = the least index for N - {xo}. 0

PROPOSITlON 1,4,3C RE sv! is identifiable.

Proof The key property of RE svt is this. Suppose that Land L' are
members of RE sv• and that L =f: L'. Then there are x, Y, y' E N such that
Fundamentals of Learning Theory 21

(x, Y) E L, (x, y') E L', and y # y'. Thus, if t is a text for L, there is an n E N
such that by looking at T" we know that t is not a text for L'.
N ow we define h E:F which identifies RE", as follows. For all U E SEQ, let

least i such that W, ERE"" and rng(u) s; w" if such an i exists,


h(u) ~ { 0,
otherwise.

Informally, h guesses the first language in RE,,, that is consistent with a.


By our preceding remarks, given a text t for LE REsvt> h will eventually
conjecture the least index for L having verified that t is not a text for any
L' with a smaller index. 0

Exercises

t.4.3A Let t E:T and total f E:F be given.

a. Show that if f converges on t, then {f(tll)ln EN} is finite. Show that the converse
is false.
b. Show that if f identifies t, then Wf(1 ) = rng(r) for all but finitely many n EN. Show
that the converse is false. n

1.4.38 Let 2' = {N} U {EIE is a finite set of even numbers}. Specify a learning
function that identifies ze'.
l.4.3C Prove: Every finite collection of languages is identifiable. (Hint: Keep in
mind that a finite collection of languages is not the same thing as a collection of
finite languages.)
1.4.3D Let LE RE be given. Specify 'PE:F that identifies {LU DID finite}.
l.4.3E Let {Sili E N} be any infinite collection of nonempty, mutually disjoint
members ofREr e c ' Let zz = {N - S;fieN}. Specifiy a learning function that iden-
tifies 2.
t.4.3F Given ce', 2" s; RE, let:£' x ff' be {L x L'ILeff and L'e2"}. Prove: If
2,:.e's RE are each identifiable, then :.e x :£" is identifiable.
t.4.3G
a. Prove: !l' ~ RE is identifiable if and only if some total f e § identifies .fE.
b. Let t e Y and <p e § be given. We say that <p almost identifies t just in case there
exists an iEN such that (a) H'I = rng(t) and (b) <p(t/l) = i for all but finitely many
n e N. (Thus <p can almost identify t without being defined on t.) <p almost identifies
:£' s; REjust in case tp almost identifies every text for every language in 2'. .fE is said
to be almost identifiable just in case some <p e.:F almost identifies z", Prove:!t' ~ RE
is almost identifiable if and only if!t' is identifiable.
22 Identification

1.4.3H cpeff is said to P percent identify.IE £; REjust in case for every Le!£ and
every text t for L, cp is defined on t,and there is j E N such that (a)"'i = Land (b) there
is neN such that for all m > n, <p(I) = i for P percent of {jIm ~j ~ m + 99}.
!f £ RE is said to be P percent identifiable just in case some cpefF P percent
identifies !e. Prove
3. if P > 50, then Ie £ RE is P percent identifiable if and only if 2' is identifiable.
*b. if P ::; 50, then there is 2' £ RE such that :£ is P percent identifiable but 2' is
not identifiable.
I.4.31 cp E § is said to identify it' £ RE laconically just in case for every Le!l! and
every text t for L there is n E N such that (a) Wrp(lh) = L and (b) for all m > n, cp(Tm)t.
Prove: :l' ~ RE is identifiable if and only if!£' is identifiable laconically.
1.4.3J The property of REs"l used in the proof of proposition 1.4.3C is that if L.
L' E RE SY" L oF L', and t is a text for L, then there is an n e N such that ell is enough to
determine that t is not a text for L'. Show that there are identifiable infinite
collections oflanguages without this property.
1.4.3K Let <pe:!F be given. We define 2'(<p) to be {LeREI<p identifies L}.
a. Let .pe:!F be such that for all ereSEQ, .p(er) ~ the least nerng(er). Characterize
2'(.p).
b. Show by example that for <p,.p e:!F, 2'(<p) ~ 2'(.p) does not imply <p = .p.

1.5 Identification as a Limiting Process

1.5.1 Epistemology of Convergence


Let <peg; identify tefT, and let n e N be given. We say that <p begins to
converge on t at moment n just in case n is the least integer such that (1)
W.(i., = rng(t) and (2) for all m> n, <p(tm ) = <pet,). Now let feg; be as
defined in part a of example 1.3.4B, and let t be a text for a finite language.
Then f identifies t. What information about t is required in order to
determine the moment at which f begins to converge on t? It is easy to see
that no finite initial segment t n of t provides sufficient information to
guarantee thatf's conjectures on t have stabilized once and for all. Simply
no such t, excludes the possibility that t,+10 ¢ rng(t,), in which case
fIt,+) ,) "" f(t,). Thus, although f in fact begins to converge on t at some
definite moment n, no finite examination of t provides indefeasible grounds
for determining n. (Compare the last paragraph of section 1.3.3.)
More generally, identification is said to be a "limiting process" in the
sense that it concerns the behavior of a learning function on an infinite
Fundamentals of Learning Theory 23

subset of its domain. For this reason Gold (1967) refers to identification as
"identification in the limit." Because of the limiting nature of identification,
the behavior of a given learning function cp on a given text t cannot in
general be predicted from cp's behavior on any finite portion of t. The
underdetermination at issue here does not arise from the disadvantages
connected with the "external" observation of a learning function at work.
To make this clear, the next subsection discusses learning functions that
announce their own convergence and may thus be considered to observe
their own operation.

Exercise

I.S.IA Let!,f ~ RE be identifiable, let {ao,... ,a m } be a finite subset of SEQ, and


let {io,'''' in} be a finite subset of N. Show that there is tpE SF such that (a) <p
identifies !,f and (b) <p(UO) = i o, .. " <plan) = in'

*1.5.1 Self-Monitoring Learning Functions


DEFINITION 1.5.2A (after Freivald and Wiehagen 1979) Let eo be an index
for the empty set. A function cp E g; is called self-monitoring just in case
for all texts t, if sp identifies t, then (a) there exists a unique n EN such that
cp(cn ) = eo, and (b) for i > n, cp(c;) = rp(cn+tl.

Intuitively, a learner I is self-monitoring just in case it signals its own


successful convergence, where the (otherwise useless) index eo serves as the
signal. Note that once 1announces eo, 1's next conjecture is definitive for t.
I might be pictured as examining its own conjectures prior to emitting them.
If and when I realizes that it has successfully determined the contents of t, I
signals this fact by announcing eo on the current input, reverting thereafter
to the correct hypothesis. The following proposition is suggested by our
earlier remarks.

PROPOSITION L5.2A N a self-monitoring learning function identifies RE nn.

Proof Let cp E § be self-monitoring, and let t be any text for some


LERErin . Then there is nEN such that cp(tn) = eo, and for all m > n,
cp(tn+1) = cp(tm ). Let X o be a fixed integer such that Xo i L. Let t' be the text
such that (a) C~+l = cn + 1 and (b) for all m > n, t~ = x o' Since rp(t~) = cp(cn ) =
24 Identification

eo, we must have ",(t~) = ",(t;+,) = ",(tn+,) for all m <: n. But ",(tn+,) is an
index for L, whereas t' is a text for LU {x o}. Hence", does not identify
LU {x o}, and so '" does not identify RE nn. D

Proposition 1.5.2A shows that identifiability does not entail identifi-


ability by self-monitoring learning function. Informally, a learner may iden-
tifya text without it being possible for her to ever know that she has done so.

Exercises

1.5.2A Let L. L' E RE be such that L c: L'. Show that no self-monitoring learning
function identifies {L, L'}.
1.5.28 Let:e be the collection of languages of Proposition 1.4.3B. Show that no
self-monitoring learning function identifies !.f.

*
t.5.2C Call a collection :t' of languages easily distinguishable just in case for all
Le!f' there exists a finite subset S of Lsuch that for all L' E fe, if L' L, then S 1. L'.
a. Specify an identifiable collection of languages that is not easily distinguishable.
b. Prove: Let it' £; RE be given. Then some self-monitoring rp E!iF identifies 2 ifand
only if 2' is easily distinguishable.
I.S.2D <p E!F is said to be a I-learner just in case for all t E 3 there exists no more
than one mEN such that cp("t:.:J -=F- wCL.--,). That is. a I-learner is limited to no more
than one "mind change" per text.
a. Prove: If 2 ~ RE is identifiable by a self-monitoring learning function, then 2 is
identifiable bv a l-Iearner.
b. Show that the converse to part a is false.
1.5.2E Let i E N. cp E!F is said to be an i-learner just in case for all texts t there exist
no more than i numbers m such that cp(Tm) 0:1 cp(Tm+l)' That is, an i-learner is limited
to no more than i "mind changes" per text.
a. Forj E N define.2j ~ {{OJ, {O, t}, {O, 1,2}, ... , {O, t, 2, ... ,j}}. Prove: For alljE N,
2j is identifiable by an i-learner if and only if i > j. (Hint: Suppose that cp E SF is an
i-learner and that i ::; i. Consider texts of the form 0, 0, ... ,0, 1, I, ... , 1•... .i.i. ... .],
j, .... What happens as the repetitions get longer and longer?)
b. For iEN, let F; be the class of r-leamers. Let F = UiFj. Show that no «p e F
identifies RE r i n • Show that no cpE F identifies {N - {xll.c e N}.
1.5.2F Let e be an index for 0. if> E:Y is said to be a one-shot learner just in case for
every text I the cardinality of (op(lnlltp(l.) '" e} '" 1. Let 2' <;; RE be given. Show
that some one-shot learner identifies :e if and only if some self-monitoring learning
function identifies 2.
2 Central Theorems on Identification

Within the paradigm of identification the learnability of a collection of


languages amounts to its identifiability. Propositions 1.4.3A through
1.4.3C provide examples of learnable collections. In this chapter we give
examples of unlearnable collections.

2.1 Locking Sequences

Many of the theorems in this book rest on the next result. To state and
prove it, we introduce some more notation. For a, r E SEQ, let (J Ar be the
result of concatenating r onto the end of ,,-thus (2,8,2) A (4, 1,9,3) =
(2,8,2,4,1,9,3). Next, for a, tESEQ we write "o S t," if" is an initial
segment of r, and "er c; t," if a is a proper initial segment of t-thus
(8, 8, 5) c; (8,8,5,3,9).
Finally, let finite sequences aO. a 1, (T2, . . . • be given such that (1) for every i,
j E Neither a' S;; a! or a! S;; a' and (2)forevery n E N, thereis mEN such that
Ih("m) ~ n. Then there is a unique text t such that for all n EN, 0'" = 1;h'U"';
this text is denoted: Unrr",

PROPOSITION 2.IA (Blum and Blum 1975) Let <pEff identify LE RE. Then
there is "E SEQ such that (i) rng(a) s L. (ii) WO'U) = L, and (iii) for all
r ESEQ, if rng(r) S L, then <p(" A r) = <p(a).
ProoJ(We follow Blum and Blum.) Assume that the proposition is false;
that is, that there is no "E SEQ satisfying (i), (ii), and (iii). This implies the
following condition:

(*) For every XESEQ such that rng(x) S Land WO'X) = L, there is
some r ESEQ such that rngtr) S Land <p(X A r] #- <p(x).

We show that (*) implies the existence of a text t for L which <p does not
identify, contrary to the hypothesis that <p identifies L. Let s = so, S" S2,""
be a text for L. We constructt in stages 0,1,2, ... , at each stage n specifying a
sequence a" which is in t.

Stage 0 Let aD E SEQ be such that rng(aO) ~ Land Wq>(<70) = L; a O must


exist since qJ identifies L.
Stagen + 1 Given 0"'\ there are two cases. If WIp{O''') =F L,let (T1l+l = a" A Sn'
Otherwise, by (*), let r E SEQ be such that rngtr) S Land <p(,," A r) #- <p(,,").
Let o-n+l = an 1\ T 1\ Sn.
26 Identification

We observe that 0"1 c 0"1+1 for all i E N. Let t = Un


rr". t is a text for Lsince
s, is added to t at stage n + 1 and no nonmembers of L are ever added to t.
Finally, cp does not converge on t to an index for L since for every neither
Wcp(<J") -# Lor cp(u" A 1") -# <p(u n ). D
Intuitively, if cp Eff identifies LE RE, then proposition 2.1.A guarantees
the existence of a finite sequence 0" that "locks" <p onto a conjecture for L in
the following sense: no presentation from L can dislodge cp from cp(u). This
suggests the following definiton.

DEFINITION 2.1A Let LE RE, sp E ff, and a E SEQ be given. (J is called a


locking sequencefor Land <p just in case (i) rngfo) ~ L, (ii) Wrp(<J) = L, and (iii)
for all r E SEQ, if rng(r) £:: L, then cp(a II r) = cp(u).

Thus proposition 2.1A can be put this way: if cp E ff identifies LE RE, then
there is a locking sequence for cp and L.
As a corollary to the proof of proposition 2.IA, we have the following.

COROLLARY 2.lA Let cpEff identify LERE. Let uESEQ be such that
rng(a) c::; L. Then there is r E SEQ such that a
A 1" is a locking sequence for <p

and L.

Proof Just as in the proof of proposition 2.1A, if the corollary fails, we


could construct a text t for L which cp fails to identify. Central to this
construction is the following condition, which is analogous to (*), that holds
if the corollary fails:

(**) For every X ;;2 a, X E SEQ such that rng(x) S Land Wrpw = L, there is
some r E SEQ such that rng(r) c::; Land cp(x A r] -# cp(x).

The construction of t proceeds exactly as in the proof of proposition 2.1A,


except that at stage 0 we also require 0"0;;2 a. D

Note that proposition 2.lA does not characterize cp's behavior on ele-
ments drawn from I. In particular, if r E SEQ is such that rng(r) i LE RE,
then even if 0" E SEQ is a locking sequence for cp Eff and L, <p(a A r) may well
differ from <p(0").
Central Theorems on Identification 27

Example 2.IA

a. Let f E.? be as described in part a of example I.3.4B. Let L ~ {2,4,6}.Then one


lockingsequencefor f and Lis (2,4,6); another is (6,4,2,6). Indeed, it is easy to see
that for all "E SEQ, o is a locking sequencefor f and rngfc].
b. Let gEg: be as described in the proof of proposition 1.4.3B. Let L =
{0,2,3,4, ... }. Then (22,8,4,0) is a locking sequence for 9 and L. Indeed, any
U E SEQ such that 0 E rng(u) and 1 ¢ rng(u) is a locking sequence for g and L.

Exercises

2.IA Let (T be a locking sequence for cp E!F and LE RE. Let r e SEQ be such that
rng(r) ~ L. Show that a 1\ or is a locking sequence for cp and L. Distinguish this result
from corollary 2.1A.
2.1B Refute the converse to proposition 2.IA. In other words, exhibit CPE:¥,
LE RE, and a E SEQ such that a is a locking sequence for qJ and L, but cp does not
identify L.
2.lC Let cP E:¥ identify LE RE. Let t be a text for L. t is called a locking text for cp
and Ljust in case there exists nEN such that til is a locking sequence for cP and L.
Provide a counterexample to the following conjecture: If qJE!F identifies LERE,
then every text for L is a locking text for cP and L.

2.2 Some Unidentifiable Collections of Languages

Proposition 2.IA may now be used to show that certain simple collections
of languages are unidentifiable.

PROPOSITION 2.2A

i. (Gold 1967). RE n" U {N} is not identifiable.


ii. Let !I? = (N - {x}lxEN}. Then!l?U {N} is not identifiable.

Proof

i. Suppose for a contradiction that 'fJ E.? identifies RE n" U {N}, and let a be
a locking sequence for 'fJ and N. Note that rng(a) E RE n". Clearly there is a
text t for rngtc) such that I;h(a) = a. But then 'fJ does not identify rng(a) since
'fJ converges on t to an index for N.
ii. Again, suppose that a is a locking sequence for 'fJ E.? and N, where 'fJ
identifies !I? U {N}. Choose x ¢ rngte). Then, on any texlt for N - {x} such
that I 1h(a ) = a, 'fJ converges to an index for N and not one for N - {x}, 0

Proposition 2.2A should be compared with propositions 1.4.3A and 1.4.3B.


28 Identification

The following fact is evident and often very useful.

LEMMA 2.2A Suppose that :E s RE is not identifiable. Then no superset


of:E is identifiable.

From lemma 2.2A and proposition 2.2A we have corollary 2.2A.

COROLLARY 2.2A RE is not identifiable.

Corollary 2.2A should be compared with proposition 1.4.3C.


Since the collections of languages invoked in proposition 2.2A consist
entirely of recursive languages, we also have corollary 2.2B.

COROLLARY 2.2B REn~c is not identifiable.

Exercises

2.2A (Gold 1967) Let L be an arbitrary infinite language. Show that REno U {L} is
not identifiable.
2.28 Let!f ~ RE be such that for every O'ESEQ there is LE2' such that
rng(u) ~ Land L =1= N. Show that.seU {N} is not identifiable. (This abstracts the
content of proposition 2.2A)
2.2C
a. Let ioEN be given. Define 2! = {N - DID ~ N has exactly i o members}. Show
that :£ is identifiable.
b.Letio,joENbesuchthatio :;i:jo.Define.se = {N - DID s:; Nhaseitherexactlyi o
members or exactly io members}. Prove that.se is not identifiable.
2.2D Exhibit cp, t/J E.:F such that .2"'(cp) U !£(t/J) is not identifiable. (For notation. see
exercise 1.4.3K.) This shows that the identifiable subsets of RE are not closed under
umon.

2.2E
a. Let .Ie s; RE be an identifiable collection of infinite languages. Show that there is
some infinite L¢!R such that .Ie U {L} is identifiable. (Hint: First use proposition
2.IA to argue that if LoE.Ie, then there is an xoEL o such that if L = L o - {x o}.
then L is not a member of ft'. Next define a function i/J E S' that identifies ft' U {L)
by modifying the output of a function cp E:IF that identifies .:e.)
b. .:e f; RE is called saturated just in case .Ie is identifiable and no proper superset
of 2' is identifiable. Prove: 2' s; RE is saturated if and only if 2' = RE f i n •
*2.2F :/ s;:IF is said to team identify!R ~ REjust in case for every LE!f there is
cp E:/ such that t:p identifies L. Show that no finite subset of:IF team identifies RE.
(See also exercise 6.2.11.)
Central Theorems on Identification 29

2.3 A Comprehensive, Identifiable Collection of Languages

The collections oflanguages exhibited in proposition 2.2A are so simple as


to encourage the belief that only impoverished subsets of RE are identifi-
able. The next proposition shows this belief to be mistaken. To state it, a
definition is required.

DEFINITION 2.3A Let L, L' E RE be such that (L - L') U(L' - L) is finite.


Then Land L' are said to be finite variants (of each other).

That is, finite variants dilTer by only finitely many members. Thus
E U {3, 5, 7} and E - {2,4, 6, 8} are finite variants, where E is the set of even
numbers. Note that any pair of finite languages are finite variants.

PROPOSITION 2.3A (Wiehagen 1978) There is f£ <:: RE such that (i) for
every LE RE there is L' E f£ such that Land L' are finite variants and (ii) f£
is identifiable.

Proposition 2.3A asserts the existence of an identifiable collection that is


"nearly all" of RE. To prove the proposition, two important lemmata are
required.

LEMMA 2.3A (Recursion Theorem) Let total f E § ' " be given. Then there
exists nE N such that f.Pn = f.Pf(IlJ (and so Hj(n) = ~).
Proof See Rogers (1967, sec. 11.2, theorem I). 0

DEFINITION 2.3B

i. LE RE is said to be self-describing just in case the smallest x ELis such that


L~ Wx '
ii. The collection {LE REI L is self-describing} is denoted: RE'd'

LEMMA 2.3B For every LE RE there is L' E RE'd such that Land L' are
finite variants.

Proof Fix LE RE. Define a recursive function f by the condition that for
all nEN, !-Ij,,) = (LU {n})n {n,n + I,n + 2, ... }. That such an f exists is a
consequence ofthe definition of acceptable indexing definition 1.2IA(ii). To
see this, if L= ltio' there isjotN such that for all n, XEN:

'!',o(X), if x > n,
'!'jo«n,x» = I, if x = n,
{
i, if x < n.
30 Identification

Now by definition 1.2.1A(ii) there is a function 9 such that q>g,(j,•• ),(x) =


q>j,«n,x». By setting fIn) = g((jo,n» for all nEN, J.!j,.,
has the desired
properties.
Now by the recursion theorem there is n E N such that w" = J.!j,.,
=
(LU{n})n{n,n+l,n+2, ... }. Clearly w" is self-describing and is a
finite variant of L. 0

Proof of Proposition 2.3A By lemma 2.3B it suffices to show that RE'd is


identifiable. But this is trivial. Define f Eff such that for all UESEQ,J(u)
is the smallest number in rng(u). Then f identifies RE'd' 0

Exercises

2.3A Show that for no Le RE does REsd include every finite variant of L.
2.38 Specify 2' ~ REsuchthat (a) forall L, L' E 2', if L '" L', thenLand L' are not
finite variants, and (b)!L' is not identifiable.

2.4 Identifiable Collections Characterized

The next proposition provides a necessary and sufficient condition for the
identifiability of a collection of languages.
PROPOSITION 2.4A (Angluin 1980) .:£ s; RE is identifiable if and only iffor
all LE.:£ there is a finite subset D of Lsuch that for all L' E.:£, if D s; L', then
L'¢L.

Proof First suppose that .:£ s; RE is identifiable, and let q> Eff witness
this. By proposition 2.IA, for each LE.:£ choose a locking sequence uLfor q>
and L. Since for each LE 2, rng(CTL) is a finite subset of L, it suffices to prove
that for all L' E.:£, ifrng(uL ) S; L', then L' ¢ L. Suppose otherwisefor some
L, L' E.:£, and let t be a text for L' such that 1,"'" = uL . Then q> converges on
t to L # L' = rng(t). Thus q> fails to identify L', contradicting our choice
of tp,
For the other direction, suppose that for every LE.:£ there is a finite
DL S; L such that DL S; L' and L' E.:£ implies L' ¢ L. We define f Eff as
follows. For all uESEQ,
Central Theorems on Identification 31

least i such that i is an index for some


j(u) = LE fE such that DL £ rng(u) s; L, if such an i exists,
{
0, otherwise.

To see thatf identifies fE, fix LE 2', and let t be a text for L. Let i be the least
index for L. Then there is an n E N such that

1. rng(tn) :2 Du
2. if j < i, J1.j E se, and L ¢ Wj, then rng(tn ) ¢ H:J.
We claim that j(tm ) = i for all m :::>.: n. By 1 and the fact that t is a text for
L, j will conjecture ion t m unless there is j < i such that Wj = L' E 2' and
DL' s; rng(Tm) c: L'. If rng(tm) 2 DL · , then L 2 DL , so by the condition on
DL" L ¢ L'. But then by 2, rng(tm ) ¢ L'. Thus on t m , j will not conjecture
j for any j < i. D

Exercises

2.4A Specify a collection of finite sets meeting the conditions of proposition 2.4A
with respect to

a. REnno
b. {N - {X}IXEN}.
2.4B
a. Use proposition 2.4A to provide alternative proofs of propositions 1.4.3A, 1.4.3B,
and 1.4.3C.
b. Use proposition 2.4A to provide an alternative proof of proposition 2.2A.

2.5 Identifiability of Single-Valued Languages

Every language may be paired with a structurally identical single-valued


language in the following way.

DEFINITION 2.5A We let S be the function from RE to RE defined as


follows, Forall LE RE, S(L) = {<x, O>lx E L}. For 2' £ RE, we define S(2')
to be {S(L)ILE.2"}.
32 Identification

Example 2.5A

a. Let L be the finite language {2,4,6}. Then SeLl is the finite, single-valued language
{(2,0>, (4,0), (6,0)).
b. SiN) is the set of numbers (x,y> such that y ~ 0. Note that S(N) is total. whereas
for all other LE RE, SiLl is not total.

PROPOSITION 2.5A .fl',:; RE is identifiable if and only if S(.fl') IS


identifiable.

Proof Given <1ESEQ, say <1 = (xo, ... ,x.), define S(<1) = «xo,O), ... ,
<x.,O». Similarly, if <1 ~ «xo,Yo), ... ,<x.,y.»), define P(<1) = (xo, ... ,x.).
Let g, hE.? be such that for all i E N, w,,'i)
= S(W,) and w,,(i) = P(W,).
Now suppose that.fl' ,:; RE is identified by <p E.? Let'" E.? be such that
for all <1 ESEQ,

"'(<1) = g(<p(P(<1))).
It is clear that'" identifies S(.fl').
Similarly, if '" E.? identifies S(.fl'), let <pE.? be such that for all <1ESEQ,

<p(<1) ~ h("'(S(<1))).

Then <p identifies .fl'. 0


The technique used in the foregoing proof is important. It might be called
"internal simulation." For instance, in the first part, l/J works by simulating
the action of <p on a text constructed from the text given to <p.

COROLLARY 2.5A The collection of all single-valued languages is not


identifiable.

Corollary 2.5A should be compared with proposition 1.4.3C.


Proposition 2.5A (along with the method of its proof) shows that the
collection of single-valued languages presents nothing new from the point
of view of identification. In contrast, proposition 1.4.3C shows that the
collection of total, single-valued languages has learning-theoretic prop-
erties that distinguish it from RE. For this reason, when considering
single-valued languages, we shall generally restrict attention to RE"" the
collection of total, single-valued r.e. sets.
What makes RE", identifiable? Recall from section 1.3.3 that texts do
Central Theorems on Identification 33

not, in general, allow the learner to infer directly the nonoccurrence of


sentences. In contrast, ift is a text for an unspecified language in RE s vl , then
for every x E N there is an n E N such that examination oftn is sufficient to
determine whether or not x E rng(t). To see this, suppose that x = <i,j).
Then some number y occurs in t such that y = <i, k) (since rng(t) is total). As
soon as <i, k) appears in t, the question "<i,i) E mgfr)?" can be answered,
for <i,j) E rng(t) just in case j = k (since mg(r) is single-valued). If j =F k,
the presence of <i, k) in t may be thought of as "indirect negative evidence"
for <i,j) in t, in the sense discussed by Pinker (1984). In sum, texts for total,
single-valued languages offer information about both the presence and the
absence of sentences. The learning function h of proposition 1.4.3C exploits
this special property of RE s vl '

Exercises

*2.5A Is there a price for self-knowledge? We restrict attention to recursive


learning functions. Call ip e y;ro< Socratic just in case rp identifies the language
L'I"= {(x,y)lrp(x) = y}. (Since rpeY;"c, L'I'eRE.)

a. Specify a collection if of single-valued languages such that some rp ey;rec


identifies.sf, but no ip ey;rec identifies.sf U {L'I'}' (Hint: See exercise 2.2E.) Conclude
that (recursive) Socratic learning functions are barred from identifying certain
identifiable collections.
b. Prove: Let .sf s; RE sv1 be given. Then some f.{J E y;ree identifies 2 if and only if
some Socratic rp e Y;TCC identifies 2. (Hint: Use the recursion theorem, lemma 2.3A.)
Philosophize about all this.
3 Learning Theory and Natural Language

We interrupt the formal development of learning theory in order to moti-


vate the technicalities that follow. Specifically we attempt to locate learning-
theoretic considerations in the context of theories of the human language
faculty. Toward this end section 3.1 presents the perspective that animates
this book; it derives from Chomsky (1975)and Wexler and Culicover (1980,
ch. 2).Section 3.2 examines several issues that complicate the use oflearning
theory in linguistics.

3.1 Comparative Grammar

Comparative grammar is the attempt to characterize the class of (biologi-


cally possible) natural languages through formal specification of their
grammars; a theory of comparative grammar is such a specification of some
definite collection of languages. Contemporary theories of comparative
grammar begin with Chomsky (e.g., 1957, 1965), but there are several
dilTerent proposals currently under investigation.
Theories of comparative grammar stand in an intimate relation to
theories of linguistic development. If anything is certain about natural
language, it is this: children can master any natural language in a few years
time on the basis of rather casual and unsystematic exposure to it. This
fundamental property of natural language can be formulated as a necessary
condition on theories of comparative grammar: such a theory is true only if
it embraces a collection of languages that is learnable by children.
For this necessary condition to be useful. however. it must be possible to
determine whether given collections oflanguages are learnable by children.
How can this information be acquired? Direct experimental approaches are
ruled out for obvious reasons. Investigation of existing natural languages is
indispensable, since such languages have already been shown to be learn-
able by children; as revealed by recent studies, much knowledge can be
gained by examining even a modest number of languages. We might hope
for additional information about learnable languages from the study of
children acquiring a first language. Indeed, many relevant findings have
emerged from child language research. For example, the child's linguistic
environment appears to be devoid of explicit information about the non-
sentences of her language (see section 1.3.3). As another example, the rules
in a child's immature grammar are not simply a subset of the rules of the
adult grammar but appear instead to incorporate distinctive rules that will
be abandoned later.
Learning Theory and Natural Language 35

However, such findings do not directly condition theories of comparative


grammar. They do not by themselves reveal whether some particular class
of languages is accessible to children, nor whether some other particular
class lies beyond the limits of child learning. Learning theory may be
conceived as an attempt to provide the inferential link between the results of
acquisitional studies and theories of comparative grammar. It undertakes
to translate empirical findings about language acquisition into information
about the kinds of languages accessible to young children. Such informa-
tion in turn can be used to evaluate theories of comparative grammar.
To fulfill its inferential role, learning theory provides precise construals
of concepts generally left informal in studies of child language, notably the
four concepts of Section 1.1 as well as the criterion of successful acquisition
to which children are thought to conform. Each such specification con-
stitutes a distinctive learning paradigm, as discussed in Section 1.1. The
scientificallyinteresting paradigms are those that best represent the circum-
stances of actual linguistic development in children. The deductive conse-
quences of such paradigms yield information about the class of possible
natural languages. Such information in turn imposes constraints on
theories of comparative grammar.
To illustrate, the identification paradigm represents languages as r.e. sets
and environments as texts; children are credited with the ability to identify
any text for any natural language. If normal linguistic development is
correctly construed as a species of identification, then proposition 2.2A
yields non vacuous constraints on theories of comparative grammar; no
such theory, for example, could admit as natural some infinite and all finite
languages.
Unfortunately identification is far from adequate as a representation of
normal linguistic development. Children's linguistic environments, for
example, are probably not arbitrary texts for the target language: on the one
hand, texts do not allow for the grammatical omissions and ungrammatical
intrusions that likely characterize real environments; on the other hand,
many texts constitute bizarre orderings of sentences, orderings that are
unlikely to participate in normal language acquisition. In addition the
identification paradigm provides no information about the special charac-
ter of the child's learning function. To claim that this latter function is some
member of:F is to say essentially nothing at all. Even the criterion of
successfullearning is open to question because linguistic development does
36 Identification

not always culminate in the perfectly accurate, perfectly stable grammar


envisioned in the definition of identification.
The defects in the identification paradigm can be remedied only in light of
detailed information about children's linguistic development. For the most
part, the needed information seems not to be currently available. Conse-
quently we shall not propose a specific model of language acquisition.
Rather, the chapters that follow survey a variety of learning paradigms of
varying relevance to comparative grammar. The survey, it may be hoped,
will suggest questions about linguistic development whose answers can be
converted into useful constraints on theories of comparative grammar.
Our survey of learning paradigms occupies parts II and III of this book.
Before turning to it, we discuss some potential difficulties associated with
the research program just described.

3.2 Learning Theory and Linguistic Development

3.2.1 How Many Grammars for the Young Child?

If cp E g: identifies t E fT, then cp is defined on t (see definition 1.4.1 A); thus


cp(tnH for all n E N. This feature of identification will be carried forward
through almost all of the paradigms to be studied in this book. Yet it is easy
to imagine that newborn infants do not form grammars in response to the
first sentence they hear (perhaps: "It's a boy!"); similarly, bona fide gram-
mars might be lacking during early stages of linguistic production. The
empirical interest oflearning theory might seem to be compromised by this
possibility.
To respond to this problem, we may adopt a new convention concerning
indexes. According to the new convention all indexes are increased by 1,
leaving the number 0 without an associated grammar. Zero may then be
used to represent any output that does not constitute a grammar. Then for
n EN, <p(n) = 0 implies <p(n)!, as before. Plainly, 2? <;:::: RE is identifiable if
and only if 2' is identifiable under the new convention. The result is that
identification of a text t need not be compromised by the failure to conjec-
ture a grammar at early stages of t.
In similar fashion it is possible to envision the following possibility.
Children may respond to linguistic input not with one grammar but with a
finite array of grammars, each associated with some (rational) subjective
probability. To represent this possibility, the numbers put out by learning
Learning Theory and Natural Language 37

functions can be interpreted not as r.e. indexes but as codes for such finite
arrays, since finite arrays of the sort en visioned are readily coded as single
natural numbers. On the other hand, we might simply choose as the child's
"official" conjecture at a given moment the grammar assigned highest
subjective probability at that moment
Consider next children growing up in multilingual environments. Such
children simultaneously master more than one language and hence convert
their current linguistic input into more than one, noncompeting gram-
matical hypothesis. To represent this situation, we must assume that inputs
from different languages are segregated by the child prior to grammatical
analysis (perhaps by superficial characteristics of the wave form or the
speaker). Linguistic development may then be conceived as the simulta-
neous application of the same learning function to texts for different
languages.
Clearly the general framework of learning theory can be adapted to a
wide variety of em pirical demands of the kind just considered. Consequen t-
Iy in the sequel we shall not pause to refine our models in these directions;
specifically, we shall continue to treat conjectures straightforwardly as
(single) r.e. indexes.

3.2.2 Are the Child's Conjectures a Function of Linguistic Input?

As discussed in section 1.3.4, learning functions are conceived as mappings


from finite linguistic corpora (represented as members of SEQ) into gram-
matical hypotheses. It is possible, however, that children's linguistic conjec-
tures depend on more than their linguistic input; that is, the same finite
corpus might lead to different conjectures by the same child depending on
such nonlinguistic inputs as the physical affection afforded the child that
day or the amount of incident sunlight. Put another way, children may not
implement any function from finite linguistic corpora into grammatical
hypotheses; rather, the domain of the function that produces children's
linguistic conjectures might include nonlinguistic elements.
This issue must not be confused with the problem of individual dif-
ferences. It is possible that different children implement distinct learning
functions, but the present question concerns the nature of a single child's
function. We shall in fact proceed on the assumption that children are more
or less identically endowed with respect to first language acquisition.
The present issue is also independent of the possibility that the child's
learning function undergoes maturational change. To see this, let IjJ Eti' be
38 Identification

considered the maturational successor to qJE:T, and let l{! begin its opera-
tion at the nth moment of childhood. Then the child may be thought of as
implementing the single function () E:T such that for all <1 E SEQ,

0(<1) = {rp(lJ), if Ih(lJ) < n,


if! (a), otherwise.
()is a function of linguistic input if rp and if! are such functions. This schema
may be refined in several ways, and any number of maturational changes
may be envisioned.
Finally, the problem of nonlinguistic inputs to the learning function is
not the same as the problem of utterance context. As noted in section 1.3. t,
any finite aspect of context may be built into the representation of a
sentence. What is at issue here, in contrast, are inputs that play no evident
communicative role, such as the child's diet or interaction with pets.
The possibility that the child's grammatical hypotheses are a function of
more than just linguistic input can be accommodated in a straightforward
way. Specifically, the interpretation of SEQ can be extended to allow both
sentences and other kinds of inputs to figure in the finite sequences pre-
sented to learning functions. Such extension would require a compensatory
change in the definition of successful learning since convergence on a text t
to rng(t) would no longer be appropriate; rather, success would consist in
convergence to the linguistic subset of rng(t).
In practice, such amended definitions seem unmotivated since there is no
available information about the role of nonlinguistic inputs in children's
grammatical hypotheses, if indeed there is any such role. As a consequence
learning theory has developed under the assumption (usually tacit) that the
only inputs worth worrying about are linguistic. We shall follow suit.
3.2.3 What Is a Natural Language?
Comparative grammar aims at an illuminating characterization ofthe class
of natural languages. But what independent characterization of this latter
class gives content to particular theories of comparative grammar? The
question may be put this way: What is a natural language, other than that
which is characterized by a true theory of comparative grammar?
Inevitably considerations of learnability enter into any "pretheoretical"
specification of the natural languages. Even if we revert to the partly
ostensive definition "The natural languages are English, Japanese, Russian,
and other languages like those," the italicized expression must bear on the
Learning Theory and Natural Language 39

ease of language acquisition if the resulting concept is to have much interest


for linguistics. The following formula thus suggests itself:

A highly expressive linguistic system is natural just in case it can be easily


acquired by normal human infants in the linguistic environments typically
afforded the young.

The role of the qualification "highly expressive" in the foregoing formula is


discussed in section 7.1, so wedo not consider the matter here. Rather, we
examine the remaining concepts, beginning with "normal human infant."
What content can be given to the concept ofa nomal infant that does not
render the preceding formula a tautology? Plainly it is no help to qualify a
child as "normal" just in case he or she is capable of acquiring natural
language (easily and in a typical environment). It is equally useless to
appeal to majority criteria such as: a language is natural just in case a
majority of the world's actual children can acquire it (easily, etc.). The
reason is that the world's actual children might all have accidental pro-
perties (e.g., the same subtle infection), rendering them inappropriate as
the intended standard. What was wanted were normal children, not the
possibly unlucky sample actually at hand.
It is tempting to here invoke neurological considerations by stipulating
that a child is normal just in case his or her brain meets certain neurophysi-
ological conditions laid down by some successful (and future) neurophysi-
ological theory. The difficulty with this suggestion is that the choice of such
neurological conditions must depend partly on information about the
normal linguistic capacities of the newborn, for a brain cannot be judged
normal ifit is incapable of performing the tasks normally assigned to it. And
of course invocation of normal capacities leads back to our starting point.
Quite similar problems arise if we attempt to identify normal children with
those children implementing the "human" learning function (or a "normal"
learning function).
Consider next the concept "typical linguistic environment." Majoritarian
construals of this idea are ruled out for reasons similar to before. Rather,
"typical" must be read as "normal" or "natural." It is of course unhelpful to
stipulate that an environment is natural just in case it allows (normal)
children to acquire (easily) a natural language. Nor is it admissible to
characterize the natural languages as those acquirable (easily, etc.) in some
environment or other, for in that case the notion of natural language will
vary with our ability to imagine increasingly exotic environments (e.g.,
environments that modify the brain in "abnormal" ways). We leave it to the
40 Identification

reader to formulate parallel concerns with respect to the concept of "easy


acquisition."
None of this discussion is intended to suggest that comparative grammar
suffers from unique conceptual problems foreign to other sciences. As in
other sciences, we must hope for gradual and simultaneous clarification of
all the concepts in play. Thus examination of central cases of natural
language will constrain our conjectures about the human learning function,
which can then be expected to sharpen questions about environments,
criteria of successful learning, and, eventually, natural language itself. As in
other sciences a natural language will eventually be construed in the terms
offered by the most interesting linguistic theory. Within this perspective
learning theory may be understood as the study ofthe deductive constraints
that bind together the various concepts discussed earlier. These concepts
are thus in no worse shape than comparable concepts in other emerging
sciences. Our discussion is intended to show only that they are not in much
better shape either.

3.2.4 Idealization

Texts are infinitely long, and convergence takes forever. These features of
identification will be generalized to all the paradigms discussed in this
book. However, language acquisition is a finite affair, so learning theory (at
least as developed here) might seem from the outset to have little bearing on
linguistic development and comparative grammar.
Two replies to this objection may be considered. First, although conver-
gence is an infinite process, the onset of convergence occurs only finitely far
into an identified text. What is termed "language acquisition" may be taken
to be the acquisition of a grammar that is accurate and stable in the face of
new inputs from the linguistic environment; such a state is reached at the
onset of convergence not at the end. Moreover, although it is true that
identification places no bound on the time to convergence, we shall later
consider paradigms that do begin to approximate the constraints on time
and space under which the acquisition of natural language actually takes
place. Further development of the theory in this direction may be possible
as more information about children becomes available.
This first reply notwithstanding, convergence involves grammatical sta-
bility over infinitely many inputs, and such ideal behavior may seem
removed from the reality of linguistic development. We therefore reply,
second, that learning theory is best interpreted as relevant to the design of a
Learning Theory and Natural Language 41

language acquisition system, not to the resources (either spatial or tem-


poral) made available to the system that implements that design. Analo-
gously, a computer implementing a standard multiplication algorithm is
limited to a finite class of calculations whereas the algorithm itself is
designed to determine products of arbitrary size. In this light, consider the
learning function (p of the three-year-old child. However mortal the child, ip
is timeless and eternal, forever three years old in design. Various questions
can be raised about <p,forexample: What class oflanguages does <p identify?
If comparative grammar is cast as the study of the design of the human
language faculty-as abstracted from various features of its implementation
-then such questions are central to linguistic theory.
Evidently, the foregoing argument presupposes that a design-
implementation distinction can be motivated in the case of human cogni-
tive capacity. Now Kripke (1982),in an exegesis ofWittgenstein (1953), has
offeredapparently persuasive arguments against the coherence of the predi-
cate "nervous system (or mind) ... represents rule ...." If the latter predicate
is indeed incoherent, then not much can be made of the program-hardware
distinction invoked above.
We decline the present opportunity to examine Kripke's argument in
detail. The issue, after all, is quite general since it bears on all repre-
sentational theories in cognitive science, in the sense of Fodor (1976). We
note only that Kripke's challenge must eventually be faced if cognitive
science, and learning theory in particular, are to rest on firm conceptual
foundations.
Having done no more than raise some of the conceptual and philosoph-
ical complexities surrounding the application of learning theory to the
study of natural language, we now return to formal development of the
theory itself.
II IDENTIFICATION GENERALIZED

This part is devoted to a family of learning paradigms that results from


modifying the definitions proper to identification. Chapter 4 considers
alternative construals of "learner" that are narrower than the class ;Y; of
all number-theoretic functions. Chapter 5 concerns the environments in
which learning takes place. Chapter 6 examines various construals of
"stability" and "accuracy" in the context of alternative criteria of successful
learning. Functions that learn neither too much nor too little are the topic
of chapter 7.
The family of models introduced in this part may be designated
generalized identification paradigms.
4 Strategies

4.1 Strategies as Sets of Learning Functions

To say that children implement a learning function is not to say much; a


vast array of possibilities remains. Greater informativeness in this regard
consists in locating human learners in proper subsets of ff.

DEFINITION 4.1A Subsets of ff are called (learning) strategies.

Strategies can be understood as empirical hypotheses about the limitations


on learning imposed by human nature. As such, the narrower a strategy, the
more interesting it is as a hypothesis.
Strategies may also be conceived as alternative interpretations of the
concept learner{see section 1.1). We leave intact for now the interpretations
of language, environment, and hypothesis proper to the identification para-
digm; similarly identification (section 104) is the criterion of learning rele-
vant to the present chapter. Each strategy !I' thus constitutes a distinct
learning paradigm. The identification paradigm results when !I' = 5'.

DEFINITlON 4.1B Let !I' s 5' be given.


i. The class {2' ~ Rfilsome ({J E.7 identifies .P} is denoted; [9'].
ii. The class {.P S REsvtlsome fPE!I' identifies.P} is denoted: [!I']svt.

Thus [Y] is the family of all collections Ie of languages such that some
learning function in the strategy 51' identifies 2. [YJ.vt is just
[.7] n .9'(RE.vt), that is, the family ofall collections 2 of total, single-valued
languages such that some learning function in the strategy Y identifies !E.

Example 4.1A

a. [§.] is the family of all identifiable collections of languages. By proposition


2.2A(i), RE fin U {N} ¢[ff]. Thus REf[ff]. Let!f' be any finite collection of lan-
guages. By exercise 1.4.3C, !f' E r.:71.
b. [fflv, is the family of all identifiable collections of total, single-valued languages.
[ff]svi = &,(REsv,l since REm € [ffJ.VI by proposition 1.4.3C and everysubset of an
identifiable collection of languages is identifiable. Let h be as in the proof of
proposition L4.3C. Then [ff ]SV, = [{ h}].
c. Let f Eff be as defined in part a of example 1.3.4B. Then [{I}J = &'(RE rin)·
d. The strategy .z" = {<p E ffl<p is self-monitoring} was discussed in section 1.5.2. By
proposition l.5.2A, [;,II] c: [.:7].
46 Identification Generalized

In this chapter we consider the inclusion relations between [.9'] and [.9"]
as g and g' vary over learning strategies. Informally we say that g
restricts g' just in case [g n g'] c [g']. If [g] c [.?], then g is said to
be restrictive. Similar terminology applies to [g).vl' One last notational
convention will be helpful.
DEFINITION 4.1C Let P be a property oflearning functions. Then the set
{tp E.? IP is true of qJ} is denoted: fF p.
Thus the set of recursive learning functions is denoted "fFr<cur>;vc," which we
will continue to write as ".?rcc."
All the strategies to be examined may be viewed as constraints of one
kind or another on the behavior of learning functions. Five kinds of
constraints are considered, corresponding to the five sections that follow.
Before turning to these constraints, we conclude this section with a general
fact about strategies,
PROPOSITION 4.1A Let g be a denumerable subset of fF, Then [g) c
[.?].

Proof For each iEN and each X a subset of N, define Li,x =

{<i,x) [x E X}. Now if Q ~ N, define a collection oflanguages 2'Q by


2'Q = {Li,NI i E Q} U {Li,DI i ¢ Q and D finite},

Obviously for every Q, 2'Q € [fF].


Claim No cp E.? identifies both 2'Q and 2'Q' for Q -# Q'.
Proof of claim Suppose that qJ identifies 2'Q and i E Q - Q'. Then, since qJ
identifies .'l'Q' cp identifies Li,N' Let a be a locking sequence for cp and L i• N .
Then there is a finite set D such that rng(u) £ L"D' But then (J can be
extended to a text t for Li,D' Since LI,n is a subset of Li,N' qJ converges to an
index for L i • N on t. Thus qJ does not identify LI,D' Since Li,D€SfQ " qJ does
not identify Y Q, .
It is easy to see that the claim implies the result of the proposition, since
there are nondenumerably many Q £ N and each cp identifies at most one
of the classes .'l'Q' 0
Strategies 47

Exercises

4.1A Let!/ and [I" be learning strategies such that!/ c !/'.


a. Prove that [!/] c:::: [!/'].
b. Show by example that [!/] = [!/'] is possible.
4.18 Evaluate the validity of the following claims. For learning strategies !/, !/',
a. [!/ U!/,] = [!/] U [.9"].
b. [!/ n!/'] c:::: [!/] n [.9"].
c. [.9'] n [!/'] £ [!/ n .9"].
d. [~ - !/] = [ff] - [.9'].

4.1C Let rp E ff and !/ £ ff be given.


a. What is the relation between 2'( rp) and [{ rp} ]?
b. Prove that [!/] = {2' £ !t'(rp)lrpe!/}.

4.2 Computational Constraints

In this section we consider two attempts to specify learning strategies that


approximate human computational limitations.

4.2.1 Computability

One of the most popular hypotheses in cognitive science is that human


ratiocination can be simulated by computer. It is natural then to speculate
that children's learning functions are effectively calculable. The corre-
sponding strategy is :Free, the set of all partial and total recursive functions
(see section 1.2.1).
Since:F ree constitutes a small fraction of !F, the computability strategy is
a nontrivial hypothesis about human learners. From the fact that :Free c
:F, however, we cannot immediately conclude that :Free is restrictive (see
exercise 4.1A). For this latter result it suffices to observe that :Free is a
denumerable subset of:F, from which proposition 4.1A directly yields the
following.
PROPOSITION 4.2.1A [:Free] c [:F].

It will facilitate later developments to exhibit a specific collection of lan-


guages that falls in [:F] - [:Free]. We proceed via a definition and three
lemmata.
48 Identification Generalized

DEFINITION 4.2.IA The set {x ENI cpAx)l} is denoted: K.


LEMMA 4.2.IA K ERE, but R ¢ RE.
Proof See Rogers (1967, sec. 5.2, theorem VI). 0
LEMMA 4.2.IB {K U{X}IXE N} E[.?].
Proof This follows from exercise 1.4.30. 0

LEMMA 4.2.1C {K U{X}IXEN} ¢ [.?"'].


Proof Suppose on the contrary, that some q> E y;rec identifies
{K U {x} [x EN}. Fix cp, and let 0 be a locking sequence for cp and K. We will
show that R is r.e., contradicting lemma 4.2.1A.
Let ka• kl' ...• be some fixed enumeration of K, and for every x define a
text r" for K U {x} by IX = 0 A X A k o, k" .... Since 0 is a locking sequencefor
cp and K, cp(t,~,.») = cp(o) is an index for K for every x. Now, if x ¢ K, IX is
a text for K U {x} which is not the same language as K. Thus, if x ¢ K, there
is an n > lh(o) such that cpCt:) is not an index for K, and hence cpCt:) '"
qJ(T.~(<1)' But, if x E K. e= is a text for K, and hence, since t;~(<J) is a locking
sequence for K, cpCt.X) = cp(o) for all n > lh(oj. Thus we have shown that
(*j x E R if and only if there is n > lh(o) such that

cp(I.X) '" cp(o).


Now it is easy to see from (*) that R is r.e. To see this, note that IX can be
constructed elTectively from x and that the function
ljJ(x) = least n > Ih(o) such that cp(I:) '" cp(o)

is therefore partial recursive with domain R. 0


A fundamental result for RE sv , is stated in proposition 4.2.1B.
PROPOSITION 4.2.1 B (Gold 1967) RE,,, ¢ [.?''']",.
Proof (from Gold 1967) Suppose that cpE.?'" identifies RE,,,. We will
construct an L ERE,,, and a text I for L such that cp changes its mind
infinitely often on I.This means that cp does not identify L, so the hypothesis
that cp identifies RE,., must be false. We will construct I in stages so that the

°
initial segment of I constructed by the end of stage s, 0', is equal to <0, xo),
< 1,XI),"" <n, x.) for some n.(We will also have that each x, is equal to or
1.)We rely on the following claim.
Strategies 49

Claim Given a = (O,xo), (l,xl)" .. , (n,x n >. there are numbersj and k
such that if r = a A (n + 1,0), ... , (n + I, 0) and r' = t A (n + j + 1, I),
... , (n + j + k, 1), then q>(r) # q>(r').

Proof of claim The following is a text for a language La E RE,v' : o A (n +


1, 0), ... , (n + j, 0), .... Thus there is a j such that if r = a A (n + 1, 0),
... , (n + j, 0), q>(T) is an index for La. But the following is a text for another
language L 1 ERE,vt: r A (n + j + I, I), ... , <n + j + k, I), .... Therefore
there must be a number k such that if

r' = T A (n + j + 1, I), ... , <n + j + k, I), q>(r') is an index for L l •


Since La of- L l , ep(r) # q>(r'), and j and k are our desired integers.
Now we construct t in stages.

Stage 0 US = <0,0).
Staae S + 1 Given as, let j and k be as in the claim using as for a, Define
o'" 1 to be the resultant r',

It is clear that t = Us
as is a text for some LE RE,yt. However, q> does not
converge on t, since q> changes its value at least once for each SE N. D

COROLLARY 4.2.IA [fff<C]'YI c [ffJ,yt.


Proof See proposition 1.4.3C. D

Exercises

4.2.1 A
a. Prove that {KU {X}IXEK}E[.:FfOC].
b. Let L ERE be recursive. Prove that {L U Dr D s: Nand D finite} E[.:F'OC].
c. Prove that {N} U {DI D finite and D s: K} U{DID finite and D s: K} e [.:F'ee].
d. Prove that {N - {x} Ix E K} U {N - {x, y}lx # y and x, y E !5} E[.:Free].
e. Prove that {N - {x} [x E K} U {N - {x,Y}lx # y and x, yE K} E[.:F'ee].
Compare exercise 2.2C.
4.2.1 B For 2', 2" s: RE, define 2' x 2" as in exercise 1.4.3F. Prove that if
2' E[.:F'ec] and 2" E [.:F,ee], then 2' x 2" E[.:F'oc].
*4.2.1 C Prove: Let 2' E[.:F,ec],VI' Then, there is qJ e §i,ee such that (a) qJ identifies
2', and (b)for all Le 2', there is i E N such that for all texts t for L, qJconvergeson t to
i. (Hint: Fix qJ' e.:F tee which identifies 2'. Define qJ e.:F'OC which uses qJ' to compute
its guesses. qJ rearranges the incoming text and feeds the rearranged text to qJ'.)
Compare section 4.6.3.
50 Identification Generalized

4.2.1D Let 2'E[.9"T«]S'l be given. Show that there is 2"E[.9"roc]S.l such that
.sf c: 2". (Hint: Let (jJ E.9"roc identify 2! <;:: REs". Use the proof of proposition
4.2.1A to construct L¢!t' and (jJE.9"rec which identifies 2!U{L}.) Compare this
result to exercise 2.2E.

4.2.1E Prove that RE sd E [9""c]. (For RE. d, see definition 2.38.)

4.2.1F For Y s 9", let [Y],.c = [Y] n &'(RErec ). Prove that [.9"'OC]... c: [.9"] rec '
(Hint: Use corollary 4.2.1A and exercise 1.2.2B.)

4.2.1G Let RE f in R = {LEREfinIL n K #- 0}. Prove that {K} U REfinKHff" c].

4.2.1H Let RE••, = {{O, 1,2, ... , n}ln EN}. REs., thus consists of the initial
segments of N. Prove:

a. Let ~E[I'] be given. Then 9'URE_e[l] if and only if Nf,9'.


b. Let 9' e [,....1 be given. Then !£ U RE_ e [,r·1 if and only if N «s:
4.2.11 Let nE N be given. A total recursive function f is called almost everywhere n
just in case for all but finitely many i EN, f(i) = n. Let !Z = [Llfor some total
recursive function f and for some n E N, f is almost everywhere nand L represents
fl. Show that some qJe"··identifies !£. (Compare proposition 4.5.3B.)
*4.1.tJ Prove that {{ (0, x)} U {(I, y)} U {(2, z)} U {3} x Kjlat least two-thirds
of {x,y,z} are indexes for KJ} e ['''.].

4.2.2 Time Bounds


Children do not effect computations of arbitrary complexity, so we are led
to examine computationally limited subsets of /7'••. The following defi-
nition is central to this enterprise.

DEFINITION 4.2.2A (Blum 1967) A listing <1>0, <1>1' ••• , of partial recursive
functions is called a computational complexity measure (relative to our fixed
acceptable indexing of /7rec) just in case it satisfies the following two
conditions:
i. For all i, x EN, c,oi(xll if and only if <1>; (xll.
ii. The set {<i,x,Y)I<1>,(x);S; y} is recursive.
To exemplify this definition, suppose that :Free is indexed by associated
Turing machines (see section 1.2.1). Then <1>; may be thought of as the
function that counts the steps required in running the ith Turing machine;
specifically, for i, x, YEN. <IJ,(x} = y just in case the ith Turing machine halts
in exactly y steps when started with input x. Condition i of the definition
Strategies 51

requires that <I>;(x) be undefined just in case the ith Turing machine never
halts on x. Condition ii requires that it be possible to determine effectively
whether the ith Turing machine halts on x within y steps. Both require-
ments are satisfied by the suggested interpretation of <l>j. Moreover it
appears that any reasonable measure of the resources required for a com-
putation must also conform to these conditions.
As with acceptable indexings, none of our results depend on the choice of
computational complexity measure. Indeed, any two computational com-
plexity measures can be shown, in a satisfying sense, to yield similar
estimates of the resources required for a computation (see Machtey and
Young 1978, theorem 5.2.4). Let a fixed computational complexity measure
now be selected; reference to the functions <1>0' <1>" •.• , should henceforth be
understood accordingly.
These preliminaries allow us to define the following class of strategies.
DEFINITION 4.2.2B Let hE !!Frec be total. IjJ E!!Frec is said to run in h-time
just in case IjJ is total and there is iEN such that (i) <Pi = 1jJ, and (ii)
<I>,(x) " h(x) for all but finitely many x E N. The subset of :!F''' that runs in
h-time is denoted !!Fh-time.

Note that for any total hE!!F rec, !!Fh-time consists exclusively of total recur-
sive functions.
Intuitively a learning function in !!Fh-time can be programmed to respond
to finite sequences" within h(,,) steps of operation (recall from section 1.3.4
that "rr" in "h("l" denotes the number that codes zr], The strategy ffh'''m,
corresponds to the hypothesis that children deploy limited resources in
formulating grammars on the basis of finite corpora. The limitation is given
by h.
Does y-h-time restrict !!Free regardless of the choice of total recursive
function h? The following result suggests an affirmative answer.
LEMMA 4.2.2A (Blum 1967a) For every total hEff'" there is recursive
L E RE such that no characteristic function for L runs in h-time.

Proof See Machtey and Young (1978, proposition 5.2.9). 0

Contrary to expectation, however, the next proposition shows that for


some total h e y;rec, y;h-time does not restrict y;rec.

PROPOSITION 4.2.2A There is total h e y;rec such that [!!Fh-time] = [!!F rec].
52 Identification Generalized

The proof of proposition 4.2.2A will be facilitated by a lemma and a


definition. The lemma is also of independent interest.

LEMMA 4.2.2B There is total f E.:F'" such that for all lEN (i) 'Pf(i) is total
recursive, and (ii) for all L ERE, if 'P; identifies L, then 'Pf(i) identifies L.

Proof of the lemma Given i, we would like to define 'Pf(O so that 'Pf(O
identifies at least as many languages as 'P, but 'Pf(i) is total. Thus we would
like 'Pf(,)(a), to simulate 'P;(a) but not to wait forever if 'P,(a) doesn't con-
verge. Therefore on input a we will only allow 'Pf(i) to wait lh(a) many steps
for 'P, to converge. Now 'P,(a) may not converge in Ih(a) many steps for any a
but, if 'P,(a) converges, there is a k such that 'P,(a) converges in k steps. Thus,
in defining 'Pf(o(a), we will allow the simulation of 'P; to "fall back on the
text", that is,to compute only 'P,(8) for some initial segment 8 of a. Precisely,
define
where 8 is the longest initial segment of a such that
<1>,(8) :0; lh(a) if such exists,
otherwise.

<Pf(!) is a total recursive function for every i. The condition defining ucan
be checked recursively, since we have bounded the waiting time by lh(a).
To see that 'Pm identifies any language L that 'P, identifies, let t be a text
for such an L. Then there is an n E N and an index j for L such that for all
m :2: n, 'P,(tm ) = j. Let s = <I>,(t"). Then by the definition of 'Pm, if m > s, n,
'Pf;i)(tm ) = 'P,(t,) for some k > n. Thus 'Pf(') converges on t toj. 0

Proof ofproposition 4.2.2A Let f be as in the statement of lemma 4.2.2B.


Define
h(x) = max{<I>f(o(j)li,j:O; x].
h is a total recursive function, since each function <l>f(jl is total.
Now suppose that 2'E [.:F"']. Let 'P;E.:F'" identify 2'. Then by the
lemma, 'Pf(O identifies 2'. But by the definition of h, for all j:2: i,
<l>f(i)(j) :0; h(j). Thus 'Pf(') runs in h-time. This implies that 2' E[.:F'.Hm,]. 0

Exercise

4.2.2A Let he.?feC be total. Let 2'h = {LeREsvtIL represents a function in


Show that for some total g E :Free, !t'h E [ff"g-time].
g-h-time}.
Strategies 53

4.2.3 On the Interest of Nonrecursive Learning Functions

Why study strategies that are not subsets of ff"'? For those convinced that
human intellectual capacities are computer simulable, nonrecursive learn-
ing functions might seem to be of scant empirical interest. Many of the
strategies we consider are in fact subsets of g;rec.
Nonrecursive learning functions will continue, however, to figure promi-
nently in our discussion. The reason for this is not simply the lack of
persuasive argumentation in favor of the view that human mentality is
machine simulabIe. More important, consideration of non recursive learn-
ing functions often clarifies the respective roles of computational and
information-theoretic factors in nonlearnability phenomena. To see what
is at issue, compare the collections 2' ~ {N} U REf;" and 2" ~ {K U
{X}IXEN}. By proposition 2.2A(i) and lemma 4.2.1B, respectively, no 'PE
g;rec identifies either collection. However, the reasons for the unidentifiabil-
ity differ in the two cases. On the one hand,.:£' presents a recursive learning
function with an insurmountable computational problem, whereas the
computational structure of !£ is trivial. On the other hand,!£ presents the
learner with an insurmountable informational problem-that is, no a E SEQ
allows the finite and infinite cases to be distinguished (cf. proposition 2.4A).
In contrast, no such informational problem exists for 2"; the available
information simply cannot be put to use by a recursive learning function.
The results to be presented concerning nonrecursive learning functions
may all be interpreted from this information-theoretic point of view.

4.3 Constraints on Potential Conjectures

Let 2' <;; RE be identifiable, and let fJESEQ and iEN be given. From
exercise 1.5.1 A we see that some 'P E ff such that 'P(fJ) = i identifies 2'. Put
differently, from the premise that cp E g; identifies!f! s;;; RE, no information
may be deduced about 'P(fJ) for any fJE SEQ, except that 'P(fJ)L if a is drawn
from a language in !f!. In this section we consider the effects on identifi-
cation of constraining in various ways the learner's potential response to
evidential states.
4.3.1 Totality
The most elementary constraint on a conjecture is that it exist. The corre-
sponding strategy is the set of total learning functions, denoted jF"total. From
part a of exercise 1.4.3G we have proposition 4.3.IA.
54 Identification Generalized

PROPOSITION 4.3.1A [,?"tolal] = [,?"].

Similarly directly from lemma 4.2.2B we obtain proposition 4.3.1 B.


PROPOSITION 4.3.1 B [,?"ree] = [,?"ree n ,?"Iotal].

Thus totality restricts neither'?" nor :ipee.

4.3.2 Nontriviality

Linguists rightly emphasize the infinite quality of natural languages. No


natural language, it appears, includes a longest sentence. If this univer-
sal feature of natural language corresponds to an innate constraint on
children's linguistic hypotheses, then children would be barred from conjec-
turing a grammar for a finite language. Such a constraint on potential
conjectures amounts to a strategy.

DEFINITION 4.3.2A cp E'?" is called nontrivial just in case for all (J E SEQ,
W'l'(I1) is infinite.
Thus the strategy of non triviality contains just those cp E'?" such that cp
never conjectures an index for a finite language. Note that nontrivial
learners are total. The learning function g defined in the proof of propo-
sition 1.4.3B is nontrivial.
Obviously nontriviality is restrictive: finite languages cannot be identi-
fied without conjecturing indexes for them. Of more interest is the relation
of nontriviality to the identification of infinite languages. The next propo-
sition shows that non triviality imposes limits on the recursive learning
functions in this respect; that is, some collections of infinite languages are
identifiable by recursive learning function but not by nontrivial, recursive
learning function.

PROPOSITION 4.3.2A There is 2' ~ RE such that (i)every L E 2' is infinite,


and (ii) .f£' E (§'Tree] - [§'Tree n §'Tnontrivial].
To prove the proposition, a definition and lemma are helpful.

DEFINITION 4.3.2R .f£' ~ RE is said to be r.e. indexable just in case there is


SERE such that 2' = {l¥;li ES}; in this case S is said to be an r,e. index set
for 2'.

Thus.f£' <;; RE is r.e. indexable just in case there is an r.e. set S such that for
all L ERE, L E 2' if and only if L = WI; for some i E S (S is not required to
contain every index for L).
Strategies 55

LEMMA 4.3.2A RE - RE nn is not r.e. indexable.

Proof of the lemma Let S be an r.e. set of indexes for infinite sets. Let eo.
e l , ... , be a recursive enumeration of S. We show how to enumerate an
infinite r.e. set A such that no index for A is in S. We enumerate A in stages.

Stage 0: Enumerate ~o until an X o appears in ~o. Enumerate 0, 1, ... ,


Xo - I into A.
Stage s + 1: Enumerate ~"+I until an X s+ 1 appears in ~"+I with X s+ 1 >
x, + 1. Such an xs + 1 exists since »:'.+1 is infinite. Enumerate x, + 1, ... ,
x s + l ~ 1 into A.

A is infinite, since at least one integer, x s + 1, is enumerated in A at each


stage s + 1. A =I=- We. for each s, since X s E We" but X s ¢ A. 0

Proof of proposition 4.3.2A Recall that we have fixed a recursive isomor-


phism between N' and N, the image ofa pair (x,y) being denoted by <x,y).
Recall also that 1l, and 1l, are the recursive component functions defined by
1l,«x,Y)l = x and 1l,«x,Y» = y(see section 1.2.1).
Define for each iEN, L, = {<i,x)lxEWI}. Let It' = {L,IWI is infinite}.
Obviously every language in It' is infinite. To show that It' E [SC'"'], define
h(er) = 1l, (ero) for every a ESEQ, and choose f ESC'" such that for all i E N,
f(il is an index for {i} x WI. Then f 0 h identifies It'; indeed,j 0 h identifies
L, for every i E N.
Suppose, however, that cp E g;rcc ng;nontrivia'. We show that «p does not
identify It'. Let

S = {ilthere is a sequence a such that q>(er) = i).


For any recursive function tp, S defined in this way is r.e. Since cp is
nontrivial, S contains only indexes for infinite sets.

Claim There is a recursive function 9 such that for every i EN,

1. ~ infinite implies Wg(i) infinite,


2. W1EIt'implies w",) = {1l,«x,Y»)I<x,Y)EWI}·
Proof of claim Given i, define w,(I) by

w,(i) = {{1l'«X'Y)ll<X'Y)EWI}, if <x,Y) E WI and<x',y')E WI


implies x = x',
N, otherwise.
56 Identification Generalized

Informally we enumerate in l¥g(O the second components of elements of rt;


until we have seen two elements with different first components. In this case
we then switch to enumerating every integer in l¥g(j)' The function 9
obviously has properties I and 2.

Now given the claim, we complete the proof of the proposition as follows.
Suppose for a contradiction that lp identifies 2. Let g(S) = {g(i)liES}.
Since lp is nontrivial, property I of 9 implies that g(S) contains only indexes
for infinite sets. Since lp identifies !E, and since for every infinite rt;, L; E 2,
for each such rt; there is a j E S such that H'i = L,. Then by (2) of the claim,
g(j) is an index for Jti. Thus g(S) is an r.e. set containing indexes for all and
only the infinite r.e. sets contradicting lemma 4.3.2A. 0

In section 4.3.5, proposition 4.3.2A will be exhibited as a corollary of a


more general result.

Exercises

4.3:2A Let 2' be a collection of infinite languages. Prove that 2' is identifiable if
and only if some nontrivial qJ E:F identifies !E. Compare this result to proposition
4.3.2A.
4.3.28 Let fI' ~ :F be such that some .P E [9"] is infinite. Show that not every
2" E [9"] is r.e, indexable.
*4.3.2C Let Sf be as defined in the proof of proposition 4.3.2A. The function f 0 h
defined therein is such that 2' c: 2'(f 0 h). Show that there is cp E:F'C< such that
.P = 2'(q».

*4.3.2D q) E /Y is called nonexcessive just in case for all (J E SEQ, W'P(al # N. Prove:
For all It' ~ RE, if N ¢ 2', then 2' E [!y Tec n /ynonmese;."] if and only if 2' E [!y r • c].
4.3.2E (John Canny) cp E:F is said to be weakly nontrivialjust in case for all infinite
LE 2'(cp). Wq>{In) is infinite for all n E N and all texts t for L. Nontriviality implies
weak nontriviality. Show that for some collection 2 ~ RE of infinite languages,
.If E [~f'ec] _ [~ref; n ~wcakly nontrivial].

4.3.3 Consistency

We next consider a natural constraint on conjectures.

DEFINITION 4.3.3A (Angluin 1980) cp E Y; is said to be consistent just in case


for all O"ESEQ, rng(a) S Wrp(a)'
Strategies 57

That is, the conjectures ofa consistent learner always generate the data seen
so far. Note that consistent learning functions are total.

Example 4.3.3A

a. The function f defined in part a of example 1.3.48 is consistent. Hence


RE fi n E [§consiSI~nt].
b. The function 9 defined in the proof of proposition 1.4.38 is consistent.
c. The function h defined in part c of example 1.3.4B is consistent.
d. The function f defined in the proof of proposition 2.3A is not consistent. To see
this, let i o be an index for 0, and let (J E SEQ be such that io E rngte) and io is least in
rng(a). Then f(a) ~ ;0_ Since rng(a) '*' w.
0 ~ o> f is not consistent.

Consistency has the ring of rationality: Why emit a conjecture that is


falsified by the data in hand? It thus comes as no surprise that consistency is
not restrictive. The proof of this fact resembles the solution to exercise
4.3.2A. We now demonstrate the less evident fact that consistency restricts
re:.
:y

PROPOSITION 4.3.3A Let consistent cp E y;r~c identify !f s; RE. Then


!f s; REre c .

Proof Let LE 2. By the locking sequence lemma, proposition 2.IA, there


is a sequence a such that rng(a) <::: L, W.,.)
= L, and if r E SEQ is such that
rngtr) <::: L, then !p(a A!) = !pta).
If x E L, !p(a A x) ~ !p(a), since a is a locking sequence for L. On the other
hand, if x if' L, !p(a A x) is not an index for L, since !p is consistent; hence
!p(a A x) "" !p(a). Thus x E L if and only if !p(a A x) = !pta). This constitutes
an elTective test for membership in L, since !p is total. 0

There are certainly 2 <::: RE such that(l) 2 E [ff"'J, and (2) 2 includes
nonrecursive languages. One such collection is {K}! Hence proposition
4.3.3A yields the following corollary.
COROLLARY 4.3.3A [:yr~c n:yconsistcnt] C [:yr~c].

Proposition 4.3.3A suggests the following question..If attention is limited to


the recursive languages, does consistency still restrict :y re c ? The next pro-
position provides an affirmative answer.
58 Identification Generalized

PROPOSITION 4.3.3B There is :£ ~ RE reo such that :t> E [ffleO] -


[;F"rec n ~consi:5tenl].
The proof of proposition 4.3.3B uses the following lemma, which is interest-
ing in its own right.

LEMMA 4.3.3A Let h(j, k) be a total recursive function, and let functions
fj Effrec be defined by fj(k) = h(j, k) for all k. Then there is a recursive set S
such that fj is not the characteristic function of S for any j.

Proof Define S by k E S if and only if h(k, k) -# O. Obviously S is recursive.


No fj is the characteristic function of S, since j E S if and only if h(j,j) -# 0 if
and only if fj(j) -# O. 0

Recall the definition of r.e. index able (definition 4.3.2B). Although the
recursive sets are r.e. indexable (exercise 4.3.30), lemma 4.3.3A says that the
recursive sets are not r.e, indexable as recursive sets. In other words, there is
no r.e. set of indexes of characteristic functions containing at least one index
for a characteristic function of each recursive set.

Proof of proposition 4.3.3B As in the proof of proposition 4.3.2A, define


L, = {(i, x) [x E Jti}, and let.2" = {L;I Jti is recursive} . .2" E [ff rec]; in fact, as
noted in the proof of proposition 4.3.2A, {L;li E N} E [ffTCC].
Suppose, however, that g Eff rec is a consistent function that identifies st.
Define a function h as follows:

h( (CT,;), k) = {a,I, if g(t'T /\ .(i,


otherwise.
k» = g(a),
It is obvious that h is a total recursive function, since 9 must be total. Thus h
satisfies the hypothesis of lemma 4.3.3A, so there is a recursive set S such
that no funtion k"i)(k) = h«t'T,i),k) is a characteristic function of S. But
let i' be an index for S, and let a' be a locking sequence for L i , and g. Then
k E Jti. implies that g(t'T' /\ (i', k» = g(t'T') which implies that h( (a', i"), k) =
O. And if k rt lV;" then g(t'T' /\ (i', k»
-# g(O"), since 9 is consistent so that
h«a', r), k) = l. But this implies that f(a'.i,)(k) = h«O", i'), k) is the char-
acteristic function of S, contradicting the choice of S. 0

Proposition 4.3.3B may be strengthened to the following fact about total,


single-valued. languages.
PROPOSITION 4.3.3C (Wiehagen 1976) [fff<C n ffconsiolenlJovl c: [ffroc]OVI'
Strategies 59

Proof See exercise 4.3.3B. D

We note that children are not thought to be consistent learners because


their early grammars do not appear to generate the sentences addressed to
them.

Exercises

4.3.3A qJ E!F is said to be conditionally consistent just in case for all 0' E SEQ, if
ep(uH, then rng(u) ~ »;'(q).
a. Refute the following variant of proposition 4.3.3A: Let conditionally consistent
ep E :Fro< identify 2' ~ RE. Then 2' £ RE rec.
b. Prove the following variant of corollary 4.3.3A: [ff rec n :Fco.di';o'BlIy co.'i".,,]
C r:F'""l.
c. Prove the following variant of proposition 4.3.3B: there is 2' £ RE re• such that
2' E [:F'OC] - [ffre. n :Fco.dil;o.allyc'.'iste' l]. (Hint: Add N to the collection 2' de-
fined in the proof of proposition 4.3.3B.)
4.3.3B Prove proposition 4.3.3C using the proof of proposition 4.3.3B as a model.
4.3.3C Let ff' E [:F'.' n ff· ons;slen']'VI be given. Show that for any L E RE sv"
ff' U'{ L } E [ffroc n ffcon,;".n'],v"

*4.3.3D Show that RE r •• is r.e. indexable. (Hint: See Rogers 1967, exercise 5-6.
p.73.)
4.3.3E (Ehud Shapiro 1981) Let ff' £ RE and total h e ffteC be given, and suppose
that 2' E [:F h'lim• n ffcon,j,.. ntJ. Show that there is a total 9 E :Free such that for all
L E 2', some characteristic function for L runs in g-time.

4.3.4 Prudence and r.e, Boundedness

Suppose that qJE!F is defined on O"ESEQ. Call tp(a) a "wild guess" (with
respect to tp) if qJ does not identify Wq>(a)' In this section we consider learning
functions that do not make wild guesses.

DEFINITION 4.3.4A <p E!F is called prudent just in case for all aE SEQ, if
tp(a)! then tp identifies Wq>(al-

In other words, prudent learners only conjecture grammars for languages


they are prepared to learn. The function f defined in part a of example
1.3.4B and the function g defined in proposition 1.4.3B are prudent.
Children acquiring language may well be prudent learners, especially if
60 Identification Generalized

"prestorage" models of linguistic development are correct. A prestorage


model posits an internal list of candidate grammars that coincides exactly
with the natural languages. Language acquisition amounts to the selection
ofa grammar from this list in response to linguistic input. Such a prestorage
learner is prudent inasmuch as his or her hypotheses are limited to gram-
mars from the list, that is, to grammars corresponding to natural (i.e.,
learnable) languages. In particular, note that the prudence hypothesis
implies that every incorrect grammar projected by the child in the course of
language acquisition corresponds to a natural language.
It is easy to show that prudence is not restrictive. The effect of prudence
on the recursive learning functions is a more difficult matter. We begin by
considering an issue of a superficially diITerent character.
The "complexity" of a learning strategy Y' can be reckoned in alternative
ways, but one natural, bipartite classification may be described as follows.
From exercise 4.3.2B we know that if some 2' E [Y'] is infinite, then not
every member of [9'] is r.e. indexable. However, even in this case it remains
possible that every collection in [Y'] can be extended to an r.e, indexable
collection of languages that is also in [Y'] The next definition provides a
name for strategies with this property.

DEFINITION 4.3.4B Y ~ fF is called r.e. bounded just in case for every


2' E [Y'] there is 2' E [Y'] such that (i) 2 ~ 2", and (ii)2' is r.e. indexable.
Thus r.e. bounded strategies give rise to simple collections oflanguages in a
satisfying sense.
We now return to the effect of prudence on :Free.

PROPOSITION 4.3.4A (Mark Fulk) [:Free n ~prudent] = [~ree].

Proposition 4.3.4A is a consequence of the following two lemmata, whose


proofs are deferred to section 4.6.3.

LEMMA 4.3.4A If ~ree is r.e. bounded, then [.:Free n ~prudent] = [~ree].

LEMMA 4.3.4B (Mark Fulk) :7'''' is r.e, bounded.

Exercises

4.3.4A Show that the function f defined in the proof of proposition 2.3A is not
prudent.
Strategies 6t

'4.3.48 Specify prudent qJ E:F'"o that identifies {KU{x} [x E K}.


4.3.4C Exhibit [/ ~ Si' such that (a)!/ is infinite, and (b)!/ is not r.e. bounded.

4.3.4D Show that for every lpEy;reensz-prudent, 2(lp) is r.e. indexable. Conclude
that Si'ree n Si'prudent is r.e. bounded.

4.3.4E Let [/ and ff' be r.e. bounded strategies.


a. Show that YUY' is r.e. bounded.
b. Show by counterexample that ff ns" need not be r.e. bounded.

4.3.5 Accountability

Proper scientific practice requires the testability of proposed hypotheses.


In the current context this demand may be formulated in terms of the
"accountability" of scientists, as suggested by the following definition.

DEFINITION 4.3.5A 'PEff is accountable just in case for all 17ESEQ,


W.,.) - rngte) "" 0.
Thus the hypotheses of accountable learners are always subject to further
confirmation.
It is easy to see that finite languages cannot be identified by accountable
learners. Similarly for!R s;;; RE fi n it is obvious that!R E [~accountable] if and
only if 2' E [ff]. In contrast, the following proposition reveals that the
interaction of ff"ccountable and :Free is less intuitive.

PROPOSITION 4.3.5A There is 2' ,; RE such that (i) every L E 2' is infinite,
and (ii) !i' E [~recJ _[ff rec n ~accountable].

Thus the identification by machine of certain collections of infinite lan-


guages requires the occasional conjecture of hypotheses that go no further
than the data at hand. The proof of the proposition relies on the following
definition and lemma.
DEFINITION 4.3.5B
i. The set {fEff'" nff<a"'!'P!,O) = f} is denoted SD.
ii. RE s D ~ {LERE,.,lfor some fESD, L represents f}.
LEMMA 4.3.5A RE SD rJ [~rec n ~accountableJ.

Proof Suppose BE :Free n~acccuntable. We define, uniformly in i, a text t i


for a language L, EREM which 0 fails to identify. An application of the
62 Identification Generalized

recursion theorem will then suffice to yield an Le REs D that e fails to


identify.

Construction of t i We construct t i in stages. Let k be an index for e.


StageO uO=<O,i).
Stage n + 1 Let (m, s) be the least number such that mE l1'8 ra") - rng(u n)
e
and <l>kCU") < s. Such a number exists since E :Faccoun'ablc.
If 7l: 1 (m) < lh(u"), let a n+ 1 = a" 1\ <lh(u n), 0).
If 7t I (m) ~ Ih(a"), let u"+ 1 = o" 1\ «lh(u"), 0), ... , <7l:1 (m), 1 . .:. . 7l:2(m)).
(n --'-- m = max{O,n - m}.) Let t i = Unun.

Let L, = rngjr'). It is clear that L, E RE s vl and that lJ fails to identify t i ,


since for each neither l1'8(a") rt RE svt or l1'8(a") 1l. rng( u n + 1). Now let g be a total
recursive function such that ~(i) = L j • By the recursion theorem, pick aj
such that ~Ul = Wj. Then, LJ is an element of RE sD· 0

It is plain that RE s D S; REs., S; RE fi n and that REsoE [:Fr<c]. Proposi-


tion 4.3.5A thus follows immediately from the lemma. It may be seen
similarly that proposition 4.3.2A is a direct corollary of proposition 4,J.5A
since §nonlrivial c ji>account.ble.
An analog of nontriviality relevant to RE s • t may be defined as follows.

DEFINITION 4.3.5C (Case and Ngo-Manguelle 1979) qJE:F is called Pop-


perian just in case for all UE SEQ, if lp(uH then W....(<r) E RE svt •

Thus the conjectures ofa Popperian learning function are limited to indexes
for total, single-valued languages. The function h in the proof of proposition
1.4.3C is Popperian.
An index for a member S of RE s vt can be mechanically converted into
an index for the characteristic function of S (see part b of exercise 1.2.2B).
As a consequence it is easy to test the accuracy of such an index against
the data provided by a finite sequence. Such testability motivates the ter-
minology "Popperian" since Popper (e.g., 1972) has long insisted on this
aspect of scientific practice (for discussion, see Case and Ngo-Manguelle
1979).
Plainly, in the context of RE s• I ' ji>Popperi.n is not restrictive. In contrast,
since :FPopperian C :FaccGuntable, lemma 4.3.5A implies the following.

PROPOSITION 4.3.5B (Case and Ngo-Manguelle 1979) [:F«C n


Strategies 63

Exercises

*4.3.5A L ERE is called total just in case for all x E N there is YEN such that
(x, y) E L (compare definition 1.2.2D).Note that a total language need not represent
a function (since it need not be single valued). cp E § is called total minded just in case
for all U E SEQ, if cp(u)1 then W'I'(C) is total. Prove: There is 2' ~ RE such that (a)
every LE:f' is total, and (b) 2'e[§·ecJ - [§,eCn§'o,.I.m;ndcdJ. (Hint: Rely on
Rogers 1967, theorem 5-XVI: the single-valuedness theorem.)
4.3.58 (Putnam 1975) Supply a short proof that RE s" ¢ [§r<c n §PopperianJ.
4.3.5C Define: L E RE c ha • just in case L E RE,y! and for each n E L either 1I: 2 (n) = 0
or 1I: 2 (n) = L (RE ch • thus consists of the sets representing recursive characteristic
functions.) Prove the following strengthening of proposition 4.5.3A: there is
2' s; RE c h•• such that

*4.3.6 Simplicity

Let L ERE, and let S = {x] wx = L}, the set of indexes for L. By lemma
r.z.m, S is infinite. Intuitively the indexes in S correspond to grammars of
increasing size and complexity. It is a plausible hypothesis that children do
not conjecture grammars that are arbitrarily more complex than simpler
alternatives for the same language (in view of the space requirements for
storing complex grammars). In this subsection we consider learning func-
tions that are limited to simple conjectures.
To begin, the notion of grammatical complexity must be precisely ren-
dered. For this purpose we identify the complexity of a grammar with its
size, and we formalize the notion of size as follows.

DEFINITION 4.3.6A (Blum 1967a) Total mE :7. ec is said to be a size measure


(relative to our fixed acceptable indexing of :7 re c) just in case m meets the
following conditions.

i. For all i EN, there are only finitely many j EN such that m(j) = i.
ii. The set {<i,j)lfor all k ~j, m(k) -# i} is recursive.

To grasp the definition, suppose that :7 r ec is indexed by associated Turing


machines (TM) as in section 1.2.1. Then one size measure mT M maps each
index i into the number of symbols used to specify the ith Turing machine.
This num ber is to be thought of as the size of i. mTM can be shown to be total
recursive. This size measure meets condition i of the definition, since for
64 Identification Generalized

i EN there are only finitely many Turing machines that can be specified
using precisely i symbols. Condition ii is satisfied, since there exists an
effective procedure for finding, given any i E N, the largest index of a Turing
machine of size i. For another example, the simplest size measure is given
by the identity function m(x) = x. Conditions i and ii of the definition are
easily seen to be satisfied. It would seem that any reasonable measure of
the size of a computational agent also conforms to these conditions.
As with our choice of computational complexity measure (section 4.2.2),
none of our results depend on the choice of size measure. Indeed, any two
such measures can be shown, in a satisfying sense, to yield similar estimates
of size (see Blum 1967a, sec. I). Let a fixed size measure m now be selected.
Reference to size should henceforth be interpreted accordingly.

DEFINITION 4.3.6B We define the function M : RE -+ N as follows. For all


L ERE, M(L) is the unique i E N such that

i. there is kEN such that ltk = Land m(k) = i,


ii. for all j E N, if Jtj = L, then m(j) ~ i.

Intuitively, for L ERE, "M(L)" denotes the size of the smallest Turing
machine for L. No index of size smaller than M(L) is an index for L.

DEFINITION 4.3.6C Let total j Effrec be given. i E N is said to be f-simple


just in case m(i) ::;; j(M(J.V;».

Tn other words, i is I-simple just in case the size ofi is no more than "jof" the
size of the smallest possible grammar for Wi. Thus, if j(x) = 2x for all x EN,
then i is j-simple just in case no index for J.V; is less than half the size of i.
With these preliminaries in hand, we may now define strategies that limit
the complexity of a learner's conjectures.

DEFINITION 4.3.6D

i. Let total J E§rec be given. fP E§ is said to be f-simpleminded just in case


for all O"ESEQ, if rp(O")l, then rp(O") is I-simple.
ii. If rp E§ is J-simpleminded for some total IE fFrec, then <p is said to be
simpleminded.

Put differently, an I-simpleminded learning function never conjectures


indexes that are I-bigger than necessary. Thus, if J(x) = 2x for all XE N,
then no conjecture of an I-simpleminded learner is more than twice the size
of the smallest equivalent grammar.
Strategies 65

Example 4.3.6A

a. Suppose that m is the size measure defined by m(x) = x for all x EN. Let total
h e /F«< be such that h(x) ;::: x. Then both function f of part a of example 1.3.48 and
function g of proposition 1.4.38 are h-simpleminded.
b. Irrespective of chosen size measure, the function g of part b of example 1.3.48 is
simpleminded.

Provided that total hE:Y TeC is such that h(x) ~ x for all x E N,
§h-simpleminded is not restrictive. However, for any total h e §,ec,

ffh-simpleminded severely restricts §ree. To show this, we rely on the following

remarkable result.

LEMMA 4.3.6A (Blum 1967a) Let L ERE be infinite, and let total g E§rec
be given. Then there is i E L such that m(i) > g(M(1t;».

Proof The lemma is a direct consequence oftheorem 1 of Blum (1967a). 0

The lemma asserts that every infinite r.e. set of indexes contains at least one
index that is g-bigger than necessary, for any choice of total 9 E :y r ec.
PROPOSITION 4.3.6A Let!l' E [:y r ec n :ysimplem;nded]. Then!i' contains only
finitely many languages.

Proof Let (j)E§rec n g;simpleminded identify 2. Let S = rngto), where S is


r.e, because it is the range of a recursive function. Since (j) is g-simpleminded
for some g, lemma 4.3.6A implies that S is finite. Otherwise, for some (J
we would have that m«(j)(u» > g(M(W'I'(a») contradicting the definition of
g-simpleminded. If S is finite, obviously 2 must be finite because cp cannot
learn a language for which it does not produce a conjecture. D

Thus, if children implement recursive, simpleminded learning functions,


and ifthey can only learn languages for which they can produce grammars,
then there are only finitely many natural languages.
COROLLARY 4.3.6A [g;rec n g;simpleminded] C [g;'ec].

Exercises
4.3.6A Prove the following strengthening of proposinon 4.3.6A: [/Fre< n
is the class of all finite collections of languages.
jO,impleminded]
66 Identification Generalized

4.3.6B cpeff is called loquacious just in case {q>(a)IO'ESEQ} is infinite. Prove:


There exists total h e !Free such that for all total f E :Free and loquacious cp E !Free
there exists 0" E SEQ and i EN such that

a. CP<p(a) = <P;,
b. f(m(i)) < m(<p(u)),
c. for all but finitely many (x, s) e N, if <I>'la)(x) ~ s, then <I>,(x) ~ h«x,s».
(In other words, the longer program cp(a) is not much faster than the shorter
program i). (Hint: Use theorem 2 of Blum 1967.) This result extends proposition
4.3.6A.

4.4 Constraints on the Information Available to a Learning


Function

Each initial segment t n of a text t provides partial information about the


identity ofrng(t). The information embodied in Ta may be factored into two
components: (I) rng(ta), that is, the subset of rng(t) available to the learner
by the nth moment, and (2) the order in which rng(Ta ) occurs in Ta • Human
learners operate under processing constraints that limit their access to both
kinds of information. In this section we examine two strategies that reflect
this limitation.

4.4.1 . Memory Limitation

It seems evident that children have limited memory for the sentences
presented to them. Once processed, sentences are likely to be quickly
erased from the child's memory. Here we shall consider learning functions
that undergo similar information loss.
DEFINITION 4.4.IA Let a E SEQ be given.

i. The result of removing the last member of" is denoted: a>. !f Ih(u) = 0,
then a" = a.
ii. For n E N the result of removing all but the last n members of" is denoted:
,,-no If lhlo) < n, then ,,-n ~ a.

Thus, if" = 3, 3, 8, 1,9, then a" = 3,3,8, I and ,,-2 = 1,9.


DEFINITION 4.4.1B (Wexler and Culicover 1980, sect. 3.2) For all neN,
q> E ff is said to be n-memory limited just in case for all a, r E SEQ, if
Strategies 67

a- n = "r- nand cp(a-) = cp(t-), then cp(a) = cp(t). If cp E:F is n-memory


limited for some n E N, then cp is said to be memory limited.

In other words, cp is n-memory limited just in case cp(a) depends on no more


than cp(a-) (cp's last conjecture) and a-n (the n latest members of a).
Intuitively a child is memory limited ifhis or her conjectures arise from the
interaction of recent input sentences with the latest grammar that he or she
has formulated and stored. This latter grammar of course provides partial
information about all the data seen to date.

Example 4.4.1A

a. The function h defined in the proof of proposition 4.3.2A is I-memory limited.


b. Neither the function J defined in part a of example 1.3.4B nor the function g
defined in the proof of proposition 1.4.3B is memory limited.
c. The function g of part b of example 1.3.4B is O-memory limited.

Does some memory-limited cpEff identify RE n n? Let cpEff be 2·


memory limited, and consider the text t = 4, 5, 5, 5, 5, 6, 6, 6, 6, ... ,
for the language {4,5, 6}. It appears that by the time cp reaches the
first 6 in t, the initial 4 will have been forgotten, rendering conver-
gence to rng(t) impossible. Since a similar problem arises for any
"memory-window," it appears that memory limitation excludes identifi-
cation of RE n n •
However, this reasoning is incorrect. Memory limitation can often be
surmounted by retrieving past data from the current conjecture. The fol-
lowing proposition will make this clear.
PROPOSITION 4.4.1A REno E [ffrec n ffl-memory limited].

Proof Let S be a recursive set of indexes ofr.e. sets containing exactly one
index for each finite set and such that, given a finite set D, we can effectively
find e(D) E S such that e(D) is an index for D. The existence of such a set and
function e is an easy exercise.
Now define f Efpec by f(a) = e(rng(a» for all a E SEQ. Informally f
chooses a canonical index for the range of a. Now, if f(a-) = f(t-), then
rngto") = rngtr ") and if also a-I = t-l, rng(a) = rngtr) so that f(a) =
f( r), Th us f E 9' I-memory limited. 0
68 Identification Generalized

This last result notwithstanding, memory limitation is restrictive.


PROPOSITION 4.4.1 B [ymemor y limited] c: [.?].

Proof Let 2:' consist of the language L = {(O,X>IXEN) along with, for
each jEN, the languages Lj = { ( O , x > l x E N } U { ( l , i > } and Lj~
{(O,x>lx '" j) U {(I,D). It is easy to see that 2:' E [3"']. (In fact 2:' E
[ff rec ]. ) But suppose that !I! E [ffmemOTY limited]. For instance. suppose that
some <p E yl-memor ylimited identifies z'. (The case where qJ E Yll-memorylimited is
similar.) Intuitively, when", first sees (I,D for somej, '" cannot remember
whether it saw (O,j> or not and so cannot distinguish between L j and Lj.
Formally let (J be a locking sequence for", and L. Let (J' = (J A (I,jo>
for some jo such that (O,jo>!,rng(J). Let (J" = (JA (O,jo> A (I,jo>. Now
"'«J') = "'(J"), since "'(J) = "'(J A (O,jo» and '" is I-memory limited.
But now let t 1 = (J' A (0,0> A (0, I> A . . . A (0, i> A ... , for all i '" jo, and let
t 2 = (J" A (0,0> A (0, I> A . . . A (0, i> A " ' , for i '" jo' t , is a text for Ljo'
and t 2 is a text for Li o' but ip converges on t 1 and ts to the very same
index because of memory limitation. Thus", cannot identify both Lj o and
L}o" D

The proof of proposition 4.4.1B hinges on a collection oflanguages all of


whose membersarefinite variantsof each other. Exercise 4.4.1 F shows that
this feature of its proof is not essential.
To simplify the statement oflater propositions, it is useful to record here
the following result.

LEMMA 4.4.1A
i. [;y; I-memory limited] = [:fjimemory limited].
ii. [ff rec n g-l-memory limited] = [g:rec n.,¥"memory limited].

The proof of this lemma turns on the following technical result (d. lemma
1.2.1B).

LEMMA 4.4.1B There is a recursive function p such that p is one to one and
for every x and y, lfJx = Q)p(x,yj'

A proof of this lemma may be found in Machtey and Young (1978). Such a
function p is called a padding function, for to produce p(x, y) from x, we take
the instructions for computing lfJx and "pad" them with extra instructions
to produce infinitely many distinct programs for computing the same
function.
Strategies 69

Proofof [emma 4.4.1A


i. Obviously, [yl-memory limited] ~ [ymemory limited]. Suppose on the other
hand that!e E [§"memory limited]; say IE is identified by the n-memory limited
function <p. We construct IjJ which is I-memory limited and identifies 2. Let
p be the padding function provided by lemma 4.4.18. Given any xEN,
define x'·, to be the sequence of n x's. Now given a E SEQ, define

Now define ljJ(a) ~ p(<p(8), ao). (Intuitively we simulate <p on texts for which
n-memory limitation is of no advantage over l-rnemory limitation due to
the repetitions.) IjJ evidently identifies 2, since for any text t for L E 2, f is
also a text for L. To see that IjJ is I-memory limited, suppose that ljJ(a-) ~
ljJ(r-) and a-I ~ r l . Since ljJ(a-) ~ ljJ(r-), p(<p(8-),ao) = p(<p(r-),ro) so
t

ao = roo Let x = a-I = r-1. We have then that 8 = 8· A X A ah·' and


f = r- A x A ah·'. Since <p(8-) = <p(r), <p(8- A x) = <p(r A x) by the
n-memory limitation of <p. Thus <p(f) = <p(8) by the n-memory limitation of
tp, Thus

ljJ(a) = p(<p(8),ao) = p(<p(i),r o) = ljJ(r).

ii. The transformation of tp to IjJ in the proof of (i)produces a recursive IjJ if <p
is recursive. 0

Proposition 4.4.IB show that memory limitation restricts :JF. We now


show that memory limitation and §"rcc restrict each other.
PROPOSITION 4.4.1C [g-rec n g;memor y limited] L [g-rec] n [ffmemory limiled].
Proof Let A be a fixed r.e. nonrecursive set, and define L =
{(D,x)lxEA), L. ~ LU (I,n»), and L~ ~ LU {(D,n),(I,n»). Let 2 =
{L, L., L~lnEN).1t is easy to see that 2E[:JF"']. (Informally, conjecture
L until some pair (I, n) appears in the text. Then conjecture L. forever
unless (D,n) appears or has already appeared in the text. In that case
conjecture L~.) Also IE E [~memory limited]. (Again informally, conjecture L
until either (I, n) appears in the text for some n, in which case behave asjust
described, or until (D,n) appears in the text for some n¢A. In this case
conjecture L~ forever. This procedure is l-memory limited but not effective,
since it asks whether n E A for a nonrecursive set A.)
Finally, we claim that ::e ¢ [~rec n ~memory limited]. For suppose that cp is
l-memory limited, recursive, and tp identifies 2. Let (J be a locking se-
70 Identification Generalized

quence for (() and L. This implies that for every n e A. (()(O''' (0. n» = rp(O').
Thereforeforsomeme.4, rp(O''' (O,m» = rp(O') else ...tisrecursivelyenumer-
able, implying that A is recursive. Fixing such an m, let s be an enumeration
of L, and define two texts, t and t', by t = 0''' (I,m) "s and t' =
0' " (0, m) " (I, m) "s. By l-memory limitation and the property of m,
rp(O''' (O,m) " (I,m» = rp(O''' (l.m» and so again by l-memory limita-
tion, (()(t~+l) = rp(t,.) for all n ~ Ih(O') + 1. But t' is a text for L~ and t for L~
and L II .;: L~. Thus rp does not identify both L~ and L~. 0

The interaction of memory limitation and computability may be reflned


yet further.
PRoPOSmON 4.4.10 For every total hE~roc, [yll-t1menymemoryllmlted]
C [y.ee n ymemory IIllllted].

The proof of proposition 4.4.10 is facilitated by the following definition.

DEFINITION 4.4.1C

i, For i, n EN, we define (()I.~ E yrcc as follows. For all x EN,

ii. We define JiI'..II to be the domain of Cf't,lI"


Thus Cf'••II(X) may be thought of as the result of running the ith Turing
machine for n steps starting with input x. If the machine halts within n steps,
then Cf'".(x) = Cf'.(x); if the machine does not halt within n steps, then rp., .(x)
is undefined. Definition 4.2.2A implies that the set {(i,n,x)lrpl,~(x>!} is
recursive.

Proofofproposition 4.4,1 D The collection f£ of languages, which we will


show to be in [Free n Fmemory IIml1ed] but not in [Fmomory limited n ,lI-tl"'O],
will be ofthe form .!l'. = {R UFIF finite}. where R is a fixed recursive set to
be chosen later. It is easy to see that each such class is identifiable by a
recursive, l-memory-limited function. so it remains to choose R such that
.2'. f [jF'lt-limo n jF'momory limited]. Fix h, and define a recursiie function f by

. _{I,° if (().,1f(~~O') = (().,1f(.)(r), where r = 0''' x" 0'~J<1,


f(.,Cf,x) - , atherwise, .

(Note that equality in the first clause means that both computations con-
verge and are equal.) f is evidently total and recursive.
Strategies 71

Fix R recursive, and suppose that CPr E y-h-time n§n-memory limited is such
that C{Ji' identifies .PR . Let 17' be a locking sequence for Rand qJj' such that,
in addition, CfJj',h(O',)(a') converges.

Claim For all but finitely many x, x E R if and only if f(i', ,l,x) ~ I.

Proof of claim If x E R, then if r, = 17'1\ x 1\ a;?>, CPj'(!x) = qJi'(a'). Now for


all but finitely many x, 'I"',h".,(r,) converges. Thus for all but finitely many
xeR, 'I"'.h<,jrx ) = 'I"'.h,.,,(a'), and therefore f(i', a', x) = I. On the other
hand, suppose that f(i', a', x) = I. Then 'I',,(a') = 'I',,(a' A x A a;SX') and this
common value is an index for R. Since (J' is a locking sequence for R, we also
have that 'I',,(a' A aO"') = 'I',,(a'). Let t be any text for R. Since 'I',,(a' A a;{") =
cpda' 1\ x 1\ ad » and <Pi' is n-memory limited, cpda' 1\ a(jn) 1\ tm ) = cpda' 1\
X

)for every m. But since the former must be an index for R, so is


x 1\ ag X ) 1\ t m
the latter. Thus xeR else '1',' does not identify RU{x} on the text
a' 1\ x 1\ agX ) 1\ t.
The theorem will now be proved if we can show that there is a recursive
set R such that for all a and 1there are infinitely many x such that x e R if
and only if [(I, a, x) ~ O. This follows easily by a direct diagonalization
argument (f is a total recursive function) or by an argument that depends
on lemma 4.3.3A. We leave the details to the reader. 0

Proposition 4.4.ID should be compared with proposition 4.2.2A.


Finally, we show that memory limitation restricts the identification of
total, single-valued languages. Indeed, the next proposition provides more
information than this (and implies proposition 4.4.1 B).

Proof Consider the following collection of total recursive functions:


C = {flf is the characteristic function of a finite set
or f is the characteristic function of N}.

If If' is the collection of languages in REM that represents precisely the


functions in C, it is easy to see that .sf E [.?rec]svt. Suppose, however, that
cP E ffmemor y limited identifies 2; we may suppose by lemma 4.4.1B that cp is
I-memory limited, Let a be a locking sequence for 'I' and (the language
representing) the characteristicfunction of N, Let D = {x] (x, 0) e rng(a)},
and let (J' be a sequence such that T = a 1\ 170 1\ a' is a locking sequence for the
characteristic function of D. (The existence of such a a' uses corollary 2.IA
72 Identification Generalized

to the Blum and Blum locking-sequence lemma.) Let n be an integer such


that neither (n,O) nor (n, I) is in a. Now <p(a A (n, 0) A ao) = <p(a A ao),
since a is a locking sequence for <p and the characteristic function of N, and
so <p(a A (n, 0) A ao A a') = <p(a A ao A a') by the I-memory limitation of <p.
But then if we let I be a text that begins with a A (n, 0) A a o A a' and ends
with an enumeration ofthe characteristic function of D except for the pair
>,
(n, I then <p must converge on t to an index for the characteristic function
of D by I-memory limitedness and the locking sequence property of a'.
However, t is a text for the characteristicfunction of D U {n} and not D con-
tradicting the fact that <p identifies the characteristic function of D U in}. 0
COROLLARY 4.4.1 A [g;memor ylimited]svt C [Y]svt.

Proposition 4.4.IE implies proposition 4.4.1B. Corollary 4.4.1A should be


compared to proposition 1.4.3C.

Exercises

4.4.IA Specify l-memory-limited, recursive learning functions that identify the


following collections of languages.

a. {N - {X}IXE N).
b. RE'd (see definition 2.3B).
c. {KU (X)IXEK).
4.4.1 B Let n E N be given, and let cp E g;n-memor y limited identify L ERE. Must there be
a locking sequence (J for qJ and L such that Ih(a) ::;; n?
4.4.1C Prove that [ffmemory limited] n [§h-timc] ~ [ffmemory limited nffh'limcJ for all
total h e !Free.

*4.4.1D Let a function F: SEQ -+ SEQ be given. cp e:F is called F-biased just in
case for all aESEQ, if F(a) ~ F(r) and <p(a-) ~ <p(r-), then <pta) = <p(r). To illus-
trate, let H : SEQ ~ SEQ be such that for all a E SEQ, H(a) ~ a- 5.Then <p E!J'" is
H-biased if and only if cp is 5-memory limited.

a. For n E N. let Gn : SEQ _ SEQ be defined as follows. For all CTE SEQ, Gn(CT) is the
sequence that results from removing from (1 all numbers greater than n. Thus
G6(3, 7. 8. 2) = (3,2). Prove: Let n E N be given. If .!t' E [:FGn·biIlSed], then If is finite.
b. (Gisela Schafer) For neN.let IL:SEO-SEO be defined as follows. For
all a E SEQ. Hn(a) is the result of deleting all but the last n different elements of (1.
[Thus H)(8,9,4,6,6,2) = (4,6,2).] Prove that for all n ~ 1, [:Frecn:FHn'biased] =
[$I'rec n $I' I·memory limited].

*4.4.1E(John Canny) (1 is said to be a subsequence of r just in case rng(a) s; rng(r).


For i e N, m: SEQ _ SEQ is said to be an i-memory function just in case for all
Strategies 73

UESEQ, (a) m(u) is a subsequence of a, (b) Ih(m(u)) ~ i, and (c) rng(m(u»)-


rng(m(a-)) ~ {a1h(Oj}' cpEff is said to be i-memory bounded just in case there is
some i-memory function m such that for all o e SEQ. cp(a) depends on no more than
a1h(oj-I' rp(a-). and meal-that is,just in case for all a, tESEQ. if a1h{oj-l = alh(~)-I'
",(u-) ~ ",(,-), and m(u) ~ m(,), then ",(u) ~ ",(,), Thus a memory-bounded learner
chooses his or her current conjecture in light of a short-term memory buffer of finite
capacity that evolves through time. The concept of i-memory bounded generalizes
that of l-memory limited inasmuch as cpEff is I-memory limited ifand only if tp is
O-memory bounded.
Prove: For all i E N, [ffi-memOry bOUnded] C [ff].

Open question 4.4.1 A [g;:-memoTYlimited] c: [g;:-l-memorybounded]?

4.4.1F Exhibit £' £; RE such that(a) for all L, L' E£', if L '1' L' then Land L' are
not finite variants, and (b)!E E [.9'] _ [ffmemory limited].

*4.4.2 Set-Driven Learning Functions


We next consider learning functions that are insensitive to the order in
which data arrive.

DEFiNITION 4.4.2A (Wexler and Culieover 1980, see. 2.2) 'I' E:Ii' is said to
be set driven just in case for all (J, r E SEQ, if mgt«) = rng(r), then 'I'«J) =
'I'(f).

Example 4.4.2A
a. The function f defined in part a of example 1.3.48 and the function 9 defined in the
proof of proposition 1.4.38 are set driven.
b. The function h defined in part c of example 1.4.2A is not set driven.

Identification of a language L requires identification of every text for L,


and these texts constitute every possible ordering of L. This consideration
encourages the belief that the internal order of a finite sequence plays little
role in identifiability. The conjecture is correct with respect to :#'. (see
exercise 4.4.2A). However, the next proposition shows that it is wrong with
respect to g;:-rec.
PROPOSITION 4.4.2A (Gisela Schafer) [yrec n g;sel driVen] C [ff r e c ].
74 Identification Generalized

Proof For eachj define L) = {(j,X)IXEN}. Givenj and n, define a':" =


(j,O) A (j, 1) A ••• A (j,n). Now define

. . if th~re are n, S such that .


L', = qJiu Jon) ~ i and Wi,. ::::> rng(~)'R)
) and n, S IS the least such pair,
{ (j, 0) }, otherwise.

Let ff = {L), Ljlj EN}. It is easy to see that 2 E (fflec]. Suppose, how-
ever, that ff E [ffrec n :Fsel dJi.en], and suppose that qJ) is a set-driven
recursive function. Now if !p) identifies 2, qJ) identifies the text t =
(j, 0) 1\ (j, 1) 1\ ••• 1\ (j, n) 1\ . . . . Thus there must be an n E N and an
index i for L) such that qJkr),n) = i. In particular, there must be an nand s
such that qJ)(a),R) = i and Wi,s::::> rng(u),n). But then qJ) does not identify
rng(c-:") since on the following text t' = a':" A (j, n) A (j, n) A (j, n) 1\ . . . .
qJ) must conjecture J¥; in the limit since qJ) is set driven. Thus qJ) does not
identify ff. D

Thus set-drivenness restricts ffrec (but see in this connection exercise


4.4:2C).
Although children are not likely to be set driven, they may well ignore
certain aspects of sentence order in the corpora they analyze.

Exercises
4.4.2A Prove that [§,e! dri.en] = [§].
4.4.28 Let <pE§,eldll ve n identify RE n•. Show that for all uESEQ, U is a locking
sequence for qJ and rng(u).
4,4.2C Prove that if !f contains only infinite languages, then !f E [§"C] if and
only if 2' E [§r« n §.CI d';.enJ.

4.5 Constraints on the Relation between Conjectures

The successive conjectures emitted by an arbitrary learning function need


stand in no particular relation to each other. In this section we consider five
constraints on this relation.
Strategies 75

4.5.1 Conservatism

DEFINITION 4.5.1A (Angluin 1980) cp e:7 is said to be conservative just in


case for all 0- eSEQ, if rng(rr] c::::; Wq>(a-J' then cp(o-) = cp(o--).
Thus a conservative learner never abandons a locally successful conjecture,
a conjecture that generates all the data seen to date.

Example 4.5.1A

a. The function h defined in part c of example 1.3.4B is conservative.


b. Both the function f definedin part a ofexample 1.3.4B and the function g defined
in the proof of proposition 1.4.3B are conservative.
c. The function f defined in the proof of proposition 2.3A is not conservative.

Conservatism is not restrictive.


PROPOSITION 4.5.1A [:7] = [:7cortServative].
Proof This proof depends on the characterization of classes Ye [:7]
given in proposition 2.4A. Recall that if Y E [:7], then for every LEY there
is a finite set DL ~ L such that if DL ~ L' and L' E Y; then L' ¢ L. Now
given such an Y, define f by

f (o-- ),
) least i such that L = It;
f(O' = {
and L ::2 rng(o) ::2 DL , if such exists and ltf(a-) ~ rng(u),
least index for rngur), otherwise.

By the first clause of the definition, f is conservative. Note further that for
all y E SEQ, ltf(y) ::2 rng(y) (f is consistent), so this fact together with the
first clause of the definition implies that f never returns to a conjectured
language once it abandons a conjecture of that language.
To show thatf identifies Y, suppose that LeY and t is a text for L. Iff(tn)
is an index for L for any n, then fein) = fCc",) for all m ~ n. Further there is
an n' such that DL ~ rng(tn , ) c::::; L. Thus f will adopt the conjecture of the
least index for L on t", for some m 2: n' unless there is an index i for a
language L' =1= L such that f converges on t to i. Suppose for a contradiction
that such an i exists. Then L' :;2 rng(t) = L, since Uf<rJ ::2 rng(y) for all y. Let
n be least such that f(tn) = i; f(tn) was defined by either the second or the
76 Identification Generalized

third clause in the definition of f. Iff was defined by the third clause, L' =
rng(r.) so that L = rngtr) 2 rng(r.) = L' 2 L; thus L = L', contradicting
the assumption that L -# 1.:. Suppose, on the other hand, that !Ci.) is defined
by the second clause of the definition of f so tha t DL , ~ rng(r.) ~ L'. Thus
L 2 Du which by the property of DL' implies L ¢ L'. But this contradicts
L' ::> rng(t) = L. 0

On the other hand, conservatism does restrict g;rec.


PROPOSITION 4.5.1 B (Angluin 1980) [g;T.C n socon'.Tva,;v.] C [sorce].

Proof This argument is essentially the same as that of Proposition 4.4.2A.


Consider the class .;£ oflanguages defined there and suppose that <PjE g;r.c
is a conservative function that identifies 2. As we argued in proposition
4.4.2A, q>j must identify the text t = <i,a), <i.
I), <i,2), ... , for Lj ; thus
there is a least pair <n,
s) such that <pj(T.) = i and HIi.• ::>' rng(lJ Then Lj =
rng(T.) is not identified by q>j, since on the text <i,0), <j, I), ... , <L»>;
<j, 11), ... , q>j must continue to output i by the conservativeness of <Pj' 0
There is a parallelism between consistency and conservatism. Both
strategies embody palpably rational policies for learning, both constitute
canonical methods of learning in the sense that neither is restrictive, but
both strategies restrict g;T.C. Mechanical learners evidently pay a price for
rationality.
Evidence that children are not conservative learners may be found in
Mazurkewich and White (1984).

Exercises
4.5.lA Prove:
a. [ff] = [sz;consiSlent n§consc:rvali'i"e n 9 p 'I'p d c:n t J.
b. [Free n ~consistcnl] $. [Free n ji""conscrvllltjve].
C. [~TCIC n ~4;vnSer",alj""l!] $. [~re(; n ~Ii;;Qnsistenl].
4.5.18
a. Let q>E ji7<ons;sl.nl n .:FeDn' ervae i ve be given. Show that for all e s SEQ, 11 is a locking
sequence fore and W"'<U)'
b. Let ([1E gennm.a'ive identify text t. Show that there is no n E N such that
W",(I.) => rng(c). (Thus conservative learners never "overgeneralize" on languages
they identify.)
*4.5.1C Prove that [~l11em-Ory Limited] = [ymcmor y Iimited n .?consiSlent n~conse-rv8(iv~].
Strategies 77

4.5.2 Gradualism

A single sentence probably cannot effect a drastic change in a child's


grammar. We consider a corresponding strategy here. As a special case of
notation introduced in section 2.1 for o E SEQ and nE N, 0' 1\ n is the result of
concatenating n onto the end of 0': thus (6,2,4) 1\ 3 is (6,2,4,3).

DEFINITION 4.5.2A cp E.? is said to be gradualist just in case for all


0' E SEQ, {cp(a-" n)ln E N} is finite.

Thus, if cp E.? is gradualist, then the effect of any single input on q/s latest
conjecture is bounded. An argument similar to that for lemma 4.2.2B
shows that gradualism is not restrictive.
PROPOSITION 4.5.2A [.?grodua1isIJ = [.?].
Proof We will give an informal argument to show that if cpE:!F identifies
2, there is a qlE.? such that !p' identifies 2, and for all O'ESEQ
{cp'(a-" n)ln E N} has size at most 3 by showing that sp' can be constructed
from cp so that it never changes its conjecture in response to a new input by
more- than 1. The argument is a fall-behind-on-the-text argument as in
lemma 4.2.28. What cp' does on a text t is to simulate tp, Whenever cp
changes its conjecture, say by n, tp' then uses the next n arguments of t to
change its conjecture by ones. If!p converges on t, so will cp', although !p' will
start converging much later on the text. 0

Since all the procedures invoked in the preceding proof can be carried out
mechanically, we have the following corollary.
COROLLARY 4.5.2A [:!Fre • n :!FgradUalist] = [:!Frcc].
The next proposition shows that gradualism restricts memory limitation.
PROPOSITION 4.5.2B [g;graduaiisl n :!Fmemory limited] C [§memory limited].

Proof Let L m be the two-element language {I,m}, and let If' =


{LmlmEN}. Obviously 2 can be identified by a l-memory-limited
function. Suppose, however, that cp Effsradualist n §memory limited. Suppose
for simplicity that e is l-rnemory limited. Consider the texts t'" = I, m, 1, 1,
I, .... Since tp is gradualist and cpetf') = cp(t;) for all m, n E N, there are m
and n, m oF n, such that cp(ti) = cp(t2). Then, since cp is l-memory limited
and t::' = t: for all k > I, cp converges to the same index on t" and t". But
then cp does not identify both L m and Ln. 0
78 Identification Generalized

Exercise

4.5.2A cp E:F is said to be n-gradualist just in case for all 0' E SEQ,
I{<p(<r A X)IXEN)I,; n. Note that as a corollary to the proof of proposition
4.5.2A, [:F3-gradualist] = [:F] and that [y;3- lIradualisl n .JFrec] = [:F uc ].
Let n, mEN be given. Let!£ ~ RE be as defined in the proof of proposition 4.5.2B.
Prove: If <p E g;m.gradulllist n !F"-memory limited, then <p can identify no more than (2m)"+ 1
languages in 2.

4.5.3 Induction by Enumeration

One strategy for generating conjectures is to choose the first grammar in


some list of grammars that is consistent with the data seen so far. '

DEFINITION 4.5.3A (Gold 1967) <p E.'F is said to be an enumerator just in


case there is total f E.'F such that for all a E SEQ, <peer) = f(i), where i is the
least number such that mgto) <;; I-Ij'i); in this case f is called the enumerating
function for <po

The function defined in part c of example 1.3.4B uses induction by enumer-


ation; the enumerating function for h is the identity function.
Induction by enumeration constraints the succession of hypotheses
emitted by a learner. This constraint is restrictive, but not for RE s vt -
PROPOSITION 4.5.3A
i. [ffenumerator] C [.¥].
ii. RE s VI E [jFenumerator]svt.

Proof

i. Let L. = {xix <: n}, and let 2' = {L.lnEN}. 2' can certainly be identi-
fied; in fact there is a recursive function that identifies z". Suppose, however,
that <p is an enumerator with enumerating function f. Were <p to identity 2',
rng(f) must contain indexes for each L•. Thus there would be i < j such
that I-Ij'il ::> I-IjU) and f(i) and f(j) are the least indexes in rng(f) for I-Ij'i)
and I-Ij(j). But then <p must, on any text for I-Ij(j)' conjecture I-Ij'kl for some
k S; i and so does not identify I-Ij(j).
ii. The function h of proposition 1.4.3B that identifies REO', is an enumerator
with enumerating function f(x) = x. 0
Strategies 79

COROLLARY 4.5.3A [ffenUme'alOT nffTeC] C [ffTec].

Examination of the proof of proposition 4.5.3A(ii) leads naturally to the


following result.

PROPOSITlON 4.5.3B (Gold 1967) Let!f <;:;: RE sv! be r.e, indexable. Then
!f E [ffTeC n ffenUmeTalOT]sYl'

Exercises
4.5.3A Prove that for some h e §'«. [§h-time n §consistent n §conse,vat;ve n
§pn... denl] 1=. [§enUmc:riIlor].

4.5.38 Total f E§ is called strict just in case j ¢ j implies l-lj(iJ ¢ HjU), for all i,
jEN. cpEyenUmerllO' is called strict just in case cp's enumerating function is strict.
Prove that [~st!'iC' enlimeralOr] == [ii""enumeralorJ

4.5.3C Prove proposition 4.5.3B.

*4.5.4 Caution

Conservative learners do not overgeneralize on languages they do in fact


identify, since once a conservative learner overgeneralizes it is trapped in
that conjecture (see part b of exercise 4.5.1B). However, a conservative
learner may well overgeneralize on a language it does not identify. We now
examine learning functions that behave as if they never overgeneralize.

DEFINITlON 4.5.4A r.p E:Y is called cautious just in case for all (T, r E SEQ,
Wtp(<1 ~ t) is not a proper subset of Wtp(<1)'

Thus a cautious learner never conjectures a language that will be "cut back"
to a smaller language by a later conjecture. Both the function f defined in
Example 1.3.4B (part a) and the function g defined in the proof of propo-
sition 1.4.3B are cautious.
Caution is an admirable policy. A text presents no information allowing
the learner to realize that it has overgeneralized; consequently the need to
cut back a conjectured language could result only from a prior miscalcu-
lation. These considerations suggest that caution is not restrictive.
PROPOSITlON 4.5.4A [ffCaUlious] = [ff].
80 Identification Generalized

Proof The function f defined in the proof of proposition 4.5.1 A is cau-


tious. For if l1}(<1" f) i:- l1}(<1)' then l1}r<J" f) 2 rngfe A r), but l1}(<J) ';/2. mglc A r)
since conjectures are only abandoned by f if they do not include the input.
Thus l1}r(1) 1> l1}(<J" fl' 0
As in the cases of consistency and conservatism, the calculations required
for a cautious learning policy sometimes exceed the capacities of comput-
able functions.
PROPOSITION 4.5.4B [.~rec n ffcaUlious] C [ffrecl

Proof Again, the class:e oflanguages in the proof of proposition 4.4.2A is


the desired example of a class .It? E[ff rcc ] that cannot be identified by
<p Effrec n ffcautiou., For if C{>j identifies t = (j, 0), <). I), ... , as before, C{>j
must conjecture some i such that C{>j(t.) = i, ~.• ~ rng(t.), where Lj =
rng(t.)E:e. But then C{>j on text t = (j,O), (i,l), . . . , (i, n), (j, n), ... , can
never later conjecture any L c ~. However, Lj c ~, 0

Exercises
4.5.4A Prove that [ffree nffcon,crv.,;ve n g eautiou,] = [gree n geon,.,va,ive].
4.5.48 Prove that [,jO'" n ,jOeau'iou,] 'it [,jOree n ,jOeon..,va'ive].
(Hint: See the proof of proposition 4.5.1B.)

*4.5.5 Decisiveness

Let C{> E ff be a strict enumerator in the sense of exercise 4.5.3B. Then C{>
never returns to a conjectured language once abandoned. The next defi-
nition isolates those learning functions whose successive conjectures meet
this condition.
DEFINITION 4.5.5A C{> Eff is called decisive just in case for all (J E SEQ, if
Wip(<T-) '1= W<p(<T)'then there is no r E SEQ such that W<p(<T" r] = Wq>(<T-j'
Both the function defined in example 1.3.48 (part a) and the function g
defined in the proof of proposition 1.4.3B are decisive.
Like caution, decisiveness appears to be a sensible strategy. It is not
restricti ve.
Strategies 81

PROPOSITION 4.5.5A [y;:-decisive] = [ff].

Proof The function f defined in the proof of proposition 4.5.1A is deci-


sive, as was remarked in the proof immediately following the definition of f.
There is a general fact here of note: conservative, consistent learners are
also decisive (see exercise 4.5.50). 0

The next result shows that decisiveness does not restrict fF'" in the
context of RE5vt .

PROPOSITION 4.5.5B (Gisela Schafer) [!F rec ngdecisivelvt = [grec]svt_

Proof (proof due to Gisela Schafer) Suppose that OEfF'" identifies


.P E REsvt. We will define l/t E yrec nffdecisive which identifies .P. It is easy
to see that we need only define i/J on sequences o of the form a ~
«O,xo),(I,x 1 ) , ••• ,(n,x.» (cf exercise 4.2.1C). We may also suppose
that the conjectures of 0 are indexes of partial recursive functions rather
than indexes for r.e. sets. This is because forj E N wecan effectivelycompute
an index i such that if Hj represents a partial recursive function, U:J repre-
sents 'P,. Define for each i, 'P,[n] ~ «0, 'P,(O»,... , (n, 'P,(n»). Suppose
then that a = «0, xo),"" (n, x.») and that O(u) = i. Let k = I +
max{mIO(ifm) # i}. Define a recursive h by

Xj ' x:=s; k,
'Ph".k'(X) = 'P,(x), if x> k, for all y :=s; k, 'P,(y) = x y , and O('P,[x]) ~ i,
{
diverges. otherwise.

Now define i/J(u) = h(i,k).


Informally, if 0 appears to be converging after the first k elements of input
to qJjo then qJh(i,k) is defined to agree with the input through k elements and.
provided that 'P, agrees with the first k input elements, 'Ph"," is also defined
to agree with '1', through the longest initial segment such that '1', is defined
and 0 appears to converge to i on that initial segment of '1'"
It is clear that i/J(u-) # i/J(u) if and only if O(u-) # O(u). Further, if 0
converges on an increasing text for 'P, to i and '1', is total, then i/J converges
on t to h(i,k), where k = max {mIO(tm ) # i}. Also in this case h(i,k) is an
index for qJj. Thus IjJ identifies at least as many total functions as does O.
To show that i/J is decisive, suppose that a ~ «O,xo), ... ,(O,x.» and
that i/J(u) # i/J(u-). Suppose that i/J(u) = h(i, k) and i/J(u-) = h(i', k'). Note
that k' = n. There are two cases.
82 Identification Generalized

Case 1. Suppose that n is not in the domain of 'l'h".'" Then "'(u A r) is not
an index for Cf\(u-) for any T, since the domain of CPlf(tT" r) ;::2 {O,l, 2, ... , n} for
all r by the first clause in the defintion of h(i, k).
Case 2. n is in the domain of ((Jh(i,k)' Then O(CPh(i,kj[n]) = i, and so, in
particular, a '" 'l'h"..,[n]. But '1'",. A,j extends o for all r ESEQ, again by the
first clause in the definition of h(i, k). Thus in this case also, "'(u A r) is not an
index for CfJli,(,,-j" 0

Whether decisiveness restricts :!Free in the general case is not known.

Open question 4.5.5A [y;rec n ?decisivc] = [gree]?

Exercises

4.5.5A <p E.9' is called weakly decisive just in case for all o E SEQ, if <p(u-) '" <p(u),
then thereis no! E SEQ such that <p(a A r] = <p(q-); that is, weaklydecisivelearning
functions never repeat a conjecture once abandoned. Prove that [yrec n
jilwcakl y decisive] = [~rec].

4.5.?B Prove that [.¥cnumeraIOT] c: [ffdCCiSivCJ.

4.5.5C Prove that [yrccn,?"conservalivC] C [jFrcCnjFdCciS;VC].

4.5.5D Prove that .~consislenl n jFconscrv8t;VC C .9"dccis;vc.

4.6 Constraints on Convergence

If <p E:F identifies 2' ERE, then 'I' must converge to some index for mg(t) on
every text t for L E!.e. <p mayor may not converge to the same index on
different texts for the same language in 2, and q> mayor may not converge
on texts for languages outside of !E. In this section we consider three
constraints on convergence that limit the freedom of learning functions in
these ways.

4.6.1 Reliability
A learner that occasionally converges to an incorrect language may be
termed "unreliable."
Strategies 83

DEFINITION 4.6.IA (Minicozzi, cited in Blum and Blum 1975) 'PEg; is


called reliablejust in case (i) 'P is total, and (ii)for all t E fY,if 'P converges on t,
then 'P identifies t.

Example 4.6.1A

a. The function f in example 1.3.4B (part a) is reliable, for f identifies every text for a
finite language and fails to converge on any text for an infinite language.
b. The function f defined in the proof of proposition 2.3A is not reliable, for if n is an
index for N. then f converges on the text n, n, n, ... , but fails to identify it.

Reliability is a useful property of learning functions. A reliable learner


never fails to signal the inaccuracy of a previous conjecture. To explain, let
f Eg; be reliable, let t be a text for some language, and suppose that for some
i, n E N, fet.) = i. If W, # rng(t), that is, ifi is incorrect, then for some m > n,
f(tm ) # i (otherwise, f converges on t to the incorrect index i, contradicting
J's reliability). The new index f(tm ) signals the incorrectness of i. It might
thus be hoped that every identifiable collection oflanguages is identified by
a reliable learning function. It might also be conjectured that children
implement reliable learning functions on the assumption that any text for a
nonnatural language would lead a child to search ceaselessly for a success-
ful grammar, ever elusive. In view of these considerations it is interesting to
learn that reliability is a debilitating constraint on learning functions.
PROPOSITION 4.6.1A Let cpEy;reliable identify LERE. Then L is finite.

Proof This is a straightforward locking sequence argument. Suppose that


cP E y;reliable identifies L; let a be a locking sequence for cP and L. Then, if
t = a A 0"0 A 0"0 A •..• cP converges on t to an index for L. Thus L =
rng(t) = rnglo) which is finite. 0

COROLLARV 4.6.1A [g;nH'bl'] C [g;].

The next definition relativizes reliability to RE",.


DEFINITION 4.6.IB (Minicozzi, cited in Blum and Blum 1975) 'PEg; is
called reliable-svt just in case (i) 'P is total, and (ii) for all texts t for any
L ERE,,,, if 'P converges on t then 'P identifies t.
84 Identification Generalized

Since REsvt is identifiable (proposition 1.4.3C), RESYl is identified by a


learning function that is (somewhat vacuously) reliable-svt. The interaction
of ~Iel;able.svi and ~re. is more interesting, as revealed by the following
results.

PROPOSITION 4.6.1B (Minicozzi, cited in Blum and Blum 1975) Let 2,


2' E [so,e. n soreliable-<YtJsvl be given. Then 2 U 2' E [~Ie. n soreUable-svtJsVl'

That is, [~re. n ~reuable-svlJ.YI is closed under finite union (cf.exercise 2.20).

Proof of proposition 4.6.1 B We give an informal proof and invite the


reader to formalize it. Suppose that ljI and ljI' E ~.e. nsoreliable-SYl identify 2
and 2' in REsy! respectively. Then we define cp Eff r ec n ~rcuable.5Yt as fol-
lows. On text t, cp outputs the conjectures of I/J until if; changes its mind.
Then qJ outputs the conjectures of if;' until it changes its mind, and so forth.
If t is a text for a language L' E Y' which if; does not identify, then qJ will
always abandon its I/J-like conjectures for the eventually stable ljI' conjec-
tures. Similarly for LEY. Should t be a text for a language that is not in
!fJ U if', then cp will change its mind infinitely often. 0

Now recall definition 1.2.20 of "S £ N represents T s: N 2 ."


DEFINITION 4.6.1C

i. qJ E:F is said to be almost everywhere zero just in case qJ(x) = 0 for all but
finitely many x E N. The collection {L E REsvt IL represents a function that is
almost everywhere zero} is denoted: REaez.
ii. cp E ffree is called self-indexing just in case the smallest x E N such that
cp(x) = 1 is an index for tp. The collection {LEREsy,IL represents a self-
indexing function} is denoted: REs;.

PROPOSITION 4.6.1C (Blum and Blum 1975)

i. REs; E [:F.eeJsVI'
ii. REaez E [ff,ee n ff,eLiabLe-.vtJovl'
iii. REs; U RE ae,. i [ff,eeJSYI'

Proof For i and ii, the obvious methods for identifying REs; and REaez
work.
For iii, suppose that I/J identifies REaez. We will define a recursive function
f by the recursion theorem such that f is self-indexing and if L represents f,
then I/J does not identify L. To apply the recursion theorem, we define a total
Strategies 85

recursive function h by the following algorithm:


<p"ilx) = 0, if x < i,

<P'u,(i) = I.

If x> i, and <p"ilY) has been defined for all Y < x, define <p",,(x) as follows.
For every integer n define 0'" = «0,0), (1,0), ... , (i - 1,0), (i, I), (i + 1,
<p,,,,(i + 1), ... , (x - 1, <p,,,,(x - 1), (x,O), ..• , (x + n, 0». Enumerate
simultaneously W,.,(ao)l W,.,(crl), ••• , W,.,(l1") for increasing n until a pair (x +
M + 1, 0) appears in W.,.") for some M. Then define <p,,,,(x) = ... =
°
<P"i'(X + M) = and <p,,,,(x + M + 1) = I. Such an n will exist, since the
sequences 0'0, 0'1, 0'2, ••. , are initial segments of a text for a function in
REm. Thus <P'u, is total for every i.
Let i' be such that <p"") = <p,.. By the definition of <p'O')' <p,. = <P"i') is
self-indexing. But there are infinitely many x such that for the correspond-
ing M and a", i/J(a M ) is an index for a set that does not represent <Pi' since
<p,O')(x + M + 1) = <P..(x + M + 1) = 1, but (x + M + 1, 0) E W.,.",.
These aM are initial segments of the same text t for <p'" so i/J does not
converge on a text for <Pi" 0
Thus [.9'"''']." is not closed under finite union (cf. exercise 2.2D). The
following corollaries are immediate from the two preceding propositions.
COROLLARY 4.6.1B REsi ¢ [.'Free n .'Freliable-svt]svt.

COROLLARY 4.6.1C [.'F.rec n ffreliable-SVI]sVI C [§rec]svt"

Exercises
4.6.1A cp E!F is called weakly reliable just in case for all texts t for any L ERE. if cp
converges on t then cp identifies t. (Thus weakly reliable learning functions need not
be total). Prove the following strengthened version of proposition 4.6.1A: let
cP E !Fweakly reliable identify L E RE. Then L is finite.

4.6.1B(Blum and Blum 1975) Let total! E.9'"'" be given. Suppose that for all j EN,
In . E !Free n oz-reliable-svl
't"f(l) 0.7".
Show that U.!feN [(m.
't"f(l)
)] svt E [!F rec n !Freliable-svt] svr- This
result generalizes proposition 4.6.1B.

*4.6.1CcpE:F is called finite-difference reliable just in case for all texts t for any
L ERE, if tp converges on t, then cp converges to a finite variant of rng(t) (see
definition 2.3A). Thus cp E:F is finite-difference reliable just in case cp never con-
verges to a conjecture that is "infinitely wrong." Reliability is a special case of finite-
86 Identification Generalized

difference reliability. Prove the following strengthened version of proposition


4.6.1A: let finite-difference reliable cp E fF identify L ERE. Then L is finite.

4.6.2 CoafJdellCe
A learner that converges on every text may be termed "confident."

DEFINITION 4.6.2A '" E Y is called confiden: just in case for all t E:T, '"
converges on t.

Thus confidence is the mirror image of reliability.

Example 4.6.2A

a. The function f defined in the proof of proposition 2.3A is confident.


b. Neither the function f defined in example 1.3.4B (part a) nor the function g
defined in theproofof proposition1.4.3B isconfident sinceneither converges on any
text for N.

Children are confident learners if they eventually settle for some approxi-
mation to input, nonnatural languages.
PROPOSITION 4.6.2A [jFconfident] C [.9'].

Proof RE,,"E[J]. Suppose that 'PEJ identifies RE,,", We construct


sequences (J0, ai, .. " such that <.p does not converge on (f0 A (11 t\ "',
demonstrating that q> ~§confidenl. Let (i0 be the shortest sequence of zeros
such that 'P«10) is an index for (0). Since 'P identifies (0), (10 exists, Now let
(1' be the shortest sequence of ones such that <p(10 A (1') is an index for

(0, I). Given (1"-', let (1" be the shortest sequence of n's such that
<p(cr O A '" A (in) is an index for {O, 1•...• n}. Obviously <p does not converge
on (i0 A ••• A (in A •••• 0

The next proposition shows that confidence and :Free restrict each other.
First, we prove a lemma.

LEMMA 4.6.2A Let q> E yconfident be given. Then for every L ERE, there is
(1 ESEQ such that (i) rng(o) £; L, and (ii)for all r ESEQ such that rng(r) £; 1.,

'P«1 A r) = <p«1).
Strategies 87

Proof This is much like the proof of proposition 2.IA, the locking se-
quence lemma. If such a (J did not exist. we could construct a text t for L on
which <p does not converge. contradicting its confidence. 0
PROPOSITION 4.6.2B [ff TC C n ffconfidcnl] c: [~rcc] n [:¥"confidcnl].
Proof ..\f={KU{x)lxEK} is the needed collection. We have noted
before that ..\fE['?"'''] (see exercise 4.2.1A). The following defines
f E [ffconfidcnt] which identifies 2:
index for K, if rng(u) c: K,
(o) = {
f index for KU {x}, if x is the least element of rng(u) - K.
Suppose however that <p E ffrcc nffconfidcnl and identifies if. By lemma
4.6.2A there is a sequence (J such that mg«(J) c K and if rng(r) c; K, cp{u" r] =
'I'(u). But then, much as in the proof of lemma 4.2.1C, we now have a way
of enumerating K, for, x E K if and only if there is a sequence t such that
mg(r) c: K and 'I'(u A x A r) "" 'I'(u). 0

Exercises

4.6.2A Recall the definition of !t' x:1" from exercise 1.4.3F. Prove: Let
Sf e [ffconfidcntJ and!£' E [ffconfidcnt] be given. Then, !£ x se e [ffconfidcntJ
4.6.2B if s; RE is called a w.c. chain just in case if is well ordered by inclusion.
a. Exhibit an infinite. w.o. chain in [yrcc].
b. Prove: If !£ s; RE is an infinite w.o. chain, then!£ ¢ [ffconfidcnt].
c. qJ E ff is called conjecture bounded (or cb)just in case for every t E:T, {qJ(Tm)lm E N}
is finite. Thus cp E g;Cbjust in case no text leads cp to produce conjectures of arbitrary
size. Prove: Let 2 be an infinite w.o. chain, theil'7t'l"[ff cb].
d. Prove that rg;-confidcnt] = [§cb].
4.6.2C

a. 2 s; RE is called maximal just in case 2 E [jl']. and there is Le RE such that


2 U {L} ¢ [ff]. To illustrate. proposition 2.2A(ii) shows that the collection {N -
{X}IXEN} is maximal. (Compare the definition of "saturated" given in exercise
2.2E.) Prove that if if ~ RE is maximal, then 2 ¢ [ffCOnfidcnt]. Obtain proposition
4.6.2A as a corollary to this result.
b. Prove: Let 2 E [ffCOnridCnt] and 2' E [yconfidcntJ be given. Then
2' U 2" e [SZ'"confident]. Obtain part a of the present exercise as a corollary to this
result.
88 Identification Generalized

4.6.3 Order Independence

As a final constraint on convergence we consider learning functions that


converge to the same index on every text for a language they identify.

DEFINITION 4.6.3A (Blum and Blum 1975) cp E ff is called order indepen-


dent just in case for all L ERE, if cp identifies L, then there is i E N such that
for all texts t for L, cp converges on t to i.

Thus an order-independent learning function is relatively insensitive to the


choice of text for a language it identifies: any such text eventually leads it to
the same index (even though such behavior is not required by the definition
of identification). Note that different texts for the same identified language
may cause an order-independent learning function to consume different
amounts of input before convergence begins (just as for order-dependent
learning functions).

Example 4.6.3A

a. Both the function! defined in example I.3.4B, part b, and the function g defined in
the proof of proposition 1A.3B are order independent.
b. The function f defined in the proof of proposition 2.3A is order independent.
c. The function f in the proof of proposition 4.4.1A is order independent.

It is easy to see that order independence is not restrictive. The relation of


order independence to ffrec is a more delicate matter; the following con-
sideration suggests that it is restrictive. An order-independent learning
policy seems to require the ability to determine the equivalence of distinct
indexes. But the equivalence question cannot in general be answered by a
computational process; indeed, the set {(i,j) I JV; = Wi} is not even r.e. (see
Rogers 1967, sec. 5.2). Contrary to expectation, however, order indepen-
dence turns out not to restrict ;Free.

PROPOSITION 4.6.3A (Blum and Blum 1975) [ff re e nffo rde, independent] =
[;Fr«].

The proof of proposition 4.6.3A depends on a very important construc-


tion due to Blum and Blum. We will first give this construction and then
derive from it a corollary concerning classes .P E [ff,ec). We will then use
Strategies 89

this corollary to establish proposition 4.6.3A. Following that, we will


establish lemmas 4.3.4A and 4.3.4B and thereby Fulk's proposition that
prudence does not restrict recursive learning functions (proposition 4.3.4A).

Construction (the locking-sequence-hunting construction) Suppose that


<p E :7"" identifies !I!. Now by lemma 4.2.2B there is a total recursive
function f such that f identifies!l!. By proposition 2.IA, for every LE!I!
there is a locking sequence a for f and L We will construct gE :7"" so that
on any text t for L, 9 converges to f(a), where a is the least locking sequence
for f and L (Recall that sequences are identified with natural numbers so
that the terminology "least locking sequence" is appropriate.)
Given T, let a be the least sequence such that a ,,; Tand
I. rng(a) S rngjr),
2. for all y ,,; Tsuch that

a. rng(y) S rngtr),
b. a £ y,
c. lh(y) ,,; Ih(T),

f(a) = fly)·

a exists since Titself satisfies I and 2. Define 9 (T) ~ f(a). 9 is recursive since
only finitely many sequences need be checked to define g(T).
Claim If f identifies Land t is a text for L, then 9 converges on t to f(a),
where a is the least locking sequence for f and L.
Proof of claim Let a be the least locking sequence for f and L, and let n be
such that rng(a) S rng(I.) and a ,,; I•. Since a is a locking sequence for L
and t is a text for L, it is clear that for every m ~ n, a satisfies both I and 2 for
T = 1m • Thus for each m ::>: n, g(lm ) = f(a) unless there is a' < a such that a'
also satisfies 1 and 2 for T = I",. Since no such a' can be a locking sequence
for f and L, there must be a y such that y :2 a', rng(y) S L, and fly) '" f(a').
(Otherwise, either a' would be a locking sequence for f and L or f would
converge on some text for L to an index, f(a'), for a language other than
L) But then if m is such that rng(y) S 1m and y ,,; 1m , a' cannot satisfy I
and 2 with T = 1m • Thus, for almost all m, 9 cannot conjecture f(a') for any
a' < a. 0

COROLLARY 4.6.3A For every !I! E [:7""J, thereis a gE :7"" that identifies
!I! such that for all L ERE, 9 identifies L if and only if there is a locking
90 Identification Generalized

sequence for 9 and L. Furthermore, if 9 identifies L, 9 converges to g(a),


where a is the least locking sequence for 9 and L.

To prove the corollary, we need to modify the construction slightly.

Proof of corollary 4.6.3A Let p be a recursive padding function as


supplied by lemma 4.4.1B. In the construction g(r) was defined to be equal
to f(a} for some a. Modify the construction so that g(r) = p(f(a}, a}.
Now if 9 identifies L ERE, there is a locking sequence for 9 and L; this is
just proposition 2.lA. So suppose conversely that L ERE and that a is a
locking sequence for 9 and L. This means that there is a a' ~ a such that
g(a} = p(f(a'), a'), and furthermore for all y such that a £ y and rng(y) £ L,
g(y) = p(f(a'), a'}. Now suppose that t is any text for L. Let n be such that
rng(a) £; t. and a ~ t •. Then g(t.) = p(f(a'), a') since a' satisfies 1 and 2
for t = t., and if a" < a' satisfies 1 and 2, then a" would satisfy 1 and 2 for
r = a also, contradicting g(a) = p(f(a'), a'). Thus on t, 9 converges to
p(f(a'), a'). 0

Proof of proposition 4.6.3A Let Sf E §'reo and 9 be as in the proof


of the corollary. Then if 9 identifies L, 9 converges on every text t for L
to g(u), where a is the least locking sequence for 9 and L. Thus 9 is order
independent. 0

We may now return to lemmata 4.3.4A and 4.3.4B whose proofs were
deferred to this section.

LEMMA 4.3.4A [yrec] is r.e. bounded if and only if [Y>OC] =


[§'reo n §'prudenLl
Proof Suppose first that [ff reo] = [ff rec n ffprudenl]. Let 2 E [ffreo]. Let
tp Eff reo n ffprudenl identify 2. Then rngte) is an r.e. set S since tp is
recursive. Since tp is prudent, tp identifies 2' = {J.V;li ES}. 2' is r.e. index-
able and so witnesses that [ff reo n ffprudent] is r.e. bounded.
Suppose, on the other hand, that g;reo is r.e. bounded. Let 2Eg;rec and
let 2" 2 fE be such that some tp E ffreo identifies 2 ' and 2 ' is r.e. indexable,
say by the r.e. set S. By proposition 4.6.3A, let 9 be a total order-independent
recursive function that identifies fE" :::::> fE'. We show how to construct a
prudent f that identifies fE' (and hence fE). Let So, S1> ... , be a recursive
enumeration of S. If Ih(a) = 1, define f((J) = so' If n = Ih(a) > 1, for j ~ n,
let a' be a sequence constructed from the elements that have been
enumerated in ~, by stage n in order of enumeration. Then define
Strategies 91

if i is least such that g(cr') = g(cr),


if there is no such i.

Since g is order independent and identifies :e', f will converge on any text t
for L E:e' to s.for the least i such that s, is an index for L. Since every index
in S is an index for a language in:e' and f outputs only indexes from S, f is
prudent. 0

LEMMA 4.3.4B (Mark Fulk) ff'" is r.e. bounded.

Proof (Mark Fulkj Let:e be in [ff"']. By corollary 4.6.3A,:e is identifi-


able by some g such that g identifies L if and only if there is a locking
sequence for g and L. We now give an r.e. indexable collection z" such that
:e' E [ff"'] and :e' :2 :e thereby exhibiting that [ff"'] is r.e. bounded.
There are two cases.

Case 1. Suppose that g identifies N.


Define a recursive function f by

'*'
if rngte) w',a,'
if a is a locking sequence for g and w',a"
otherwise.

To see that f defined in this way is recursive. we argue informally. Given a.


to enumerate »)-(G)' compute g(u) and enumerate nothing in »)-(G) until a
stage s such that W"aj., contains all ofrng(cr). Then begin enumerating all of
w',a, into l-!f,a) until there is a sequence y such that y :2 rr, rng(y) s; w',a, and
g(yj of g(cr). If such a y exists, then begin enumerating all of N into l-!f,a,'
Since f is recursive, S = {f(crjlcrESEQ} is r.e. On the other hand, since 0
and N are identified by g and since g identifies every L such that there is a
locking sequence for 9 and L, 9 identifies every language with an index in S.
Case 2. g does not identify N.
Define a recursive function f by

0, if rngjc) '*' w"a)'


~(G)' if (J is a locking sequence for 9 and ~(G)'
MJ(G) = {D,l, ... .j'}, where y is the maximum element enumerated in
MJ(G) when it is discovered that o is not a

locking sequence for ~(G)'

Again, it is easy to give an informal description of the algorithm for


enumerating MJ(G) given a.
92 Identification Generalized

The set S = {f(o")I<1ESEQ} is an r.e, index set for some collection of


languages,!l" Let r bea total recursive function such that for every O"E SEQ,
J¥,.(<7) = rngtc), To see that 2' E [,?"rec], define hE.?Fr<c as follows:

h(O") = {r(O"), if rngte) is an initial segment of N,


g(O"), otherwise.

Since 9 does not identify N, h identifies all that 9 does together with all
initial segments of N (cf. exercise 4.2.1H). 0

Exercises
4.6.3A Show that [jOrec n yorder Independent n sP] = [g;ro. n sP] as sP varies over
the following strategies:
a. Nontriviality
b. Prudence
c. Consistency
d. Memory limitation
e. Confidence
*4.6~3B (Gisela Schafer) rp E y is said to be partly set driven just in case for all (J,
"tESEQ, if Ih(a) = Ih(r) and rng(a) = rngfr), then rp(a) = rp(r). Prove that
[§rcc n ~par'IY let driven] = [§Cc-C].

4.6.3C t, t' E:T are said to be cousins just in case


a. rng(t) = rngjr'),
b. there are n, mEN such that t l +m = rf+" for all j E N.
ip E g;is called monotonic just in case for all t, t' E:T, if t and t' are cousins, and rp
identifies t, then rp identifies t'.
Prove that [$OmOnO'OniC] = [Y]. (Hint: See exercise 4.4.2A.)
Open question 4.6.3A [yr•• n $OmonOIOoie] = [jOr.c]?

*4.7 Local and Nonlocal Strategies

There are an overwhelming number of potential learning strategies. As a


consequence classificatory schemes are needed to suggest general prop-
erties of large classes of strategies. Two classificatory principles have
already been advanced in the discussions of countable strategies (section
4.1) and r.e. bounded strategies (definition 4.3.4B). In addition exercise 4.7C
Strategies 93

defines a classification of subsets of ;Free.The classification provided by the


titles of sections 4.2 through 4.6 might also serve as the beginning of a
classificatory scheme, if it could be rendered formally precise. In the present
section we offer yet another classificatory principle.
Compare the strategies of consistency (definition 4.3.3A) and confidence
(definition 4.6.2A). Intuitively membership in consistency can be deter-
mined by examining a function's behavior in many "small" situations;
specifically, it is sufficient to determine whether rng(o) s;;; W",(C1) for every
0' E SEQ. Since SEQ is infinite, there are infinitely many situations of this

nature to check; nonetheless, each such situation is "small" because each


0' E SEQ is finite. In contrast, this kind ofchecking is useless for determining

membership in confidence. Rather, determination of confidence requires


examining a function's behavior on entire texts in order to verify conver-
gence. In this sense consistency, but not confidence, may be termed a "local"
strategy. Looked at from another perspective, a learner can "decide" to
embody a given, local strategy by pursuing a policy bearing on small
situations. In contrast, to embody a nonlocal strategy, the learner must
arrange his or her local behavior in such a way as to conform to a more
global criterion.
We now make this precise.

DEFINITION 4.7A

i, The set {fP e;Flthe domain of cp is finite} is denoted: j"fin.


ii. Let learning strategy Y be given. Y is called local just in case there
is a subset F of ;Ffin such that for all qJ E/F, qJ E Y if and only if
{IjI E/Ffinll/t s; qJ} S;;; F.
The subset F of /Ffin in definition 4.7A(ii) represents the set of local
examinations that enforce membership in Y.

LEMMA4.7A
i. There are 2t{o many local strategies.
ii. There are 22t-:o many strategies that are not local.

Proof
i. There are t{o many functions in /F fin. Thus there are 2~o many subsets F of
:!F rin. The local strategies are in one-to-one correspondence with such
subsets.
94 ldentification Generalized

ii. There are 2N o many functions in:F so 2 zt{o many subsets of ff. Thus there
are 2 2 1':0 - 2N o = 2 2 No many nonlocal strategies. 0

Thus "most" learning strategies are not local.

PROPOSITION 4.7 A The following learning strategies are local:

i. Nontriviality
ii. Consistency
iii. l-mernory limitation
iv. Conservativism

Proof These are very easy. We will just say how to choose the F in the
definition oflocality.
i. F = !Ffin n§""ontrivial
ii. F ~ {rpldomain of 'I' is finite andif '1'(0") converges, then rngto) s:; W.,.,}.
iii. F = !Ffin ng;1-memor y limited.
iv. F = yfin n§conservativc. 0

PROPOSITION 4.78 The following learning strategies are not local:


i. Computability
ii. Prudence
iii. Reliability
iv. Confidence
v. Order independence
Proof The key property of each of these strategies !I' is that for every
cp E jFfin, there is a sp' E Y such that cp £; cp',
Thus for each of the strategies listed, were it to be local, the set F would
equal all of .9'"fin. But this would imply that !I' = .9'" which we know to be
false for all of the strategies listed. 0

Exercises

4.7A Show that memory limitation is not a local strategy.


4.7B Classify the remaining learning strategies discussed in sections 4.2 through
4.6 in terms of locality.
4.7C Let [/ s; jFrec be given. f/' is said to be r.e. indexable just in case!/' =
{cpjlje JrV;} for some j e N. .5I' is said to have an r.e. core just in case there is.5l" £; !/'
Strategies 95

such that Y" is r.e. indexable and [Y"] = [5"], A strategy without r.e, core may be
considered intrinsically complex, in a certain sense.
a. Show that ipo< n ylOlal has an r.e. core. (Hint: See proposition 4.2.2A.) Conclude
that there are non-r.e. indexable strategies with r.e. cores.
b. Show that not every 5" 5; /Fr ec has an r.e. core. (Hint: Consider /Fro< n ynontriviat;
see section 4.3.2.)
5 Environments

5.1 Order and Content in Natural Environments

The identification paradigm offers a specific hypothesis about natural


linguistic environments. According to this hypothesis children are typicalIy
exposed to an arbitrary ordering of the target language, an ordering that
includes every grammatical string ofthe language and is free of ungramma-
tical intrusion. Alternatively, the identification paradigm may be construed
as advancing a claim about the class of linguistic environments that are
sufficient for (natural) language acquisition. According to this claim chil-
dren can acquire a natural language Lin any environment that constitutes a
text for L; if desired, the claim may be strengthened by the assertion that no
other environment is sufficient in this sense. As noted in section 3.2.3, such
construals of a learning paradigm require an independent definition of
natural language, on pain of circularity; one possibility is to qualify a
language as natural just in case it can be learned on any text at all.
Taken either way, the class of all texts for a language is a questionable
representation of the possible environments for that language. On the one
hand, children likely face ungrammatical intrusions into the ambient lan-
guage as well as the omission therefrom of grammatical sentences; minor
perturbations of this kind would not be expected to influence the outcome
oflinguistic development. On the other hand, it is unlikely that children face
an entirely arbitrary order of input sentences. In this chapter we examine
several construals of environment that begin to respond to these difficulties.
As a preliminary it will be useful to expand somewhat our conception of
text. This is the topic of the next section.

5.2 Texts with Blanks

It is tempting to conceive of the length of a finite sequence as a temporal


interval measured in discrete, standardized units; a sequence of length n
would thus represent the linguistic experience available to a child over n
such temporal units. Unfortunately the existence oflong sentences compli-
cates this conception, since many such sentences will not "fit" into a single
temporal unit (no matter what size unit is specified). There are several ways
to resolve this problem, but we will not pause to examine the matter.
Rather, our present concern is that any temporal construal of sequences
must allow for "pauses"-those moments when no sentences are presented
to the child. We shall now accommodate such pauses by incorporating
blanks into texts.
Environments 97

DEFINITION S.2A

i. We let # be a special "blank" symbol (in particular, # iN).


ii. A text # is any infinite sequence drawn from NU {#}.
iii. The set of numbers appearing in a text # t is denoted: rng(t) (thus
# If rng(t».
iv, A text# t is said to be for Le Rb just in case rng(t) = L.

Intuitively a text # for L results from inserting any number of blanks into a
text for L.
Our notation for texts may be carried over to texts #. Thus for n EN and
text # t, tnis the nth member of t (number or blank), and Tn is the sequence
determined by the first n members of t. The set of finite sequences of any
length in any text # is denoted: SEQ #. For a E SEQ #, the (unordered) set
of numbers in a is denoted: rng(cr) (thus # If rng(cr». As in section 1.3.4, we
fix upon a computable isomorphism between SEQ # and N, and denote the
code number of a E SEQ # by "o",
The notions of convergence and identification are adapted to texts # in
straightforward fashion. To be official:
DEFINITION 5.2B Let <p E g; be given, and let t be a text # .

i. <p is defined on t just in case <p(tnH for all n E N.


ii. <p converges on t to i EN just in case <p is defined on t and, for all but finitely
many n E N, <p(t.) = i.
iii. <p identifies # t just in case there is i E N such that <p converges on t to i
and rng(t) = ltj.
iv, <p identifies # L ERE just in case <p identifies every text # for L.
v. <p identifies # .:e ~ RE just in case <p identifies # every LE.:e; in this case
.:e is said to be identifiable #.
The following proposition is evident.
PROPOSITION S.2A

i. .:e ~ RE is identifiable # if and only if .:e is identifiable.


ii, Some <p E ym identifies # .:e ~ RE ifand only ifsome <p E g;ree identifies
s:
Moreover it is easy to verify that none of the propositions proved to this
point need be altered if "identify # "is subsituted for "identify" therein.
We shall hereafter restrict attention to texts #. leaving the class of texts,
98 Identification Generalized

as such, to one side. Several simplifications will thereby be realized.


To reduce notational clutter, we henceforth abbreviate "text #" and
"identify #" to "text" and "identify," respectively. All such terminology
should now be understood in the context of texts with blanks.

Exercises

5.2A
a. Sincethe empty set is recursively enumerable, 0 ERE. On the revised conception
of text, what text is for 0? Is 0e2'(q» for all ({)E:F? (For notation, see exercise
1.4.3K.l
b. Prove: For all q> E:Free, there is l/! E:Fro< such that 2'(l/!) = {0} U 2'(q».
S.2D On the revisedconception of text, how many texts are there for the language
{2}?

5.j Evidential Relations

We now present one means of generalizing the concept "text t is for


language L." This generalization gives rise to all the paradigms studied in
this chapter.

DEFINITION 5.3A
i. The class of all texts (as understood from section 5.2) is denoted Y.
ii, Y x RE (the Cartesian product of Y and RE) is the set of all ordered
pairs (t,L) such that t is a text and Lis a language. A subset of f7 x RE is
called an evidential relation.
iii. The evidential relation {(t,L)lrng(t) = L} is denoted: text.

An evidential relation is a means of specifying the environments that count


as "for" a given language. Specifically, let t! be an evidential relation, and let
LERE; then {tl(t,L)E~} is the set of environments for L, relative to S.
The evidential relation proper to the identification paradigm is text.

DEFINITION 5.3B Let evidential relation S be given.

i. qJ E ff is said to identify LE RE on C just in case for all t E {tl(t, L) E ~},


qJ converges on t to an index for L.
Environments 99

ii. 'I' E.'Y is said to identify f£' £; RE on S just in case 'I' identifies every L Ef£'
on S. In this case f£' is said to be identifiable on S.

Definition 5.3B generalizes the definitions of section 1.4.

Example S.3A

a. 5£ ~ RE is identifiable on text if and only if 5£ is identifiable.


b. Let If, be the evidential relation {(t,L)IL;> rng(t)"" 0}. Then 'I'e!F identifies
LE RE on &1 just in case cp converges to an index for Lon every text whose range is
nonempty and included in L. Plainly, if ::t' 5: RE is identifiable on &1' then 5£ is
identifiable. The converse is false.
c. Let L, L' E RE be such that L ¢ L'. Let evidential relation II be such that
(tl(', L) elf,} n ('I(t, L') elf,} "" 0. Then no 'I' E!F identifies {L, L'} on If,. For, let to
be such that (to. L)E 8 1 and (to, L')EtffI. Since L ¢ L', no CPEg; convergeson to to an
index for both languages.

Each choice of evidential relation and strategy yields a distinct learning


paradigm, with definition 5.3B providing the appropriate criterion of
successful learning.

DEFINITION 5.3C Let strategy f/ S;:!#i and evidential relation g be given.

i. The class {f£' £; Rfilsome 'I'EY' identifies f£' on S} is denoted: [Y',SJ.


ii. The class {f£' £; RE",lsome 'I'Ef£' identifies f£' on S} is denoted:
[Y',S],,,.

Thus [Y', S] is the family of all collections f£' of languages such that some
learning function in the strategy Y' identifies f£' on S. [Y', S]", is just
[Y', S] n iJI'(RE",l. For Y' £; .'Y, [Y', text] = [Y'], where [Y'] is interpreted
according to definition 4.lB. Similarly [Y', text]", = [Y'],,,. In this chapter
we consider the inclusion relations among collections of the form [Y', S],
where Y' is a strategy, and S an evidential relation.

Exercises

S.3A Let If, If' be evidential relations such that for all LeRE, {'I("L)elf} c
{t](t,L)EJ"}. Let $I' £; SF be given.

a. Show that [sP,If'J '" [sP,lf].


100 Identification Generalized

b. Show by example that [9',"'J = [9',$J is possible.


5.3B Let evidential relation Sand cpe:F be given. The collection {LERElcp
identifies Lon S} is denoted: .'l'.r(CP) (cf.exercise 1.4.3K). Show that for all 9' S 9',
[Y,lfJ = {2' £: 2'.r(cP)lcpE9'} (cf. exercise 4.1C).

5.4 Texts with Imperfect Content

In this section we consider evidential relations that distort the content of the
ambient language.

5.4.1 Noisy Text

DEFINITION 5.4.lA The environmental relation {(t, L)lfor some finite


DeN, rng(t) = LUD} is called noisy text. If (t,L)Enoisy text, then t is
called a noisy text for L.

Thus a noisy text t for a language L can be pictured as a text for L into which
any number of intrusions from a finite set have been inserted. Note that any
single such intrusion may occur infinitely often in t.

Example 5.4.1A

a. Since the empty set is finite, every text for a language Lcounts as a noisy text for L.
Consequently [9', noisy text] £: [9', text], for any strategy 9'.
b. Let L, L' E RE be finite variants such that L c L'. Then every noisy text for L' is a
noisy text for L, but not conversely.
c. Let L, L' e RE be finite variants, L #- L'. Then {tl(t, L) E noisy text} n {t I(t, L') E
noisy text} #- 0. In conjunction with part c of example 5.3A, the foregoing implies
that {L, L'}¢ [9', noisy text], and hence that [9', noisy text] c [$", text].

The following proposition sharpens the observation of Example 5.4.IA,


part c. For its proof we define a A t, a E SEQ, and t a text to be the text s
such that a is in s and for every n EN, slh(<1)+n = t.;
PROPOSITION 5A.IA Let L, L'ERE be such that Li= L' and {L,L'}E
[§", noisy text]. Then both L - L' and L' - L are infinite.
Environments 101

Proof Suppose to the contrary that 'I' identifies both Land L' on noisy text
and that L' - Lis finite. Let D = L' - L. We use exercise 5.4.1B. By that
exercise, let "E SEQ be such that (I) rngfe) ,:: LU D, (2) W.,a' = L, and (3)
for all ,E SEQ, if rngtr] ,:: LU D, then '1'(" A r) = '1'(,,). Then, if t is any text
for L', rng(t) ,:: LU D so that on "A t, 'I' converges to an index for L,
namely '1'(,,). But" A t is a text for L' Urngte) and so is a noisy text
for L', contradicting that 'I' identifies L' on noisy text. D

In fact, proposition 5.4.IA also provides a sufficient condition for mem-


bership in [ff, noisy text].

PROPOSITION 5.4.1B Let ft',:: RE be such that whenever L, L' Eft' and
L # L', then both L - L' and L' - Lare infinite. Then ft' E [ff,noisy text].
Proof Let L o, L 1 , ... , be a listing of the languages in ft' such that each
language in ft' appears in the list infinitely often. We define 9 which
identifies ft' on noisy text as follows. Define a function f by

Define g(,,) ~ least index for Lf(a,' Informally, on text t, 9 conjecture L o


until an x appears in t such that x ¢ L o. Then 9 conjectures L 1 , and so on.
lf t is a noisy text for L, there is an n such that rng(t) - rng(I,) ,:: L. Thus
for m 2: n, if 9 conjectures L on 1m , 9 will converge to an index for L. Since L
appears infinitely often in the list L o, L 1 , ... , 9 will conjecture L on 1m for
some m 2: n unless there is an i and n' 2: n such that f(lm ) = i for all m 2: n'
and L, # L. But then L - L, is infinite by the hypothesis, so there is an
m > n' such that rng(lm) - rng(I,.) 't L,. This, however, implies that
f(lm ) # f(I,.), and this is a contradiction. D
The next proposition highlights the disruptive effects of noisy text on
recursive learning functions.

PROPOSITION 5.4.le There is ft' ,:: RE such that (i) every LE ft' is infinite,
(ii) for all L. L' E ft', if L # L', then L n L' = 0, and (iii) ft' E [.~''', text] ~
[ji'rec, noisy text].

Proof For each m, nEN, define L,.m = {<n,x>lx # mi. By proposition


S.4.tA, no function identifies both Ln,m and
for m;j:. m' on noisy text.
Ln,m'
Now let h be any permutation of N, and define ft', = {L,.hI"lnEN}. It is
102 Identification Generalized

easy to see that for any h, 2'h is identifiable by a recursive learning function.
But if h # h', and if q> identifies zej, on noisy text, qJ does not identify 2"h' on
noisy text. Since there are only X o many recursive functions and 2~o
many permutations of N, there must be a permutation h such that no
recursive learning function identifies 2"h on noisy text. 0

Exercises

5.4.IA

a. Prove: Let SERE be recursive. Then {{<i,Y>liEN and YESU{x}}lxEN}E


[7 r•o, noisy text].
b. Prove: H<i,Y>liEN - {x} andYEN}lxEN} E [3"ree, noisy text].
5.4.lB
a. Prove the following generalization of proposition 2.IA. Let ip E7 identify LE RE
on noisy text. Then for every finite DeN, there is U E SEQ such that rng(a) ~ LU D,
W"la) = L, and for all tESEQ, it rngtr) ~ LUD, then ip(a /\ r) = ip(a).
b. Let ipE3" identify LERE on noisy text. Show that for every uESEQ there
is .some TESEQ such that rngfr) ~ L, W<pla"<) = L, and for every xESEQ, if
rng(x) ~ L, then ip(u r; T/\ X) = ip(u /\ r),
c. Prove: Let SERE be given. Show that H<i,y)\iEN and YESUD}ID finiteH
[$k, noisy text]. Compare this result with part b of exercise 5.4.lA.

5.4.IC Let If ~ RE contain at least two languages. Prove:


a. If <t [7 c o n f ;d e n " noisy text].
·b. 2' ¢ [3'"d.ci'i••, noisy text].
c. !t ¢ [~o:rdcr independent n~rnC'mOrYHmited, noisy text].

5.4.1 D For mEN, the evidential relation {(t, L)lfor some set D of no more than m
elements, rngfr) = LU D} is called m-noisy text. Prove:
a. Let n < m. Then [3'", m-noisy text] c [3", n-noisy text].
b. [3", noisy text] c: nmEN
[9', m-noisy text].
5.4.IE The evidential relation {(t, L)lt isa noisy text for Land {nit. ¢L} is finite} is
called intrusion text. Prove:
a. [9', intrusion text] = [.'F,noisy text].
b. [§r<e, intrusion text] = [3"fOC, noisy text].

5.4.IF 2' ~ RE is said to be saturated on noisy rexz just in case 2' E [$k, noisy text]
and for all 2" ~ RE such that 2 c 2',2' ¢ [$k, noisy text). Note that RE ti o is not
saturated on noisy text. Show that infinitely many 2 ~ RE are saturated on noisy
text. Compare this result to exercise 2.2E.
Environments 103

5.4.2 Incomplete Text

Noisy text accommodates ungrammatical intrusions into the language


presented to the child. Natural environments may also omit sentences from
the ambient language, and it is possible that the child's learning function
can identify a natural language despite the systematic omission from its
environment of any finite set of its sentences. This conjecture implies that
the structure of a natural language L is "spread out" over it and that L
includes no finite set of "key" sentences for aspects of its grammatical
organization.

DEFINITION 5.4.2A The evidential relation {(t,LJlfor some finite DeN,


rng(t) ~ L - D} is called incomplete text. If (t, L) E incomplete text, then t is
called an incomplete text for L.

Thus an incomplete text t for a language L can be pictured as a text for L


from which all occurrences of a given finite set of sentences have been
removed.

Example 5.4.2A

a. Let t E:r be the blank text(for all ne N, tn = #). Then t is an incomplete text for
Le RE if and only if L is finite.
b. More generally, let L, L' e RE be finite variants such that L c: L'. Then any text
for L is an incomplete text for L', but not conversely.
c. Since the empty set is finite, every text for a language L counts as an incomplete
text for L
d. As for noisy text, it is easy to see that if L, L' E RE are finite variants such that
L * L', then {L,L'} ¢: [9=', incomplete text]. Consequently [9=',incomplete text] c::
[9=', text].

The following proposition suggests that incompletion has less impact


than noise on the information content of a text.

PROPOSITION 5.4.2A [S',noisy text] c; [S',incomplete text].


Proof To see that [S', incomplete text] ;l [3', noisy text] notice that
the function 9 in proposition 5.4.1 B identifies each L E!f? on incomplete
text.
104 Identification Generalized

'*
To see that [§, incomplete text] [§, noisy text], let E be the set of
even integers, and let 2 = {N, E}. 2 is easily identifiable on incomplete
text, but it is not identifiable on noisy text by proposition 5A.lA. 0
The asymmetric relation between noisy and incomplete text is related to
the asymmetrical character of texts noted at the end of section 1.3.3. In
particular, noisy texts allow the intrusion of pseudolocking sequences,
which does not occur in incomplete text.
The next proposition parallels proposition 5A.IC.
PROPOSITION 5.4.2B There is 2 £ RE such that (i) every LE 2 is infinite,
0, and (iii)2 E [§,ec, text] -
(ii) for all L, L ' E 2, if L #- L'. then Ln L' =
[§rec, incomplete text].
Proof This proof is entirely analogous to that of proposition 5.4.1 C. If h
and hi are permutations of N, the collections 2 h and 2 h , defined in the proof
of that proposition cannot both be learned by the same recursive learning
function on incomplete text. Thus there are only ~o many collections
2 h E [.97"ec, incomplete text]. 0

Exercises

5.4.2A
a. Let f£ = {{ (i, Y>I yE Nand i EN - {x}} [x EN}. Specify qJ E:F such that qJ
identifies f£ on incomplete text.
b. Let SERE be given. Specify qJEff such that qJ identifies {{(i,Y>lyE SUD and
iE N} IDfinite} on incomplete text. Compare this result to part c of exercises 5A.IB.
5.4.2B Prove the following generalization of proposition 2.1A. Let qJ E:F identify
LE RE on incomplete text. Then for every finite DeN there is I:TE SEQ such that
rngto) S L - D, Wq>(~) = L, and for all "t ESEQ, if mgte) S L - D, then qJ(I:T A r) =
cp(I:T).
5.4.2C For mEN the evidential relation {(t, L)lfor some set D of no more than m
elements, rog(t) = L - D} is called m-incomplete text. Prove:
a. Let n < m. Then [S",m-incomplete text] C [ff,n-incomplete text].
b. Let f£ S RE be such that f£Erff,m-incomplete text] for all mEN. Then,
f£ E [:F, incomplete text]. Compare exercise 5.4.1D.
5.4.2D !f' s RE is called maximal o~ incomplete text just in case
f£ E [g;, incomplete text], and there is Le RE such that f£ U {L} ¢ [:F, incomplete
text] (cf. exercise 4.6.2C). Show that there are f£ S RE such that f£ is maximal on
incom plete text.
Environments 105

·5.4.3 Imperfect Text

We consider next the combined effect of intrusion and omission.

DEFINITION 5.4.3A The evidential relation {(/,LJlrng(/) is a finite variant


of L} is called imperfect text. If (t, L) E imperfect text. then I is called an
imperfect text for L.

PROPOSITION 5.4.3A [S', imperfect text] ~ [S', noisy text].

Proof The function 9 in the proof of proposition 5.4.IB also identifies .P


on imperfect text. 0

The next proposition should be compared to propositions 5.4.lC and


5.4.2B.

PROPOSITION 5.4.3B There is .P s RE sv t such that (i) for L, L' E.P, if L 'i'
L ', then Ln V = 0, and (ii) 2' E [.?'ree, text]svt - [.?'rec, imperfect textJsvl"

Proof The proofis similar to those of propositions 5.4.1C and 5.4.2B. Fix a
permutation h of N, and define In E :Free by

j,«n. h(n)) ~ I,

f.«n, x») = 0, if x 'i' h(n).

f.«m, x») ~ h(m) + 2, ifm 'i' n.

Define.Ph = {L. E RE ..,ln E Nand L. represents f.}. It is clear that ifn 'i' m,
L. n L m = 0 and that.Ph E [S".', text]. But just as in the proofs to propo-
sitions 5.4.IC and 5.4.2B, it is easy to see that if h 'i' h', then no 'PES' can
identify .Ph and .Ph' on imperfect text. The result follows. 0

We note finally that noise, incompletion, and imperfection are finite


distortions of the content of texts. It is not obvious at present how to
formulate empirically motivated evidential relations that embrace texts
with an infinite number of intrusions or omissions.

Open question 5.4.3A [S"", imperfect text] = [S'''', noisy text]?

Exercises

5.4.3A Prove the following generalization of proposition 2.1.A. Let cp E!F identify
Le RE on imperfect text. Then for every finite variant L' of L there is U E SEQ such
that rng(u} s; L', W,,(u) = L, and for all r E SEQ. if rngfr) S; L ', then cp(u 1\ t) = cp(u).
106 Identification Generalized

5.4.3B 2' £ RE is said to be saturated on imperfect text just in case (a)


2' E [g-, imperfect text], and (b) for all 2" £ RE, if.'l' c 2", then 2" ¢[g-, imperfect
text] (cf, exercise
o2.2E).
Show that there are .'l' f; RE such that 2' is saturated on
imperfect text.

5.5 Constraints on Order

Each of the evidential relations discussed in the last section enlarged the set
of texts counted as "for" a given language. The present section concerns
evidential relations that have the reverseelfect. The new evidential relations
result from constraining the order in which a language may be presented to
a learner.

5.5.1 Ascending Text


It is sometimes claimed that children typically encounter simple sentences
before complex ones and that this rough complexity ordering is essential to
language acquisition. Sentential complexity in this context is measured by
length, degree of embedding and inflection, and so forth. To begin to
examine this hypothesis from the learning-theoretic point of view, we
consider texts whose content is arranged in ascending order. (The hypothesis
is treated from the empirical point of view by Newport, Gleitman, and
Gleitman, 1977.)

DEFINITION 5.5.1 A
i. t E:T is said to be ascending just in case for all n, mEN, if tn' t m E N, and
n :s; m thenz, ::::;; t m •
ii. The evidential relation {(t, L)!rng(t) = Land t is ascending} is called
ascending text. If (t, L) E ascending text, then t is called an ascending text
for L.

Ascending text facilitates identification, but not to the point of triviali-


zation, as the next proposition shows.

PROPOSITION 5.5.1A

i. [ff, text] c: [ff, ascending text].


ii. [ff reo, text] c: [$"roo, ascending text].
iii. {N} U RE fin ¢ [$", ascending text].
Environments 107

Proof

i, ii. Let L; = N - {n}, and let 2' = {N} U {LnlnE N}. By proposition 2.2A,
2' ¢. [.9",text]. However, 2' E [.9"feC, ascending text]. The function that
witnesses this merely conjectures N unless a gap has appeared in the
ascending sequence seen so far. In this case it makes the appropriate
conjecture.
iii. To prove this, we note that there is an analog to the notion of locking
sequences for ascending texts. Namely suppose that qJ identifies L on
ascending text. Then there is a sequence a such that a is ascending,
rng(a) S;; L, W",(U} = L, and whenever r is such that a 1\ r can be extended to
an ascending text for L, W",(u t) = L~in fact cp(a 1\ r) = cp(a). The proof to
1\

this is similar to that of proposition 2.1A and exercises 5.4.1 Band S.4.3A.
Now iii is easy, for let a be such a locking sequence for Nand ip, Then qJ does
not identify rng(a) on ascending text. 0
The preceding results should be compared to proposition 2.2.A.

Exercises

5.5.1A t E:Y is said to be strictly ascending just in case for all n, meN, if tn' t m e N,
and n < m, then t, < t.; The evidential relation {(t,L)lrng(t) = Land t is strictly
ascending} is called strictly ascending text. Prove:
a. [sF, strictly ascending text] = [sF, ascending text].
b. [sF'"·, strictly ascending text] = [sF'«, ascending text].
5.5.18
a. Let LeRE be finite and nonempty. What are the cardinalities of {te:Ylt
ascending and for L} and {te:Ylt strictly ascending and for L}?
b. Let Le RE be infinite. What are the cardinalities of {t e:Ylt ascending and for L}
and {te:Ylt strictly ascending and for L}?

5.5.2 Recursive Text

DEFINITION 5.5.2A
i. A text t is said to be recursive just in case {tn t n EN} is a recursive set.
ii, The evidential relation {(t, L)I rng(t) = Land t is recursive} is called
recursive text. If (t, L) E recursive text, then t is called a recursive text for L.
108 Identification Generalized

Put another way, a text t is recursive just in case there is a decision


procedure for the question. "Does o = t1h (6 )?" Intuitively a text is recursive
just in case some machine generates it.
If children's caretakers are machine simulable and are sheltered from
random environmental influence, they might be limited to the production
of recursive texts. Would such a limitation afTect in principle the class of
learnable languages? The next proposition suggests an affirmative answer
to this question.

PROPOSITION 5.5.2A REEl3'", recursive text].


Proof Since there are only countably many recursive functions, there are
only countably many recursive texts. Thus we can list the recursive texts, to,
t', .... Now we construct <p E 3'" which identifies RE as follows. Given a, let i
be least such that Tih(.) = a. Then let <p(a) = least index for rng(t'). (Since
t i is a recursive text, rngtr') is an r.e, set.) It is clear that cp identifies RE on
recursive text since given a recursive text t for Le RE, t = t 1 for some i, and
there is an n such that T. # TI for allj < i so that <p(Tm ) is an index for rng(t')
for all m 2': n. D

On the other hand, recursive text has no effect on identifiability by


recursive learning function, as indicated by the next proposition.

PROPOSITION 5.5.2B (Blum and Blum 1975)


[jpec, recursive text] = [ff rec, text].

Proof Of course it is clear that [3'""', text] ~ [3'""', recursive text]. Given
IE E [!Free, recursive text] and t/J e!F rec whichidentifies .se on recursive text
we now claim that there is a qJ E §Tec which identifies ff on arbitrary text.
To see this, first notice that if ljJ identifies L on recursive text, then there is
a locking sequence a for ljJ and L. This is because the construction of
proposition 2.1A can be made effective. Namely, if there is no locking
sequence a for ljJ and L, then there is a recursive text t for L on which ljJ does
not converge. Now <p may be constructed as in the proof of proposition
4.6.3A, the locking-sequence-hunting construction. By corollary 4.6.3A,
given e' E :Free, there is a qJ E :Free such that for every LE 2, cp converges on
any text for L to i = ljJ(a), where a is the least locking sequence for ljJ and L.
Thus <p identifies !f' on arbitrary text. D
Environmen ts 109

Exercises

5.5.2A Let r! s;:T x RE be such that (a) for all LeRE, {tl(t,L)er!} is denumer-
able, and (b) for all L, L'eRE, if L #- L', then {t!(t,L)€r!} n{tl(t,L')er!} = 0.
Prove that REe [:JO,r!].This result generalizes proposition 5.5.2A.
*55.2B (Gold 1967): The evidential relation {(t,L)[rng(t) = Land {t.lneN}
is primitive recursive} is called primitive recursive text. Show that RE€ [:JO"c,
primitive recursive text]. What is the appropriate generalization of this result?
55.2e Show directly (without the use of proposition 5.5.2B) that REov l ¢
[:JOrec, recursive text]. (Hint: Modify the construction in the proof of proposition
4.2.1B.)

*5.5.3 Nonrecursive Text

DEFINITION 5.5.3A The evidential relation {(t, L)I rng(t) = Land t is not
recursive} is called nonrecursive text. If (t, L) E nonrecursive text, then t is
called a nonrecursioe textfor L.
The sequence of utterances actually produced by children's caretakers
depends heavily on external environmental events. Such environmental
influences might seem guaranteed to introduce a random component into
naturally occurring texts. Such texts would be nonrecursive, perhaps
strongly so. It is natural to inquire whether limitation to nonrecursive text
facilitates identification. The next proposition provides a negative answer
to this question.
PROPOSITION 5.5.3A
i. [.97,nonrecursive text] = [.97, text].
ii. (Wiehagen 1977) [.97 re o, nonrecursive text] = [.97 re c, text].
Proof Forthe proof ofi, suppose that 2'E [%, nonrecursive text] and that
this is witnessed by tp, We will suppose that 04:2'; the other case is easily
handled. We will establish that 2' E [.97, text] just as in proposition
5.5.2B-namely, we will show that for every LE 2' there is a locking
sequence o for <p and L. Suppose otherwise. We will derive a contradiction
by showing that there are uncountably many texts t that <p does not identify
and hence that there are nonrecursive texts that <p does not identify. First
note that the nonexistence ofa locking sequence for <p and Limplies that for
every a, rng(u) ,,;; L implies that there are sequences r and t' such that
110 Identification Generalized

rng(r) S L, rng(r') s L, rand r' extend o, <p(r) '" <p(") or W.,,) '" L, <p(r') '"
<p(") or W-",) '" L, and, finally, that rand r' are incompatible, To see this,
let n be any fixed element of L Since "Anand "A
# are not locking
sequences for L, they can be extended to rand r', respectively, by elements
of rng(L)U {#} such that <p(r) #- <p(") or W.,,) #- L, and <p(r') #- <p(") or
W.,<') '" L rand r' have the desired properties, Now we simply apply this
splitting property iteratively to get uncountably many texts for L Applying
the principle with" = 0 yields r'', r 1 which are incompatible and for which
<p(") '" <p(r') or W-",) #- L, i ~ 0, L Let so, S1, "., be an enumeration of L
Applying the splitting property to both 'to 1\ So and r 1 1\ So yields -roo, 'to 1, t lO ,
r ", all incompatible and such that <p(r') '" <p(r'i) or W.,,'i) '" L. Continuing
this process gives uncountably many texts for L which <p does not identify,
namely one for each infinite sequence of O's and 1'5. For instance,
to U tOO U r Ooo ... is one such text.
The proof of ii is virtually identical and is left for the reader. 0

Exercises

5.5.3A Call an evidential relation $ big just in case (a) $S {(t,L)lrng(t) ~ L},
and (b) {(t.L)lrng(t) = L} - rf is denumerable. Thus a big evidential relation is
"nearly" {(t, L)I rng(t) ~ L}, that is, nearly text,
Let S be a big environmental relation. Prove:
a. [S', $] ~ [S', text].
b. [§rec,$] = [SPeC,text].

This result generalizes proposition 5.5.3A.

*5.5.4 Fat Text


It may be that in the long run every sentence of a given natural language will
be uttered indefinitely often. What effect would this have on learning?

DEFtNITION 5.5.4A
i. teY is called Jot just in case for all ierng(t), {nit, = i} is infinite.
ii. The evidential relation {(t, Ljlrngfr) = Land t is fat} is called Jot text. If
(t,L)efat text, then t is called a Jot text for L.
Thus t is a fattext for Ljust in case t is a text for Lsuch that every member of
L occurs infinitely often in t.
Environments III

PROPOSITION 5.5.4A

i. [ff,fat text] = [ff,text].


ii. [ffrec, fat text] = [.9'rec, text].

Proof These are both internal simulation proofs. We do i. Obviously


[ff,fat text] ~ [9'", text]. So suppose 2' E [9'", fat text] is given and that
cp E fF identifies If' on fat text. Then t/J E fF is defined to identify 2' on text by
expanding each input text! to t/J into a fat text t' and then simulating cp on r.
Specifically, given U E SEQ, say, a = X o , Xl"'" X., let a' = X o, Xl> X O, X 2, Xl'
X o , · · · , X., Xn-l' ... , xo' Define t/J(u) = cp(u'}.
Notice that the construction of a' from a is effective. This suffices to prove
ii, 0

Fat text is more interesting in the context of memory limitation. The next
proposition shows that the former entirely compensates for the latter.

PROPOSITION 5.5.48 [fFmemory limited, fat text] = [$', text].


Proof [9'"memor y limited, fat text] ~ [ff, text] by proposition 5.5.4A. So sup-
pose that 2' E [ff, text]. We will define sp E ffmemor y limited that identifies 2'
on fat text. We will use proposition 2.4A. Namely for every LE if there is a
finite set DL such that if DL ~ L' and L' E 2', then L' ¢ L.
Let f be a recursive function such that for every i, n EN and D a finite set,
f(i, D, n) is an index for Wi and such that f is one to one. Such an f exists by
the fact that finite sets can be effectively coded by integers and by the s-m-n
theorem.
Roughly, on a, cp will conjecture f(i, D, m), where Wi = LE 2', DL ~
D ~ mg(o), and m counts the number of times we have changed con-
jectured languages. Let i o be any index such that Wi o ¢ 2'. Define cp(0) =
!(io, 0, O). Given cp(u) = f(i, D, m), define cp(u A n) as follows:

Case 1. If Wi = LE.P and DL £" D U {n} S;;; L, then


f (i, D, m), ifm < n,
cp(u A n) =
{ f(i,DU {n},m), ifn::s; m.

Case 2. If Wi ¢.P or D U {n}


DL s;;; D U {n} S;;; L, then
'* Wi, but there is an LE.? such that

cp(u A n) = f(j,D U {n},rn + 1),

where j is the least index for such an L.


112 IdentificationGeneralized

Case 3. Otherwise,

(Notice that if t is any text, then as 'I' is given more and more of t, the
components m and D of the conjectures oftp can only change by increasing.)
It is easy to see that 'I' is I-memory limited since tp(a A n) depends only on
tp(a) and n and not on a. Suppose now that LE:t' and that t is a fat text
for L. We will show that 'I' converges to f(j, D, m) for somej, D, m such that
j is the least index for Land D ;2 DL •
Suppose first that 'I' converges on t. Then, since cases 2 and 3 both result
in a change of conjecture, cp must eventually be in case 1 forever on t and
so must converge to some f(i, D, m) such that IV; ~ L', DL • ~ D, and
D U {n} ~ L' for every n that appears on t after the point at which 'I' begins
to converge. Since t is a fat text for L, this implies that Dc- ~ L ~ L'. But by
the property of DL • this implies that L' ~ L, and so 'I' converges to an index
for L.
Suppose then that qJ does not converge on t. This implies that on t case 2
or3 happens infinitely often so that 'I' makes conjectures with arbitrarily
large m. Suppose that x ELand n is such that tp(t.) = f(i, D, m), m ;0, x, and
t n = x. Such an n must exist, since t is a fat text for L. Then cp(tll+1) =
f(i', D U [x], m') for some i', m', Thus every x E L is eventually added to the
sets D of tp's conjectures; that is, there is an no such that if n > no, tp(t.) =
f(i, D, m) for some D ;2 DL . This implies that for all n > no, tp(t.) will be
defined either by case I or by case 2 applied to a language L' of index :s;; j,
since L will satisfy the condition of 2 for all such n. However, for each
language L' of index < j, 'I' will eventually abandon L', since otherwise 'I'
will converge to L' and we argued that this does not happen. Thus 'I' will
eventually conjecture L on t., for some n" and then 'I' will be in case I for all
t. such that n ;0, n,. 'I' will then change its conjecture at most finitely often
after t. and will converge to f(j, D, m) for some D, m, D

Exercises

5.5.4A t E:Y is called lean just in case for all n, mEN, if tn, tm E Nand n ¥- m, then
t n # 1m, Thus lean texts never repeat a number.The evidential relation {(t, Lllrng(t) =
Land t is lean} is called lean text. Prove:
a. [3', lean text] = [3', text].
b. [§"rec,lean text] = [~rcc. text].
En vironments 113

5.5.4B Let i ~ j. t E ff is called mixed(i,j) just in case for all kEN, {tk , t H I" •• , t H j}
contains at least i + 1 different numbers. The evidential relation {(t,L)lrng(t) = L
and t is mixed(i,j)} is called mixed(i,j) text. Mixed text generalizes lean text.
Suppose that !i' s;; RE contains only infinite languages, and let i ~j be given.
Prove:
a. !i' E [S"", mixed(i,j) text] if and only if !i' E [S"", text].
b. !i' E [3"'"0, mixed(i,j) text] if and only if .!i' E [3", ec , text].

5.6 Informants

5.6.1 Informants and Characteristic Functions

In section 1.3.3we noted that arbitrary texts for a language Ldo not provide
the learner with direct information about L. This feature of texts is moti-
vated by empirical studies suggesting the absence from children's environ-
ments of overt information about ungrammatical strings (see the references
cited in section 1.3.3). In other learning situations, however, the foregoing
property of texts is less justified. In mastering the extensions of certain
concepts, for example, the child may expect explicit correction for false
attributions; other examples may be drawn from scientific settings. We are
thus led to consider environments for a language L that provide equivalent
information about L.

DEFINITION 5.6.1A (Gold 1967) Let LeRE and leg- be given.


i. t is said to be an informant for L just in case rng(t) = {(x, y) [x eLand
y = 0, or x ~ Land y = 1}. If t is an informant for some LE RE, t is said to
be an informant.
ii. The evidential relation {(t, L)I t is an informant for L} is called informant.

Thus informants are special kinds of texts, but an informant for a language
L is not normally a text for L.

Example 5.6.1A
a. Let t E ff be such that for all n E N, t. = <n,O) if n is even, and t. = (n, 1) if n is
0deL Thf'!! t an informant for the set of even numbers.
j~
b. Let tEff be such that rngtr) = {(i,O)!ieN}. Then t is an informant for N.
114 Identification Generalized

Informants for a language L stand in an intimate relation to the charac-


teristic function for L (definition 1.2.2C). Recalling definition 1.2.2D, we
have the following lemma.

LEMMA 5.6.1 A Let t.« REand tefT be given. Then tis an informant for L
if and only if rng(t) represents the characteristic function for L.

A noteworthy property of informants is that their range need not be r.e.


Consider for example an informant t for K. Rng(t) represents the character-
istic function for K and so cannot itself be r.e. (otherwise, it would be easy to
prove that KERE, contradicting lemma 4.2.1 A). In contrast, the reader
may verify that all the other evidential relations g introduced in this
chapter are such that if (t, L) E g (and hence LE RE), then rng(t) eRE.
With this consideration in mind, suppose that children's caretakers are
machine simulable and that natural languages are r.e. but not recursive.
Then it would be impossible for caretakers to present children with an
informant for their language (unless the caretaker's environment supplied a
suitable "oracle"). Similar remarks apply to ascending text (section 5.5.1).
Finally, we note the following feature of identification on informant. Let
q> Eff identify LE RE on informant, and let t E:Y be an informant for L.
Then q> converges on t to an index for L. Consequently q> does not converge
on t to an index for rng(t), since rng(t) # L. Thus the index i to which cp
converges provides less information about L than is available in t, since i
corresponds to a mere "positive test" for L, whereas rng(t) embodies a "test"
for L. (For test and positive test, see section 1.2.2.) In section 6.8 we consider
learning functions that converge to indexes for characteristic functions.

Example 5.6.18

Let (jJ e:Y' be defined as follows. For all a e SEQ, <p(a) is the smallest index for
{xl<x,O>emg(u)}. Then (jJ identifies REno on informant.

Exercises

5.6.1A Show that if t e.:r is an informantfor Le RE, then L is recursiveif and only
if rng(t)E REsv t '
5.6.1B Prove the following:
a. {N} UREno e [:Y'rec, informant].
b. {N} U {N - {x}lx e N} e [:Y',e.,informant].
5.6.1C Prove: [:Y',e., text] ~ [:Y're<, informant].
Environments 115

5.6.2 Identification on Informant

Unlike texts, informants present enough information to identify RE.

PROPOSITION 5.6.2A (Gold 1967) RE E [ff, informant].

Proof If 0" E SEQ, we say that 0" is consistent with Itj if (x, 0) E rng(a)
implies that x E It; and (x, 1>
e mgfrr) implies that x ¢ Itj. Then we define qJ
by

= {least i such that a is consistent with Wi,


()
qJ 0" O'f .
, I such an i does not eXISt.

Obviously on informant t for Itj, qJ converges to the least index for Wi. 0

COROLLARY 5.6.2A [ff, text] c [ff, informant].

Some of the additional information available in informants is utilizable


by recursive learning functions.

PROPOSITION 5.6.2B (Gold 1967) [g;-rcc, text] c:: [ffrcc, informant].

Proof See exercises 5.6.1A and 5.6.1B. 0

However, informants do not allow recursive learning functions to identify


RE.

PROPOSITION 5.6.2C (Gold 1967) RE ¢ [g;-rec, informant].

Proof This is simply a reformulation of the proof of proposition 4.2.1 B.


We leave the details to the reader. 0

COROLLARY 5.6.2B [ffre<, informant] c [ff, informant].

Exercises

5.6.2A Let SEQ* = {Tn II! EN and t is an informant}. Prove the following variant of
proposition 2.lA. Let ({JEfF identify LERE on informant. Then there is CTESEQ*
with the following properties: (a) {x](x, 0) E rng(a)} ~ L; (b) W<p(a) = L, and (c) for
all tESEQ* such that {xl(x,O)Erng(1")} ~ L, cp(a" 1") = ({J(fJ).
5.6.28 Prove: [fFr«, informantjj; = [SFr<<, text].v,'
5.6.2C tE:!T is called an imperfect informantfor LE REjust in case t is an informant
for a finite variant of L. The evidential relation {(t, L)I t is an imperfect informant for
L} is called imperfect informant. Prove: There is Ii' ~ RE such that (a) every LE Ii' is
116 Identification Generalized

infinite, (b) for every L, L' E !I' if L # L', then L n L' = 0, and (c) !I' E [soree, text] -
[so'e", imperfect informant]. (Hint: See the proof of proposition 5.4.3B.)
5.6.2D Recall the evidential relation ascending text from definition 5.5.1A. Prove:
a. [3"", ascending text] c [SO, informant].
b. [SOTec, ascending text] c [soree,informant].
*5.6.2E An oracle for a language L is an agent that correctly answers questions of
the form "x E L?" in finite time. Conceive of a learner 1 as a device that queries an
oracle for an unknown language L an infinite number of times, producing a
conjectured index after each query is answered.l is said to identify L 0/1 oraclejust in
case (a) 1never fails to produce a conjecture after each answered query, (b) for some
i E N, all but finitely many of 1's conjectures are i, and (c) L = Jt;. Identification of
collections oflanguages on oracle is defined straightforwardly. (All of this is drawn
from Gold 1967;variants are possible.) For g ~ SO let the class of collections f£ of
languages such that some q> E g identifies f1' on oracle be denoted [g, oracle].
Prove:
a. [SO, oracle] = [SO, informant].
b. [SO,e., oracle] = [§re., informant].
*5.6.2F Consider a learning paradigm intermediate between oracles (in the sense
of exercise 5.6.2E)and text. In this case the learner is presented with an arbitrary text
fora language L but is allowed, in addition, to pose any finite number of questions of
the form "x E L?" each to be answered appropriately in finite time. Identification
may be defined straightforwardly for this situation. The corresponding class of
identifiable collections of languages is denoted "[g, text with finite oracle]," for
g ~ SO. Prove:

a. [3"", text] = [3"", text with finite oracle].


b. [soree,text] = [soree,text with finite oracle].

*5.6.3 Memory-Limited Identification on Informant

This subsection considers paradigms that result from pairing informants


with learning strategies other than fIl and g;rec. We illustrate with memory
limitation (section 4.4.1), leaving other strategies for the exercises.

PROPOSITION 5.6.3A [fIlmemory limited, text] c: [g;memory limited, informant].

Proof We will exhibit a collection !i' of languages such that !i' E


informant] _ [fjlmemOTY limited, text]. The collection is that
[g;memor y limited,

of proposition 4.4.1B. Namely !i' = {L, Lj,L/IiE N}, where L = {(O,x)1


xEN}, Lj = {(O,X)IXEN} U {(l,j)}, and L/ = {<O,x)lx ¥:j} U {(1,j)}.
Proposition 4.4.1 B demonstrated that !i' if fjll-mcmor y limiled, text]. However,
it is easy to identify !i' on informant. Conjecture L until either «O,j), 1)
Environments 117

or «I,i>, 0) appears in the informant. Then conjecture L/, Lj , respectively.


!fthe current conjecture is L j , conjecture L, unless «O,j), I) appears in the
informant in which case conjecture L/ forever after. 0
As a corollary to the proof of the preceding proposition:
COROLLARY 5.6.3A [ ..~·-ree n g;memorylimited, text] c: [g;ree n g;memorYlimited,
informant].
The next proposition and corollary show that memory limitation is
restrictive on informant.
PROPOSITION 5.6.38 [g;ree, informant] <J [g;memorYlimited, informant].

Proof This proof is essentially the same as that of proposition 4.4.IF. The
relevant collection is !l' = {N} U {DID finite}. !l' is easily identified on
informant by a recursive funtion cp; cpt,,) = N unless there is a pair (x, I ) in
rngtc) for some x in which case cp conjectures {x](x, 0) Emg(,,)}. Suppose
that .P E[g;memorylimited, informant]. Then by exercise 5.6.2A there is a
locking sequence" for each LE!l' and tp, Let" be such a locking sequence
forN,let D = {x](x, 0) Emgt,,)}, and let a' be such that r = "A "0 A o' is a
locking sequence for D. Now choose n ¢ {x](x, i) Erng(r) for some i}. Then
cp(" A "0 A,,') = cp(" A (n,O) A "0 A ,,') and is an index for D. However, if we
now complete" A (n, 0) A "0 A o' to an informant for D U {n} with pairs
(m, I), m ¢ D U {n}, we must have that cp converges on this informant to an
index for D by memory limitation and the property of r, 0
COROLLARY 5.6.3B [~memorYlimited.informant] C [&t".informant].
An "effective" version of the preceding corollary may also be proved.
PROPOSITION 5.6.3C [~ree n&t"memorylimited, informant] c: [g;ree,
informant].
Proof The proof of this proposition is left to the reader. 0

Exercises

5.6.3A Let AeRE be nonrecursive. Let ff'A = {{(O,i)lieK}U{(t,i)liEAUD)I


D finite}. Show that .2"A E [Free, informant] _ [Free nFmemory limited, informant].
(Hint: The learner may use its informant as an "oracle" for K, allowingmembership
in A to be decided; see Rogers 1967, sees. 9.1-9.4.)
118 Identification Generalized

5.6.38 Prove that there is !L' £ RE such that (a) every LE I.e is infinite, and (b)
I.e E [ffrec, informant] - [ffTeC n ,F,ontr'v'.!, informant]. (Hint: See the proof of pro-
position 4.3.2A.)
5.6.3C Prove that [ff TCC n ffco,.erv.,••e, informant] c [ff Te., informant]. (Hint: See
the proof of proposition 4.5.1B.)
5.6.3D Prove that [ff TeCn ffc.utiou., informant] c [ff rec, informant]. (Hint: See
the proof of proposition 4.5.4B.)
5.6.3E Prove that [,Freli.bLe, text] c [ff TeCn ,Froli.ble, informant] c [ff TCC,
informant]. (Hint: See section 4.6.1.)
5.6.3F Prove that [ffre., informant] = [,Free n ffotdcT;.depe.de." informant]. (Hint:
Don't use the construction given in the proof of proposition 4.6.3A-even though
this construction can be successfully modified for the present case; a simpler
construction is available.)

5.7 A Note on "Reactive" Environments

The order in which sentences are addressed to children depends partly on


children's verbal and nonverbal response to earlier sentences. To illustrate,
the caretaker is likely to repeat or paraphrase his or her previous sentence if
the child gives evidence of noncomprehension. An environment that de-
velops through time as a partial function of the learner's prior activity may
be termed "reactive," It is presently unknown to what extent natural en-
vironments are reactive and also whether nonreactive environments (such
as offered by television) are sufficient for normal linguistic development.
Whatever the empirical status of reactive environments, it is important to
recognize that none of the evidential relations defined in this chapter exhibit
the slightest degree of reactivity. Nor is it clear to us how one would
construct paradigms that offer plausible hypotheses about naturally
occurring reactivity. This omission constitutes a significant theoretical gap
in the development of learning theory to the present time.
6 Criteria of Learning

Identification of a text t requires the learner's conjectures to stabilize on


some one grammar for rng(t). Such stabilization formally represents both
the veridicality of the learner's cognitive state and its enduring nature.
Veridicality and stabilization are the hallmarks of learning, and identifi-
cation provides a compelling construal of each.
Although most theorists agree that identification is a sufficient condition
for learning, many deny its necessity. It is argued that natural examples of
learning-language acquisition included-instantiate only weaker concep-
tions of veridicality and stabilization. Accordingly more liberal construals
of both concepts have been olTered. Such proposals amount to alternative
criteria of successful learning. In this chapter we examine some of these
alternative criteria. They may all be construed as generalizations ofidentifi-
cation, in a sense now to be explained.

6.1 Convergence Generalized

6.1.1 Convergence Criteria

DEFINITION 6.1.1A Let <p E 9', t E Y, and S <::: N be given. <p is said to end in
Son t just in case (i) <p is defined on t, and (ii) <p(lm) E S for all but finitely
many mEN.
Thus <p ends in S on tjust in case <p(lmH for all mEN, and there is n E N such
that <p(lm)ES for all m;" n. More intuitively, <p ends in S on tjust in case <p
eventually produces on t an unbroken, infinite sequence of conjectures
drawn from S.

Example 6. 1.1 A

a. Let E be the set of even numbers. Then <p E!F ends in E on t E.'!T just in case <p is
defined on t and <p(tm) is even for all but finitely many mEN.
b. Let g bethe functiondefined in the proofofproposition 1.4.3B,lett ~ 1,2,3,4, ...•
and let no be the smallest index for N - {OJ. Then g ends in (no} on t. More
generally, <peg; converges on lEfT to neN ifand only ifcp ends in {n} on t.
c. If lpE# ends in S £; N on te!T, then cp ends in S' S; N on t forall S' 2 S.
120 Identification Generalized

We observe that RE x £J'(N)(the Cartesian product of RE and the power


set of N) is the set of all pairs (L, S) such that L is a language and S is a subset
ofN.

DEFINITION 6.1.1B

i. A subset of RE x &'(N) is called a convergence criterion.


ii. Let "C be a convergence criterion, and let cP E ff, t E:T, and LE RE be
given. cp is said to f(!-converge on t to Ljust in case there is S s N such that
(L, S) E "C and cp ends in S on t.

Thus to "C-converge on t to L, cp must eventually produce on t an infinite,


unbroken sequence of indexes drawn from S S N, where (L, S) E f(!. Intui-
tively, cp "C-converges on t to L just in case cp's limiting behavior on t
conforms to the standard of veridicality and stability embodied in 1'(/.

Example 6.1.1B

a. The convergence criterion proper to the identification paradigm is fIJ =


{(L, {n})1 J¥" = L}, as the reader may verify.
b. Let fIJ' = {(L,{n})IL and J¥" are finite variants}. Then ffJEg: fIJ'-converges on
t E!7 to LE RE just in case ffJ converges on t to an index for a finite variant of L. fIJ'
will be studied in section 6.2.
c. Let fIJ" = {(L,SJISL = {i1W; = Ln. fIJ" pairs every language with the set of its
indexes. ffJ E fJ' fIJ" -converges on t E!7 to LE RE just in case ffJ ends on t in the set of
indexes for L. Cf}" will be studied in section 6.3.

6.1.2 Identification Relativized


The next definition interrelates convergence criteria, evidential relations,
and successful learning.

DEFINITION 6.1.2A Let convergence criterion Cfi and evidential relation If


be given.

i. rp E ff is said to Cfi-identify LE RE on S just in case for all texts t such that


(t,L)ES, cp I'(/-convergeson t to L.
ii. rpE g; is said to f(!-identify !£ S RE on If just in case for all LE 2, cp
Cfi-identifies Lon 8.
Criteria of Learning 121

Thus to "t'-identify Lon C, cp must CC-converge to L on every text that c!


stipulates as ''for'' L.

Example 6.1.2A

a. Let rcbe asin part a of example 6.1.1B.Then cpEfF rc-identilies LE RE on text just
in case cp rc-converges to L on every text for L-that is,just in case cp converges to an
index for rng(t) on every text for L. Thus cp rc-identifies Lon text if and only if cp
identifies L in the sense of definition 1.4.2A.
b. Let '(f' be as in part b of example6.1.1B. Then 'PES' '(f'-identifies LE RE on noisy
text just in case cp ~'-converges to Lon every noisy text for L-that is.just in case cp
converges to an index for a finite variant of L on every noisy text for L.
c. Let ce" be as in part c of example 6.1.18. Then cpEff "C"-identifies LERE on
incomplete text just in case cp t(j" -converges to L on every incomplete text for L-
that is, just in case cp ends in the set of indexes for L on every incomplete text for L.

DEFtNtTtON 6.1.2B Let learning strategy Y', evidential relation $, and


convergence criterion C(j be given.

i. The class {2' >; Rlilsome 'P E Y' 'C-identifies 2' on <!'} is denoted:
W.$, '6].
ii. The class {2' >; RE",lsome q>EY' 'C-identifies 2' on $} is denoted:
[Y', s, '6]",.

Finally, we provide a name for {(L,{n})1 w" = L}, the convergence crite-
rion proper to the identification paradigm. The intuitive significance ofthe
name will become apparent as we consider alternative convergence criteria
in the sections that follow.

DEFtNtTtON 6.1.2C The convergence criterion {(L, (n}) I w" = L} is called


intensional, abbreviated to: INT.

Example 6.1.2B

a. [S', text. INT] is the family of all identifiablecollectionsoflanguages (in the sense
of definition 1.4.3A).
b. Let C(j'be as in part b of example 6.1.1B.Then [.:F m , noisy text, CC'] is the family of
all collections !t' oflanguages such that for some recursive learning function cp, and
for every noisy text r for a language Lin !t', cp converges on t to a finite variant of L.
122 Identification Generalized

C. Let (C" be as in part c of example 6.1.1B. Then [ffconslslent, incomplete text, {C"] is
the family of all ccllections ze' of languages such that for some consistent learning
function ip, and for every incomplete text t for a language Lin !E, cp ends on l in the
set of indexes for L.

It can be seen that each choice of strategy, evidential relation, and


convergence criterion yields a distinct learning paradigm. The criterion of
successful learning proper to any such paradigm emerges from the interac-
tion of its associated evidential relation and convergence criterion. To-
gether, these items determine whether a given function is to be credited with
the ability to learn a given language. This is achieved by specifying what
kind of behavior on which set of texts is to count as successful performance
with respect to a given language.
In this chapter we consider the inclusion relations among classes of the
form [9', C, 'C], where 9' is a strategy, rC an evidential relation and 'f! a
convergence criterion. The strategy of principal interest is .9"". After
considering a variety of criteria in sections 6.2 through 6.6, a partial
summary is provided in section 6.7. Section 6.8 introduces a variant of
learning paradigms in which the learner conjectures tests rather than
positive tests in response to his environment.

Exercises

6.1.2A Let convergence criteria {C. re' be such that re s: re'.


a. Show that for all strategies SF and evidential relations $, [SF,S, {C] ~ [SF,S. re'].
Show by example that equality is possible even ifre c {C'.
b. For any evidential relation 8. convergence criterion (C. and cp E!IF, we define
2',,,.«1') to be {LERE[tp 'i-identifies L on 8} (cf. exercise 5.38). Show that
2',.• ('1') '" 2',.~.(tp).
c. Give examples of cp, !/J E!!F. and convergence criteria re, re' such that q; c re' and
~nt,'6(CP) c; .2";nt,'6·(CP)' and 2';uI.'C(!/J) = 2';ul.'6·(ifr)·

6.1.2B Specify convergence criterion re such that (a) {C c INT and (b)
[!!Free, text, (C] = [!!Free, text, INT].

6.1.2C Prove: Let evidential relation Gand convergence criterion re be given. Then
[!!Free n !!Flolal. s, rc] = [!!Free. S. rc] (cf. proposition 4.2.1B).

6.1.2D Prove the following generalization of proposition 2.1A. Let convergence


criterion f(/ be given, Suppose that cp E!F re-identifies LE RE on text. Then there is
a ESEQ and S '" N such that (a) (L, S)E'i, (b) rng(a) '" L, and (c) for all r ESEQ if
Criteria of Learning 123

rngfr] ~ L, then cp(rr '" r)ES. (In the foregoing situation (J is called a tC-Iocking
sequence for cp and L on text.)
6.1.2E Let convergence criterion (t be given. Show that [:F re <, text, 't&'] ~
[jtrc<, informant, 1i']. (For informants, see definition 5.6.lA.)

6.2 Finite Difference, Intensional Identification

It may be that children do not internalize a grammar for the ambient


language L, but rather a grammer for some language "near to" L. This
possibility suggests the following convergence criterion.

DEFINITION 6.2A The convergence criterion {(L, {n} )IL and J¥" are finite
variants} is called finite difference, intensional, abbreviated to: FINT.

Thus cp Eff FlNT-converges on t E:Y to LE RE just in case cp converges on


t to an index for a finite variant of L. As a consequence cp FINT-identifies L
011 text just in case for all texts t for L, cp converges on t to an index for
a finite variant of rng(t). Intuitively to FlNT-identify L on text, cp must
produce on every text for L an infinite, unbroken sequence of identical
indexes, all ofthem for the same "near miss." Thus, FlNT·identification on
text compromises the accuracy required of a learner but not the stability.
Note that if cp Eff FINT-identifies LE RE on text, then cp may converge on
different texts for L to distinct languages.

6.2.1 FINT-Identification on Text


The following proposition shows FlNT-identification on text to be a
strictly more liberal paradigm than identification simpliciter.

PROPOSITION 6.2.1A

i. [ff, text, INT] c [ff, text, FINT].


ii. [ff rec, text, [NT] c Up ec, text, FINT].

Proof Note that INT ~ FINT. Hence by exercise 6.1.2A it suffices to show
that the inclusions are proper. Let.It' = {N - DID finite}. By proposition
2.2A and lemma 2.2A, !f f/: [SF, text, INT] 2 [yre<, text, INT]. Let n be an
index for N. Define hEff red by h(r) = n for all -rESEQ. Then h FINT-
r
identifies se on text. Thus se E SFree, text, FINT] ~ [SF, text, FINT],
which establishes i and ii. 0
124 Identification Generalized

On the other hand, FINT does not allow the identification of RE.

PROPOSITION 6.2.IB {N} U RE fi n <j [SF, text, FINT].


Proof Suppose, to the contrary, that <pESF FINT·identifies {N} U RE fi n
on text. Let r be a Fl N'f-locking sequence for <p and N on text, in the sense
of exercise 6.1.20. Define a text, t, for rng(r) U {OJ E RE fi n by I;h(') = r; t n = 0
if n:2: Ih(r). Then <p does not FfN'Ivconverge on t to rng(r)U{O}. Hence
<p fails to FlNf'-identify RE fi n on text. 0

COROLLARV 6.2.IA RE <j [SF, text, FINT].

The next proposition shows that SF'" is restrictive with respect to FINT
and text. It is a corollary of proposition 6.5.1C (see also exercise 6.2.1H).

PROPOSITION 6.2.IC [SF"', text, FINT] c; [SF, text, FINT].

Exercises

6.2.1A !.t, fiJ' £: RE are said to be finite analogues just in case (a) for all Le!i' there
is L' E.st' such that Land L' are finite variants, and (b) for all L' e!l" there is LeIi'
such that Land L' are finite variants. Now let:e.!£' ~ RE be finite analogues and
suppose that cpefF is such that 2 £: ~~.I,INT(CP).
a. Does it follow that If' £: !t;Uf.INT(CP)?
b. Does it follow that fiJ' E [g:. text, INT]?
6.2.1B Prove:
a. [ffcOnfiden\ text, INT] c [fFCOnfident, text, FINT].
b. Let !I! £: RE be a w.o. chain (in the sense of exercise 4.6.2B) such that for infinitely
many L, L' E fe. Land L' are not finite 'variants. Then::e¢ [ffeOnfiden" text, FINT).
Conclude that [:F m , text,INT] rt [:Feo~r;den'. text, FINT].

6.2.1C Prove that [ff,text,INT] rt [~memorYlimiled,text,FINT]. (Hint: See the


proof of proposition 4.4.1 C.)

6.2.\ D Prove:
a. [§ree, text, INT] $ [§ree n§consislent, text, FINT).
"b. [!Free, text, INT] rt [:Free n :Feauliou"text, FINT].

*6.2.\E
a. ::e
s; RE is called [mite-difference saturated just in case se E [§, text, FINT], and
for all 5£' s; RE, if 5£ c: X', then IE' ¢ [Y. text, FINT]. Exhibit a finite-difference
saturated collection of languages (cf. exercise 2.2E).
b. ::e
s;; RE is called finite-difference maximal just in case se E [ff, text, FINT], and
Criteria of Learning 125

for some LE RE, Z U {L} '1 [.'F, text, FINn. Exhibit a finite-difference maximal
collection of languages different than the collection exhibited in part a (cf. exercise
4.6.2C).
6.2.IF For nEN the criterion {(L, (i})I(L- W;jU(W; - L) has no more than n
members} is denoted FINT(n). Prove:
a. If n < m, then [%', text, FINT(n)] c: [%', text, FINT(m)].
b. U.. N [%', text,FINT(n)J c [%', text, FINT].
"'c. (Case and Smith 1983) Let n < m. Then [yrec n g;Popperian, text, FINT(m)]sv, -
r%'''', text, F1NT(nll••, oF 0.
*d. (Case and Smith"'1983) [yrec n ffPopperian. text, FINT]svt - UnEN [yrec, text,
F1NT(n)],,, oF 0.
*6.2.1G <PEg; is called FINTorder independent just in case for all LERE, if ip
FINT -identifies L on text, then for all texts r, t' for L, cp converges on both t and t' to
the same index (cf. definition 4.6.3A). Prove the following variant of proposition
4.6.3A: [yrec n yFINT order independem. text, FINT] = [yree, text, FINT].

6.2.1H Prove: Let!/ be a denumerable subset of ff. Then r!/,text,FINn c


[Y, text, FINT]. (Hint: See the proof of proposition 4.lA.) Note that this result
implies proposition 6.2.1 C.
*6.2.tl [/ ~:F is said to team FINT-identify!£' ~ REjust in case for every LE if'
there is cp E [ / such that cp FINT-identifies Lon text. Show that no finite subset of %
team FINT-identifies RE. (For penetrating results on team identification, see Smith
1981.)

6.2.2 FINT-Identifieation on Imperfect Text

Does the margin of error tolerated in FINT-convergence compensate for


the textual imperfections examined in section 5.4? In other words, if some
<pES' identifies !i' >;; RE on text, does some <pE$' FINT-identify!i' on
imperfect text? The results of the present subsection show that this conjec-
ture is false.

PROPOStTtON 6.2.2A

i. [S', text, INT] '*' [S', noisy text, FINT].


ii. [S'''', text, INT] '*' [S"", noisy text, FINT].

Proof Let!i' = {N} U {D - {O}ID finite}. It is clear that !i'E [$'''', text,
INT]. But suppose that!i' E [S', noisy text, FINT]. Observe that every text
for a language in {N} U RE n" is a noisy text for some language in !i'. But
from this it follows that {N} U RE n" E [S', text, FINT] contrary to the proof
of proposition 6.2.IB. 0
126 Identification Generalized

PROPOSITION 6.2.2B

i. [Si', text, INT] '*'


[Si', incomplete text, FINT].
'*'
ii. [Si"", text, INT] [Si'''', incomplete text, FINT].
Proof The collection !I? in the proof of proposition 6.2.2A witnesses the
noninclusions of this proposition as well. 0

Exercises

6.2.2A Prove that [S'''',texl,INT] '*' [S',noisy text,FINT]U[S',incomplete


text,FINT].
6.2.28 Prove the following generalization of proposition 2.1A. Let cp E ff FINT-
identify LE RE on imperfect text. Then there are l', L" E RE and (J ESEQ such that
(a)L,L',L"are finite variantsofeachother,(b)rng(a) '" L',(c) W.,.)
~ L", and (d)for
all'ESEQ, if rngfr] '" L', then ",(a A.) ~ ",(a).
6.2.2C Prove:
a. Let L, L'ERE be finite variants. Then {L,L'}E[ff,noisy text.FfNf'j Ft
[ff,incomplete text, FINT].
b. Let E be the sel of even numbers. Then {N, E} ¢ES', noisy text,FINT].
c.!£ E [9', noisy text, FINT] if and only iffor all L, L' E 2, either Land L' are finite
variants or both of L - L' and L' - L are infinite. (Hint: Adapt the proof of
proposition 5.4.1B.)
d. [ff, noisy text, FINT] c [sP, incomplete text, FINT].

6.2.20 Prove: [.'F, imperfect text, FINT] = [.¥",noisy text, FINT].(Hint: See pro-
position 5.4.3A.)

6.2.3 FINT-Idenlificalion in RE,,,

Suppose that qJ Eff FINT-identifies LE RE,,, on text, and let t be a text for
L. It does not follow that qJ converges on t to an index for a total, single-
valued language. Since qJ is allowed a finite margin of error, '" may well
converge on t to a language that represents a properly partial function. This
is a useful fact to bear in mind when thinking about the results of the present
subsection (cr. exercise 6.2.3B).
In view of proposition 1.4.3C, [Si', text, INT],,, = [Si', text, FINT],.,. In
contrast, the next proposition shows that INT-identification on text and
Fl N'l'-identification on text can be distinguished in the context of Si"".
Criteria of Learning 127

The following definition is required for its proof.

DEFINITION 6.2.3A !/J E 9'rec is called almost self-naming just in case !/J(OH
and!/J and ({J1p(O) are finite variants, The collection {LE REsvtlL represents an
almost self-naming function} is denoted REasn'

PROPOSITION 6.2.3A (Case and Smith 1983) [9'reo, text, INTJ.vl c


[9'reo, text, FINT]svt.

Proof We follow the proof given by Case and Smith. The reader should
compare the construction here to the proof of proposition 4.6.1C(iii)
which gives a similar application of the recursion theorem in a simpler
setting.
We claim that RE asn E [ff rec, text, FINTJ,vl - [9'rec, text, INTJ.vl' It is
clear that REa•n E [9'rec, text, FINT]. Hence it suffices to show that RE a•n ¢
[~rec, text, INT]svt.
Suppose, to the contrary, that !/J E 9'ro. intensionally identifies RE as n on
text. By lemma 4.2.28 we may assume that !/J is total. We define a total
recursive function f by the recursion theorem such that f is almost self-
naming and if L represents f, then !/J fails to intensionally identify L. To
apply the recursion theorem, we construct a total recursive function h by the
following algorithm. The algorithm defines ({Jh(i) in stages indexed by s. In
the description of the algorithm, ({J~(i) denotes the finite piece of ((Jh(i) cons-
tructed through stage s; as denotes a number we are attempting to withhold
from rY,,(i) at stage s, and x S denotes the least number n such that n is not in
the domain of ({J~(iJ and n #- as, Recall from the proof of proposition 4.6.1C
that ({J[n] = «0, ({J(O), .. " (n, ({J(n)), where it is understood that ({J(mH for
every m :s; n. We also think of!/J as conjecturing indexes for partial recursive
functions rather than the languages representing them.

Construction

Stage 0: ({J~(i)(O) = i; aO = 1.

Stage s + 1: Suppose ({J~(i) (a finite function) and a' have been defined. We
define ({J~~i by the following three cases.

Case 1. There is a o such that ({J~(i)[a' - 1] <::: 0' <::: (4'~(i)U {(a',O)})[x'-
IJ and !/J«{J~(i)[a' - IJ) #- !/J(O'). Then let lfJ~~~ = ({Jh(j) U {(a',O)}, and let
a s +1 = x",
Case 2. The hypothesis of 1 fails and ((J1I'(<i'~III[a'-lJ),,(aS)!.Then let cp~~)l =
lfJh(i) U {(a', 1 - rplj>('I'~III[a'-ln(a')}, and let a'+1 = x'.
128 Identification Generalized

Case 3. If neither the hypothesis of case I holds nor the hypothesis of case
2 holds, then let 'P:U = 'P:(i) U {(x', O)} and let a'+! ~ a'.

Now, by the recursion theorem, let i be such that 'Ph(i) = 'Pi' In the
construction of lph(i) either(a)for every s thereis an s' > s such that as' =I=- as,
or (b) there is an s such that for every s' > s, as' = as. In each case we define
a total recursive function J such that if L represents J, then LE RE,," and
i/J fails to identify L.
In case (a) holds, 'Ph(i) is total. Let J ~ 'Ph(i) and let L represent J. Clearly
L ERE,,". Let t be the text for L such that t"+! ~ J[ n] for every n. Then, for
infinitely many s, CPh{i) is defined by case 1 or case 2 of the construction.
Hence either i/J changes its conjectures infinitely often on t or for infinitely
many n, 'Pw" . ) # J. In either case i/J fails to identify t.
In case (b) holds, 'Ph'i) is defined by case 3 at stage s' of the construction for
every s' > s. Hence 'Ph'i)(n)L for all n # a'. Let J = 'Ph(i) U {(a', O)}, and let L
represent J. Again it is clear that LE RE,,". Let t be the text for L such that
t"+! = J[n] for all n. Since 'Ph'i) is defined by case 3 of the construction for
all s' > s, i/J converges on t to an indexj such that 'Pj(a')f. But then i/J fails
to identify t. 0
The next proposition shows that even in the context of REsVH §rt:c is
restrictive for FINT-identification on text.
PROPOSITION 6.2.3B RE,,, ¢ [.'J"""', text, FINT],,,.

Proof The proof of proposition 4.2.1 B suffices to prove this result as well
with only minor modification. Note that L o and L, specified there are
members of REsvt and are not finite variants of one another. 0
COROLLARY 6.2.3A [y;rec, text, FINnsvt C [§, text, FINT]svt.

Exercises

6.2.3A (Case and Smith 1983) Prove: [$Fmn.?popperian,text,FINnsvl1:.


[.?m, text,INT]svt" (Hint: See the proof of proposition 6.2.3A.)
6.23B The convergence criterion {(L, {n} )ILERE... and w" ERE,,, and Land w"
arefinite variants} is called functionalfinite difference, abbreviated to FFD. Thus cp
FFD-identifiesLEREsv1 on text just in caseforeverytext t forL. cpconverges on t to
an index for a language L' EREO', such that the functions represented by Land
Criteria of Learning 129

L' differ at only finitely many arguments. (If L¢ REsvt, then L cannot be FFD-
identified.] Prove: [§rec, text, INTJs~1 = [§rcc, text, FFD]"'I. Compare this result to
proposition 6.2.3A.
6.2.3C(Case and Smith 1983) For ne N,let the convergence criterion FINT(n) be
defined as in exercise 6.2.1 E. Prove the following strengthenings of proposition
6.2.3A:
a. Let n < m.Then [,¥rec, text, FINT(n)]svl C [,¥rec, text, FINT(m)lvl'
b. UIliEN[greC,text,FINT(n)]svt c: [§"reC,text,FINTlw

6.3 Extensional Identification

FINT-idenlification liberalizes the accuracy requirement of identification.


We now examine one method ofliberalizing its stability requirement. From
lemma 1.2.1 B we know thai for all i E N, {j lit) = w,} is infinite; that is, no
language is generated by only finitely many grammars. Now consider a
learner who fails to converge on a text t to a grammar for rng(t) but does
eventually conjecture an uninterrupted, infinite sequence of grammars, all
of them for rng(t) and no two of them the same. After some finiteexposure to
t, such a learner would never lack a grammar for rngfr), In this sense the
learner maybe said to converge to rng(t) itself(thelanguage"in extension"),
rather than to anyone grammar for rng(t) (the language "in intension"). In
particular, it is possible that normal linguistic development constitutes
convergence to the ambientlanguage but to no particular grammar.
A sequence of equivalent but ever-shifting conjectures need not betray
perversity. Such shifting might arise as the result of continual refinements
for the sake of efficiency. For example, the learner might discover how to
lower the processing time of a subset of sentences already accepted, albeit
inefficiently, by her latest conjecture. The new processing strategy might
require modification of her grammar without changing the language ac-
cepted. Alternatively, the learner may, from time to time, happen on an
unfamiliar figure of speech s from a grammatically reliable source. Rather
than check whether her current grammar G accepts s, the learner might
incorporate within G a special purpose modification that ensures s's ac-
ceptance. If G already accepts s, the modified grammar will be equivalent to
it. This situation could arise indefinitely often for a sufficiently cautious
learner. (Compare the function 9 in the proof of proposition 6.3.18.)
130 Identification Generalized

We are thus led to the following definition.

DEFINITION 6.3A The convergence criterion {(L,{illti = L})ILERE} is


called extensional, abbreviated to EXT.

Thus cp E:F EXT-converges on t E t7 to LE RE just in case cp ends in


{i Ilti = L} on t. As a consequence cp EXT-identifies L on text just in case
for all texts t for L, cp ends on t in the set of indexes for rng(t). Put another
way, to EXT-identify L on text, cp must produce on every text for L an
infinite, unbroken sequence of indexes for L.

6.3.1 EXT-Identification in RE

The following proposition is easy to prove.

PROPOSITION6.3.1A [:F,text,INT] = [:F,text,EXT] c [g;,text,FINT].

As a counterpoint to proposition 6.3.1A, we now show that INT and EXT


may be distinguished on text in the context of g;ree.

PROPOSITION 6.3.1B [yree, text, INT] c [:Free, text, EXT].

Proof It suffices to exhibit an ..P E[ff rec , text, EXT] - [g;ree, text, INT].
Let ..P = {KUDID finite}. By lemma 4.2.1C, ..P¢[ffree,text,INT]. Let
g E ffrec be such that l¥g(r) = K U rngtr). (Such a 9 may be defined using part
ii of definition 1.2.1A.)Then 9 EXT-identifies ..P on text: Let t be a text for
some K U D, D finite. Then for some n, D ~ rng(tn) and for every m ~ n, g(tm )
is an index for K U D. (Note that for m' > m ~ n, g(tm) and g(tm,) will not, in
general, be the same index for K U D.) D
The next proposition states that :!i'ree restricts EXT-identification on
text. It will be exhibited as a corollary of proposition 6.5.1 C (see also
exercise 6.3.1D).

PROPOSlTlON 6.3.1C [:Free, text, EXT] c [.?', text, EXT].


What is the relation between EXT- and FINT-identification on text
by recursive learning function? The answer is provided by the next
proposition

PROPOSITION 6.3.10
i. [:Free, text, FINT] rt [:!i'ree, text, EXT].
ii. [:Free, text, EXT] ¢ [:Free, text FINT].
Criteria of Learning 131

Proof

i. Let 2 = {N - DID finite}. Then 2 E [ffrec, text, FINT] by the proof of


proposition 6.2.1A and 2 ¢ [ff, text, EXT] ;2 [ff rec, text, EXT] by propo-
sition 6.3.1 A and the proof of proposition 6.2.1 A.
ii. Let 2 = {N x (KU{x})lxEN}. Suppose q>Eff re c FINT-identifies 2.
Then there is a FINT-locking sequence a for q> and N x K on text in the
sense of exercise 6.2.1 D. For every x, let s" be a computably generable text
for N x (K U {x}). If x E K, then there is an n such that q>(o-) #- cp(a AS;)
since in this case N x K and N x (K U {x]) are not finite variants. On the
other hand, if x E K, then for every n, q>(a A SnX ) = q>(a), since a is a FINT-
locking sequence for q> and N x K. But this yields a positive test for
membership in K, contradicting the fact that R is not recursively
enumerable.
On the other hand, 2 E [ff rec, text, EXT] (cf. the proof of proposition
6.3.1B).D
In section 4.3.3 we saw that consistency restricts ffrcc. In contrast, the
consistent subset of /yrec does not limit EXT-identification on text.
PROPOSITION 6.3.1 E [g;rec n g;consistent, text, EXT] = [g;rec, text, EXT].

Proof See exercise 6.3.1 F. D

Exercises

6.3.1A
a. Prove: [ffTCC n ffcon,e".'ivc, text, EXT] c [3O r«, text, EXT].
b. rp E 30 is said to be extensionally conservative just in case for all 0" E SEQ, if
rngfe) ~ W"'l,q, then W"'I~) = W"'ln (00- is explained in definition 4.4.1A). Thus an
extensionally conservative learner never abandons a language that generates all the
data seen to date (although a specific grammar may be abandoned at any time).
Conservatism is a special case of extensional conservatism. Prove: [3O,ee n
3Oe.,en,;onRllycon.e"alive, text, EXT] 'it. [ff,e., text, INT].

6.3.1B rpE30 is said to be extensionally confident just in case for all t€:Y there is
a:
LE RE such that C{J ends in {iI = L} on t. Confidence is a special case of exten-
sional confidence. Prove:
a. Let:.f 0;; RE bean infinite w.o. chain (see exercise4.6.2B for the definition ora w.o.
chain). Then 2! rj [ff u,ensional1y confident text, EXT]. (Hence the collection :.f of all
finite languages' is not a member of this latter class since :.f contains an infinite
chain.)
132 Identification Generalized

b. Let 2', 2!' E [§UlenS;onallyconfiden\ text, EXT]. Then se U 2" E [§CltICnSiOnallyconfidCn\

text,EXT].
c. Let s; !e' e [:~~n"'n'·d."'I""'.I1,.cuarldeDt, text, EXT]. Then, !l'U 2" e [:F'" n
,uteuiona1b coafldeal. text. sxrj.
*6.3.1C Let S, t eff be given. 5 is said to be final in t just in case there is n E N such
that Sm = l,"+11 for all mEN. Intuitively, s is final in t just in case t has s as infinite
"tail." Let cpe:? he defined on 1 eY. The infinite sequence of conjectures produced
by 'P on t is denoted 'P[t]' Formally,'P[t] is the unique s Eff such that s; = 'P(I,) for
all n E N. Finally, cp E g; is said to be extensionally order independent just in case for
all LeRE, if cp extensionally identifies L, then there is seg- such that for all texts
t for L, s is final in cp[e]. It can he seen that order independence is a special case
of extensional order independence. Prove [§rec n§e:llensional order Independence. text,
EXT] c [§rec, text, EXT].
6.3.1D Prove: Let.'l' be a denumerable subset of:T. Then [.'I',text,EXTJ c:
[:T, text, EXT]. (Hint: See the proof of proposition 4.l.A.) Note that proposition
6.3.1C follows from this result.
*6.3.1E (Case and Lynes 1982) Prove that there is !E £; RE rec such that
!E E [§rec, text. EXT] - [§rec. text. FINT]. (For REm. see definition 1.2.2B.) The
foregoing result strengthens proposition 6.3.1 D(ii). lts proof is nontrivial.
6.3.1F Prove proposition 6.3.1 E.
6.,j.lGLet REr;n'K be as defined in exercise 4.2.1H. Show that
{K} UREr;'KE [:T''', text, EXn - [:T''', text, INT].
6.3.1H Prove: There is !f ~ RE such that (a) every LE!E is infinite, and (b)
!E E [ff rec, text, EXT] - [§rec n j"accounlable, text. EXT]. (Hint: See the proof of
proposition 4.3.5A.)

6.3.2 EXT-Identification in RE",

Proposition 1.4.3C is enough to show that RE,,, E [.'F, text, EXT],.t- On the
other hand:
PROPOStTION 6.3.2A (Case and Smith 1983) RE,,, if [.'F"', text, EXTJ",·

Proof The proof we give parallels the proof of proposition 4.2.1B. Sup-
pose <p EXT -identifies RE,,, on text. Call a text t orderly just in case there is
an f E :T 'O'" such that 1,+1 = f[ n] for every n and call a sequence 0' ESEQ
orderly just in case it is in an orderly text. (See the proof of proposition
6.2.3A for <p[n].) For any orderly sequence 0', let s" be the orderly text s such
that o is in s and for every n ~ Ih(u),s. = (n,O),and lett" be the orderly text
Criteria of Learning 133

t such that a is in t and for every n ~ Ih(u), to = (n, 1). Each of the texts s"
and ttl is for a language in RE. vl' Consequently for every a there is an
n > lh(u) and an m such that (n, 0) E Wq>(:<:,),m and there is an n > lh(u) and an
m such that (n, 1> E Wq>(i~),m' Let p(a) be the first coordinate of the smallest
pair (n, m) with this property with respect to s", and let q(a) be likewise with
respect to t". We now define an orderly text t for a language in RE' Vl which qJ
fails to EXT-identify.
Le t a o -- 0 . F It
or n even e a 0+1 -- _"n - -"n
Sp("n). F or n 0 dd Ie t a 0+1 - tq (,, " ) , L e t
t = U(10, It is clear that t is an orderly text for a language in RE. v!'
In addition for every n > 0, t1h("n) if Wq>(,,")' which shows that qJ fails to
EXT-identify rng(t). 0

COROLLARY 6.3.2A [ffr.c, text, EXT].v! c [ff, text, EXT].v('

For the proof of the following proposition the reader may consult Case
and Smith (1983, theorem 3.1). Note the contrast to proposition 6.3.1D(i).

PROPOSITION 6.3.2B (Case and Smith 1983) [§"r.c, text, FINT].v! C

[§"rec, text, EXT].v!'

Exercise

6.3.2.A (John Steel, cited in Case and Smith 1983) Provide a simple proof for
the following weakening of proposition 6.3.~..B: [9're<, text, FINT]SV1 S;;
[9'r.o, text, EXT]svl' (Hint: The errors of a FINT learner on a text for Le REBvt
can be discovered and patched.)

*6.3.3 Finite Difference, Extensional Identification

The liberalizations captured by finite difference and by extensional identi-


fication may be combined as follows.

DEFINITION 6.3.3A The convergence criterion {(L,{il~=L'})IL' is a


finite variant of L} is called finite difference, extensional, abbreviated to
FEXT.

Thus qJ E §" FEXT-identifies LE RE on text just in case for all texts t for L, qJ
ends on t in the set of indexes for some, one finite variant of L. Intuitively, to
134 Identification Generalized

FEXT-identify L on text, <[J must produce on every text for L an infinite,


unbroken sequence of indexes, all of them for the same "near miss."
The following results for FEXT-identification are immediate corollaries
of propositon 6.3.1D.

PROPOSITION 6.3.3A

i. [ff"", text, EXT] c: [ff''', text, FEXT].


ii. [ff''', text, FINT] c: [ff''', text, FEXT].

Exercise

6.3.3A Prove: [3',text,FEXl] ~ [3', text, FINT].

*6.4 Bounded Extensional Identification

<[JEff can EXT-converge to LE RE on tE Y in two dilTerent ways. On the


one hand, <p may conjecture on t an infinite number of distinct indexes for L,
just as the function 9 in the proof of proposition 6.3.1B conjectures infinitely
many distinct indexes for K on any text for K. On the other hand, <[J may
EXT-converge to Lon t by cycling on t within some finite set S of indexes for
L (if S has just one member, then this kind of extensional convergence
reduces to intensional convergence). Given the resource constraints on
human cognitive activity. this second, bounded form of EXT-convergence
seems the more plausible model of linguistic development, for the un-
bounded form requires the learner to manipulate grammars of ever-
increasing size. We are thus led to the following definition.

DEFINITION 6.4A The convergence criterion {(L, SLllSL is a nonempty,


finite set of indexes for L) is called bounded extensional, abbreviated to
BEXT.

Thus <[J Eff BEXT-identifies LE RE on text just in case for all texts t for L, <[J
ends on t in {il W, = Land i < n} for some nE N such that n is at least as big
as the least index for L.
Criteria of Learning 135

6.4.1 BEXT-Identification in RE

It is evident that [9", text, BEXT] = [9", text, INT] (= [9", text, EXT]).
The next proposition provides information about the classification of
[9""', text, BEXT].

PROPOSITION 6.4.1A

i. [9""', text, FINT] 't [9""', text, BEXT].


ii. [9""', text, BEXT] $ [9"'"', text, FINT].
iii. [yrec, text, BEXTJ c: [yrec, text, EXT].

Note that proposition 6.4.IA(ii) implies proposition 6.3.ID(ii). We will


need the following definition and lemma for the proof of part ii of this
proposition.

DEFINITION 6.4.IA

i. For LE RE, par(L) = card(Ln ({O) x N)).


ii. L is called parity self-describing just in case par(L) is odd and IV; = Lor
par(L) is even and It; = Lor par(L) is infinite and IV; ~ IV; = L, where i is
the least nsuch that <l,n)ELandjis the least n such that <2,n)EL. REp,d
is the collection {LE REIL is parity self-describing}.

LEMMA 6.4.1A For any total recursive functions f and g, there exist i andj
such that J1'; = U/(i.i)l and ~ = H-;,(i.i»)'
Proof See Rogers (1967, sec. 11.4, theorem X, a). 0

We will refer to lemma 6.4.1A as the double recursion theorem.

Proof of proposition 6.4.1 A

i. This is an immediate corollary of proposition 6.3.ID(i).


ii. We claim that (a) REp'd E [9"'"', text, BEXT] and (b) REp", ~
[:F rec, text, FINT].

To show (a), let CPE9"'O< be defined so that if par(rng(o-)) is odd, then


cp(o-) = the least i such that (I, i) E rng(o-) and if par(rng(o-)) is even, then
cp(o-) = the least i such that <2, i) E rngt«). (If in either case such an i does
not exist, let '1'(0-) = 0.) It is clear that cp BEXT-identifies REp'd on text.
To show (b), suppose to the contrary that l/J E :!Free FINT-identifies RE pSd
on text. I/J may be assumed to be total. We will define, using the double
recursion theorem, two languages L E , La E RE pSd at least one of which r/J
fails to FINT-identify. In order to apply the double recursion theorem, we
136 Identification Generalized

will define two total recursive functions f and g. The definitions of f and g
rely on the following construction.

Construction Given i.j e N, the following algorithm constructs three in-


finite sequences of finite sequences {WlnEN}, {o-nlnEN} and {tnlnEN}.

Stage °
0-° = (l,i),(2,j».
to = (1, i), (2,j),(O,O».
(10 = 0.
Stage n + 1
Case 1. l{1(o-n) =1= ljJ(o-n A WA #). Then let o-n+l = o-n A (1" t\ #, t"+1 = r",
and (1n+l = 0.
Case 2. 1jJ(0-") = l{1(o-n A WA #) = l{1(o-n A Tn). Then let 0-"+1 = an, t"+1 =
t n A<3, n), and (1"+1 = (1" A #.
Case 3. 1jJ(0-") = l{1(a n A WA #) =1= l{1«(Tn A r"). Then let 0'"+1 = an 1\ r" 1\
(~,2n+ 1),t"+1 =.nI\C(O,2n+ l),(O,2n + 2»), and (1n+l = 0.

We now define f and 9 as follows: let »f(i,J» = U {rng(r")lnEN} and


~«i,j» = U {rng(a")ln EN}. By the double recursion theorem we may pick
i andj such that w-; = Hj(j.j» and W:J = ~(j.j»' Let Lo = w-; and L£ = ~.
It is clear from the construction that L o , L£E RE p • d ' We now argue that l/J
fails to FINT-identify at least one of L o and L£. There are two cases:

Case a. (T" is defined infinitely often by case lor case 3 ofthe construction.
In either case t = U a" is a text for L£ on which ljJ changes its conjecture
infinitely often. Hence l{1 fails to FINT-converge to L£ on t.
Case b. an is not defined infinitely often by either case 1 or case 3 of the
construction. Then, a" is defined cofinitely often by case 2 of the construc-
tion. In this case there is an n such that for every m ;:::. n, (T'" = (T". Let s E!!I
be such that for every i, Sj = #, and let t = Urn. Then an 1\ S is a text for L£
and an A t is a text for L o , and on each of these two texts l{1 converges to the
same index. But in this case La and L£ are not finite variants. Hence l/J fails
to FINT-identify at least one of La and L£.
iii. Let .P = {K U {x} Ix E N} . .P E [§r.c, tex t, EXT] (cr. the proof of propo-
sition 6.3.1 B). We show on the other hand, that f£ f [spec, text, BEXTJ.
Our argument parallels the proof oftemma 4.2.1C. Suppose to the contrary,
Criteria of Learning 137

that cp E :F rec BEXT-identifies!l' on text. Let a be a BEXT-Iocking sequence


for cp and K on text in the sense of exercise 6.1.2D. Then there is an n such
that for every r, if rng(r) s K, then cp(a 1\ r) < n. For every x, let t'" be a
computably generable text for K U {x}. Since cp BEXT-identifies 5f' on text,
for all but finitely many x E K, there is an m such that cp(a 1\ t~) ~ n, whereas
for every x E K and for every m, cp«(J 1\ t~ < n. This yields a positive test for
membership in K - D, for some finite set D and hence exhibits K as
recursively enumerable. D

COROLLARY 6.4.1A

i. [:F rec, text, I NT] c [:Free, text, BEXT].


ii. [~rec, text, BEXTJ c [~rec, text, FEXT].

Proof Part i follows from propositions 6.4.1A(ii) and 6.2.1A(ii). Part ii


follows from proposition 6.4.1A(i) and 6.3.3A(ii). 0

Exercises
6.4.1A Prove:
a. [jO'.' n jOeoIlIiIl.ll\ text, BEXT] c: ["OC, text, BEXT]. Compare this result with
proposition 6.3.1 E.
b. There is !i' s;; RE such that every Le!t' is recursive and !i' e [,r••, text,lNT] -
[,ree n jOe..... IltOftl, text, BEXT]. Compare this result with proposition 4.3.38.
6.4.18 Give an alternative proof of proposition 6.4.1A(ii) by showing that
!t' = {{ (O,j)} U{(x, y)IO < x and ye Kj}ljE N} UH(O,j)} U{(x,y)IO < x and
ye N} Ii eN} e [".', text, BEXT] - [,..." text, FlNT]. (Hint: To show that
.sf HjOre., text, FINT], show that otherwise X = {il HI, = N} would be 1:g,
contradicting the n¥-completeness of X. For 1:~ and ng, see chapter 1.)

6.4.2 BEXT-Identification in REm


In contrast to corollary 6.4.1A(i) we have the following proposition.

PROPOsmON 6.4.2A (Barzdin and Podnieks 1973, cited in Case and Smith,
1983) [,-r••, text, BEXTJ ...t = [9"'.c, text, INT].....

Proof It only needs to be shown that [,rec, text, BEXT).vt S


[,-.e., text, INTJovt. So suppose l/J E"ec BEXT-identifies !£ ~ RE.... on
text. We construct a ()E'-'ec which INT-identifies!£ on text. We describe
138 Identification Generalized

the computation of 8 on a ESEQ as follows. Let i be an index for ljJ, and let
A = {'Pi.lhia,(r)lr £ a}. Let B = {nEAlfor every m and for every j < lh(a),
if mE w".lh,a, and "1(m) = "1(a), then ",(m) = ",(a)}. B consists of ljJ's
conjectures on a which are not contradicted by data from a within running
time bounded by Ih(a). Now 8's conjecture on a is an index for the recursive
function given by the following computation. For each XE N simultaneously
compute 'P.(x) for nEB, and give as output the result of the earliest
terminating computation (in case of ties give the smallest result and in case
all these computations diverge, diverge). For every t E Y, ifrng(t) is BEXT-
identified by ljJ, then 8's behavior on t depends on only finitely many
indexes, so 0 l N'F-identifies t. D

Propositions 6.4.2A and 6.3.2B yield the following corollary.

COROLLARY 6.4.2A [ff"rec, text, BEXT]svt c: [.¥rec, text, sxrj.;

Exercise

6.4.2A Let convergence criterion rc be given. <pe:!F is called (C-confident just in


case for all t E:T, there is LE RE such that q> (&'~converges on t to L. f6'-confidence
generalizes confidence (section 4.6.2; see also exercise 6.3.1B). Prove: Let 2:,
.!£" E [.'Free n g-BExT-confident, text, BEXT]. Then 2' U!.t' E [yrec n ~BEXT'
confident, text, BEXT].

6.4.3 Bounded Fiuite Difference Extensional Identification

The liberalizations captured by bounded extensional identification and


finite difference intensional identification may be combined as follows.

DEFINITION 6.4.3A The convergence criterion {(L, SLllSL is a nonempty,


finite set of indexes for some one finite variant of L} is called bounded finite
difference extensional, abbreviated to BFEXT.

Thus 'i' E.9' BFEXT-identifies LE RE on text just in case 'P produces on


every text for Lan infinite, unbroken sequence of indexes, all of them for the
same finite variant of L, and all of them below some fixed bound.
The following result is a corollary of proposition 6.5.3A.

PROPOSITION 6.4.3A [.9'''", text, EXT] 't [.9'''", text. BFEXT].


Criteria of Learning 139

Exercise
6.4.3A Prove the followingdirect consequencesofdefinitions and previous results:
a. [5P'c, text, BEXn c [.:¥'"c,text, BFEXT].
b. [.:¥'"c,text, FINT] c [.:¥r.c, text, BFEXT].
c. [.:¥TeC, text, BFEXT] <;::; [.:¥r.c,text, FEXT].
'*
d. [':¥"c, text, BFEXT] [.:¥'cc, text, EXT].
e. [.:¥,text,BFEXT] = [.:¥,text,FINT].

*6.5 Finite Difference Identification

A very liberal kind of identification may be defined as follows.

DEFINITION 6.5A The convergence criterion {(L, {i I ~ is a finite variant of


L})ILE RE} is called finite difference, abbreviated to FD.

Thus qJEff FD-identifies LERE on text just in case qJ produces on every


text for L an infinite, unbroken sequence of indexes, all of them for finite
variants of L.

6.5.1 FD-Identification in RE
The principal results for FD-identification are as follows.

PROPOSITION 6.5.1 A [ff, text, FD] = [SF, text, FINT].

Proof The proof of this proposition is left for the reader. 0

As a corollary to propositions 6.5.lA and 6.2.1B we have the following


statement.

COROLLARY 6.5.1A Let ((1 be any convergence criterion defined so far in


this chapter. Then {N} U RE fi n Hff, text, '6'].

For a proof of the following proposition, the reader may consult Osherson
and Weinstein (1982, proposition 5).

PROPOSITION 6.5.1 B [SF r•c, text, FEXT] c [9'"rec,text, FD].

PROPOSITION 6.5.lC Let [I' be a denumerable subset of SF. Then


[9'", text, INT] $. [[1', text, FD].
140 Identification Generalized

Proof The proof of proposition 4.1A establishes this result as well. Note
that the proof of the claim there actually demonstrates that no tp E:!F FD-
identifies both .2"Q and 2'Q' on text, for Q '# Q'. 0

COROLLARY 6.5.1 B Let «j be any convergence criterion defined so far in


this chapter. Then [:!Free, text, «j] c= [S", text,~.

In particular, propositions 4.2.1A, 6.2.1C, and 6.3.1C are special cases of


corollary 6.5.1B.
The liberality of FD is brought out by its interaction with :!Froe and
recursive text, as revealed by the next proposition.

PROPOSITION 6.5.1 D (Case and Lynes 1982) RE E [:!FT«, recursive text,


FD].

Proof See exercise 6.5.2B. 0

The foregoing proposition should be compared to exercise 5.5.2B.

Exercise

6.5.1A Let 2' = {N}U{(N - «»


x NliEN}. Show that 2'¢[ff, text, FD]. Con-
elude that for all the convergence criteria '(f defined so far in this chapter.
2' i [ff, text, (6'].

6.5.2 FD-Identification in REs,.

FD is such a liberal convergence criterion that all of RE sv t can be identified


on text by a single recursive learning function. This is the content of the next
proposition, a proof of which may be found in Case and Smith (1983,
theorem 3.10).

PROPOSITION 6.5.2A (Harrington 1978, cited by Case and Smith 1983)


RE svt E [:!Free. text, FD]SVl"

COROLLARY 6.5.2A [:!Free. text, FD]sv, = [S". text, FD]svt.

COROLLARY 6.5.2B

i. [:!Free, text, EXT],v, c [ffteC, text, FD],vl'


ii. [§ree, text, FINTJ.vl c: [ff rcc •text, FD],v,'
Criteria of Learning 141

Corollary 6.5.2A should be compared to corollary 6.5.1B.

Exercises

6.5.2A (Caseand Smith 1983) For nEN. theconvergence criterion {(I., {il(W, - L) U
(L - W,) has no more than n elements} ILE RE} is denoted: FD(n). Prove:
a, Let n < m. Then [.:Free, text. FD(n)]svl c; [.:Free, text, FD(m)]SV1'
b. UIlEN [ff rec, text, FD(n)JsVl c; [ff ree• text. FDJsVl'
6.5.28 Derive proposition 6.5.1D from proposition 6.5.2A. (Hint: Use an internal
simulation argument.)

6.5.3 Bounded Finite Difference Identification

A bounded version of FD may be defined as follows.

DEFINITION 6.5.3A The convergence criterion {(L,SL)ISL is a nonempty


finite set of indexes for finite variants of L} is called boundedfinite difference.
abbreviated to: BFD.

Thus <P E:7' BFD-identifies LE RE on text just in case <p produces on every
text for L an infinite. unbroken sequence of indexes. all of them for finite
variants of L and all of them below some fixed bound.
The principal results for BFD-identification are as follows.

PROPOSITION 6.5.3A

i. [:7'''', text, BFD] 't [:7''"'. text. EXT].


ii. [:7"". text. EXT] 't [:7''"'. text. BFD].
Proof Part i of the proposition is an immediate corollary of previous
results. The proof of part ii is left as an exercise for the reader. 0

Note that Proposition 6.5.3A(ii) implies proposition 6.4.3A.

PROPOSITION 6.5.3B [:7"". text. BFD] c [:7''''. text. FEXT].


Proof We show that [:7''''. text. BFD] S; [:7"". text. FEXT]. The strict-
ness of the inclusion follows immediately from proposition 6.5.3A(ii).
142 Identification Generalized

Suppose that OE.9'''' BFD-identifies ff' on text. We construct from 0 a


'" E .9"" which FEXT-identifies ff' on text. For every a E SEQ, let "'(u) be an
index for w".,U U{W,"'.lh",n (D.... ,lh(t)}lt ~ e}, Such an index can be
calculated effectively from O(u).
Suppose that tE:Y is a text for some LEff'. We show that'" FEXT-
converges on t to L. Let X = (ilfor infinitely many n, OCta) = i}. Since 0
BFD-converges on t to L, we have X is finite, and for each i E X, 'HI; is a finite
variant of L. We may then choose n large enough so that for every m ;" n,
O(lm ) E X and for every i, j E X, there is an m < n such that i = O(lm ) and
w,,,., - H:J ~ w,,'.>.m n {D, ... , m}. But then for every m ;" n, W."., W."a)
=
and W1p(i"n) is a finite variant of L, that is, I./J FEXT-converges on t to L. 0

The following fact follows directly from definitions 6.4.3A and 6.5.3A.
PROPOSITION 6.5.3C [ffrec, text, BFEXT] s [,?"rec, text, BFD].
Whether the inclusion in proposition 6.5.3C' is proper is presently

unknown.

Open question 6.5.3A [.9'''', text, BFD] ~ [.9"", text, BFEXT]?


Finally, with respect to RE sv t we have the following.

PROPOSITION 6.5.3D (Case and Smith (983) [SO"', text, BFDJ.., =


[SO"', text, F1NTJ.".

Proof It only needs to be shown that [SO''', text, BFD] sv t ~ [SO"', text,
FINT]svt. So suppose IjJ E g;rcc BFD-identifies Sf ~ REsv! on text. We
construct a OE§"TeC which FINT-identifies !.f on text. The construction
is similar to that used in the proof of proposition 6.4.2A. We describe the
computation of 0 on uESEQ as follows. Let i be an index for 1jJ, and let
A = ('I',.lh,.)(t)lt ~u). For each jEA, let r(j,u) = card((n < Ih(u)lj =
'I".lh,.)(O'a)}), and let d(j, u) = card( (m E H:J.lh,.)lthere is an n < Ih(o-) such
that ",(m) = ",(ua) and "2(m) # "2(Ua)}). r(j,u) is the number of times
j is conjectured by '" on o and d(j, e) is the number of disagreements
registered by j with data from a in running time bounded by lh(u). Let
B = (jEAld(j,u) < r(j,u)}.
Now O's conjecture on (J is an index for the recursive function given by
the following computation. For each x E N, simultaneously compute 'l'ix),
j E B, and give as output the result of the earliest terminating computation
Criteria of Learning 143

(in case of ties give the smallest result, and in case all these computations
diverge, diverge). We leave it to the reader to verify that ift is a text for some
LEft', then e FINT-converges on t to L. D

Exercises

6.5.3A Prove the following:


a. [3", text,BFD] = [S", text,FINT].
b. [.'Free, text, BFD] c [.'F, text, BFD].
6.5.38 Prove proposition 6.5.3A(ii). (Hint: Combine the ideas used in the proofs of
propositions 6.3.1 D(ii)
, and 6.4.1A(iii).)

*6.6 Simple Identification

Children are unlikely to converge to grammars of arbitrary size and com-


plexity since the manipulation of such grammars demands excessive compu-
tational resources. We are thus led to examine a criterion oflearning that
favors convergence to small or "simple" grammars. To formulate this
criterion. we rely on terminology introduced in section 4.3.6.

DEFtNtTlON 6.6A(Chen 1982) Let total f E S"'" be given. The convergence


criterion {(L, {i))IWi = Land i is j-simple} is denoted: SIM (f).
It is evident that for any total f E .9"" such that f(x)::? x,
[.9', text, SIM(f)] ~ [.9', text,INT] and that RE." E [.9', text,INT]",.
The situation is different with respect to the recursive learning functions, as
the following result shows. The reader may consult Chen (1982, theorem
4.3) for a proof.
PROPOStTION 6.6A (Chen 1982) For all total fE.9'''', [.9'''',text,
SIM(f)]", C [.9'''', text,INTJ",.
COROLLARV 6.6A For all total fE.9"·', [.9"",text,SIM(f)] C [.9'''',
text,INT].
144 Identification Generalized

We next compare the strategy of simplemindedness (section 4.3.6) with


the convergence criterion SIM(f).

PROPOSITION 6.6B Let IE :F re• be such that I(x) ~ x for all x EN. Then,
[ffre. n :FsimpLeminded, text, INT] C [ffle., text, SIM(f)].

Proof It suffices to show the inclusion is strict, the inclusion itself being
a consequence of the definitions. By proposition 4.3.6A every If' E
[ff,e. n :Fsimpleminded, text, INT] is finite. Hence it suffices to show that
there is an infinite collection If' E [ffr.c, text, SIM(f)]' for every total
IE ffTCC. For each such I, we construct a l/J Effre. which SIM(f)-identifies
RE f in on text. l/J(a) is the index i ~ lh(a) with the following properties (if
there is no such i, let ljJ(a) be 0): (1) Jt;.lh(q) = rng(a), and (2) if j ~ lh(a) and
Wj,lh(q) = rng(a), then 10) ~ IU), and if 10) = IU), then i <j (i.e., i is the

fsmallest index less than lh(a) for rng(a) in running time bounded by lh(a).)
It is left to the reader to verify that l/J SIM(f)-identifies RE nn on text. 0

Exercises
6.6A (Chen 1982) Prove the following strengthening of proposition 6.6A:
U,o,aL!EjI"" [ff'OC, text, SIM(f)J sv, c [st",eo, text, INTJ sv,.
6.6B(Chen 1982) Lettotal I E ff'oc be given. The convergence criterion {(L, {i})1 H'.
is a finite variant of Land i is I-simple} is denoted: FINTSIM(f). Prove:
U,ol.l!ejI'''' [ff,e., text, FINTSIM (f) JSY, = [ff re", text, FINTJsY"

6.7 Summary

Figures 6.7A and 6.7B summarize the major results of sections 6.2 through
6.6. To interpret SIM(f), let total IE ,gp ec be given.

Exercise

6.7A For each convergence criterion '(j appearing in figure 6.7A add [ff, text, ~
to the figure.
Criteria of Learning 145

[J"'••, text, SIM(f)]


n
[,ree, text,INTJ

[STrec, text, BEXT] [ji're., text, FINT]

c- -?
n [y'oc, text,BFEXT]

[Y'··, text, EXT] [ST'··, text, BFD]


(' -?
[yre<, text, FEXT]

n
[ji"'o, text, FD]

n
[STree, recursive text, FD]

&'(RE)

Figure 6.7.A

6.8 Characteristic Index Identification

6.8.1 Cl-Convergence

Normal linguistic development may culminate in a test for the ambient


language and not merely in a positive test. This hypothesis is suggested by
speakers' apparent ability to classify arbitrary strings as either grammatical
or ungrammatical in their language (cf. Putnam 1961). Assuming that
human linguistic competence is machine simulable, such ability entails that
natural languages are recursive sets of sentences, not merely recursively
enumerable. In this section we consider the convergence criterion that
corresponds to this hypothesis.
146 Identification Generalized

[§"ree, text,SIM(f)]••,

n
[9Ore<, text,INTJ sv,

[.Free, text, BEXTJ...

n
[.F Te. , text,FINT] svt

[ji'Te., text,BFD].YI

n
[Fre., text, EXTJevt

n
[.Free, text, FDJ svt

[F, text,INTJ.Y,

&'(RE..,)

Figure 6.7.B

DEFINITION 6.8.1A

i. i E N is said to be a characteristic index for LE RE just in case cpj is the


characteristic function for L.
ii. If i EN is a characteristic index for some LE RE, then i said to be a
characteristic index.

Plainly only recursive languages have characteristic indexes. As can be seen


from lemma 1.2.1 B, for each recursive language there are infinitely many
characteristic indexes.

DEFINITION 6.8.1B (Gold 1967) The convergence criterion {(L,{i})Ii is a


characteristic index for L} is called characteristic index, abbreviated to: CI.

Thus cP E:F Cl-identifies LE RE on text just in case for all texts t for L, cp
Criteria of Learning 147

converges on t to an index for the characteristic function of rngtr), Obvi-


onsly only recursive languages can be CI-identified on g, for any nonempty
evidential relation g. Observe too that CI-convergence is not a special case
of INT convergence, since if i E N is a characteristic index for LE RE, then
W; # L unless L = N. Finally, note that 'PE:7 may CI-converge to LE RE
on t E:T even though for finitely many n E N 'PO'.) is not a characteristic
index.

Exercises

6.8.IA
a. Prove [S',text.CI] ~ [S',text,INT] n&,(RE",j.
b. Prove [S',informant, CI] ~ [S',informant, INT] n &'(RE"ol ~ &,(RE",).
6.8.1B
a. Prove [ffre<:, text, CI] ~ [ffre<:. text, INT] n&(RE re<:).
b. Prove [g;re<:, informant, CI] ~ [g;re<:. informant, INT] n&(RE rec ).

6.8,2 CI-Identification on Text and on Informant


.
CI-identification by arbitrary learning function is considered in exercise
6.8.lA. In this subsection we focus on ;y;re<:.

PROPOSITION 6.8.2A (Freivald and Wiehagen 1979) RE,. n RE,," I"


[y;re<:, informant, CI].

In other words, no recursive learning function CI-identifies the class of


recursive, self-describing languages on informant.

Proof Given e E:7"" we exhibit LE RE,. n RE,," such that'" does not CI-
identify L on informant. The construction of L should be compared to that
used in the proof of proposition 4.6.1 C.
In preparation for an application of the recursion theorem, we define a
total recursive function f such that for all i E N, HI'i) has the following
properties:
I. i is the least element of HI'i)'
2. "'f(i) is recursive.
3. '" fails to CI-identify HI,,) on informant.
148 Identification Generalized

Given i E N, we construct "f(I) by stages. Wj(i) is the finite piece of "f(i)


enumerated through stage s, and US is the finite initial segment of an
ascending informant for ~(i) constructed through stage s.

Construction

Stage 0:
uO=«O,I), ... ,<i-I,I),<i,O».

WJ(i) = {t},
Stage s + 1: Let! be the least CT E SEQ such that CT :::; s and for some n,
a = «lh(u S ) , 1), ... ,<lh(u S ) + n,I», and <p"'(<1'1'<1).s(lh(u') + n + 1) = 1, if
there is such a o; let! = 0, otherwise.
Let us+! = US A r A <lh(CT S) + n + 1,0), if r "*
0; = US, otherwise.
Let Wit: = Wi(i) U {lh(CT') + lhtr) - I}.

We let ~m = U {WjmlsEN}. It is clear that "f(i) satisfies condition 1.


Condition 2 is ensured by the fact that ~(j) is enumerated by the construc-
tion in increasing order. To verify condition 3, we argue by cases.

Case 1. HjU) is infinite. Then t = U CT s is an informant for Hj(i) and for


infinitely many n, l/I(tn) is not a characteristic index for l1j(i).
Case 2. l1j(j) is finite. Then a = U CT s is a finite sequence. Let t = CT A u,
where Un = <lh(CT') + n, 1), for every n E N. Then t is an informant for Hj(i)
and for infinitely many n, !/t(rn) is not a characteristic index for Hj(i).

By the recursion theorem, let i be such that J.ti = l1j(il. Then W'i E
RE. d n RErecs and l/J fails to CI-identify J.ti. 0
Using exercise 6.1.2E, we obtain the following corollary.

COROLLARY 6.8.2A RE' d n RE rec ¢ [g;;rec, text, ell


On the other hand, from exercise 4.2.1 E, RE sd E [g;;rec, text, INT]; hence
RE'd n REm E [g;;rec, text, INT], and RE sd n REm E [g;;rec, informant,
INTJ. Along with exercise 6.8.1B these facts yield the following.

COROLLARY 6.8.2B [g;;r.c, text, CI] c [g;;rec, text, INT],w

COROLLARY 6.8.2C (Freivald and Wiehagen 1979) [g;;r«, informant, CI] c:


[g;;r.c, informant, INT]rec'
Criteria of Learning 149

Exercises
6.8.2A What is the relation between U;i:nc, text, INTJrec and [yrec,
informant, CI]?
6.8.2B Prove: [yrec n ymemorylimited, text, CI] c: [yrec, text, crj. (Hint: See the
proof of proposition 4.4.1F.)
6.8.2C Prove: [ffree n ffeonservative, text, CI] c: [ff ree, text, Cl]. (Hint: See the proof
of proposition 4.5.1B.)
*6.8.2D Prove: [yree nyPopperian, text, CI] c: [yrec, text, crj, (Hint: Consider
{{<O,i)}U({l} x ~)Ij is a characteristic index for W;}.)
*6.8.2E (Gold 1967) Let REp, be the set of primitive recursive languages. Show
that REpre [ff rtC , informant, CI].
-,
6.8.2F Let the evidential relation imperfect informant be defined as in exercise
5.6.2C. Show that there is!i' £; RE such that <a) every LE!i' is infinite, (b)for every
L, L'e!l', if L#-L', then LnL'=0, and (c) Ii'e[yrec,informant,CI]-
[ff ree, imperfect informant, CIl

*6.8.3 Variants of CI-Identification

The definition ofCI-identification may be modified in several ways in order


to accommodate imperfect stability or imperfect accuracy. For this purpose
the following definition is central.

DEFtNlTtON 6.8.3A Let LeRE", andj, n e N be given.

i. j is said to be an n-finite difference characteristic index for L just in case


for all but at most n many x e N, if XE L, then <p;(x) = 0 and if x ¢'L, then
<Pj(x) = 1.
ii. j is said to be a finite difference characteristic index for L just in case j is
an rn-finite difference characteristic index for L for some mE N.

Note that a finite difference characteristic index need not be a characteristic


index for a finite variant of L, for the latter index, but not the former, needs
to correspond to a total function.

DEFINlTtON 6.8.3B (Case and Lynes 1982)

i. The convergence criterion {(L, {n})ln is a finite dilTerence characteristic


index for L} is called finite difference intensional characteristic index, ab-
breviated to: FINTCI.
150 Identification Generalized

ii. For n E N, the convergence criterion {(L, {m))Im is an n-finite difference


characteristic index for L} is called n-finite dijJerence intensional character-
istic index, abbreviated to: FINT(n)CI.

Thus FINTCI and FINT(n)CI may be thought of as counterparts to FINT


(definition 6.2A) and FINT(n) (exercise 6.2.1 F).
PROPOSITION 6.8.3A (Case and Lynes 1982, theorem 2) [.:Free,informant,
FINT(n + I)CI] 'Of, [S""',informant, FINT(n)].

As Case and Lynes (1982) point out, the interpretation of this proposition is
rather remarkable. It implies the existence of a collection of recursive
languages for which decision procedures with n + 1 mistakes can be syn-
thesized in the limit (on informant) but for which positive tests with only n
mistakes cannot be so synthesized. The proof is omitted.
COROLLARY 6.8.3A [S","o,informant, FINTCI] 'Of, U"N [S"'.o, informant,
FINT(n)].
Naturally [~rec, informant, FINTCI] ~ [~rcc, informant, FINTJ, and
similarly for FINT(n)CI and FINT(n).
The next definition provides the CI counterparts of EXT (definition 6.3A)
and FD(n) (exercise 6.5.2A).
DEFINITION 6.8.3C (Case and Lynes 1982)
i. The convergence criterion {(L,SJlevery XESL is a characteristic index
for L) is called extensional characteristic index, abbreviated to: EXTCI.
ii. For nEN, the convergence criterion {(L,SL)levery XESL is an n-finite
difference characteristic index for L) is called n-finite difference character-
istic index, abbreviated to: FD(n)CI.

Note that EXTCI = FD(O)CI.


PROPOSITION 6.8.3B (Case and Lynes 1982) [S"'"o, text, INT] 'Of,
U"N [S""', informant, FD(n)CI].
Proof We show that RE,. n RE,"o" U ([S""', informant, FD(m)CI] 1m E
N). The proof is almost identical to the proof of 6.8.2A. In particular,
at stage s + I modify the condition on a to require 'Po(o- oJ.,(lh(u') +
A

n + 1) = I, ... , 'P o(" A oJ.,(lh(u') + n + m + 1) = 1. The argumentthen pro-


ceeds exactly as before but with the conclusion that l/! does not FD(m)CI-
identify J.Ij(iJ on informant. 0
Criteria of Learning 151

It may be observed that the proof of proposition 6.8.3B yields all the results
of section 6.8.2 as corollaries.

'* [S'""',informant, FINT].


PROPOSITION 6.8.3C (Case and Lynes 1982, theorem 4) [S'""',informant,
EXTCI]
Following Case and Lynes (1982), we may interpret the proposition as
follows. There are classes of recursive languages for which an infinite,
uninterrupted sequence of completely correct decision procedures can be
synthesized in the limit (on informant) but for which single positive tests
with a finite number of mistakes cannot be so synthesized. We omit the
proof.
Finally, the next proposition shows that for some collections ofrecursive
languages, an infinite, uninterrupted sequence of decision procedures each
with up to n + 1 mistakes can be synthesized in the limit (on informant),
but an infinite, uninterrupted sequence of positive tests each with up to n
mistakes cannot be so synthesized. Here again the proof is omitted.

'*
PROPOSITION 6.8.3D (Case and Lynes 1982, theorem 5) For every nEN,
[S'""', informant, FD(n + l)CI] [S'""', informant, FD(n)].
7 Exact Learning

7.1 Paradigms of Exact Learning

The converse ofthe dictum that natural languages are learnable by children
(via casual exposure, etc.) is that nonnatural languages are not learnable.
Put differently, the natural languages are generally taken to be the largest
collection of child learnable languages. We are thus led to consider learning
paradigms in which learning functions are required to respond successfully
to all languages in a given collection and to respond unsuccessfully to all
other languages. For this purpose the following definition is central.

DEFINITION 7.1A Let learning strategy .51', evidential relation C, and con-
vergence criterion ce be given.

i. tp E ff is said to 'e'-identify if S;;; RE on Iff exactly just in case (a) tp


'e'-identifies if on Iff and (b) for no LE if does (J) 'e'-identifv L on C.
ii. The class {if S;;; RElsome tp E .51' '<i-identifies if on g exactly} is denoted:
[9', C, '<f]ex.

The following difference between exact and inexact learning should be


kept in mind. Let evidential relation g and convergence criterion ce be
given. If <p E §' CC·identifies 2 s;;; RE on C, then <p CC·identifies every proper
subset of 2 on C and tp may identify proper supersets of 2 on C as well. In
contrast, if qJ 'e'-identifies !E on C exactly, then qJ identifies no collection
2' # !i' on C exactly.
It is also worthwhile to note the following. Suppose that tp E§' INT-
identifies!E s;;; RE on text exactly, and choose LE RE such that L rf. 2. Then
tp fails to converge to an index for L on at least one text for L. qJ may,
however, identify other texts for L. The reader should generalize this
remark to the case: tp E§' ce-identifies !i' S;;; RE on C exactly for arbitrary
convergence criterion (C and evidential relation C. Observe, finally, that if
children INT-identify the collection of all natural languages on text exactly,
then for every nonnatural language there is some text on which children fail
to converge to a correct index. But there may well be (benign) texts for
nonnatural languages that allow children to converge to an appropriate
index.
Exact Learning 153

Example 7.1A

a. Let J E.fF be as described in part a of example L3AB. Then J INT-identifies RE n n


on text exactly. This is because (i) proposition 1.4.3A shows that J INT·identifies
RE nn on text, and (ii) f INT-identifies no proper superset of RE n n on text since Fs
conjectures are limited to indexes for finite sets.
b. Let hE.fF be as described in the proofof proposition 1.4.3B.Then hINT-identifies
RE s" on text exactly.
c. From the proof of proposition 2.3A it easy to see that RE sd E [g;-'.o, text, INT]·'.
d. Let .IE = {K U DID finite}. Then .IE E [,:?Free, text, EXT1··. To see this, let a E ,:?Fro<
be as defined in the proof of proposition 6.3.1 B. Then 9 EXT-identifies .IE on text,
and g EXT-identifies no LE!t' on text since g's conjectures are limited to indexes for
finite extensions of K.
e. Let .IE = {N - D[ D finite}. Then !£ E[.fFree, noisy text, FINT]·'. To see this,let
h e [pee be as defined in the proof of proposition 6.2.1 A. Then h FINT-identifies .IE
on any evidential relation, and h can FINT-identify no proper superset of !.e, since
h's conjectures are limited to a single index for N. In contrast, h does not FINT-
identify {N} U {N - {X}IXEN} exactly.

Suppose that for some evidential relation If and convergence criterion C(j
children C(j-identify the class of natural languages on @. Then we may
require of a theory of human linguistic competence that the collection z' of
languages it embraces be a member of [.?", C, 'C). Should we also require
that !e E [.?", @, C(jJ'? Although the first paragraph of the present section
suggests an affirmative response, the matter is clouded by the following
consideration.
Natural languages are not only learnable, they are also highly expressive
in the sense that very many thoughts can be communicated within anyone
of them. Let us therefore stipulate that a language be counted as natural just
in case it is both learnable and highly expressive. Now consider the im-
poverished language consisting of the single expression "Go" with its usual
meaning. The Go-language is not highly expressive. On the other hand, the
Go-language may well be learnable by children through casual exposure. If
so, then not every learnable language is natural, and hence the natural
languages are a proper subset of the class of learnable languages. This
entails that a theory of natural language can be legitimately evaluated
against the standard of identifiability but not against the standard of exact
identifiability.
It may be possible to disarm the foregoing objection to exact learning as
follows. There is evidence that children exposed to inexpressive languages
(e.g.,pidgins) as well as children denied access to any ambient language (e.g.,
154 Identification Generalized

deaf children in certain circumstances) invent linguistic devices of consider-


able complexity and communicative potential (see SankofT and Brown
1976, Feldman, Goldin-Meadow, and Gleitman 1978). These findings sug-
gest that children may not be capable of learning profoundly inexpressive
languages. If this is true, then the natural languages coincide exactly with
the learnable languages, and exact identifiability is the appropriate stan-
dard for the evaluation of theories of comparative grammar.
Finally, suppose that certain inexpressive languages turn out to be
learnable after all. In this case it is possible that comparative grammar can
be investigated more successfully if such languages are admitted as natural,
perhaps as special cases of natural languages. Exact learning would then,
once again, be the appropriate standard of Iearnability,
In this chapter we consider the inclusion relations among [S", <t, 'G'] and
[S", <t, 'G']" as S", <t, and 'G' vary over learning strategies, evidential relations,
and convergence criteria, respectively. Section 7.3 includes a survey of such
relations for a variety of strategies. evidential relations, and convergence
criteria, while section 7.4 discusses the complex relations among (and
interest of) generalized identification paradigms in the context of exact
learning. Section 7.5 introduces a strengthening of the notion of exact
identification and establishes its relation to exact identification. Most of
the results of the chapter are consequences of a characterization of those
collections that are exactly identifiable by recursive functions which is
established in the next section.

Exercises

7.1A Let L, ~ «i,n)lnE Wi}, and let !E ~ {L,I Wi¢RE".}. Show that !EE
[!Free, text,INn u . This collection figures in the proof of proposition 4.3.2A.

7.1B Prove: Let 5fJ !;; RE, learning strategy g, evidential relation S and conver-
gence criterion C(f be given, and suppose that :l! E [sP, 8, ~. Then there is .!l" S; RE
such that!E <;; !E' and !E' E [Y',<3', 'C]".
7.le Let learning strategy Y, evidential relation 8, and convergence criterion f(l
be given.
a. Show that 2"'E[9',8,f(l:Y~ if and only if there is a CfJE9' such that 2 = 2".<t(cp)
(ef. exercise 6.1.2A(ii) for !E...(q>)).
b. Prove: rY', 8, ~ .. <;; rY', 8, 'C].
c. Prove: Card([9', 8, CC]U) :::;; card(9'). Conclude that [§"rec, 8, rtJel is at most
countably infinite.
Exact Learning 155

*7.2 A Characterization of [jii'ree, text, INTr x

Under what conditions can a collection of languages be exactly identified


by recursive learning function? The following proposition answers this
question completely. In order to state and prove this proposition and the
next, we need to introduce some notions from the theory of analytic subsets
of N. The reader may consult Rogers (1967, chs. 14,16) or Shoenfield (1967,
chs. 6, 7) for additional background.

DEFINITION 7.2A

i. A set X £ N is n ~ just in case there is a recursive set Y £ N such that


for every n, n E X if and only if for every t Eff, there is an mEN such that
(tm,n)E Y.
ii. A set X £ N is:E~ just in case N - X is n}.
iii. A set X s;; N is m-complete just in case X is Il] and for every Y £ N,
if Y is n L then there is an f ESFtotal n SF'·· such that for every n EN, n E Y
if and only if f(n) E X. Such an f is said to reduce Y to X.
iv. .fR s;; RE is n~ (:ED indexable just in case there is a nl (:ED set X such
that.fR = {Jt;liEX}.
v. A set X £ SEQ is a £-chainjust in case X is linearly ordered by s;;;.
vi. A set X £ SEQ is a tree just in case for every a, 1"ESEQ, if U E X and
r £ (1, then 1"E X, and for every t Eff, there is an n such that t n 1: X.
vii. RE lre• = {X E REIX is a tree}; re,re. = {iEN IJt; ERE tr • e}.

The following facts will be appealed to in the proofs of the next two
propositions.

1. re,r.e is n1-complete. In consequence RE tr•• is rr: indexable.


2. If !I! is n t indexable, then {iI Jt; E!I!} is n ~.

PROPOSITION 7.2A !I! E[yr.", text, INT]CZ if and only if !I! E [SFr.", text,
INT] and se is n t indexable.

Proof Suppose:t' E [SF"·, text, INT] and .fR is n~ indexable. Let Y =


{iIJt;E !f}. By facts 1 and 2 there is a total recursive function f that
reduces Y to re lre.; that is, for every n, n E Y if and only if f(n) E reuee"
Given any uESEQ, let X.. = {m < Ih(u)lum #- #}, and let Z a; n =
XI! n U/(nJ,lh(a)·
Given rjJESF r•c which identifies P' on text, we construct 'PEyre. which
identifies!l' on text exactly. By propositions 4.3.IB and 4.6.3.A we may
156 Identification Generalized

assume without loss of generality that tf; is total and order independent.
We proceed to define <pea) for every a E SEQ. We first treat the case in
which rng(o] = 0. Let e be an index for 0 and eo be an index for {OJ. If
o E:l', then <p(0") = e, and if 0 ¢:l', then <p(0") = eo. For any O"E SEQ, if
rngte) =I- 0, then

() {e, if Za, 'I'(a) is a £ -chain and card(Za,'I'(a» > card(Za- ,'I'(a-»'


<p 0" = t/J((1), otherwise.

We claim that cp identifies :l' on text exactly. First, <p identifies :l' on text,
for let t E$ be a text for some LE fE. Then l/J converges on t to some a E Y.
But then f(a) E rclre<' and hence for large enough m, for every n > m, either
Z'n.'I'(ln) is not a £;; -chain or card(Ztn,,.,(,J = card(Zlm.'I'llml)' Consequently, <p
converges on t to a.
On the other hand, suppose L¢:l' and, for reductio, suppose that <p
identifies L on text. Then tf; must identify L on text. Since tf; is order
independent, there is an i such that for every text t for L, t/J converges on
t to i. Since J!V; = L¢.:£, l1j-(il is not a tree. Hence there is an infinite
set X s: l1j-(i) such that X is a £;; -chain. Let t be a text for L such that
U {Xd n E X} = X. Then by the definition of cp, <p(tn ) = e for infinitely
many n E N. Hence <p fails to identify t, and therefore <p fails to identify Lon
text.
Turning to the converse, suppose that .:£ E [ff rec, text, INT]eI. It follows
immediately that .:£ E [ff rec, text, INT]. Using the definition of [ff rec, text,
INT]ex, a straightforward, if tedious, Tarski-Kuratowski computation
verifies that .:£ is nl indexable, The reader unfamiliar with such computa-
tions should consult Rogers (1967, chs. 14, 16) or Shoenfield (1967, chs. 6,
7).0

It should be noted that the same sort of computation as referred to in the


preceding proof may be used to verify that if .:£ E [ff rec, S, ~]e., then .:£ is
Il ~ indexable, for each of the evidential relations and convergence criteria
considered in section 7.3.
The following proposition shows that certain computationally complex
collections of languages are exactly identifiable on text by recursive learn-
ers. It will be deployed below to distinguish [ffrcc, recursive text.Ibl'F]"
from [ff rcc, text, INT]c., and to show that the notion of very exact learning
introduced in section 7.4 is strictly stronger than exact learning with respect
to the paradigm [ffrcc, text, INT].
Exact Learning 157

PROPOSITION 7.2B There is an !l' E [9'"ree, text, INT]O' such that!? is not
1: ~ indexable.

Proof Let L= RE. d n REt, ee- It is clear that!? E [9'"reo, text, INT] (cf. the
proof of proposition 2.3A). A simple Tarski-Kuratowski computation suf-
fices to verify that If is n~ indexable. It then follows from proposition 7.2A
that!? E [ji>r0c, text, INTJ'.
That If is not 1:~ indexable follows from the bounded ness theorem for 1:~
sets and the fact that every r.e. set is a finite variant of some member of RE' d
(cf.lemma 2.3B). The reader may provide more detail for this argument by
consulting Shoenfield (1967, ch. 7). D

Exercises
7.2A Let "II'" £ &'(RE) be a class of collections of languages, and let !f' £ RE be
given. !f' is called saturated with respect to "II'" just in case 2' E "11'", and for every
proper superset!f" of 2',2" ¢ "11'". 2' is called maximal with respect to "II'" just in case
!f' E "II'" and for some LE RE, 2' U {L}It "II'" (cf. exercises 2.2E and 4.6.2C).
a. Prove: 2' £ RE is saturated with respect to [9"'.c, text, INT)"' if and only if
!f' = REn n.
b. Prove: If 2' £ RE is maximal with respect to [SO'·c,text, INT)"" then !f' is
maximal with respect to [.:F,.e, text, INT).
c. Show that the converse to b is false.
7.28 Prove that if 2' E [9""., text, INT]"' and !f" E [3'"""., text, tNT]"', then
!f' x !f"E [3'""' ec , text,INT]·'.

7.3 Earlier Paradigms Consideredl in the Context of Exact Learning

In this section we consider exact versions of several of the paradigms


introduced in chapters, 4,5, and 6. Formulation of propositions asserted in
this section is independent of material introduced in section 7.2. However,
proofs ofthese propositions rely on propositions 7.2A and 7.2B, along with
defined notions figuring therein.

7.3.1 Strategies and Exact Learning

The following proposition should be compared with proposition 4.3.1B.


PROPOSITION 7.3.IA [ji>reen ji>tOl.l, text, INTy' = [9",.e, text, INT]O'.
158 Identification Generalized

Proof The proposition is an immediate consequence of proposition 7.2A.


Note that the <p constructed in the proof of that proposition is total. 0

The next proposition should be compared to proposition 4.3.4A.


PROPOSITION 7.3.1B [g-rec n g-prudent, text, INn e• c [g-re., text, INTJe •.

Proof The inclusion is obvious. Strictness follows from proposition 7.2B


and the r.e. indexability of any If E [g-rec n g-prudent, text, INn e•. 0

The following proposition should be compared to proposition 4.5.1 B.


PROPOSITION 7.3.1C [g-rec n g-conservativt, text, INn e• c [g-re., text, INTJe•.

Proof The proposition follows immediately from the proofs of propo-


sitions 4.5.1B and 7.2A. 0

The following proposition should be compared to proposition 4.6.3A.


PROPOSITION 7.3.1D [g-re. n g-0rder independent, text,INTJe. = [g-rec, text,
INTJe,.

Proof The proposition follows from proposition 4.6.3A and the fact that
the <p constructed in the proof of proposition 7.2A is order independent. D

Exercises
*7.3.1A Prove: [ff r•• , text, INT]°' $. [.'Fro. n .'Fs.ldri•• n, text, INT]. (Hint: See the
proof of proposition 4.4.2A.)
*7.3.1B Show that if texts with #'s are excluded from !7 then proposition
7.3.1A fails (Hint: Construct a collection :f of singleton languages with
:f E [.'Fr0e, text, INTJ" - [ffre. n .'F,olal, text, INT]". Discover where the proof of
proposition 7.2A fails for singleton languages in the absence of texts with # 's.)
7.3.le Let:f s; RE be given. Show that :f E [ffro. n ffPrudon" text,INT]" if and
only if:f is r.e, indexable and !l' E [.'Fr0e, text, INT].

7.3.10
a. Prove: Let :fe [.'F roe n .'Fsoldriv.n n ffeonsorva'lve, text, INTJ". Then !l' is r.e,
indexable.
b. Prove: There is :f S; RE such that :f is not r.e. indexable and :f E
[§l"or:c n ffconscnrativc, text, INT]e:I.
c. Conclude from a and b that [~roe n ~S.' driven n ~eonservalive, text, INT]"' c
[,'Free n ,'Feonserva'l.', lext, INT] e,.
Exact Learning 159

*7.3.2 Environments and Exact Learning

In this section we reconsider a few of the principal evidential relations


studied in chapter 5.
Let cp Effrec [NT-identify 5t' ~ RE on noisy text exactly. Are we guaran-
teed the existence of IjJ E :!F,eo that INT-identifies 9! on text exactly? The
"exactly" qualifier renders the matter nonobvious. Perhaps there are not
enough texts to prevent IjJ from identifying some LE 5t' (noisy texts might be
needed for this purpose). The next proposition settles the matter.

PROPOSITION 7.3olA
i. [:!Free, noisy text, INTy' c [:JFrcc, text, INT]Cl.
ii. [ffrec, incomplete text, INT]Cl c [ff rec, text, INT]Cl.
iii. [§.<o, imperfect text,INTJ"" c [§rce, text, INT]c,.

Proof For each of the environmental relations If mentioned in i, ii, and iii
we have if .PE [ffrec,tt, INT)"' then .PE [ffrcc,text, [NT] and.P is n~
indexable (see the remark following the proof of proposition 7.2A). Hence
eae? of the inclusions follows by proposition 7.2A.
The strictness of the inclusions in i, ii, and iii may be inferred from
proposition 7.2A and the n1 indexability of the collections of languages
constructed in propositions 5.4.1A and 5.4.2A. D

We turn now from considering texts with imperfections to look at


recursive text in the setting of exact identification. The following result
should be compared to proposition 5.5.2B(i)_ _

PROPOSITION 7.3.2B [jOrcc, recursive text, INT]o, c [9"00, text, INT] Cl.

To prove this proposition, we need to introduce the notion of an arith-


metical subset of N. The interested reader may consult Rogers (1967, ch. 14)
for further information about arithmetical sets.

DEFINITION 7.3.2A A set X 5; N is arithmetical just in case there is a


recursive set Y such that for every n, nEX if and only if Q1n 1 E
N ... QknkE N (n J , ... , nk, n) E Y, where each Qj is either the quantifier
"for all" or the quantifier "there exists."

The following facts about arithmetical sets will be needed in the proof of
proposition 7.3.2B:

i. If X is arithmetical, then X is Il].


ii. If X is not 1:L then X is not.arithmetical,
160 Identification Generalized

Proof of proposition 7.3.28 We claim that if !i' E [ff"', recursive text,


INT]", then {i IW. E!i'} is arithmetical. This follows by a Tarski-Kuratow-
ski computation from the definition of [ff"', recursive text, INT]".
The inclusion now follows immediately from i and propositions 5.5.2B(i)
and 7.2A, while its strictness follows from ii and proposition 7.2B. 0

Exercises

7.3.2A Prove the following variants on proposition 7.3.2A.


a. noisy text, EXTJ cX c: [gootee, text, EXT]u.
[ff rec ,
rec
b. [ff , incomplete text, FINTJu c; [§"rec, text, FINTJu.

7.3.3 Convergence Criteria and Exact Learning

We turn next to a sample result bearing on alternative convergence criteria


in the context of exact learning. It should be compared to the relevant
portion of figure 6.7A.

PROPOSITION 7.3.3A
i. [9"rec, text, INTJ~X c; [Si'rec, text, FINTrll..
ii. [!!Free, text, INT]CX c; [g;rec, text, EXT]cx.
iii. [:Free, text, FINT]CX '*
[!!Free, text, EXT]cx.
iv. [!!Free, text, EXTJe:l '*
[:Free, text, FINT]cx.

Proof Left to the reader. 0

*7.4 Very Exact Learning

It may be that natural languages have such special properties that no text
for a nonnatural language leads children to a correct and stable grammar.
The next definition allows us to formulate one version of this hypothesis.

DEFINITION 7.4A

i. <{! E ff is said to identify !i' ~ RE very exactly just in case for all t E:T, <{!
identifies t if and only if t is for some LE!i'.
ii. For.9' ~ ff, the class {!i' ~ REI some <{! E.9' identifies!i' very exactly} is
denoted: [.9', text, INT]"'.
Exact Learning 161

It is evident that [9', text, INn vex = [9', text, INT].


The following lemmata, together with the results ofsection 7.2, show that
very exactness is a more stringent requirement than exactness for recursive
learners.

LEMMA 7.4A Let ({JE9' identify !f' ,; RE very exactly. Then, for all LE RE,
LE!f' if and only if there is a locking sequence for <p and L.

Proof Sufficiency follows immediately from definition 7.4A. Necessity is a


consequence of proposition 2.1A. 0

DEFINITION 7.4B !f' is arithmetically indexable if and only if {i I Wi E!f'} is


arithmetical. (See definition 7.3.2A for "arithmetical.")

LEMMA 7.4B If !f' E [9're., text, INTJvex, then !f' is arithmetically


indexable.

Proof Suppose.!f' E [9" ec, text, INTJvox. Then, by the preceding lemma,
there is <p E9'ro. such that Wi E!f' if and only if there is a locking sequence
for Jt; and <po But then {iI W; E.!f'} is arithmetical. This may be verified by
examination of the definition of "locking sequence" (definition 2.1A) and a
simple Tarski-Kuratowski computation. D

PROPOSITION 7.4A [g;re., text, INTr u <= [g;ro., text, INTJ".

Proof The inclusion is an immediate consequence of proposition 7.2A and


lemma 7.4B. Its strictness follows directly from proposition 7.2B, fact ii of
section 7.3.2 and lemma 7AB. 0

In contrast to proposition 7.4A, we have, as a direct consequence of


proposition 4.3.4A, that every collection identifiable by a recursive learner
is contained in some collection that may be identified very exactly by some
recursive learner (see exercise 7.4B).

PROPOSITION 7.4B If!l' E [9" ec , text, INT], then there is :£' s RE such
that if' s !i" and if" E [g;ro., text, INTJvox.

Proposition 7.4B should be compared to exercise 7.1B.


For arbitrary learning strategy 9" and convergence criterion re, the class
[[I', text, rer·· is defined in the obvious way. The natural interpretation of
[[1',8, reJvox for arbitrary evidential relation If is almost as straightforward.
We leave the details to the reader. Issues parallel to those discussed in the
present section arise with respect to any such class.
162 Identification Generalized

Exercises

7.4A

a. For n EN, let S. = to, 1,... , n). Prove: {N - S.lnEN} E [3""', text,INTJ"'.
b. Prove: {K UDID finite} E [3"''', text,EXTJ"".
7.48 Prove: [ji'rec n§prudcnt, text. INT]u = [§"'c n§prudent, text, INTJ"u. De-
duce proposition 7.4B as a corollary.

7.5 Exact Learning in Generalized Identification Paradigms

Having introduced the distinction between exact and nonexact learning, we


may now characterize the class of generalized identification paradigms in
the following way. Each such paradigm is the result of specifying (I) a
learning strategy, (2)an evidential relation, (3) a convergence criterion, and
(4) a choice among exact, very exact, and nonexact learning. In addition
attention may be restricted to various subsets ofRE, for example, REM' the
total, single-valued r.e. sets. Each generalized identification paradigm de-
termines an associated family oflearnable collections oflanguages, denoted
by the bracket notation developed in chapters 4 through 7. Exercise 7.5A
provides an example of a learning paradigm that lies outside the family of
generalized identification paradigms.
Evidently a vast number of generalized identification paradigms may be
defined, giving rise to an equally vast number of questions about inclusion
relations, and so forth. In the face of such a multitude of potential research
topics, one must rely on empirical considerations (especially facts about
normal language acquisition) to motivate study of particular paradigms.
In the case oflinguistic development, we seek a strategy Y' that includes
the child's learning function, an evidential relation $ that represents typical
(or "sufficient") linguistic environments, and a convergence criterion 'fl that
represents the child's actual linguistic achievement such that the class of
natural languages falls into [Y', $, 'i&']". The narrower this class, the more
interesting are the hypotheses about Y', $, and 'C. The existence of empiri-
cally interesting Y', If, and 'C ofthe nature just described cannot be assumed
a priori. It is possible that normal language acquisition cannot be usefully
described within the constraints offered by generalized identification para-
Exact Learning 163

digms. The investigation of alternative kinds of learning paradigms is thus


motivated on both formal and empirical grounds.

Exercise

7.5A qJe§' is said to partially identify te.r just in case (a) there is exactly one i e N
such that for infinitely many j eN, qJ(Tj ) = i, and (b) It; = rng(t). qJ E §' is said to
partially identify !/ ~ RE just in case qJ partially identifies every text for every
Le!/. [§', partially identify] is defined to be {~ ~ RElsome qJe §' partially iden-
tifies .?}. Note that partial identification is not a generalized identification para-
digm. Prove that REe[§',partially identify].
III OTHER PARADIGMS OF LEARNING

This part discusses several models of learning that lie outside the frame-
work of generalized identification paradigms. Chapter 8 presents a criterion
of successful learning that cannot be construed as a convergence criterion in
the sense of section 6.1; chapter 9 discusses an environmental issue that is
not easily viewed through the lens or evidential relations (section 5.3); and
chapter 10 reformulates identification in the language of topology.
Throughout this part "identification" is to be understood as INT-
identification on text, the learning paradigm defined in section 1.4. Thus, to
say that q> E.9' identifies.Ii' ~ RE is to say-in the expanded vocabulary of
part II-that q> INT-identifies.Ii' on text. Similarly the expression "conver-
gence" is to be understood as in definition 1.4.IA(ii).
8 Efficient Learning

Useful learning must not take too much time. This vague admonition can
be resolved into two demands: first, the learner must not examine too many
inputs before settling for good on a correct hypothesis and, second, the
learner must not spend too long examining each input. Recursive learning
functions satisfying the second demand were discussed in section 4.2.2.
Learning functions satisfying the first demand are introduced in section 8.1
and studied in section 8.2. The effect of imposing both demands on learning
functions is taken up in section 8.3.

8.1 Text-Efficiency

Let <p Eff canverge on t E g- to j EN. Then, for some n ,E N, <p (tm ) = i for all
m ;:::: n. The least such n is called the convergence point of <p on t. The
following definition provides a notation for this concept.

DEFINITION 8.1A
i. ff x :T is the set of all pairs consisting of a learning function and a text.
ii. For all <p Eff, t E:T, the partial function CONY: ff x :T -+ N is defined
as follows.

Case 1. If <p is not defined on t, then CONV(<p, tH.


Case 2. If <p is defined on t, then CONV(<p,t) = the least n E N such that for
all m ;:::: n, cp(tm ) = cp(tn).
Note that in case 2, if no such n exists, then CONV(cp, t)j. Of course
CONV(cp, t)! does not imply that <p identifies t.

Example S.lA
a. Let g E § be defined as in the proof of proposition 1.4.3B. Lett be 3, 0, 4, 1,5,6, 7,
8,9, ... , PI, n + 1, .... Then CONV(g, t) = 4.
b. Let f be as defined in example 1.3.4.B,part a. Let t be 2, 3, 3, 3, 4, 2, 2, 2, 2, ... ,2, ....
Then CONV(j, r) = 5. Let s be any text for N. Then CONV(f,s)l even though f is
r1l'ofi n f'ti rm x .
C.Let cp E § converge on t E.'!T. Then t CONV('I'.,) is the finite sequence starting from
which cp begins to converge on r. Informally, lcoNv('I',') is the last sequence in t on
which q> changes its mind.
168 Other Paradigms of Learning

The following notation will also be useful in our discussion of text


efficiency (and elsewhere).
DEFINITION 8.IB Let L ~ Nand 2' ~ iY'(N) be given.

i. The set of texts for L is denoted: 5i.


ii. UL,., 2".rL is denoted: :!T!i'-
Thus Y y = {tEYlfor some LE2', t is for L}.

The notion of convergence point suggests the following criterion for the
efficient use of text. Let <p E fF and 2' ~ RE be given. <p is said to identify 2'
Jastjust in case (I)<p identifies 2', and (2) for aliI/! EfF, if I/! identifies 2' then
CONV(<p,t):S; CONV(I/!,t) for all tEfTy . In other words, <p identifies 2'
fast just in case <p identifies 2', and no other learning function that also
identifies .P converges on any text for any language in !£ sooner than cp
converges on that text. Despite its natural character, however, fast identifi-
cation is a concept of limited interest, for, there are simple, identifiable
2' ~ RE such that no <pEfF identifies 2' fast (see exercise 8.1B). The next
definition avoids this problem by weakening the requirements for efficient
use of text.
DEFINITION 8.IC (Gold 1967) Let tp, I/!EfF and:t' ~ RE be given.

i.1/! is said to identify 2' strictly Jaster than <p just in case
a. both <p and I/! identify 2',
b. CONV(I/!,t):s; CONV(<p,t) for all tEfTy ,
c. CONV(I/!, s) < CONV(<p,s) for some s E fTy .
ii. <p is said to identify :t' text efficiently just in case
a. <p identifies !/!,
b. no OEfF identifies 2' strictly faster than 'P.

In this case 2' is said to be identifiable text efficiently.


Text efficiency has a natural order-theoretic interpretation; see exercise
8.IF.

Example 8.IB

Let f be as defined in example 1.3.4B, part a. Then f identifies REno text efficiently.
To seethis, supposethat OE:F identifies REn• and that CONV(O, t) < CONV(f, t)
for some t E SREfin' Then 0 must begin to converge on t prior to seeing all of
Efficient Learning 169

mg(t). Formally: mg(~ONV(6.1) c "¥e(iCONyIO."l' Now consider the text s=


where (J = Te ON V (9••). It is easy to see that CONV(j, s) <
(J A (J A (J A (J A ••• ,

CONV(8, s). Thus 0 does not identify RE fi n strictly faster than f.

DEFINITION 8.1 D Let strategy Y s; fF be given.

i. The class {2 s; Rlilsome ({JEf/ identifies.!£ text efficiently} is denoted:


[YJ·e..
ii. The class {.!£ ~ REsV1lsome ({JEfF identifies 2' text efficiently} is de-
noted: [YJ~~~·.

Thus example 8.1B shows that RE rin E [fFrecJ-e.. We shall also make use in
this chapter of the unadorned bracket notation from section 4.1. Thus [Y]
denotes the class of all collections of languages that are identifiable (not
necessarily text efficiently) by a learning function in Y.

Exercises

8.1A Let cp e fpee iden tify {K U {x} Ix e K} (cf. exercise 4.2.1A, part a). Show that
CONV(cp, tH for some text t for K.

8.1D Let!E = {{1,2}, {1,3}}.


a. Show that no cp e IY identifies !t' fast.
b. Generalize part a to the following. Let L, L' ERE be such that L of. L' and
z.n L' of. 0. Then no cp ElY identifies {L, L'} fast.
8.1C Let cp E fF identify 2' <:::; RE. Prove: cp identities 2' text efficiently ifand only if
for all if! ElY that identify 2', if CONV(I{1, t) < CONV(cp, r) for some tE .:T!I', then
CONV(cp,s) < CONV(l/!,s) for some SE.r!l"
8.10 Prove: {N - {X}IXEN}e[IYrecJt<.,
8.IE cp E fF is said to identify!t' text efficiently with respect to IYrcc just in case (a) rp
identifies !E, and (b) no 0 E fFroo identifies !E strictly faster than lfJ.
Prove that rpE ffrec identifies 2' text efficiently with respect to ffrcc ifand only if rp
identities 2' text efficiently. (llint: Left to right: Let OEfF identify !t' strictly faster
than rp, and let tE:T!I' be such that CONV(O,t) < CONV(rp,t). Then we may con-
struct l/! E fF that "memorizes" O's behavior on t, and otherwise behaves like sp, It
may then be proved that if! efF roo and that if! identifies 2' strictly faster than rp.)
"'8.IF We rely on standard terminology concerning partial orders (e.g.,see Malitz
1979, sec. 1.8).Let identifiable!t' ~ RE be given, and let <p ~:IF x IY be such that
for all j, g E fF, j <!I'g just in case j identifies !t' strictly faster than g.
170 Other Paradigms of Learning

a. Show that <If is a partial order on:F (be sure not to overlook functions failing to
identify 2').
b. Show that f E:F identifies 2 text efficiently if and only if f is a minimal element
with respect to <If.
c. Show that f E:F identifies 2 fast if and only if f is a least element with respect to
<If·

8.2 Text-Efficient Identification

Under what conditions is text-efficient identification possible? The results


of the present section address this question.
8.2.1 Text-Efficient Identification in the Context of 9'

If a collection of languages is identifiable, then it is identifiable text-


efficiently. This is the content of the next proposition.
PROPOSITION 8.2.1A [:F] = [jOT-e,.
The proof of proposition 8.2.1A will be facilitated by a definition and a
lemma.
DEFINITION 8.2.1A Suppose that Y s; RE and O"ESEQ. Then
i. Y a = {LeYlrng(O") s; L},
ii, Yamin = {LEY..IL'EY,,=L' 1- L}.
LEMMA 8.2.1A Suppose that Y E [:F] and that t is a text for some LE Y.
Then

i. L E !tift for aU n.
ii. there is an m such that for aU n ~ m, Le Itr 1ra
ft •
iii. for every L' #- L such that L' e Y, there is an m such that for all n ;::>: m,
L' ;. 9f.:'lra.

Proof The proof of i is obvious. For ii, recall that by proposition 2.4A,for
every Le Y there is a finite set DL S; Lsuch that if DL s;: L' and L' E.P, then
L' ¢ L. Thus if m is such that rng(tm ) :2 DL , Le ..<li::in and thus LE in
2f:
for
all n;::>: m, For iii, suppose that L' E 2f:
in
for arbitrarily large n. Then
L' :2 rng(tn ) for arbitrarily large n. Thus L' :2 L. Since Le 21 for all nand
ft

L' =2 L, this means that L' = L. 0


Efficient Learning 171

Proof of proposition 8.2.lA Suppose that 2E[.?]. For every uESEQ,


define a ~ U as follows. If 2. = 0, let a = a. Otherwise, let a be the least
sequence among the shortest sequences extending U such that 2t;0 # 0.
Such a a exists by lemma 8.2.1A(ii), since if 2. # 0, a begins a text for at
least one LE 2.
Now we define f E[.?] which text-efficiently identifies 2 as follows:

O' if 2. = 0,
f(u) = f(u-), iflh(u) > 1 and J.!j(._)E2"mlo,
{
least j such that Jti E !Ramin, otherwise.
To see that f identifies 2, let t be a text for LE 2. By lemma 8.2.1A(ii),for
all sufficiently large n, t, ~ to' By lemma 8.2.1A(ii) and (iii)and the choice of
the least i in the third clause in the definition of fJ(to ) is the least index for L
for all sufficiently large n.
To show that f identifies 2 text-efficiently, we use exercise 8.l.C. Sup-
pose then that e identifies 2 and that CONV("" t) < CONV(j, t) for some
text tfor some LE 2. Let n ~ CONV(j, t) - 1. Then "'(to) is an index for L,
but i(to ) is an index for some L' E2, L' # L. Ifa = to, let s be any text for L'
which begins with a.

Claim On s, f converges to L' and CONV(j,s) ~ n.

Proofofclaim Suppose that a <:::: y <:::: a. Then y ~ a. Therefore the second


clause in the definition of f guarantees that f(y) ~ f(u) for all such y. Now
for all n such that8ft ;;;;;?: U, we have that L' E ffsmin, sinces is a text for L' and L'
is already in 2"m;0. Thus again the second clause of the definition of f
ensures that f(so) = f(u) for all such So' Thus CONV(j, s) = Ih(u) = n.

Now s is a text for L' and CONV(j, s) = n. But CONV("" s) > n, since
"'(so) = "'(to) is an index for L # L'. By exercise 8.l.C, f identifies 2 text-
efficiently. 0

8.2.2 Text-Efficient Identification and Rational Strategies

The text-efficient learner f of proposition 8.2.1A exemplifies none of the


"rational" strategies: consistency, prudence, conservatism, or decisiveness.
However, modifications in the definition of f reveal the compatability of
various combinations of these strategies with text-efficiency; see exercise
8.2.2C. The following proposition shows that conservatism, consistency,
and prudence together guarantee text-efficiency. For cpE.?, let 2('1') be
as defined in exercise l.4.3K.
172 Other Paradigms of Learning

PROPOSITION 8.2.2A Let cp E ~prudent n ~.ons.rvativ. n ~.on'isl.nt. Then cp


identifies .'l'(cp) text-efficiently.

Proof Suppose that !/J identifies 2(cp) and that CONV(!/J, t) <
CONV(cp, t) for some text t E f/~(",). Let n = CONV(cp, t) - 1. Then if t
is a text for L, ljJ(in ) is an index for L but cp(tn ) is not (since cp{tn) "# cp(tn +1 )'
cp(tn + 1) is an index for L, and cp is conservative).
Let L' = W"'On)' and let s be any text for L' beginning with tn' Such a text s
exists since cp is consistent. Then since cp is prudent, cp identifies rng(s) and,
since <p is conservative, cp(s.) = cp(sm) for all m ~ n. Thus CONV(cp,s) :5'; n,
but CONV(!/J,s) > n since ljJ(s.) = !/J(t.) is an index for L. 0

Proposition 8.2.2A highlights the rational appeal of consistency, pru-


dence, and conservatism. Is this kind of rationality necessary for text-
efficiency? An affirmative answer to this question amounts to the converse
of proposition 8.2,2A. Exercise 8.2.2B shows that this converse is false.

Exercises

8.2.2A Show by counterexample that no two ofconsistency,prudence, and conser-


vatism imply text-efficiency.
8.2.28 Let lpE:F identify It' ~ RE text-efficiently. Show by example that lp need
not be consistent, prudent, or conservative.
8.2.2C Prove:
a. [5'] = [:FPrud.n'Jt.•..
b. [:F] = [:Fennsi.'.n']',.,.
(Hint: Modify f in the proof of proposition 8.2.1A appropriately.)
c. If tp e :Fp,ud.nln :Fenn'i,'en', then there is :r ~ 2'(<p) such that !i" E
[ffp,ud.n, n :Fennsislen']'.•.. Show that the converse is false.

8.2.2D For each ieN, let Li = {O,i,i+ 1,... }. Prove that {LilieNH
[ffconscnoatiYc]1.C. U [.:f'"decisiveJl.e. (= [.¥con5crvati'll-t U ~dc.ciSjVe]t.e.).

8.2.3 Text-Efficient Identification in the Context of :¥rec

In contrast to proposition 8.2.1A, the next proposition shows that text-


efficiency is restrictive relative to the class of recursive learning functions.
Efficient Learning 173

PROPOSITION 8.2.3A [~recr·e. c: [g-rec].

ProoJ The desired collection is Ii' ~ {Ii + 1}liEK}U{{O,i + 1}liEK}.


Obviously Ii' E [9'"''']. However, suppose that 'P text-efficiently identifies
Ii'. Then for all i e N,

{i + I}, ieK,
(0) W.(i+l)~ { {O,i+ I}, ieK.

For otherwise, let i o be such that (0) doesn't hold. We define a function
l/!e9'" such that for all texts te:T:e, CONV(l/!,t) s: CONV('P,t) and such
that for some LE Ii' and r o for L, CONV(l/!, '0) < CONV('P, '0)' Define l/! as
follows:

'P (U), ifio + 1$ rng(u),


l/!(u) = least index for lio + I}, ifi o + Ie rng(u) and io e K,
{
least index for {a, i o + I} if i o + Ie rng(u) and i o e K.

On the other hand, no recursive function satisfies (*) since such a function
would exhibit K as recursively enumerable. D

A 'simple modification of the foregoing proof yields the next corollary.

ProoJ For the collection that witnesses this we simply use the character-
istic functions of the languages L in the proof of the proposition. The proof
is then entirely parallel to that of the proposition. 0

Let us reconsider order independence in the present context (see


definition 4.6.3A).
PROPOSITION 8.2.3B [~rec n y;orderindependentr·e. c: [~rec]t.e ..

Proof Let Ii' ~ {K} U {{i}lieK}. We claim that Ii'e[9'""']"" but that
.P ~ [g-rec n§"order independenlr·e.. To see the former, first define a recursive
function J by
a, ify ~ x,
'Pf(x,(Y) ~
{ 0, if 'Pxlx)L and 'P,(y)!,
t, if 'Px(x)j or 'P,(y)j.
If x e K, Itf(x, = {x}; if x eK, Itf(x, ~ K. Now define ge9'"'" by g(u) ~
174 Other Paradigms of Learning

f(uo). g obviously identifies fi' and is text efficient because it begins to


converge immediately on every text for every LE fi'. Ofcourse g is not order
independent.
Suppose, on the other hand, that q> E :Free ng;ordcr independent identifies !£
text-efficiently. Fix n E K. We claim that x E K if and only if <pIx) = rp(n),
showing that K is recursive. To see this, notice that rp must begin to
converge immediately to the appropriate language on input x for any x,
since otherwise <p is not text efficient, that is, g would then be strictly faster
than rp on fi'. And if xEK, <p must converge to <pen) since <p is order
independent. D

Exercises

8.2.3A Prove that if2',2" E [.9""]<"', then 2' x 2" E [.9'''']''''.


8.2.3B !I! s RE is called maximal with respect to [:Free]!..,. just in case (a) !I! E
[S""'Y", and (b) there is LERE such that 2'U{LjE[S""'] but 2'U{L}¢
[§"recT,e,. Show that there exist collections of languages that are maximal with
respect to [9'r""]'.",,

*8.2.3C Recall the convergence criterion EXT from definition 6.3A. Define the
partial function CONVext ::F x ff -+ N as follows. For all <p E!7, t E.:r,
a. if <p is not defined on r, then CONV... ("" t)f.
b. if <p is definedon t, then CONV"lI(<p, r) = the least n E N such that for all m;;::: n,
Wip(r..,) = Wrp(i..i-

<p E /F is said to EXT-identify!£, S RE text-efficiently just in case cp EXT-iden-


tifies 2' on text and for all>/! ES" that EXT-identify 2' on text: ifCONV... (>/!,t) <
CONVn,(rp,t) for some tefT:£> then CONVU1(lp,s) < CONVu,(!Jr,s) for some
SE :T". Prove that some <p EXT-identifies {K U VI V finite} text-efficiently.
8.2.3D Exhibit Y s;; RE such that Y E [§"Iec]t.e. but for some Y' c; Y,
Y' ¢: [fFreC]I.e ..

8.2.4 Text-Efficiency and Indnction by Enumeration


Recall the definition of yenumeulor from section 4.5.3.

PROPOSITION 8.2.4A (Gold 1967)


i. [yenumerator] C [YJ·e ..
ii. [~rec n g;enumeratorJ C [g;recJ.e..
Efficient Learning 175

Proof Part i follows from propositions 4.5.3A and 8.2.1A which together
say that [ff·rtUm.uIO'] C [ff] = [ff]I.··.
As for ii, the proof of the inclusion is due to Gold. Suppose that
2 E [fff'C nff·rtumeraIO']. We claim that the enumerator cp that identifies !l'
is itself text efficient. For such a cp is consistent, prudent, and conservative
(at least on texts for languages in 2). Thus cp is text efficient by proposition
8.2.2A. That the inclusion is proper is witnessed by the collection
2E {LnlnEN}, where L; = {n,n + 1,... }. This collection is identifiable by
a conservative, consistent, prudent, recursive function but not by an
enumerator (as established in the proof of proposition 4.5.3A). 0

Propositions 4.5.3B and 8.2.4A(ii) yield the following.

PROPOSITION 8.2.4B If:£ ~ RE sv t is r.e. indexable, then 2 E [ffre.]~~,,'.

Open question 8.2.4A If!f E [jP'.]~~~', then is !f r.e. indexable?

Exercise

8.2.4A Refute the following strengthening of proposition 8.2.4B; If 2' ~ RE is r.e,


indexable, then 2' E [jOr.c]"•.. (Hint: See the proof of proposition 4.5.3A{i).)

*8.2.5 Text-Efficiency and Simple Identification

Recall the convergence criterion SIM(f) from definition 6.6A.

DEFINITION 8.2.5A Let total f E ffro. be given.

i. cp E ff is said to SIM(f)-identify :£ ~ RE text efficiently just in case ({J


SIM(f)-identifies!f on text and cp identifies !l' text-efficiently.
ii. The class {:£ ~ RElsome ({JEff r•c SIM(f)-identifies 2 text-efficiently}
is denoted: [g"r ee , text, SIM(f)J···.

The next proposition shows that the requirements of simplicity and text-
efficiency are more stringent taken together than taken separately.

PROPOSITION 8.2.5A For every total fEg"'·· such that f(x) ~ x, [ff'OC,
text, SIM (f)]"" c [ff rec, text, SIM(f)] n [ff'··]I.··.

Proof The inclusion is obvious. The collection that witnesses that the
inclusion is proper is!l' = {{ i} liE N}. It is easy to see that !l! E [g".ecJ.•..To
176 Other Paradigms of Learning

see that 9'e["''',text,SIM(f)], define 'Pe"'" as follows. 'P(q) = the


index i,;; Ih(q) such that m(i) is minimal of all m(j) ,;; Ih(q) for which
":I.lb(O) = {qo} if such exists, and 0 otherwise. It is evident that 'I' converges
on any text for {n} to the index i for {n} such that m(i) is minimal and so
is f-simple for any f such that f(x) ~ x.
Suppose then that 9' e ["'"', text, SIM(f))"", and let'" be the function
that witnesses this. It is obvious that "'(n) is an index for in} and
CONY("" t) = I for any text t, else '" is not text efficient. For if a counter-
example n' exists, we could define "" by

""(q) = {i' where m(i') = M( in'}) if qo = n',


"'(q), if qo #' n'.

However, the set {"'(n)lne N} is an infinite r.e. set, and so by lemma 4.3.6A,
there is an n such that m("'(n)) > f(M(W",.,)). This contradicts the fact that
'" identifies 9' according to the SIM(f) convergence criterion. 0

Exercise

8.1.5A Let strategy 9', evidentialrelation I, and convergence criterion <& be given.
Frame an appropriate definition of [9',1, 'j'j•.•., that is, of the class of collections of
languagesthat can he text-efficiently identifiedwithin the learning paradigm defined
by 9', I, 'fl.

8.3 Efficient Identification

In this section we consider text-efficient learners that, in addition, react


rapidly to new inputs. Recall definition 4.2.2B of " ..tlm., and let total
he"'" be given. Then 9' e ["..tlm.]l... if and only if some 'I' e " ..tlm.
identifies 9' text-efficiently. Such an 9' is "efficiently" identifiable relative to
the time bound h.
Efficient learning is a more stringent requirement than text-efficiency
alone.

PROPOsmON 8.3A U....".,- [,,>-t'm.]e•. c: [,,"'y".


Proof For each ieN, define L, = {(i,x)I'P,(x) = OJ. Now define 9' =
{L,I'P' total} U {L, U {(4i) }I'P, total, 'P,(j) #' OJ. 9' e ["''')'''': the obvious
Efficient Learning 177

procedure for identifying Ii' is text efficient. (Conjecture L, if each pair


<x,y)Erng(a) is of the form x = i, <p,(y) = 0. If exactly one pair <x,y) is of
the form x ~ i; <p,(y) of 0, conjecture L, I I {<x, y)}. Notice that this function
is not total since it waits for <p,(y) to converge before deciding whal to
conjeclure.)
Now suppose that <p Eff identifies Ii'text-efficiently.

Claim Let <p, be total, and let a be a locking sequence for L, and tp. Then
<i, x) EL, if and only if <pia A<i, x» = <pia).

Proofofclaim Since (J is a locking sequence for qJ and L i , <i, x ) E L, implies


that <p(a A<i, x») = <p(a). Supposeforthe other direction that <i, x) 10 L, but
that <p(a A<i, x») = <p(a). Then it is easy to construct a counterexample 1/1 to
the text-efficiency of <po Define

I/I(r) = {an index for L, U <i,x), if! => ". A<i,x),


<p(r), otherwise.
CONV(I/I, t) s CONV(<p, t) for all t and CONV(I/I, t) < CONV(<p, t) for
any text t for L, U {<i, x)} beginning with a A<i,x).
Now suppose that cp E /Fh-time. Then cp is total recursive. Define

J«a,i),k) ~ {O, if<p(aA.<i,k») = <p(a),


1, otherwise.

Then J is total recursive, so that by lemma 4.3.3A there is a recursive set


S such that for no <a, i) is g(k) = J«a, i), k) the characteristic function of
S. But this contradicts the claim since S = Lj for some total <Pj' (Note that
this proof only uses the hypothesis that <p E ffh·"m, to guarantee that <p be
total recursive.) D

Proposition 8.3A should be compared to proposition 4.2.2A.


9 Sufficient Input for Learning

How much input from the environment is required for learning? In this
chapter we examine the problem.

9.1 Locking Sequences as Sufficient Input

Let cP Eff, LE RE, and (J E SEQ be given. A reasonable construal ofthe idea
that (J is sufficient input for cP to learn L is this: (J is drawn from L, cp
conjectures an index i for Lon (J, and no further input from Lean cause qJ to
abandon i. In turn examination of definition 2.IA reveals that in the
foregoing circumstances (J is a locking sequence for cp and L. We shall
therefore identify sufficient inputs with locking sequences.

DEFINITION 9.IA Let cpEff be given. The set {(JE SEQlfor some LE RE, (J
is a locking sequence for sp and L} is denoted: LS'P'

Thus (JE LS'Pjust in case (J is a locking sequence for cp and Wtp(all hence just
in case (J is a sufficient input for cp to learn Wtp(al in the sense just discussed.
By proposition 2.IA, if cp Eff identifies LE RE, then LStp contains some (J
such that L = WIp(aj' In the present context proposition 2.IA is equivalent to
the claim that if cp E:IF learns LE RE, then there is some sufficient input for qJ
to learn L. On the other hand, exercise 2.1B shows that there can be a
locking sequence for cp Eff and LE RE even if cp does not identify L. In the
present context this fact may be reformulated as follows: the existence of
some input sufficient for cp E ff to learn L ERE does not guarantee that cp
will also learn L in the absence of this input.

Example 9.1A

Let JE:7 be as described in example 1.3.48, part a. Then example 2.1A, part a,
shows that LSI = SEQ. .

Exercises

9.1A Let ge:F be as described in the proof of proposition 1.4.38. Show that
LS g = SEQ.
9.1B Prove: Let lp E :Fconsi".nt n :7 0o n servaa;ve be given. Then LS" = SEQ.
av'
9.1C qJ E:F is called avid just in case LS" = SEQ. Prove [:F d ] = [:F], where for
[/' :F, [9"] is to be interpreted as in definition 4.1B (Hint: Use the construction in
~
the proof of proposition 4.S.IA, and rely on exercise 9.1B.)
Sufficient Input for Learning 179

9.2 Recursive Enumerability of LS,

Let 'fJ E:F be given. A successful "psychological" theory of'fJ should charac-
terize the environmental inputs sufficient for 'fJ to learn; that is, such a
theory should characterize LS•. One way for such a characterization to be
perspicuous would be to provide a means of effectively enumerating LS•.
Are perspicuous psychological theories in this sense always possible?
Recall from section 1.3.4 that each 0' E SEQ is assumed to be associated
with a unique natural number via some fixed, computable isomorphism
between SEQ and N. Accordingly we say that a subset L of SEQ is r.e. just
in case the set of code numbers associated with I: is r.e.

DEFINITION 9.2A 'fJ E:F is called predictable just in case LS. is r.e.
It is easy to verify that SFpredictable C SF; that is, there are learning func-
tions whose associated set oflocking sequences is not r.e, (see exercise 9.2A).
For such learning functions, no perspicuous theory of sufficient input is
possible in the sense discussed earlier. On the other hand, SFpredictable is
sufficient for all inferential purposes, as revealed by the following
proposition.
PROPOSITION 9.2A [jFPrediClabIC] = [jF].

Proof Since SEQ is r.e., the proposition follows easily from exercise
9.IC.D
In contrast to proposition 9.2A, the following result shows that there are
collections 2 of languages such that (I) some recursive learning function
identifies 2, but (2) no recursive learning function whose sufficient inputs
are r.e. identifies !e.

Proof For each i E N and each set X ~ N define L i • x = {(O, I)} U


(I, x) [x E X}. Define 2 = {Li.Nli E R} U {L'.DII E K and D finite}. It is easy
to see that .ff E [ff rc ,] . Suppose that If E [ f f rcc n ffprcdict"bl"] and that ip

witnesses this. Then LSq> is r.e. since qJ is predictable. But now we claim
that IE R if and only if there is a UE LS. such that {(O, i)} s;; rng(u) S;; L i•N
and W.,.)
=> rng(u). If i E R, such a a must exist, since any locking sequence
must have this property. However, if i E K, no such (J can exist, since
otherwise rng(u) = L i • D for some finite set D. Yet 'fJ fails to identify this
180 Other Paradigms of Learning

language on the text for L i . D which begins with (J, since q> is locked into an
incorrect conjecture by (J. The claim shows that were such a q> to exist, K
would be recursively enumerable. 0
COROLLARY 9.2A g;ree n g;predictable C g;rec.

The corollary shows that there are recursive learning functions for which no
perspicuous theory of sufficient input is possible in the sense discussed
before.

Exercises
9.2A Prove that 9' p red ;e la ble c:: 9'.

9.2C Recall the definition of ff··id from exercise 9.1C. Prove: [ffrecnF o• id] c::
[ff reo].
9.2D Let ({Jeff be given. and let SI" = {UE SEQlfor all ,E SEQ, ifrng(r) s: W.,(a),
then rp(u /I r) = cp(u)}. Sl, is another conception of sufficient input; it does not
require a sufficientinput to be drawn from the learned language. Clearly LS., S Slop.
a. Specify cp E ff such that Si m #- LSm • (Thus the Sl; version of sufficient input is
strictly more liberal than the LSop version.)
b. Call ({) e 9' predictable' just in case Slop is r.e Prove the following variant of
proposition 9.2B: [ffnc nffpndiclable'] c:: [ffre<].

*9.3 Predictability in Other Learning Paradigms

The results of section 9.2 are pertinent to INT-identification on text. We


may generalize our concern about sufficient input by considering LS". and
its analogs in the context of other learning paradigms. We provide a sample
result.
Recall definitions 6.1.2B and 6.3A.

DEFINITION 9.3A
i. a E SEQ is called an EXT-locking sequence for <p E g; and LE RE just
in case rng(u) £; L. W<p(ul = L. and, for all r E SEQ, if rng(r) £; L, then
W<p(a "T) = L.
Sufficient Input for Learning 181

ii, Let <{JEff be given. The set {O"ESEQlfor some LERE, 0" is an EXT-
locking sequence for <{J and L} is denoted LS~".
iii. qJ E!J' is called EXT-predictable just in case LS~xt is r.e.

Example 9.3A

Let g E!Free be defined as in the proof of proposition 6.3.1B. Then LS:ll = SEQ.

PROPOSITION 9.3A [g;-TCC ng;-EXT-PrCdictabIC, text, EXT] C [yTec, text, EXT].

Proof The collection oflanguages !l' used in the proof of proposition 9.2B
works here by exactly the same proof. 0

We leave it to the reader to frame appropriate definitions extending the


foregoing concepts to arbitrary, generalized identification paradigms (in
the sense of section 7.6).

Exercise
9.3A Prove: [g-Tceng-EXT.prcdictablC,imperfcct text,EXT] c [,:Frce,imperfect text,
EXT].
* 10 Topological Perspective on Learning

In this chapter we analyze learning from a topological point of view.


Acquaintance with elementary concepts of topology is presupposed.

10.1 Identification and the Baire Space

A natural topology may be imposed on the class 17 of all texts in the


following way.

DEFINITION 1O.IA (Levy 1979, sec. VII.2)

i. Let (JESEQ be given. The set {tE17I(J = t;h(u)} is denoted: Bu.


ii, The topology on 17 generated by taking {Bul(JESEQ} to be basic open
sets is called the Baire topology, abbreviated: :!T.
iii. For LE RE, the subspace topology on 179'(L) inherited from :Y is
denoted: :YL .
iv. The basic open set of :YL determined by (J ESEQ is denoted: Bf;.

Thus BO" consists of all texts that begin with (J, and B~ consists of all texts for
subsets of L that begin with (J.
The following lemma is left to the reader.

LEMMA 10.IA For all LE RE, :YL is a Hausdorff space.

COROLLARY IO.lA Let LERE and tE:!TL be given. Then {t} is closed in
!YL ·
With each learning function qJ we now associate a function F", from ff to
17. Intuitively F",(t) is the infinite sequence of conjectures produced by qJ in
response to t (if qJ is defined on t).

DEFINITION 10.1B Let qJ E ff be given. The function F", : 17 -+ :T is defined


as follows. For all t E 17,

a. if qJ is not defined on t, then F",(t)j;


b. otherwise, F",(t) is the unique S E:T such that for all n E N, Sn = qJ(tJ
To characterize identification in terms of the functions F"" the following
concept is needed.

DEFINITION 10.1C Let t E 17 and i E N be given.

i. t is said to be stabilized on ijust in case there is n E N such that for all m ;::: n,
t m = i.
Topological Perspective on Learning 183

ii. t is said to be stabilized if t is stabilized on some i EN.

The following facts about stabilization are evident.

LEMMA 1O.1B The set {tEfflt is stabilized} is countable.

LEMMA 10.1C Let qJ E §' and t E ~ be given. Then qJ identifies t if and


only if there is an index i for L such that F",(t) is stabilized on i.

Exercises
1O.1A Define the function d::Y x :Y -+ IR (IR is the set of real numbers) as follows.
Is."'.
For all s, t E:Y, d(s, t) = 2-°, Show that :!T is a complete metric space with
respect to d.
10.18
a. Show that .'!T has a countable basis.
b. Show that :!T is a regular space.
c. Use the Urysohn metrization theorem to provide an alternative proof that:!T is
metrizable.
10.lC Show that:!T is the product topology on N W , where N is endowed with the
discrete topology.

10.2 Continuity of Learning Functions

As noted in section 1.3.4,learning functions may be construed as mappings


from finite evidential states to theories about infinite environments. Operat-
ing exclusively on finite inputs, learning functions thus have a "local"
character exploited in various theorems concerning nonlearnability. The
local nature of learning functions qJ can be better appreciated by consider-
ing the associated functions F",. The next proposition exhibits these latter
functions as continuous in fT. For ease of exposition, we restrict our
attention to total learning functions in this and the next two sections.

PROPOSITION 1O.2A For all qJ E§,lO.O', F", is continuous.

Proof Suppose that qJ E .9'l0101. We need to show that, given a basic open
set B" F<p-I(B.) = {tl F",(t) E B.} is an open set. But F",-l(B.) = {rlfor all n <
lhtr), qJ(t.) = "t.}. Thus Fq>-I(Bt } = Uye SEQ {Byl qJ(Y.) = r, for all n < Ih(r)}.
Thus F;'(B.) is a union of open sets and so is open. 0
184 Other Paradigms of Learning

Exercise

10.2A Exhibit a continuous function f on S- such that for every ({) E $/', f ~ F...
(H int: Let f be such that for all t E ff, f(t) is the result of removing to from t.)

10.3 Another Proof of Proposition 2.1A

Let LE RE and a E SEQ be such that rng(a) s;; L. Note that for any 'f E SEQ
such that rngfr) s;; L, B~ A. s;; B;. With this in mind it can be seen that
proposition 2.1A amounts to the following result.
PROPOSITION 1O.3A Let cp E jOlnt_' identify LE RE. Then there is some open
set B; of ffL> some i E N, and some t E fT such that (i) t is stabilized on i,
(ii) J¥;= L, and (iii) F<jl[B;] = {r},
The proof of the proposition hinges on the following lemma.

LEMMA 1O.3A ~ is comeager in ffL •

Proof For each n E L, fT"(L-{n}l is nowhere dense in ffL • This follows from
the fact that for each a with rng(cr) ~ L, B;; 2 B~A n which is disjoint from
fT"(L-{n}l' Hence fTg>(L) - ~ = UnEL5""g>(L-{njl which is a countable union
of nowhere dense sets. 0

Proof Since cp identifies L, for every t E~, F<jl(t) is stabilized on some i


which is an index for L (lemma 10.1C). Since {t E 5""1 t is stabilized} is
countable (lemma 1O.IB), the range of F<jl on ~ is countable. Thus
fTL ~ U {F- ( {t} )It is stabilized}. Therefore 5""L is contained in a countable
1

union of closed sets ({t} is closed in ff so F'P- 1( {t}) is closed by the continuity
of F'P)' However, 5""L is comeager in a complete metric space by the lemma,
so at least one of these closed sets F;l( {t}) must contain a basic open set B;
by the Baire category theorem. This t and a satisfy i and iii;ii follows since cp
identifies L. 0
Indeed, the original proof due to Blum and Blum (1975) of proposition
2.1.A can be viewed as a special case of standard proofs of the Baire
category theorem (e.g., Levy, 1979, theorem VI.3.6).
Note that the proof of proposition 1O.3A does not show that a is a locking
sequence for <p and L since it is possible that t1h(O"l -# t1hCO"I+l' However, the
Topological Perspective on Learning 185

proof does show that for some n and for every r with rng(r) £; Land
lh(r) ~ n. (J 1\ r is a locking sequence for q> and L.
In the present context we may also provide an alternative proof of
corollary 2.1A, which amounts to the following proposition.

PROPOSITION IO.3B Let 'I' E SOlol., identify L ERE, and let IJ ESEQ be such
that rngte) £; L. Then there is some open set B;A, of :TL , some iEN,
and some t E:T such that (i) t is stabilized on i, (ii) W; = L, and (iii)
F.[B;A,] = it}.
Proof This follows from the Baire category theorem just as in proposition
10.3A with the substitution for:TL of B;. 0

Finally, we observe that the developments ofthe present section can be


adapted for the proof of many variants of proposition 2.IA stated in
preceding exercises (e.g., exercise 6.22B).

10.4 Locking Texts

DEF"INITION lO.4A Let q> e!!F identify Le RE. Let t be a text for L. t is called
a locking text for 'I' and L just in case there exists n E N such that t, is a
locking sequence for 'I' and L.

Locking texts were first discussed in exercise 2.1C. The following propo-
sition highlights the role of locking texts in determining the behavior of
learning functions.
PROPOSITION 10AA Let 'I' E SO'"'" identify LE RE, and let tfr E SO'"'" be such
that for all locking texts t for 'I' and L, F.(t) = F.(t). Then for all t E ff"'(L)'
F.(t) = F.(t).
In particular, '" identifies L in the preceding situation.

Proof By proposition 10.3B, the locking texts t for 'I' and L are dense in
fTL . Thus FIp and Flp are continuous functions that agree on a dense subset
of a complete metric space. Therefore they agree on all of ff"'(L)' 0
'.

Exercise
10.4A Prove: Let IjI, tpE§"IOlal be such that for all recursive texts t, F...(t) = F.,(t).
Then for all texts t, F",(t) = F.,(t).
186 Other Paradigms of Learning

10.5 Measure One Learning

10.5.1 Measures on Classes of Texts

In some environments each potential element of a language is associated


with a fixed probability of occurrence, invariant through time. Such envi-
ronments may be thought of as infinite sequences of stochastically indepen-
dent events, the probability of a given element e appearing in the n + 1st
position being independent of the contents of positions 0 through n. It
should be noted that children's linguistic environments do not typically
exhibit stochastic independence in the foregoing sense. Thus the probability
that sentence p occurs at time n + 1 in a natural environment can be driven
lower and lower by issuing threats at time n to those who might utter p
immediately thereafter. On the other hand, to the extent that independence
holds, the absence of a sentence s from a given child's environment may be
construed as indirect evidence that s does not belong to the ambient
language (see Pinker 1984, for discussion). In this sense stochastic environ-
ments are potentially rich in information, a richness to be exploited in
results that follow. Note that the assumption of stochastic independence is
quite plausible in certain scientific contexts.
To study such environments, we rely on concepts and terminology drawn
from measure theory (see, e.g., Levy 1979, pp. 239fT.). To begin, each LeRE
is associated with a probability measure, mL' on N U {#} such that for all
x e N U { #}, md {x}) > 0 if and only if x e LU { #}. Next each mL is used to
determine a unique, complete probability measure M L on f/ by stipulating
that for all 0" e SEQ, MdBu) = nj<lh(UI mL(O"j)' Intuitively, for measurable
S c:::: :Yi, MdS) is the probability that an arbitrarily selected text for Lis
drawn from S.
We now assume the existence ofa fixed collection vi! = {MLILeRE} of
measures corresponding to the topologies {f/LILE RE}; until section
10.5.3, talk of measurable sets and so forth should be interpreted in the
context of .,It. The following lemmata are easy to establish.

LEMMA 1O.5.1A Let L, L' e RE be such that L "I- L'. Then M L (5""d = O.

LEMMA 1O.5.1B Let <p e!F and Le RE be given. Then M L ( {t e ffLI<p iden-
tifies t}) is defined.
Topological Perspective on Learning 187

Exercise

to.5.lA Recall the definition offat text (definition 5.5.4A). Prove: Let LERE be
given. Then ML({tE5Llt is fat}) = 1.

10.5.2 Measure One Identifiability

In the stochastic context just discussed, the concept of identification seems


needlessly restrictive. Rather than requiring identification of every text for a
given language L, it seems enough to require identification of any subset of
:!IL of sufficient probability. We are thus led to the following definition.

DEFINITION 1O.5.2A (Wexler and Culicover 1980, ch. 3) Let q> E:F,
LE RE, and 2 ~ RE be given.

i. <p is said to measureone identify Ljust in case M L( {t E :!ILl cp identifies t} ) =


1.
ii. cp is said to measure one identify 2 just in case q> measure one identifies
every LE 2.
iii. Z is said to be measureone identifiable just in case some q> E:F measure
one identifies Ie.

Intuitively, q> E:F measure one identifies L ERE just in case the probability
that <p identifies an arbitrary text for L is unity.
Measure one identification of a language differs from ordinary identifi-
cation only by a set of measure zero. The next proposition reveals the
significance of this small difference; it generalizes results due to Horning
(1969).
PROPOSITION 1O.5.2A RE is measure one identifiable.

Proof We define f E:F such that for all LE RE, f measure one identifies L.
Let h be a function such that J¥n(O)' J¥n(l)" .• , is a listing of all the r.e. sets, and
let Mo, M I , ... , be an enumeration of their associated measures. If n e N,
(J e SEQ, and W is an r.e. set, we say that (J agrees with W through n just in

case for all x < n, x E W if and only if x e rng(a). Intuitively, if t is a text, as m


gets large we want to define f(tm ) = h(i) if and only if trn agrees with Wh(j)
through some large number n, with n increasing as m does. Yet we want n to
be small relative to m so that most texts t for J¥n(i) have the property that t rn
agrees with m(i) through n. To make the definition of f precise, define for
188 Other Paradigms of Learning

every j, n, mEN, Aj , " , ,,, = {tit is for "'1.(;) and t m does not agree with "'1. 0)
through n}. It is easy to see that MiAj,".",) is defined and that for every j,
n E N, lim",~oo Mj(Aj, ..,m) = O.
Define a function d by

d(n) = least m such that Mj(AJ,",m) < r" for all i:::;; n.

Notice that L"EN MJ(Ai.",d("» is finite for all i E N. Now let X, = {tit E A i ••• d(. )
for infinitely many n} = nkeN U.>kAi,.,d(.)' Then by the Borel-Cantelli
lemma, Mj(Xj ) = 0 for all i E N.
Now given a text t define I on t as follows. For given mEN, letj be the
least h(i) :::;; m such that t m agrees with l¥,,(i) through n, where n is the greatest
integer such that d(n) :::;; m if such exists, and 0 otherwise, Let I(T",) equal the
least index for aj. With this definition of I it is clear that I converges to the
least index for l¥,,(i) on all texts t for l¥,,(i) which are not in X; 0

Proposition to.5.2,A should be compared with corollary t,2A.

COROLLARY l0.5.2A For some rp E fF the set {t E YI<p identifies t} is dense


in:T.

The liberality inherent in measure one identification entirely compen-


sates for memory limitation. This is the content of the following
proposition.

PROPOSITION 1O.5.2B Let :e ~ RE be identifiable. Then some ({J E


measure one identifies 2.
§memory limned

Proof This follows immediately from exercise 1O.5.1A and Proposition


5.5.4B.D

10.5.3 Uniform Measures


To adapt the foregoing developments to recursive learning functions, we
rely on the following definition,

DEFINITION to.5.3A

i. hE fF is said to be [characteristically] extensionally one-one just in case for


all i, j EN, if i #- j, then l¥,,(i) #- l¥"w [<Ph(O =1= <Ph(})]'
ii. Let.At = {MLILERE} be a collection of measures on RE..At is said to
uniformly measure 2 ~ RE just in case:
a. for some total extensionally one-one hE§rec, !l' = {"'I.(i)liEN}, and
Topological Perspective on Learning 189

b. the predicate "Mw..,,(B,) = p" (where lEN, <1 ESEQ, and p is rational)
is decidable.
iii. 2 £; RE is said to be uniformly measurable just in case 2 is uniformly
measured by some collection of measures.

Intuitively, ..It uniformly measures 2 just in case the probability that an


arbitrary text for LE2 begins with <1ESEQ can be effectively computed
from Land <1. Note that condition iia of the definition implies the re-
quirement that 2 be nonempty and r.e, indexable. The following lemma
provides a necessary and sufficient condition for the measurability of a
collection of languages.
LEMMA IO.S.3A :f ~ RE is uniformly measurable if and only if there is a
total, characteristically extensionally one-one g E !!Free such that for every
iEN, g(i) is a characteristic index and 2 ~ {{xENI'Pgu,(x) = O}liEN}.
Proof Suppose first that 2 is uniformly measurable, and let hE:F'" be
such that 2 = {w"", Ii EN}. To define 9 so that 'P g " , is the characteristic
function of Wh{i). it is necessary to effectively decide the question, Is n E ~(i)?
But n E J-v,.(i) if and only if MWhlll(B(n») > 0 and this can be effectively
answered by lib of the definition of uniformly measurable.
For the other direction, suppose 2 = {{x E NI'Pg(,,(x) = O}liEN}. Obvi-
ously iia in the definition of uniformly measurable is satisfied by w,,(i) =
{xlfPg(j)(x) = O}. To define a measure M Wh l j) on ~(i)l we define mWh(l)(x) =
2-""', where x is the nxth member of »h(i)' The predicate mwhW = p is
elTectively decidable, since we can effectively find the nth member of w"(,,
using 'Pgu,' Then the predicate Mw..,,(B,) = P is effectivelydecidable from i,
<1, and p using mwhl'}. and the definition of M",hill.' D
From the lemma it follows that the uniformly measurable collections of
languages constitute only a proper subset of &,(RE",); in particular, RE,,, is
not itself uniformly measurable. On the other hand, it is easy to see that
RE,," U {N} and {N} U {N - {x}lxEN} are uniformly measurable.
To state the connection between uniform measurability and identifi-
cation, we need to slightly generalize definition 1O.5.2A.

DEFINITION IO.5.3B Let 'P E:F and 2 £; RE be given, and let ..It =
{MLILERE} be a collection of measures on RE. 'P is said to measure one
identify 2 with respect to ..It just in case for all I.E 2, ML( {t E.'TLI 'P
identifies t}) = 1.
190 Other Paradigms of Learning

PROPOSITION 10.5.3A Let .« uniformly measure 9! <;; RE. Then some


cP E ipec measure one identifies 9! with respect to ..4t.

Proof It is straightforward to verify that this is the elTectivization of


proposition 1O.5.2A. In examining that proof, we see that "(J agrees with
W~(j) through n" is effectively decidable using <Pg(;j' And the function d of that
proof is decidable using MWhr;,' Thus the function f defined from d in the
proof is recursive. 0

COROLLARY 10.5.3A For some collection J! of measures on RE, (i) there is


<pEff rec such that <p measure one identifies {N} U RE fi n with respect to..4t,
and (ii) there is <p E §rec such that sp measure one identifies {N} U
{N - {X}IXEN} with respect to vlt.
Corollary IO.5.3A should be compared with proposition 2.2A.

10.6 Probabilistic Learning

It may be that the human brain is able to generate arbitrarily long se-
quences of random events and to employ such sequences in its internal
calculations. The following definitions provide one formalization of this
idea.

DEFINITION IO.6.A

i. CE:!T is called a coin just in case rng(t) <;; {O, 1}.


ii. The set of coins is denoted: :!T'D.t}.
iii. (J E SEQ is said to be a coin-sequence just in case rng(O") <;; {a, I}.
Thus a coin is an infinite sequence of O's and l 's, to be conceived as the
output of a random binary generator.

DEFINITION 1O.6B Let <p elF, Le RE, and ceff{o.l) be given. <p is said to
a-identify Ljust in case ).0". <pC <C1h(lTj, 0"» identifies L.

Restated without the A-notation, <p c-identifies Ljust in case for every t E $i.,
(1) <p«cj , T;>)! for all i E N, and (2) for some j EN such that ~ = L,
<pC (c j , i) = j for all but finitely many i E N. Intuitively, to c-identify L, qJ is
allowed to "flip a coin" once before each conjecture emitted; note that the
same coin is to serve for all texts in :YL •
To proceed, we let M* be the natural probability measure on :Y{O.I)'
Topological Perspective on Learning 191

Specifically, for each coin sequence (J, we let B:


be the set of all CE &T9'{O.I)
such that 0lh'a) = (J. Then M* is taken to be the unique, complete proba-
bility measure such that M*(B:) = rlb,a) for all coin sequences (J. Intui-
tively, for measurable collection C of coins, M*(C) is the probability that a
binary random generator produces in the limit a string drawn from C. The
proof of the following lemma is left to the reader.

LEMMA 1O.6A Let cpEff and LERE be given. Then M*({CE&T9'{o.I}lcp


c-identifies L}) is defined.
DEFINITION 1O.6C Let cpEff, LERE, 2' s; RE and pE[O, I] be given.

i. cp is said to identify L with probability p just in case M*({CE59'{o.ljlcp


c-identifies L)) ~ p.
ii. cp is said to identify 2' with probability p just in case for all L E 2', cp
identifies L with probability p.
iii. 2' is said to be identifiable with probability p just in case some cp Eff
identifies 2' with probability p.

The foregoing definitions incorporate probabilistic considerations differ-


ently than in section 10.5. In the latter paradigm learners were conceived
deterministically, whereas environments were thought to harbor stochastic
processes; in the present paradigm the reverse is true. These alternative
conceptions give rise to learnability results of different characters. We
illustrate with a result to be contrasted with proposition 1O.5.2A.
PROPOSITION 10.6A Let 2' s; RE be identifiable with probability I. Then
2' is identifiable.
Proof Let cpEff identify 2' with probability I. We will find one coin c
such that cp c-identifies each LE 2'. Given such a c, I/I«J) = CP(Olh,a), u) is a
function that identifies every LE 2'. Recall that 2' is a countable set. Then
nL'if {c E59'{o.!}Iip c-identifies L} is a countable intersection of sets of
measure I. Such a set has nonempty intersection, since it has measure one.
Any c in this intersection is a c such that cp c-identifies each LE 2'. D

If attention is restricted to !Free and REsvt • the foregoing result may be


strengthened in the following, surprising way.

PROPOSITION 10.6B (Wiehagen, Freivald, and Kinber 1984) Let cpEff'"


identify 2' s; RE", with probability p > 0.5. Then some 1/1 E ff'" identifies
2'.
192 Other Paradigms of Learning

The proof of proposition 1O.6B proceeds in two steps. In the first step we
introduce a new criterion of learning.

DEFINITION 10.60 (Case and Smith 1983) The convergence criterion


{(L, {i})1 i is the index of a finite set ofindexes at least one of which is for L} is
called: 0 EX.

Thus ip E§ OEX-identifies LE RE on text just in case on any text t for L, r:p


converges on t to an index for a finite set of indexes that includes some index
for L.

LEMMA 10.6B If r:pE:F re• identifies!e s; RE with probability >0.5, then


!fJ E [g;rec, text, OEX].

Proof Let yO, yl, ... , be an effective listing of all coin sequences, and let
m(yi) = 2- lb (1') (= M*(B;';)). Given yl such that lh(y') ::; lh(a), we say that r:p
appears to be converging at 0' with y' just in case for all y 2 yl, if
lh(y)::; Ih(a), then r:p(y,U1b(1» = qJ(y',Ulh(yl». Now let D~ = {jlqJ appears to
v!
be converging with at a}, and define i~ = least i such that Li~1 m(y.l) >
0.5. Such an i must exist, since for every coin sequence y of len~iHt = lh(a),
r:p appears to be converging with y at 0'. Let c~ = {i::; i"liED,,}. Now we
define IjJ Eff re• which OEX-identifies If as follows: W..,(,,) = {jlthere is an
i E C" such that qJ(yl, U1h(y') = n. W..,(,,) is just the collection of indexes of
languages that the coins beginning with yi, i E C", are appearing to converge
to.
Suppose that LE!e and that t is a text for L. Then there is a set of coins c
of measure >0.5 such that r:p e-converges on t. Thus the sets e l N have a limit
as n approaches infinity. (This requires compactness of the measure space
on coins.) Thus C = lim e has the following properties: if i E C and yt ;2 v'
'N
then r:p(y',1;h(Y) = qJ(yl,t;h(yl» and LieCm(yi) > 0.5. Further IjJ converges to
an index for (ilr:p(y, t1h(y» = i for some i E C}. Since the set of coins Con which
r:p c-identifies t has measure> 0.5, there must he a coin c and i E C such that
yl is in c and qJ c-identifies t. Thus IjJ OEX-identifies t. 0

LEMMA 1O.6B (Case and Smith 1983) If 9'E [:Free, text, OEX]SVl' then
se E [:Free, text, INT]'vt.
Proof Suppose that r:p OEX-identifies 2. Then for each 0', r:p(a) is an index
for some finite set F~ of indexes of r.e. sets. If i E F~, we say that i is consistent
with 1:1 at stage s if (x, z) E »l.sand (x, y) E mglo) implies y = z. Define Xso
that ~(~) = U {H-iliEF" and i is consistent with a at lh(a)}. To see that X
Topological Perspective on Learning 193

INT-identifies ff', suppose that LE 2. Let t be a text for L. Let n be such that
for all m ~ n, (J'(tm ) = (J'(tn ). Then cp(tn ) is an index for some finite set F of
indexes one of which is for L. Now ifj e F is not an index for L, eitherj is not
consistent with tor H-j ~ L. In the former case, for each such j there is an
m > n such that Xwill never use H-j after m. Thus on t, X stabilizes to an index
for U {»;lieF and»; consistent with t}. Since ieF, »; consistent with t
implies that W; is a subset of L, and since there is an i e F such that Hi; = L, X
stabilizes to an index for L. 0

Proposition 1O.6B now follows from lemmata 1O.6A and IO.6B.


Many mathematically and technologically interesting questions arise in
connection with probabilistic learning. In particular, the present paradigm
may be investigated in conjunction with various choices of strategy, envi-
ronment, convergence criterion, and so forth. Since the role of randomly
generated events in human cognition is not at present documented, we leave
the formulation of these issues to the interested reader. Penetrating results
on some of these topics may be found in Wiehagen, Frievald, and Kinber
(1984) and Pitt (1984).

Exercise

lO.6A Let finite D ~ N be given. cE::T9'(D) is called a D-coin. Let M .... be the
natural measure on ::TD , and modify definition 1O.6C accordingly. Prove the
corresponding versions of propositions 10.6A and 10.6B.
Bibliography

Angluin, D. 1980. Inductive inference of formal languages from positive data. Information and
Conrro/45: 117-135.
Angluin, D., and Smith, C. 1982. A survey of inductive inference: theory and methods,
Technical report 250. Department of Computer Science, Yale University, New Haven,
October.
Barzdin, L, and Podnieks, K. 1973. The theory of inductive inference. In Proceedings of the
M arhematicaf Foundations of Computer Science, pp. 9-15.
Blum, M. 1967. A machine independent theory of the complexity of the recursive functions.
Journal a/the Association/or Computing Machinery 14(2): 322-336.
Blum, M. 1967a. On the size of machines. Information and Control II (3): 257-265.
Blum, L., and Blum, M. 1975. Toward a mathematical theory of inductive inference. Infor
malion and Control 28 : 125-155.
Brown. R., and Hanlon. C. 1970. Derivational complexity and the order of acquisition of child
speech. In Cognition and rhe Development oj Language J. Hayes (ed.). New York: Wiley.
Case, J., and Lynes, C. 1982. Proceedings ICALP82, Aarhus, Denmark, July 1982, Lecrere
Notes in Computer Science. New York; Springer-Verlag.
Case. J., and Ngo-Manguelle, S. 1979. Refinements of inductive inference by Popperian
machines, Technical report. Department of Computer Science, SUNY, Buffalo.
Case, J., and Smith, C. 1983. Comparison of identification criteria for machine inductive
inference. Theoretical Computer Science 25: 193-220.
Chen, K.-J. 1982. TradeolTs in the inductive inference of nearly minimal size programs.
lrformauon and Control 52: 68-86.
Chomsky, N. 1957. Syntactic Structures. The Hague: Mouton & Co.
Chomsky, N. 1965. Aspects of the Theory of Syntax. Cambridge, Mass.: MIT Press.
Chomsky, N. 1975. Reflecuons on Language, New York: Random House.
Chomsky, N. 1980. Rules and Representations. New York: Columbia University Press.
Chomsky, N. 1980a. Initial states and steady states. In Language and Learning, M. Piattelli-
Palmarini (ed.). Cambridge, Mass.: Harvard University Press, pp. 107-130.
Feldman, H., Goldin-Meadow, S., and Gleitman, L. 1978. Beyond Herotodus: the creation of
language by linguistically deprived deaf children. In Action, Symbol and Gesture: The
Emeroence of Languaye, A. Lock (ed.). New York: Academic Press.
Freivald, R., and Wiehagen, R. 1979. Inductive inference with additional information.
Elektronische I nJormatiansverarbeitung und K ybernetik 15: 179-185.
Fodor, J. 1976. The Language of Thought. Cambridge, Mass.: Harvard University Press.
Gold, E. M. 1967. Language identification in the limit. Information and Control 10: 447-474.
Hopcroft, L, and Ullman, J. 1979. lmroducuon to Automata Theory, Languages, and Compu-
tation. N. Reading, Mass.: Addison-Wesley.
Horning, J. 1969. A Study of Grammatical Inference, Ph.D. dissertation. Computer Science
Department, Stanford University, Stanford.
Kripke, S. 1982. Wittgenslein: On Rules and Private Language-An Elementary Exposition.
Cambridge, Mass.: Harvard University Press.
Lenncbcrg, E. 1967. Biological Foundations of Language. New York: Wiley.
Levy, A. 1979. Basic Set Theory. New York: Springer-Verlag.
196 Bibliography

Lewis. H .• and Papadimitriou, C. 1981. Elements a/the Theory of Computation. Englewood


ClilTs, N.J.: Prentice-Hall.
Machtey, M., and Young, P. 1978. An lntroducuon to the General Theory ofAlgorithms. New
York: North Holland.
Malitz, J.1979./ntroduction to Mathematical Logic. New York: Springer-Verlag.
Mazurkewich, I., and White. L. 1984. The acquisition of dative-alternation: unlearning
overgeneralizations. Cognition 16(3): 261-283.
Newport, E., Gleitman, H" and Gleitman, L. 1977. Mother I'd rather do it myself: some effects
and nonefTects of maternal speech style, In Talking to Children, C. Snow and C. Ferguson (eds.),
Cambridge: Cambridge University Press.
Osherson, D., and Weinstein, S. 1982. Criteria of language learning. Information and Control
52(2): 123-138.
Osherson, D., and Weinstein, S. 1982a. A note on formal learning theory. Cognition 11: 77-88.
Osherson, D., and Weinstein, S. 1984. Formal learning theory. In Handbook oJ Cognitive
Neuroscience, M. Gazzaniga (ed.). New York: Plenum.
Osherson, D., and Weinstein, S. 1984a. Models of language acquisition. In The Biology oj
Learning, P. Marler and H. Terrace (eds.). New York: Springer- Verlag.
Osherson, D., and Weinstein, S. 1984b. Learning theory and neural reduction: a comment. In
Neonate Cognition, J. Mehler and R. Fox (eds.). Hillsdale, N.J.: Erlbaum.
Osherson, D., and Weinstein, S. 1985. Structure identification. Manuscript.
Osherson, D., Stob, M., and Weinstein, S. 1982_ Ideal learning machines. Cognitive Science
6(2): 277-290.
Osherson, D., Stab, M., and Weinstein, S. 1982a. Learning strategies. lnformauon and Control
53(1,2): 32-51.
Osherson, D., Stab, M., and Weinstein, S. 1984. Learning theory and natural language,
Cognition 17(1): 1-28.
Osherson, D., Stab, M., and Weinstein, S. 1985. Analysis of a learning paradigm. In Learning
and Conceptual Change, A. Marras (ed.). Hingham, Mass.: Reidel.
Osherson, D., Stob, M., and Weinstein, S, 1985a. Social learning and collective choice, to
appear.
Piatelli-Palmarini, M. (ed.). 1980. Language and Learning. Cambridge, Mass.: Harvard Uni-
versity Press.
Pinker, S, (1984), Language Learnability and Language Development, Cambridge, Mass.:
Harvard University Press.
Pitt, L. 1984. A characterization of probabilistic inference. Department of Computer Science,
Yale University, New Haven.
Popper, K. 1972. Objective Knowledge. Oxford: Oxford University Press.
Putnam, H. 1961.Some issues in the theory of grammar. In The Structure ofLanguage and Its
Mathematical Aspect, R. Jakobsen, (ed.). Providence: American Mathematical Society.
Putnam, H. 1975. Probability and confirmation. In Mathematics, Matter, and Method.
Cambridge: Cambridge University Press.
Putnam, H. 1980.What is innate and why: comments on the debate. In Language and Learning,
M. Piattelli-Palmarini (ed.). Cambridge, Mass.: Harvard University Press.
Rogers, H. 1958.Goedel nurnberings of partial recursive functions. Journal oJSymbolic Logic
23: 331-341.
Bibliography 197

Rogers, H. 1967. Theory of Recursive Functions and Effective Computahility. New York:
McGraw-Hill.
Sankoff G., and Brown, P. 1976. The origins of syntax in discourse: a case study ofTok Pisin
relatives, Language 52: 631-666.
Shapiro, E. 1981. Inductive inference of theories from facts. Research report 192. Department
of Computer Science, Yale University. New Haven.
Shoenfield, J. 1967. Mathematical Logic. N. Reading, Mass.: Addison-Wesley.
Smith, C. H. 1981. The power of parallelism for automatic program synthesis. Proceedings of
the Twenty Second Symposium on the Foundations a/Computing. IEEE, pp. 283-295.
Solomonoff, R. J. 1964. A formal theory of inductive inference. Information and Control
7: 1-22, 224-254.
Wexler. K.• and Culicover, P. 1980. Formal Principles of Language Acquisition. Cambridge,
Mass.: MIT Press.
wiehagen, R. 1976. Limeserkennung rekursiver Funktionen durch spezielle Stratgien,
Elektronische Irformauonsverarbeitunq und Kybernerik 12: 93, 99.
Wiehagen, R.1977. Identification of formal languages. In Lecture Nores in ComputerScience
(New York: Springer-Verlag), 53: 571-579.
Wiehagen, R. 1978. Characterization problems in the theory of inductive inference. In PrO-
ceedings of the Fifth Colloquium on Automata, Languages, and Programming. New York:
Springer-Verlag, pp. 494-508.
Wiehagen, R., Freivald, R., and Kinber, E. 1984. On the power of probabilistic strategies in
inductive inference. Theoretical Computer Science 28: 111-133.
Wittgenstein, L 1953. Philosophical Investigations. New York: Macmillan.
List of Symbols

N Section 1.2.1, p. 8
§ Section 1.2.1, p. 8
<p(x)l, <p(xH Section 1.2.1, p. 8
<x,y) Section 1.2.1, p. 8
AxB Section 1.2.1, p. 8
Xl (X), X2(X), Section 1.2.1, p. 8
Section 1.2.1, p. 8
lfJi Section 1.2.1, p, 9
Wi Section 1.2.2, p_ 10
RE Section 1.2.2, p. 10
REn n Definition 1.2.2A, p. 10
RE rec Definition 1.2.2B, p. 10
REm Definition 1.2.2E, p. 11
rng(t) Definition 1.3.3A, p. 13
y Definition 1.3.3A, Definition 5.3A, pp. 13,98
rng(cr) Section 1.3.4, Definition 5.2A, pp. 15,97
Ih(a) Section 1.3.4, p. 15
SEQ Section 1.3.4, p. 15
Section 1.3.4, p, 14
Section 1.3.4, p, 14
Exercise 1.4.3K, p. 22
Section 2.1, p. 25
Section 2.1, p. 25
Section 2.1, p. 25
REid Definition 2.3B, p. 29
[9'l, [9']... Definition 4.1D, p. 45
fFP Definition 4.1C, p. 46
K Definition 4.2.1A, p. 48
REn ,, ]! Exercise 4.2.1G, p. 50
RE.... Exercise 4.2.1H, p. 50
lI>i Definition 4.2.2A, p. 50
SO Definition 4.3.5B, p. 61
REsD Definition 4.3.5B, p. 61
REc b., Exercise 4.3.5C, p. 63
M(L) Definition 4.3.6B, p. 64
Definition 4.4.lA, p. 66
Definition 4.4.1A, p. 66
Definition 4.4.1C, p. 70
List of Symbols 199

w..• Definition 4.4.1C, p. 70


REaez Definition 4.6.1C, p. 84
REs; Definition 4.6.1C, p. 84
!Ffin
Definition 4.7A, p. 93
#, text # Definition 5.2A, p. 97
SEQ# Section 5.2, p. 97
[9',1], [9',1].", Definition 5.3C, p. 99
IE I(CP) Exercise 5.3B, p. 100
(1 " t, Section 5.4.1, p. 100
SEQ* Exercise 5.6.2A, p. 115
[9',I',tjJ, [9',I',et']rn Definition 6.1.28, p. 121
INT Definition 6.l.2C, p. 121
.2',.,,(CP) Exercise 6.1.2A, p. 122
FINT Definition 6.2A, p. 123
FINT(n) Exercise 6.2.lF, p. 125
RE.sn Definition 6.2.3A, p. 127
FFD Exercise 6.2.38, p. 128
EXT Definition 6.3A, p. 130
FEXT Definition 6.3.3A, p. 133
BEXT Definition 6.4A, p. 134
par(L) Definition 6.4.1A, p. 135
REpsd Definition 6.4.1A, p. 135
BFEXT Definition 6.4.3A, p. 138
FD Definition 6.5A, p. 139
FD(n) Exercise 6.5.2A, p. 141
BFD Definition 6.5.3A, p. 141
SIM(f) Definition 6.6A, p. 143
FINTSIM(f) Exercise 6.6B, p. 144
CI Definition 6.8.1B, p. 146
FINTCI Definition 6.8.3B, p. 149
FINT(n)CI Definition 6.8.3B, p. 150
EXTCI Definition 6.8.3C, p. 150
FD(n)CI Definition 6.8.3C, p. 150
[9",I!,~r Definition 7.1A, p. 152
nL~l Definition 7.2A, p. 155
TIl-complete Definition 7.2A, p. 155
n l-indexable Definition 7.2A, p. 155
:El-indexable Definition 7.2A, p. 155
200 List of Symbols

c-chain Definition 7.2A, p. 155


REtree Definition 7.2A, p. 155
Definition 7.2A, p. 155
[.'I', text, INT]''' Definition 7.4A, p. 160
CONV(""t) Definition 8.1A, p. 167
!YL. !T!!, Definition 8.1B, p. 168
[.'1']"" Definition 8.1D, p. 169
[Y]~~~· Definition 8.1D, p. 169
~.!f(1min Definition 8.2.1A, p. 170
CONVu t Exercise 8.2.3C, p. 174
[9'''0, text, SIM(f)]'" Definition 8.25A, p. 175
r.'l', S. 'Cl.... Exercise 8.2.5A, p. 176
LS. Definition 9.IA, p. 178
LS~u Definition 9.3A, p. 181
B<1. B;; Definition 10.lA, p. 182
:T,:TL Definition 10.lA, p. 182
F. Definition IO.IB, p. 182
mL,ML Section 10.5.1, p. 186
.H Section 10.5.1, p. 186
Y{O,l} Definition 10.6A, p. 190
M* Section 10.6, p. 190
B: Section 10.6, p. 191
Name Index

Angluin. D.• xi. xiii. 30. 56. 75. 76 P utn am. H .• 3. 20. 63. 14 5

Ba rzd in. r., 137 Roger s. H., xv, 9. 12, 29, 48. 59. 63. 88. 117.
Blum. L., xiii, 25. 72. 83 -85. gg, 108. 184 135, 155, 156, 159
Blum. M.• xiii, 25. SO. 5 1. 63 -66. 72. 83-85.
88, 108, 184 Sa nkoff G.. 154
Bro wn. P.• 154 Schafer . G.. 72. 73. 8 1. 92
Brown. R, 14 Shapiro. E.• 59
Shoenfiel d, J.• 155-1 57
Ca nny . J., 56, 72 Smith . C . xi. xlll , 125. 127. 128. 132, 133.
Case. J.• xiii. 62.125.1 27.1 28. 132. 133.1 37. 137,1 40 -142, 192
140- 142, 149,1 50,1 51,1 92 Sol omonoff R.• 3
C he n. K., 143. 144 Sloe!,)., 133
C homsky. N .• xiii, 20, 34 Slo b. M .• xi. xiii
C uhcover. P., xiii. 12.34. 66. 73. 187
Ullma n. J .• 12
Feld man. 11.• 154
F reivald. a., 23. 147, 148. 191, 193 We instei n. S.• xi. 139
Fod or. J.• 41 We xler . K.• xiii. 12. 34.66. 73, 187
Fulk , M., 60, 89, 9 1 White. L.. 76
Wie hage n. R.• 23. 29, 58. 109. 147. 148. 191.
Gleitman. u., 14, 106 193
Glcitman. L.. xiii . 14, 106. 154 witrgenstein. L..41
G old. E. M .• xiii, 3. 7, 23. 27, 28. 48. 78. 79.
109, 113,11 5, 116,146,149,1 68,1 74 Yo ung. P.• H . 9. 5 1.68
G o ldin- M cad ow, S.• 154

H anlon. C. 14
H ar rin gton. L.. 140
Ho pcroft. L, 12
Ho rn ing. J.• 187

K ri pke. S.•41
K in bcr. E.• 191. 193

Len neberg. E., 14


Levy, A.• 182. 184. 186
Lewis. II .. xv
Lynes. c, 132. 140, 149, 150. 151

Mach tey, M .. xv, 9. 5 1. 68


M alit z, J.• 169
Ma zurke wich . I.. 76
Mi nic ozzi. E., 83. 84

Ne wpo rt. E.. 14. 106


Ngo- Manguelle . S.• 62

O sh c rson . D .• xi. xhi, 139

Pa padimit rio u. C; xv
Pi nk er . S.• xiii. 33. 186
Pitt . L.. 193
Podnie ks. K .• 137
Popper. K.• 62
Subject Index

acceptable indexing, definition l.2.1A, 9 efficient identification, section 8.3, 176


accountable, definition 4.3.5A, 61 end in S on t, definition 6. 1.1 A, 119
accuracy, section 1.1,7 enumerator, definition 4.5.3A, 78
almost self-naming, definition 6.2.3A, 127 enumerating function for cp,
almost everywhere zero, definition 4.6.1C(i), definition 4.5.3A, 78
84 equivalence of indexes, section 1.2.1, 10
almost everywhere n, exercise 4.2. II, 50 evidential relation, definition 5.3A, 98
arithmetical. definition 7.3.2A. 159 environment, sections 1,1, t.3.3, 13
arithmetically index able, definition 7.5B, extensional. definition 6.JA. 130
161 extensionally confident, exercise 6.3.1B, 131
ascending text, definition 5.5.IA, 106 EXT-identify text efficiently, exercise 8.2.3C,
a vid, exercise 9.1C, 178 174
EXT-locking sequence, definition 9.3A, 180
Baire topology, definition 1O.1A, 182 extensional characteristic index, definition
big eviden tial rela tion, exercise 5.5,3A, 110 6.8.3C.150
bounded extensional, definition 6.4A, 134 extensionally conservative, exercise 6.3.1A,
bounded finite difference, definition 6.5.3A, 131
141 extensionally one-one, definition 10.5.3A,
bounded finite dilTerence extensional, 18R
definition 6.4,3A, 138 extensionally order independent, exercise
6.3.1C. 132
cautious, definition 4.5.4A, 79 EXT-predictable, definition 9.3A. 181
'6'-converge on to L, definition 6.1.1B, 120
characteristically extensionally one-one, fat text, definition 5.5.4A, exercise 10.5.IA,
definition 1O.5.3A. 188 110,187
characteristic index, definition 6.8.1B, 146 F-biased, exercise 4.4.1 D, 72
char~cteristic index for L, definition 6.8.1A, final in t, exercise 6.3.1 C, 132
146 FINT, definition 6.2A, 123
characteristic function, definition 1.2.2C, 10 finite analogs, exercise 6.2.1A, 124
'iii-confident, exercise 6.4.2A, 138 finite difference, definit ion 6.5A, 139
'C-identifv Lon 8. definition 6.1.2A. 120 finite difference extensional, definition
<C-identify L on 8 exactly, definition 7.1A, h11A 111
152 finite dilTerence intensional, definition 6.2A,
'iii-identify, definition 10.68, 190 123
<If-locking sequence for qJ and L on text, finite difference maximal, exercise 6.2.1 E,
exercise 6.1.2 D. 123 part b, 124
coding of finite sequences, section 1.34, 15 FINT(n), exercise 6.2.1F, 125
coin, definition 1O.6A, 190 finite dilTerence intensional characteristic
coin sequence, definition 10.6A, 190 index, definition 6.8.3B, 149
comparative grammar, section 3.1, 34 finite difference characteristic index for L,
computational complexity measure, definition 6.8.3A, 149
definition 4.2.2A, 50 finite dilTerence reliable, exercise 4.6.1C, 85
conditional consistency, exercise 4.3.3A, 59 finite difference saturated, exercise 6.2.1 E,
confident, definition 4.6.2A, 86 part a, 124
conjecture bounded, exercise 4.6.2B(iii), 87 finite variant, definition 2,3A, 29
conservative, definition 4.5.1A, 75 FlNT order independent, exercise 6.2.1G,
convergence criterion, definition 6.1.1B, 120 125
consistency, definition 4.3.3A, 56 [simple, definition 4.3.6C, 64
converges on t, definition 1.4.1A, 17 functional finite difference, exercise 6.2.3B,
convergence point, section 8.1, 167 129
cousins, exercise 4.6.JC, 92
generalized identification paradigm,
decisive, definition 4.5.5A, 80 section 7.5, 43, 162
defined on, definition 1.4.1A, 17 gradualist, definition 4.5.2A, 77
204 Subject Index

h-time, definition 4.2.28, 51 n-finite difference characteristic index for L,


definition 6.8.3A, 149
identify, definitions l.4.IA, 1.4.3A, 17 n-finite difference intensional characteristic
identify #, definition 5.2B, 97 index, definition 6.8.38, 149
identify fast, section 8.1, 168 n-gradualist, exercise 4.5.2A, 77
identify strictly faster than, definition 8.1C, n-memory limited, definition 4.4.1B, 66-67
I fiR noisy text, definition 5A.1A, 100
identify text efficiently, definition 8.1C, 168 nonexeessive, exercise 4.3.2D, 56
identify text efficiently with respect to IF"·, nonrecursive text, definition 5.5.3A, 109
exercise 8.1E, 169 nontriviality, definition 4.3.2A, 54
identify very exactly, definition 7.7A, 160
identify on S. definition 5.38. 98 one-shot learner, exercise 1.5.2F, 24
identify with probability p, definition 10.6C, oracle, exercise 5.6.2E, 116
191 order independent, definition 4.3.6A, 88
imperfect informant, exercise 5.6.2C, 115
imperfect text, definition 5.4.3A, 105 padding function, section 4.4.1, 68
incomplete text, definition 5.4.2A, 103 paradigm, section 1.1, 7
index, sections 1.2.1,1.2.2,9, 10 parity self-describing, definition 6A.IA, 135
informant, definition 5.6.IA, 113 partly set driven, exercise 4.6.38, 92
INT, definition 6.I.2C, 121 popperian, definition 4.3.5C, 62
intensional, definition 6.I.2C, 121 positive test, section 1.2.2, 10
intrusion text, exercise 5.4.1E, 102 probability measure, section 10.5.1,186
i-learner, exercise 1.5.2E, 24 predictable, definition 9.2A, 179
predictable', exercise 9.2D, 180
learning function, section 1.3.4, 15 primitive recursive text, exercise 5.5.2B, 109
learn text, exercise 5.5.4A, 112 prudence, definition 4.3.4A, 59
learning theory, section 1.1,7
limiting process, section 1.5.1,22-23 r.e, bounded, definition 4.3.48, 60
local, definition 4.7B, 93 r.e, core, exercise 4.7C, 94
locquacious, exercise 4.3.68, 66 recursion theorem, lemma 2.3A, 29
locking sequence, definition 2.1A, 26 recursive, section 1.2.2, 10
locking sequence hunting construction, recursively enumerable (r.e.), section 1.2.2,
section 4.6.3, 89 10
locking tex t, definition IOAA, exercise 2.1C, recursive text, definition 55.2A, 107
27,185 r.e. indexable, definition 4.3.28, 54
r.e. index set, definition 4.3.2B, 54
maximal, exercise 4.6.2C, 87 reliable, definition 4.6.IA, 83
maximal on incomplete text, exercise reliable,... definition 4.6.IB, 83
5.4.20,104 represent, definition 1.2.2D, II
maximal with respect to [lF m ]' .c. , exercise restricts, section 4.1, 46
8.2.3B,174 restrictive, section 4.1, 46
measure one identify, definition 10.5.2A,
187 saturated, exercise 2.2E, 28
measure one identify with respect to .It, saturated on imperfect text, exercise 5.4.38,
definition 10.5.3B, 189 106
memory bounded, exercise 4.4.1E, 72-73 saturated on noisy text, exercise 5A.IF,
memory limited, definition 4.4.1B, 66-67 102
m-incomplete text, exercise 5.4.2C, 104 self-describing, definition 2.38, 29
mind change, exercise 1.5.2D, 24 self-indexing, definition 4.6.1C(ii), 84
mixed text, exercise 5.5.48, 113 self-monitoring, definition 1.5.2A,23
monotonic, exercise 4.6.3C, 92 set-driven, definition 4.4.2A, 73
m-noisy text, exercise 5.4.1D, 102 SD, definition 4.3.5B, 61
SIM(f)-identify text efficiently, definition
n-finite difference characteristic index, 8.2.5A, 175
definition 6.8.3C, 150 simpleminded, definition 4.3.6D, 64
Subject Index 205

single valued, delinition 1.2.2D, 11


size measure, definition 4.3.6A, 63
Socratic, exercise 2.5A, 33
stability, section 1.1,7
stabilized on i, delinition 10.1C, 182
strategy, definition 4.1A, 45
strict enumerator, exercise 4.5.3B, 79
strictly ascending text, exercise 5.5.1A, 107

Tarski-Kuratowski computation, section


7.2,156
team identify, exercise 2.2F, 28
team FINT-identify, exercise 6.2.11,125
text, definition 1.3.3A, 13
text #, definition S.2A, 97
text with linite oracle, exercise 5.6.2F, 116
total, delinition 1.2.2D, 11
total minded, exercise 4.3.5A, 63
tree, definition 7.2A, 115
Turing machine, section 1.2.1, 8

uniformly measure, definition IO.S.3A, 188

weakly decisive, exercise 4.5.5A, 82


weakly nontrivial, exercise 4.3.2£, 56
weakly "reliable,exercise 4.6.1A, 85
w.o. chain, exercise 4.6.2B, 87
cd b Bradford Books
Natalie Abrams and Michael D. Buckner, editors. MEDICAL ETHICS.

Peter Achinstein and Owen Hannaway, editors. OBSERVATION, EXPERIMENT. AND


HYPOTHESIS IN MODERN PHYSICAL SCIENCE.

Jon Barwise and John Perry. SITUATIONS AND ATTITUDES.

Ned J. Block, editor. IMAGERY.

Steven Boer and William G. Lycan. KNOWING WHO.

Myles Brand. INTENDING AND ACTING.

Robert N. Brandon and Richard M. Burian, editors. GENES, ORGANISMS,


POPULATIONS.

Paul M. Churchland. MATTER AND CONSCIOUSNESS.

Robert Cummins. THE NATURE OF PSYCHOLOGICAL EXPLANATION.

Daniel C. Dennett. BRAINSTORMS.

Daniel C. Dennett. ELBOW ROOM.

Fred I. Dretske. KNOWLEDGE AND THE FLOW OF INFORMATION.

Hubert L. Dreyfus, editor, in collaboration with Harrison HalL HUSSERL,


INTENTIONALITY, AND COGNITIVE SCIENCE.

K. Anders Ericsson and Herbert A. Simon, PROTOCOL ANALYSIS.

Owen J. Flanagan, Jr. THE SCIENCE OF THE MIND.

Jerry A. Fodor. REPRESENTATIONS.

Jerry A. Fodor. THE MODULARITY OF MIND.

Morris Halle and George N. Clements. PROBLEM BOOK IN PHONOLOGY.

Gilbert Harman. CHANGE IN VIEW: PRINCIPLES OF REASONING.

John Haugeland, editor. MIND DESIGN.

Norbert Hornstein. LOGIC AS GRAMMAR.

William G. Lycan. LOGICAL FORM IN NATURAL LANGUAGE.

Earl R. Mac Cormac. A COGNITIVE THEORY OF METAPHOR.

John Macnamara, NAMES FOR THINGS.

Charles E. Marks. COMMISSUROTOMY, CONSCIOUSNESS, AND UNITY OF MIND.

Izchak Miller. HUSSERL, PERCEPTION, AND TEMPORAL AWARENESS.


Daniel N. Osherson, Michael Stob, and Scott Weinstein. SYSTEMS THAT LEARN: AN
INTRODUCTION TO LEARNING THEORY FOR COGNITIVE AND COMPUTER SCIENTISTS.

Zenon W. Pylyshyn. COMPUTATION AND COGNITION.

W. V. Quine. THE TIME OF MY LIFE.

Irvin Rock. THE LOGIC OF PERCEPTION.

George D. Romanos. QUINE AND ANALYTIC PHILOSOPHY.

George Santayana. PERSONS AND PLACES.

Roger N. Shepard and Lynn A. Cooper. MENTAL IMAGES AND THEIR


TRANSFORMATIONS.

Elliott Sober, editor. CONCEPTUAL ISSUES IN EVOLUTIONARY BIOLOGY.

Elliott Sober. THE NATURE OF SELECTION.

Robert C. Stalnaker. INQUIRY.


Stephan P. Stich. FROM FOLK PSYCHOLOGY TO COGNITIVE SCIENCE.

Joseph M. Tonkonogy. VASCULAR APHASIA.

Hac Wang. BEYOND ANALYTIC PHILOSOPHY.

You might also like