Professional Documents
Culture Documents
Daniel N. Osherson
Michael Stob
Scott Weinstein
A Bradford Book
The MIT Press
Cambridge, Massachusetts
London, England
Second printing, 19 8 5
All rights reserved. No part of this book may be reproduced in any form by any electronic or
mechanical means (including photo copying, recording, or information storage and retrieval)
withot:t permission in writing from the publisher.
This book was set in Times New Roman by Asco Trade Typesetting Ltd., Hong Kong,
and printed and bound by Halliday Lithograph in the United States of America
Osherson, Daniel N.
Systems that learn.
0262150301
OSHERSON
SYST THAT LEARN
Contents
Series Foreword xi
Preface xiii
Acknowledgments XV
I IDENTIFICATION 5
II IDENTIFICATION GENERALIZED 43
4 Strategies 45
4.1 Strategies as Sets of Learning Functions 45
4.2 Computational Constraints 47
4.2.1 Com potability 47
4.2.2 Time Bounds 50
4.2.3 On the Interest of Nonrecursive Learning Functions 53
4.3 Constraints on Potential Conjectures 53
4.3.1 Totality 53
4.3.2 Nontriviality 54
4.3.3 Consistency 56
4.3.4 Prudence and r.e. Roundedness 59
4.3.5 Accountability 61
*4.3.6 Simplicity 63
4.4 Constraints on the Information Available to a Learning
Function 66
4.4.1 Memory-Limitation 66
*4.4.2 Set-Driven Learning Functions 73
4.5 Constraints on the Relation between Conjectures 74
4.5.1 Conservatism 75
4.5.2 Gradualism 77
4.5.3 Induction by Enumeration 78
*4.5.4 Caution 79
*4.5.5 Decisiveness 80
Contents VII
5 Environments 96
5.1 Order and Content in Natural Environments 96
5.2 Texts with Blanks 96
5.3 Evidential Relations 98
5.4 Texts with Imperfect Content 100
5.4.1 Noisy Text 100
5.42 Incomplete Text 103
*5.4.3 Imperfect Text 105
5.5 Constraints on Order 106
5.5.1 Ascending Text 106
5.5.2 Recursive Text 107
*5.5.3 Nonrecursive Text 109
*5.5.4 Fat Text 110
5.6 Informants 113
5.6.1 Informants and Characteristic Functions 113
5.6.2 Identification on Informant 115
*5.6.3 Memory-Limited Identification on Informant 116
5.7 A Note on "Reactive" Environments 118
Bibliography 195
List of Symbols 198
Name Index 201
Subject Index 203
Series Foreword
Lila Gleitman
Susan Carey
Elissa Newport
Elizabeth Spelke
Preface
Our principal intellectual debts are to the works of E. Mark Gold and
Noam Chomsky. Gold (1967) established the formal framework within
which learning theory has developed. Chomsky's writings have revealed
the intimate connection between the projection problem and human
intelligence. In addition, we have been greatly influenced by the research of
Blum and Blum (1975), Angluin (1980), Case and Smith (1983), and Wexler
and Culicover (1980). Numerous conversations with Lila Gleitman and
with Steven Pinker have helped us to appreciate the bearing of learning
theory on empirical studies of first language acquisition, and conversely
the bearing of first language acquisition studies on learning theory. We
thank them for their patient explanations.
Preparation of the manuscript was facilitated by a grant to Osherson
from the Fyssen Foundation for 1983-84 and by National Science Founda·
tion Grants MCS 80-02937 and 82-00032 to Stob. We thank these agencies
for their support.
How to Use This Book
that you "win in the limit." Is it possible to win the game in the limit even
though you make one hundred wrong guesses? Is there any number of
wrong guesses that is logically incompatible with winning the game in the
limit'/
Fifth question: Suppose that all the clues we give you are of the form:
The set contains the number n. Suppose furthermore that for every positive
integer i, we eventually give you a clue of this form if and only if i is in
fact contained in the set we have in mind. (So for every number i in our set,
you are eventually told that the set contains i; also you receive no false
information about the set.) Do not suppose anything about the order in
which you will get all these clues. We will order them any way we please.
(Recall how we surprised you with the fourth clue.) Now let us call a
guessing rule "winning" just in case the following is true. If you use the rule
to choose your guesses, then no matter which of the sets we have in mind,
you are guaranteed to win the game in the limit. Specify a winning guessing
rule for our game.
Sixth question: We make the game harder. This time we are allowed to
select any of the sets that are legal in the original game, but we may also
select the set { l, 2, 3,4, 5, 6, ... ) of all positive integers. The rules about clues
are the same as given in question 5. Play this new game with a friend, and
then think about the following question. Is there a winning guessing rule
for the new game?
Seventh question: Let us make the last game easier. The choice of sets is
the same as in the last game, but we now agree to order our clues in a
certain way. For all positive integers i andj, if both i andj are included in
the set we have in mind, and if i is less thanj, then you will receive the clue
••The set contains C' before you receive the clue "The set containsj." Can
you specify a winning guessing rule for this version of the game?
Eighth question: Here is another variant. We select a set from the original
collection (thus the set {l, 2, 3,4, 5, ... ) of all positive integers is no longer
allowed). Clues can be given in any order we please. You get only one guess.
You may wait to see as many clues as you like, but your first guess is
definitive. Play this game with a friend. Then show that no matter what
rule you use to make your guess, you are not guaranteed to be right. Think
about what happens if you are allowed two guesses in the game.
The games we have been playing resemble the process of scientific
discovery. Nature plays our role, selecting a certain pattern that is imposed
on the world. The scientist plays your role, examining an endless series
Introduction 3
of clues about this pattern. In response to the clues, the scientist emits
guesses. Nature never says whether the guesses are correct. Scientific
success consists of eventually offering a correct guess and never deviating
from it thereafter. Language acquisition by children can also be construed
in terms of our game. The child's parents have a certain language in mind
(the one they speak). They provide clues in the form of sentences. The child
converts these clues into guesses about the parents' language. Acquisition
is successful just in case the child eventually sticks with a correct guess.
The similarity of our game to these and other settings makes it worthy
of more careful study. We would like to know which versions of the
game are winnable and by what kinds of guessing rules. Research on these
questions began in the 1960s by Putnam (1975), Solomonoff (1964), and
Gold (1967). These initial investigations have given rise to a large literature
in computer science, linguistics, philosophy, and psychology. This body
of theoretical and applied results is generally known as learning theory
because many kinds of learning (e.g., language acquisition) can be con-
strued as successful performance in one of our games.
In this book we attempt to develop learning theory in systematic fashion,
presupposing only basic notions from set theory and the theory of compu-
tation. Throughout our exposition, definitions and theorems are illustrated
by consideration of language acquisition. However, no serious application
of the theory is described.
The book is divided into three parts. Part I advances a fundamental
model of learning due essentially to Gold (1967). Basic notation, termi-
nology, and theorems are there presented, to be relied on in all subsequent
discussion. In part II these initial definitions are generalized and varied in
dozens of ways, giving rise to a multitude oflearning models and theorems.
We attempt to impose some order on these results through a system of
notation and classification. Part III explores diverse issues in learning
theory that do not fit neatly into the classification offered in part II.
I IDENTIFICATION
We let N be the set {O, 1, 2, ... } of natural numbers. The set of all functions
(partial or total) from N to N is denoted ff. Following standard mathemat-
ical practice, members of ff will be construed as sets of ordered pairs of
numbers satisfying the "single-valuedness" condition. Single valued ness
specifies that no two pairs of numbers with the same first coordinates may
occur in the same function. There are nondenumerably many functions in
ff. We let the symbols qJ, rj;, e, cp', ... , represent possibly partial functions in
ff. The symbols f, q, h.]', .. . , are reserved for total functions in ff.1f qJ E!F
is defined on x E N, we sometimes write cp(x)!. Otherwise, we write cp(x)j.
It will often be useful to construe individual numbers as "tuples" of
numbers. This is achieved as follows. For each n E N we select some comput-
able isomorphism between N· (i.e., the n-fold Cartesian product on N) and
N. For Xl' x 2, ... , x.EN, (X l,X2,""X.) denotes the image under this
function of the (ordered) n-tuple (x l, X2,... , x.). In using this notation, the
reader should keep the following facts in mind (illustrating with n = 2):
This is achieved by listing the members of !Free and using ordinal positions
in the list as code numbers. To be useful, however. this listing of y;rce must
meet certain conditions, specifically:
DEFINITION 1.2.1A An acceptable indexing of !Free is an enumeration lfJo,
'PI' 'P2' ... , of (all of) $'m that meets the following conditions:
LEMMA 1.2.IA (Rogers, 1958) Let 'Po, 'PI"'" and ifJo, ifJ" ... , be any two
acceptable indexings of :!Free. Then there is a one-one, onto, total f E g;ree
such that 'Px = ifJf{X) for all x E N.
Proof See Machtey and Young (1978, theorem 3.4.7). 0
Thus any two acceptable indexings of the partial recursive functions are
identical up to a recursive isomorphism. We now fix on some specific
acceptable indexing of $"" (of the reader's choice). Indexes are henceforth
interpreted accordingly.
The following simple result will be useful in subsequent developments.
Lemma 1.2.1 B reflects the fact that any computable function can be pro-
grammed in infinitely many ways (e.g., by inserting redundant instruc-
tions into a given program). If ({J; = ({Jj, we often say that i and j are
equivalent.
_
f( x} -
{O, if XES,
-
I, if XES.
It is not difficult to prove that S <;; N is recursive if and only if its character-
istic function is recursive. Intuitively S E REmjust in case there is a mechan-
ical procedure (called a test) that eventually responds "yes" to any input
drawn from S and eventually responds "no" to any other input (thus a test,
unlike a positive test, is required to respond to every input).
Fundamentals of Learning Theory 11
Turning to the third special kind ofr.e. set, recall from section 1.2.1 that
each n E N represents a unique ordered pair of numbers, namely the pair (i,j)
such that <i,j) = n. Accordingly:
DEFINITION 1.2.20
Equivalently S is single valued just in case for all x, Y, zEN, if <x,Y> ESand
(x, a) ES, then Y ~ z. Plainly a single-valued set S represents the function <p
defined by the condition that for all x, yE N, <p(x) ~ Y if and only if
<x,y> ES. A single-valued set S is total just in casefor all x E N there is yE N
such that <X,y>ES.
Exercises
l.2.2A Let S ~ N be single valued, and suppose that S represents CPE.'F. Show that
rec
a. tpE.'F ifand only if SERE.
b. tp is total recursive if and only if S is total and r.e.
c. if SERE and S is total, then S is recursive.
1.2.28
a. Prove: Let J E!IF be the characteristic function for S £; N. Then S E RE rec if and
only if some TE REsv, represents J.
b. Show that there is a total recursive function J such that for all i E N, if ltj E REsvl>
then tpf(i) is the characteristic function for Jt;.
c. Prove: REsv, c; RErec .
1.3.1 Languages
1.3.2 Hypotheses
Example 1.3.3A
1.3.4 Learners
sequences of any length in any text is denoted: SEQ. SEQ may be thought of
as the set of all possible evidential states (e.g., the set of all possible finite
corpora of sentences that could be available to a child). We let the symbols
a, T, X, a', ... , represent finite sequences.
Now let a ESEQ. The length of a is denoted: lh(a). The (unordered) set of
sentences that constitute a is denoted: rng(u). We do not distinguish num-
bers from finite sequences of length 1. As a consequence of the foregoing
conventions, note that aE SEQ is in t E:!T if and only if a = "t;h(u)'
Example 1.3.4A
Example 1.3.48
Exercises
1.3.4A Let L be a nonempty language,and let t be a text for L. Let I be the learning
function of part a of example 1.3.48.
a. Show that LeREfi n if and only if for all but finitely many ne N, rng(tn) =
mg(t.+,).
b. Show that LE RE". if and only if l(l.) = l(t.+1) for all but finitely many n E N.
c.Suppose that there is n E N such that for infinitely many m E N,f(t.) ~ I(t.). Show
that H-!(i"l = L.
Fundamentals of Learning Theory 17
*1.3.4B A text t is called ascending if tn ::::;; tn+ 1 for all n e N; t is called strictly
ascending if'r, < t"+1 for all neN.
a. Let L be a finite language of at least two members. How manyascendingtexts are
therefor L?
b. Let L be an infinite language. How many strictly ascendingtexts are there for L?
These kinds of texts are treated again in section 5.5.1.
Clause ii of the definition may also be put this way: cp converges on t to Ijust
in case cp is defined on t, and there is n E N such that cp(tm ) = i for all m > n.
The intuition behind definition 1.4.IA is as follows. A text t is fed to a
learner lone number at a time. With each new input 1is faced with a new
finite sequence of numbers. I is defined on t if 1offers hypotheses on all of
these finite sequences. If 1is undefined somewhere in t, then I is "stuck" at
that point, lost in endless thought about the current evidence, unable to
accept more data. I converges on t to an index I just in case 1does not get
stuck in t, and after some finite number of inputs 1conjectures i thereafter.
To identify t, I must converge to an index for mg(t).
Let cpE:!F identify tEff. Note that definition 1.4.1A places no finite
bound on the number of times that cp "changes its mind" on t. In other
words, the set {n E NIcp(t") ,. CP(t"+l)} may be any finite size; it may not,
18 Identification
Example 1.4.IA
Exercise
1.4.1A Let t be the text 0, 1.2,3,4, 5, .... Let h be as described in part c of example
1.3.4B. Does h identify t?
Children are able to learn their language on the basis of many orderings of
its sentences. Since definition 1.4.1A pertains to individual texts it does not
represent this feature oflanguage acquisition. The next definition remedies
this defect.
Example t.4.2A
Children are able to learn any arbitrarily selected language drawn from a
large class; that is, their acquisition mechanism is not prewired for just a
single language. Definiton 1.4.2A does not reflect this fact. We are thus led
to extend the notion of identification to collections of languages.
Proof The key property of RE svt is this. Suppose that Land L' are
members of RE sv• and that L =f: L'. Then there are x, Y, y' E N such that
Fundamentals of Learning Theory 21
(x, Y) E L, (x, y') E L', and y # y'. Thus, if t is a text for L, there is an n E N
such that by looking at T" we know that t is not a text for L'.
N ow we define h E:F which identifies RE", as follows. For all U E SEQ, let
Exercises
a. Show that if f converges on t, then {f(tll)ln EN} is finite. Show that the converse
is false.
b. Show that if f identifies t, then Wf(1 ) = rng(r) for all but finitely many n EN. Show
that the converse is false. n
1.4.38 Let 2' = {N} U {EIE is a finite set of even numbers}. Specify a learning
function that identifies ze'.
l.4.3C Prove: Every finite collection of languages is identifiable. (Hint: Keep in
mind that a finite collection of languages is not the same thing as a collection of
finite languages.)
1.4.3D Let LE RE be given. Specify 'PE:F that identifies {LU DID finite}.
l.4.3E Let {Sili E N} be any infinite collection of nonempty, mutually disjoint
members ofREr e c ' Let zz = {N - S;fieN}. Specifiy a learning function that iden-
tifies 2.
t.4.3F Given ce', 2" s; RE, let:£' x ff' be {L x L'ILeff and L'e2"}. Prove: If
2,:.e's RE are each identifiable, then :.e x :£" is identifiable.
t.4.3G
a. Prove: !l' ~ RE is identifiable if and only if some total f e § identifies .fE.
b. Let t e Y and <p e § be given. We say that <p almost identifies t just in case there
exists an iEN such that (a) H'I = rng(t) and (b) <p(t/l) = i for all but finitely many
n e N. (Thus <p can almost identify t without being defined on t.) <p almost identifies
:£' s; REjust in case tp almost identifies every text for every language in 2'. .fE is said
to be almost identifiable just in case some <p e.:F almost identifies z", Prove:!t' ~ RE
is almost identifiable if and only if!t' is identifiable.
22 Identification
1.4.3H cpeff is said to P percent identify.IE £; REjust in case for every Le!£ and
every text t for L, cp is defined on t,and there is j E N such that (a)"'i = Land (b) there
is neN such that for all m > n, <p(I) = i for P percent of {jIm ~j ~ m + 99}.
!f £ RE is said to be P percent identifiable just in case some cpefF P percent
identifies !e. Prove
3. if P > 50, then Ie £ RE is P percent identifiable if and only if 2' is identifiable.
*b. if P ::; 50, then there is 2' £ RE such that :£ is P percent identifiable but 2' is
not identifiable.
I.4.31 cp E § is said to identify it' £ RE laconically just in case for every Le!l! and
every text t for L there is n E N such that (a) Wrp(lh) = L and (b) for all m > n, cp(Tm)t.
Prove: :l' ~ RE is identifiable if and only if!£' is identifiable laconically.
1.4.3J The property of REs"l used in the proof of proposition 1.4.3C is that if L.
L' E RE SY" L oF L', and t is a text for L, then there is an n e N such that ell is enough to
determine that t is not a text for L'. Show that there are identifiable infinite
collections oflanguages without this property.
1.4.3K Let <pe:!F be given. We define 2'(<p) to be {LeREI<p identifies L}.
a. Let .pe:!F be such that for all ereSEQ, .p(er) ~ the least nerng(er). Characterize
2'(.p).
b. Show by example that for <p,.p e:!F, 2'(<p) ~ 2'(.p) does not imply <p = .p.
subset of its domain. For this reason Gold (1967) refers to identification as
"identification in the limit." Because of the limiting nature of identification,
the behavior of a given learning function cp on a given text t cannot in
general be predicted from cp's behavior on any finite portion of t. The
underdetermination at issue here does not arise from the disadvantages
connected with the "external" observation of a learning function at work.
To make this clear, the next subsection discusses learning functions that
announce their own convergence and may thus be considered to observe
their own operation.
Exercise
eo, we must have ",(t~) = ",(t;+,) = ",(tn+,) for all m <: n. But ",(tn+,) is an
index for L, whereas t' is a text for LU {x o}. Hence", does not identify
LU {x o}, and so '" does not identify RE nn. D
Exercises
1.5.2A Let L. L' E RE be such that L c: L'. Show that no self-monitoring learning
function identifies {L, L'}.
1.5.28 Let:e be the collection of languages of Proposition 1.4.3B. Show that no
self-monitoring learning function identifies !.f.
*
t.5.2C Call a collection :t' of languages easily distinguishable just in case for all
Le!f' there exists a finite subset S of Lsuch that for all L' E fe, if L' L, then S 1. L'.
a. Specify an identifiable collection of languages that is not easily distinguishable.
b. Prove: Let it' £; RE be given. Then some self-monitoring rp E!iF identifies 2 ifand
only if 2' is easily distinguishable.
I.S.2D <p E!F is said to be a I-learner just in case for all t E 3 there exists no more
than one mEN such that cp("t:.:J -=F- wCL.--,). That is. a I-learner is limited to no more
than one "mind change" per text.
a. Prove: If 2 ~ RE is identifiable by a self-monitoring learning function, then 2 is
identifiable bv a l-Iearner.
b. Show that the converse to part a is false.
1.5.2E Let i E N. cp E!F is said to be an i-learner just in case for all texts t there exist
no more than i numbers m such that cp(Tm) 0:1 cp(Tm+l)' That is, an i-learner is limited
to no more than i "mind changes" per text.
a. Forj E N define.2j ~ {{OJ, {O, t}, {O, 1,2}, ... , {O, t, 2, ... ,j}}. Prove: For alljE N,
2j is identifiable by an i-learner if and only if i > j. (Hint: Suppose that cp E SF is an
i-learner and that i ::; i. Consider texts of the form 0, 0, ... ,0, 1, I, ... , 1•... .i.i. ... .],
j, .... What happens as the repetitions get longer and longer?)
b. For iEN, let F; be the class of r-leamers. Let F = UiFj. Show that no «p e F
identifies RE r i n • Show that no cpE F identifies {N - {xll.c e N}.
1.5.2F Let e be an index for 0. if> E:Y is said to be a one-shot learner just in case for
every text I the cardinality of (op(lnlltp(l.) '" e} '" 1. Let 2' <;; RE be given. Show
that some one-shot learner identifies :e if and only if some self-monitoring learning
function identifies 2.
2 Central Theorems on Identification
Many of the theorems in this book rest on the next result. To state and
prove it, we introduce some more notation. For a, r E SEQ, let (J Ar be the
result of concatenating r onto the end of ,,-thus (2,8,2) A (4, 1,9,3) =
(2,8,2,4,1,9,3). Next, for a, tESEQ we write "o S t," if" is an initial
segment of r, and "er c; t," if a is a proper initial segment of t-thus
(8, 8, 5) c; (8,8,5,3,9).
Finally, let finite sequences aO. a 1, (T2, . . . • be given such that (1) for every i,
j E Neither a' S;; a! or a! S;; a' and (2)forevery n E N, thereis mEN such that
Ih("m) ~ n. Then there is a unique text t such that for all n EN, 0'" = 1;h'U"';
this text is denoted: Unrr",
PROPOSITION 2.IA (Blum and Blum 1975) Let <pEff identify LE RE. Then
there is "E SEQ such that (i) rng(a) s L. (ii) WO'U) = L, and (iii) for all
r ESEQ, if rng(r) S L, then <p(" A r) = <p(a).
ProoJ(We follow Blum and Blum.) Assume that the proposition is false;
that is, that there is no "E SEQ satisfying (i), (ii), and (iii). This implies the
following condition:
(*) For every XESEQ such that rng(x) S Land WO'X) = L, there is
some r ESEQ such that rngtr) S Land <p(X A r] #- <p(x).
We show that (*) implies the existence of a text t for L which <p does not
identify, contrary to the hypothesis that <p identifies L. Let s = so, S" S2,""
be a text for L. We constructt in stages 0,1,2, ... , at each stage n specifying a
sequence a" which is in t.
Thus proposition 2.1A can be put this way: if cp E ff identifies LE RE, then
there is a locking sequence for cp and L.
As a corollary to the proof of proposition 2.IA, we have the following.
COROLLARY 2.lA Let cpEff identify LERE. Let uESEQ be such that
rng(a) c::; L. Then there is r E SEQ such that a
A 1" is a locking sequence for <p
and L.
(**) For every X ;;2 a, X E SEQ such that rng(x) S Land Wrpw = L, there is
some r E SEQ such that rng(r) c::; Land cp(x A r] -# cp(x).
Note that proposition 2.lA does not characterize cp's behavior on ele-
ments drawn from I. In particular, if r E SEQ is such that rng(r) i LE RE,
then even if 0" E SEQ is a locking sequence for cp Eff and L, <p(a A r) may well
differ from <p(0").
Central Theorems on Identification 27
Example 2.IA
Exercises
2.IA Let (T be a locking sequence for cp E!F and LE RE. Let r e SEQ be such that
rng(r) ~ L. Show that a 1\ or is a locking sequence for cp and L. Distinguish this result
from corollary 2.1A.
2.1B Refute the converse to proposition 2.IA. In other words, exhibit CPE:¥,
LE RE, and a E SEQ such that a is a locking sequence for qJ and L, but cp does not
identify L.
2.lC Let cP E:¥ identify LE RE. Let t be a text for L. t is called a locking text for cp
and Ljust in case there exists nEN such that til is a locking sequence for cP and L.
Provide a counterexample to the following conjecture: If qJE!F identifies LERE,
then every text for L is a locking text for cP and L.
Proposition 2.IA may now be used to show that certain simple collections
of languages are unidentifiable.
PROPOSITION 2.2A
Proof
i. Suppose for a contradiction that 'fJ E.? identifies RE n" U {N}, and let a be
a locking sequence for 'fJ and N. Note that rng(a) E RE n". Clearly there is a
text t for rngtc) such that I;h(a) = a. But then 'fJ does not identify rng(a) since
'fJ converges on t to an index for N.
ii. Again, suppose that a is a locking sequence for 'fJ E.? and N, where 'fJ
identifies !I? U {N}. Choose x ¢ rngte). Then, on any texlt for N - {x} such
that I 1h(a ) = a, 'fJ converges to an index for N and not one for N - {x}, 0
Exercises
2.2A (Gold 1967) Let L be an arbitrary infinite language. Show that REno U {L} is
not identifiable.
2.28 Let!f ~ RE be such that for every O'ESEQ there is LE2' such that
rng(u) ~ Land L =1= N. Show that.seU {N} is not identifiable. (This abstracts the
content of proposition 2.2A)
2.2C
a. Let ioEN be given. Define 2! = {N - DID ~ N has exactly i o members}. Show
that :£ is identifiable.
b.Letio,joENbesuchthatio :;i:jo.Define.se = {N - DID s:; Nhaseitherexactlyi o
members or exactly io members}. Prove that.se is not identifiable.
2.2D Exhibit cp, t/J E.:F such that .2"'(cp) U !£(t/J) is not identifiable. (For notation. see
exercise 1.4.3K.) This shows that the identifiable subsets of RE are not closed under
umon.
2.2E
a. Let .Ie s; RE be an identifiable collection of infinite languages. Show that there is
some infinite L¢!R such that .Ie U {L} is identifiable. (Hint: First use proposition
2.IA to argue that if LoE.Ie, then there is an xoEL o such that if L = L o - {x o}.
then L is not a member of ft'. Next define a function i/J E S' that identifies ft' U {L)
by modifying the output of a function cp E:IF that identifies .:e.)
b. .:e f; RE is called saturated just in case .Ie is identifiable and no proper superset
of 2' is identifiable. Prove: 2' s; RE is saturated if and only if 2' = RE f i n •
*2.2F :/ s;:IF is said to team identify!R ~ REjust in case for every LE!f there is
cp E:/ such that t:p identifies L. Show that no finite subset of:IF team identifies RE.
(See also exercise 6.2.11.)
Central Theorems on Identification 29
That is, finite variants dilTer by only finitely many members. Thus
E U {3, 5, 7} and E - {2,4, 6, 8} are finite variants, where E is the set of even
numbers. Note that any pair of finite languages are finite variants.
PROPOSITION 2.3A (Wiehagen 1978) There is f£ <:: RE such that (i) for
every LE RE there is L' E f£ such that Land L' are finite variants and (ii) f£
is identifiable.
LEMMA 2.3A (Recursion Theorem) Let total f E § ' " be given. Then there
exists nE N such that f.Pn = f.Pf(IlJ (and so Hj(n) = ~).
Proof See Rogers (1967, sec. 11.2, theorem I). 0
DEFINITION 2.3B
LEMMA 2.3B For every LE RE there is L' E RE'd such that Land L' are
finite variants.
Proof Fix LE RE. Define a recursive function f by the condition that for
all nEN, !-Ij,,) = (LU {n})n {n,n + I,n + 2, ... }. That such an f exists is a
consequence ofthe definition of acceptable indexing definition 1.2IA(ii). To
see this, if L= ltio' there isjotN such that for all n, XEN:
'!',o(X), if x > n,
'!'jo«n,x» = I, if x = n,
{
i, if x < n.
30 Identification
Exercises
2.3A Show that for no Le RE does REsd include every finite variant of L.
2.38 Specify 2' ~ REsuchthat (a) forall L, L' E 2', if L '" L', thenLand L' are not
finite variants, and (b)!L' is not identifiable.
The next proposition provides a necessary and sufficient condition for the
identifiability of a collection of languages.
PROPOSITION 2.4A (Angluin 1980) .:£ s; RE is identifiable if and only iffor
all LE.:£ there is a finite subset D of Lsuch that for all L' E.:£, if D s; L', then
L'¢L.
Proof First suppose that .:£ s; RE is identifiable, and let q> Eff witness
this. By proposition 2.IA, for each LE.:£ choose a locking sequence uLfor q>
and L. Since for each LE 2, rng(CTL) is a finite subset of L, it suffices to prove
that for all L' E.:£, ifrng(uL ) S; L', then L' ¢ L. Suppose otherwisefor some
L, L' E.:£, and let t be a text for L' such that 1,"'" = uL . Then q> converges on
t to L # L' = rng(t). Thus q> fails to identify L', contradicting our choice
of tp,
For the other direction, suppose that for every LE.:£ there is a finite
DL S; L such that DL S; L' and L' E.:£ implies L' ¢ L. We define f Eff as
follows. For all uESEQ,
Central Theorems on Identification 31
To see thatf identifies fE, fix LE 2', and let t be a text for L. Let i be the least
index for L. Then there is an n E N such that
1. rng(tn) :2 Du
2. if j < i, J1.j E se, and L ¢ Wj, then rng(tn ) ¢ H:J.
We claim that j(tm ) = i for all m :::>.: n. By 1 and the fact that t is a text for
L, j will conjecture ion t m unless there is j < i such that Wj = L' E 2' and
DL' s; rng(Tm) c: L'. If rng(tm) 2 DL · , then L 2 DL , so by the condition on
DL" L ¢ L'. But then by 2, rng(tm ) ¢ L'. Thus on t m , j will not conjecture
j for any j < i. D
Exercises
2.4A Specify a collection of finite sets meeting the conditions of proposition 2.4A
with respect to
a. REnno
b. {N - {X}IXEN}.
2.4B
a. Use proposition 2.4A to provide alternative proofs of propositions 1.4.3A, 1.4.3B,
and 1.4.3C.
b. Use proposition 2.4A to provide an alternative proof of proposition 2.2A.
Example 2.5A
a. Let L be the finite language {2,4,6}. Then SeLl is the finite, single-valued language
{(2,0>, (4,0), (6,0)).
b. SiN) is the set of numbers (x,y> such that y ~ 0. Note that S(N) is total. whereas
for all other LE RE, SiLl is not total.
Proof Given <1ESEQ, say <1 = (xo, ... ,x.), define S(<1) = «xo,O), ... ,
<x.,O». Similarly, if <1 ~ «xo,Yo), ... ,<x.,y.»), define P(<1) = (xo, ... ,x.).
Let g, hE.? be such that for all i E N, w,,'i)
= S(W,) and w,,(i) = P(W,).
Now suppose that.fl' ,:; RE is identified by <p E.? Let'" E.? be such that
for all <1 ESEQ,
"'(<1) = g(<p(P(<1))).
It is clear that'" identifies S(.fl').
Similarly, if '" E.? identifies S(.fl'), let <pE.? be such that for all <1ESEQ,
<p(<1) ~ h("'(S(<1))).
Exercises
functions can be interpreted not as r.e. indexes but as codes for such finite
arrays, since finite arrays of the sort en visioned are readily coded as single
natural numbers. On the other hand, we might simply choose as the child's
"official" conjecture at a given moment the grammar assigned highest
subjective probability at that moment
Consider next children growing up in multilingual environments. Such
children simultaneously master more than one language and hence convert
their current linguistic input into more than one, noncompeting gram-
matical hypothesis. To represent this situation, we must assume that inputs
from different languages are segregated by the child prior to grammatical
analysis (perhaps by superficial characteristics of the wave form or the
speaker). Linguistic development may then be conceived as the simulta-
neous application of the same learning function to texts for different
languages.
Clearly the general framework of learning theory can be adapted to a
wide variety of em pirical demands of the kind just considered. Consequen t-
Iy in the sequel we shall not pause to refine our models in these directions;
specifically, we shall continue to treat conjectures straightforwardly as
(single) r.e. indexes.
considered the maturational successor to qJE:T, and let l{! begin its opera-
tion at the nth moment of childhood. Then the child may be thought of as
implementing the single function () E:T such that for all <1 E SEQ,
3.2.4 Idealization
Texts are infinitely long, and convergence takes forever. These features of
identification will be generalized to all the paradigms discussed in this
book. However, language acquisition is a finite affair, so learning theory (at
least as developed here) might seem from the outset to have little bearing on
linguistic development and comparative grammar.
Two replies to this objection may be considered. First, although conver-
gence is an infinite process, the onset of convergence occurs only finitely far
into an identified text. What is termed "language acquisition" may be taken
to be the acquisition of a grammar that is accurate and stable in the face of
new inputs from the linguistic environment; such a state is reached at the
onset of convergence not at the end. Moreover, although it is true that
identification places no bound on the time to convergence, we shall later
consider paradigms that do begin to approximate the constraints on time
and space under which the acquisition of natural language actually takes
place. Further development of the theory in this direction may be possible
as more information about children becomes available.
This first reply notwithstanding, convergence involves grammatical sta-
bility over infinitely many inputs, and such ideal behavior may seem
removed from the reality of linguistic development. We therefore reply,
second, that learning theory is best interpreted as relevant to the design of a
Learning Theory and Natural Language 41
Thus [Y] is the family of all collections Ie of languages such that some
learning function in the strategy 51' identifies 2. [YJ.vt is just
[.7] n .9'(RE.vt), that is, the family ofall collections 2 of total, single-valued
languages such that some learning function in the strategy Y identifies !E.
Example 4.1A
In this chapter we consider the inclusion relations between [.9'] and [.9"]
as g and g' vary over learning strategies. Informally we say that g
restricts g' just in case [g n g'] c [g']. If [g] c [.?], then g is said to
be restrictive. Similar terminology applies to [g).vl' One last notational
convention will be helpful.
DEFINITION 4.1C Let P be a property oflearning functions. Then the set
{tp E.? IP is true of qJ} is denoted: fF p.
Thus the set of recursive learning functions is denoted "fFr<cur>;vc," which we
will continue to write as ".?rcc."
All the strategies to be examined may be viewed as constraints of one
kind or another on the behavior of learning functions. Five kinds of
constraints are considered, corresponding to the five sections that follow.
Before turning to these constraints, we conclude this section with a general
fact about strategies,
PROPOSITION 4.1A Let g be a denumerable subset of fF, Then [g) c
[.?].
Exercises
4.2.1 Computability
°
initial segment of I constructed by the end of stage s, 0', is equal to <0, xo),
< 1,XI),"" <n, x.) for some n.(We will also have that each x, is equal to or
1.)We rely on the following claim.
Strategies 49
Claim Given a = (O,xo), (l,xl)" .. , (n,x n >. there are numbersj and k
such that if r = a A (n + 1,0), ... , (n + I, 0) and r' = t A (n + j + 1, I),
... , (n + j + k, 1), then q>(r) # q>(r').
Stage 0 US = <0,0).
Staae S + 1 Given as, let j and k be as in the claim using as for a, Define
o'" 1 to be the resultant r',
It is clear that t = Us
as is a text for some LE RE,yt. However, q> does not
converge on t, since q> changes its value at least once for each SE N. D
Exercises
4.2.1 A
a. Prove that {KU {X}IXEK}E[.:FfOC].
b. Let L ERE be recursive. Prove that {L U Dr D s: Nand D finite} E[.:F'OC].
c. Prove that {N} U {DI D finite and D s: K} U{DID finite and D s: K} e [.:F'ee].
d. Prove that {N - {x} Ix E K} U {N - {x, y}lx # y and x, y E !5} E[.:Free].
e. Prove that {N - {x} [x E K} U {N - {x,Y}lx # y and x, yE K} E[.:F'ee].
Compare exercise 2.2C.
4.2.1 B For 2', 2" s: RE, define 2' x 2" as in exercise 1.4.3F. Prove that if
2' E[.:F'ec] and 2" E [.:F,ee], then 2' x 2" E[.:F'oc].
*4.2.1 C Prove: Let 2' E[.:F,ec],VI' Then, there is qJ e §i,ee such that (a) qJ identifies
2', and (b)for all Le 2', there is i E N such that for all texts t for L, qJconvergeson t to
i. (Hint: Fix qJ' e.:F tee which identifies 2'. Define qJ e.:F'OC which uses qJ' to compute
its guesses. qJ rearranges the incoming text and feeds the rearranged text to qJ'.)
Compare section 4.6.3.
50 Identification Generalized
4.2.1D Let 2'E[.9"T«]S'l be given. Show that there is 2"E[.9"roc]S.l such that
.sf c: 2". (Hint: Let (jJ E.9"roc identify 2! <;:: REs". Use the proof of proposition
4.2.1A to construct L¢!t' and (jJE.9"rec which identifies 2!U{L}.) Compare this
result to exercise 2.2E.
4.2.1F For Y s 9", let [Y],.c = [Y] n &'(RErec ). Prove that [.9"'OC]... c: [.9"] rec '
(Hint: Use corollary 4.2.1A and exercise 1.2.2B.)
4.2.1H Let RE••, = {{O, 1,2, ... , n}ln EN}. REs., thus consists of the initial
segments of N. Prove:
DEFINITION 4.2.2A (Blum 1967) A listing <1>0, <1>1' ••• , of partial recursive
functions is called a computational complexity measure (relative to our fixed
acceptable indexing of /7rec) just in case it satisfies the following two
conditions:
i. For all i, x EN, c,oi(xll if and only if <1>; (xll.
ii. The set {<i,x,Y)I<1>,(x);S; y} is recursive.
To exemplify this definition, suppose that :Free is indexed by associated
Turing machines (see section 1.2.1). Then <1>; may be thought of as the
function that counts the steps required in running the ith Turing machine;
specifically, for i, x, YEN. <IJ,(x} = y just in case the ith Turing machine halts
in exactly y steps when started with input x. Condition i of the definition
Strategies 51
requires that <I>;(x) be undefined just in case the ith Turing machine never
halts on x. Condition ii requires that it be possible to determine effectively
whether the ith Turing machine halts on x within y steps. Both require-
ments are satisfied by the suggested interpretation of <l>j. Moreover it
appears that any reasonable measure of the resources required for a com-
putation must also conform to these conditions.
As with acceptable indexings, none of our results depend on the choice of
computational complexity measure. Indeed, any two computational com-
plexity measures can be shown, in a satisfying sense, to yield similar
estimates of the resources required for a computation (see Machtey and
Young 1978, theorem 5.2.4). Let a fixed computational complexity measure
now be selected; reference to the functions <1>0' <1>" •.• , should henceforth be
understood accordingly.
These preliminaries allow us to define the following class of strategies.
DEFINITION 4.2.2B Let hE !!Frec be total. IjJ E!!Frec is said to run in h-time
just in case IjJ is total and there is iEN such that (i) <Pi = 1jJ, and (ii)
<I>,(x) " h(x) for all but finitely many x E N. The subset of :!F''' that runs in
h-time is denoted !!Fh-time.
Note that for any total hE!!F rec, !!Fh-time consists exclusively of total recur-
sive functions.
Intuitively a learning function in !!Fh-time can be programmed to respond
to finite sequences" within h(,,) steps of operation (recall from section 1.3.4
that "rr" in "h("l" denotes the number that codes zr], The strategy ffh'''m,
corresponds to the hypothesis that children deploy limited resources in
formulating grammars on the basis of finite corpora. The limitation is given
by h.
Does y-h-time restrict !!Free regardless of the choice of total recursive
function h? The following result suggests an affirmative answer.
LEMMA 4.2.2A (Blum 1967a) For every total hEff'" there is recursive
L E RE such that no characteristic function for L runs in h-time.
PROPOSITION 4.2.2A There is total h e y;rec such that [!!Fh-time] = [!!F rec].
52 Identification Generalized
LEMMA 4.2.2B There is total f E.:F'" such that for all lEN (i) 'Pf(i) is total
recursive, and (ii) for all L ERE, if 'P; identifies L, then 'Pf(i) identifies L.
Proof of the lemma Given i, we would like to define 'Pf(O so that 'Pf(O
identifies at least as many languages as 'P, but 'Pf(i) is total. Thus we would
like 'Pf(,)(a), to simulate 'P;(a) but not to wait forever if 'P,(a) doesn't con-
verge. Therefore on input a we will only allow 'Pf(i) to wait lh(a) many steps
for 'P, to converge. Now 'P,(a) may not converge in Ih(a) many steps for any a
but, if 'P,(a) converges, there is a k such that 'P,(a) converges in k steps. Thus,
in defining 'Pf(o(a), we will allow the simulation of 'P; to "fall back on the
text", that is,to compute only 'P,(8) for some initial segment 8 of a. Precisely,
define
where 8 is the longest initial segment of a such that
<1>,(8) :0; lh(a) if such exists,
otherwise.
<Pf(!) is a total recursive function for every i. The condition defining ucan
be checked recursively, since we have bounded the waiting time by lh(a).
To see that 'Pm identifies any language L that 'P, identifies, let t be a text
for such an L. Then there is an n E N and an index j for L such that for all
m :2: n, 'P,(tm ) = j. Let s = <I>,(t"). Then by the definition of 'Pm, if m > s, n,
'Pf;i)(tm ) = 'P,(t,) for some k > n. Thus 'Pf(') converges on t toj. 0
Exercise
Why study strategies that are not subsets of ff"'? For those convinced that
human intellectual capacities are computer simulable, nonrecursive learn-
ing functions might seem to be of scant empirical interest. Many of the
strategies we consider are in fact subsets of g;rec.
Nonrecursive learning functions will continue, however, to figure promi-
nently in our discussion. The reason for this is not simply the lack of
persuasive argumentation in favor of the view that human mentality is
machine simulabIe. More important, consideration of non recursive learn-
ing functions often clarifies the respective roles of computational and
information-theoretic factors in nonlearnability phenomena. To see what
is at issue, compare the collections 2' ~ {N} U REf;" and 2" ~ {K U
{X}IXEN}. By proposition 2.2A(i) and lemma 4.2.1B, respectively, no 'PE
g;rec identifies either collection. However, the reasons for the unidentifiabil-
ity differ in the two cases. On the one hand,.:£' presents a recursive learning
function with an insurmountable computational problem, whereas the
computational structure of !£ is trivial. On the other hand,!£ presents the
learner with an insurmountable informational problem-that is, no a E SEQ
allows the finite and infinite cases to be distinguished (cf. proposition 2.4A).
In contrast, no such informational problem exists for 2"; the available
information simply cannot be put to use by a recursive learning function.
The results to be presented concerning nonrecursive learning functions
may all be interpreted from this information-theoretic point of view.
Let 2' <;; RE be identifiable, and let fJESEQ and iEN be given. From
exercise 1.5.1 A we see that some 'P E ff such that 'P(fJ) = i identifies 2'. Put
differently, from the premise that cp E g; identifies!f! s;;; RE, no information
may be deduced about 'P(fJ) for any fJE SEQ, except that 'P(fJ)L if a is drawn
from a language in !f!. In this section we consider the effects on identifi-
cation of constraining in various ways the learner's potential response to
evidential states.
4.3.1 Totality
The most elementary constraint on a conjecture is that it exist. The corre-
sponding strategy is the set of total learning functions, denoted jF"total. From
part a of exercise 1.4.3G we have proposition 4.3.IA.
54 Identification Generalized
4.3.2 Nontriviality
DEFINITION 4.3.2A cp E'?" is called nontrivial just in case for all (J E SEQ,
W'l'(I1) is infinite.
Thus the strategy of non triviality contains just those cp E'?" such that cp
never conjectures an index for a finite language. Note that nontrivial
learners are total. The learning function g defined in the proof of propo-
sition 1.4.3B is nontrivial.
Obviously nontriviality is restrictive: finite languages cannot be identi-
fied without conjecturing indexes for them. Of more interest is the relation
of nontriviality to the identification of infinite languages. The next propo-
sition shows that non triviality imposes limits on the recursive learning
functions in this respect; that is, some collections of infinite languages are
identifiable by recursive learning function but not by nontrivial, recursive
learning function.
Thus.f£' <;; RE is r.e. indexable just in case there is an r.e. set S such that for
all L ERE, L E 2' if and only if L = WI; for some i E S (S is not required to
contain every index for L).
Strategies 55
Proof of the lemma Let S be an r.e. set of indexes for infinite sets. Let eo.
e l , ... , be a recursive enumeration of S. We show how to enumerate an
infinite r.e. set A such that no index for A is in S. We enumerate A in stages.
Now given the claim, we complete the proof of the proposition as follows.
Suppose for a contradiction that lp identifies 2. Let g(S) = {g(i)liES}.
Since lp is nontrivial, property I of 9 implies that g(S) contains only indexes
for infinite sets. Since lp identifies !E, and since for every infinite rt;, L; E 2,
for each such rt; there is a j E S such that H'i = L,. Then by (2) of the claim,
g(j) is an index for Jti. Thus g(S) is an r.e. set containing indexes for all and
only the infinite r.e. sets contradicting lemma 4.3.2A. 0
Exercises
4.3:2A Let 2' be a collection of infinite languages. Prove that 2' is identifiable if
and only if some nontrivial qJ E:F identifies !E. Compare this result to proposition
4.3.2A.
4.3.28 Let fI' ~ :F be such that some .P E [9"] is infinite. Show that not every
2" E [9"] is r.e, indexable.
*4.3.2C Let Sf be as defined in the proof of proposition 4.3.2A. The function f 0 h
defined therein is such that 2' c: 2'(f 0 h). Show that there is cp E:F'C< such that
.P = 2'(q».
*4.3.2D q) E /Y is called nonexcessive just in case for all (J E SEQ, W'P(al # N. Prove:
For all It' ~ RE, if N ¢ 2', then 2' E [!y Tec n /ynonmese;."] if and only if 2' E [!y r • c].
4.3.2E (John Canny) cp E:F is said to be weakly nontrivialjust in case for all infinite
LE 2'(cp). Wq>{In) is infinite for all n E N and all texts t for L. Nontriviality implies
weak nontriviality. Show that for some collection 2 ~ RE of infinite languages,
.If E [~f'ec] _ [~ref; n ~wcakly nontrivial].
4.3.3 Consistency
That is, the conjectures ofa consistent learner always generate the data seen
so far. Note that consistent learning functions are total.
Example 4.3.3A
There are certainly 2 <::: RE such that(l) 2 E [ff"'J, and (2) 2 includes
nonrecursive languages. One such collection is {K}! Hence proposition
4.3.3A yields the following corollary.
COROLLARY 4.3.3A [:yr~c n:yconsistcnt] C [:yr~c].
LEMMA 4.3.3A Let h(j, k) be a total recursive function, and let functions
fj Effrec be defined by fj(k) = h(j, k) for all k. Then there is a recursive set S
such that fj is not the characteristic function of S for any j.
Recall the definition of r.e. index able (definition 4.3.2B). Although the
recursive sets are r.e. indexable (exercise 4.3.30), lemma 4.3.3A says that the
recursive sets are not r.e, indexable as recursive sets. In other words, there is
no r.e. set of indexes of characteristic functions containing at least one index
for a characteristic function of each recursive set.
Exercises
4.3.3A qJ E!F is said to be conditionally consistent just in case for all 0' E SEQ, if
ep(uH, then rng(u) ~ »;'(q).
a. Refute the following variant of proposition 4.3.3A: Let conditionally consistent
ep E :Fro< identify 2' ~ RE. Then 2' £ RE rec.
b. Prove the following variant of corollary 4.3.3A: [ff rec n :Fco.di';o'BlIy co.'i".,,]
C r:F'""l.
c. Prove the following variant of proposition 4.3.3B: there is 2' £ RE re• such that
2' E [:F'OC] - [ffre. n :Fco.dil;o.allyc'.'iste' l]. (Hint: Add N to the collection 2' de-
fined in the proof of proposition 4.3.3B.)
4.3.3B Prove proposition 4.3.3C using the proof of proposition 4.3.3B as a model.
4.3.3C Let ff' E [:F'.' n ff· ons;slen']'VI be given. Show that for any L E RE sv"
ff' U'{ L } E [ffroc n ffcon,;".n'],v"
*4.3.3D Show that RE r •• is r.e. indexable. (Hint: See Rogers 1967, exercise 5-6.
p.73.)
4.3.3E (Ehud Shapiro 1981) Let ff' £ RE and total h e ffteC be given, and suppose
that 2' E [:F h'lim• n ffcon,j,.. ntJ. Show that there is a total 9 E :Free such that for all
L E 2', some characteristic function for L runs in g-time.
Suppose that qJE!F is defined on O"ESEQ. Call tp(a) a "wild guess" (with
respect to tp) if qJ does not identify Wq>(a)' In this section we consider learning
functions that do not make wild guesses.
DEFINITION 4.3.4A <p E!F is called prudent just in case for all aE SEQ, if
tp(a)! then tp identifies Wq>(al-
Exercises
4.3.4A Show that the function f defined in the proof of proposition 2.3A is not
prudent.
Strategies 6t
4.3.4D Show that for every lpEy;reensz-prudent, 2(lp) is r.e. indexable. Conclude
that Si'ree n Si'prudent is r.e. bounded.
4.3.5 Accountability
PROPOSITION 4.3.5A There is 2' ,; RE such that (i) every L E 2' is infinite,
and (ii) !i' E [~recJ _[ff rec n ~accountable].
Thus the conjectures ofa Popperian learning function are limited to indexes
for total, single-valued languages. The function h in the proof of proposition
1.4.3C is Popperian.
An index for a member S of RE s vt can be mechanically converted into
an index for the characteristic function of S (see part b of exercise 1.2.2B).
As a consequence it is easy to test the accuracy of such an index against
the data provided by a finite sequence. Such testability motivates the ter-
minology "Popperian" since Popper (e.g., 1972) has long insisted on this
aspect of scientific practice (for discussion, see Case and Ngo-Manguelle
1979).
Plainly, in the context of RE s• I ' ji>Popperi.n is not restrictive. In contrast,
since :FPopperian C :FaccGuntable, lemma 4.3.5A implies the following.
Exercises
*4.3.5A L ERE is called total just in case for all x E N there is YEN such that
(x, y) E L (compare definition 1.2.2D).Note that a total language need not represent
a function (since it need not be single valued). cp E § is called total minded just in case
for all U E SEQ, if cp(u)1 then W'I'(C) is total. Prove: There is 2' ~ RE such that (a)
every LE:f' is total, and (b) 2'e[§·ecJ - [§,eCn§'o,.I.m;ndcdJ. (Hint: Rely on
Rogers 1967, theorem 5-XVI: the single-valuedness theorem.)
4.3.58 (Putnam 1975) Supply a short proof that RE s" ¢ [§r<c n §PopperianJ.
4.3.5C Define: L E RE c ha • just in case L E RE,y! and for each n E L either 1I: 2 (n) = 0
or 1I: 2 (n) = L (RE ch • thus consists of the sets representing recursive characteristic
functions.) Prove the following strengthening of proposition 4.5.3A: there is
2' s; RE c h•• such that
*4.3.6 Simplicity
Let L ERE, and let S = {x] wx = L}, the set of indexes for L. By lemma
r.z.m, S is infinite. Intuitively the indexes in S correspond to grammars of
increasing size and complexity. It is a plausible hypothesis that children do
not conjecture grammars that are arbitrarily more complex than simpler
alternatives for the same language (in view of the space requirements for
storing complex grammars). In this subsection we consider learning func-
tions that are limited to simple conjectures.
To begin, the notion of grammatical complexity must be precisely ren-
dered. For this purpose we identify the complexity of a grammar with its
size, and we formalize the notion of size as follows.
i. For all i EN, there are only finitely many j EN such that m(j) = i.
ii. The set {<i,j)lfor all k ~j, m(k) -# i} is recursive.
i EN there are only finitely many Turing machines that can be specified
using precisely i symbols. Condition ii is satisfied, since there exists an
effective procedure for finding, given any i E N, the largest index of a Turing
machine of size i. For another example, the simplest size measure is given
by the identity function m(x) = x. Conditions i and ii of the definition are
easily seen to be satisfied. It would seem that any reasonable measure of
the size of a computational agent also conforms to these conditions.
As with our choice of computational complexity measure (section 4.2.2),
none of our results depend on the choice of size measure. Indeed, any two
such measures can be shown, in a satisfying sense, to yield similar estimates
of size (see Blum 1967a, sec. I). Let a fixed size measure m now be selected.
Reference to size should henceforth be interpreted accordingly.
Intuitively, for L ERE, "M(L)" denotes the size of the smallest Turing
machine for L. No index of size smaller than M(L) is an index for L.
Tn other words, i is I-simple just in case the size ofi is no more than "jof" the
size of the smallest possible grammar for Wi. Thus, if j(x) = 2x for all x EN,
then i is j-simple just in case no index for J.V; is less than half the size of i.
With these preliminaries in hand, we may now define strategies that limit
the complexity of a learner's conjectures.
DEFINITION 4.3.6D
Example 4.3.6A
a. Suppose that m is the size measure defined by m(x) = x for all x EN. Let total
h e /F«< be such that h(x) ;::: x. Then both function f of part a of example 1.3.48 and
function g of proposition 1.4.38 are h-simpleminded.
b. Irrespective of chosen size measure, the function g of part b of example 1.3.48 is
simpleminded.
Provided that total hE:Y TeC is such that h(x) ~ x for all x E N,
§h-simpleminded is not restrictive. However, for any total h e §,ec,
remarkable result.
LEMMA 4.3.6A (Blum 1967a) Let L ERE be infinite, and let total g E§rec
be given. Then there is i E L such that m(i) > g(M(1t;».
The lemma asserts that every infinite r.e. set of indexes contains at least one
index that is g-bigger than necessary, for any choice of total 9 E :y r ec.
PROPOSITION 4.3.6A Let!l' E [:y r ec n :ysimplem;nded]. Then!i' contains only
finitely many languages.
Exercises
4.3.6A Prove the following strengthening of proposinon 4.3.6A: [/Fre< n
is the class of all finite collections of languages.
jO,impleminded]
66 Identification Generalized
a. CP<p(a) = <P;,
b. f(m(i)) < m(<p(u)),
c. for all but finitely many (x, s) e N, if <I>'la)(x) ~ s, then <I>,(x) ~ h«x,s».
(In other words, the longer program cp(a) is not much faster than the shorter
program i). (Hint: Use theorem 2 of Blum 1967.) This result extends proposition
4.3.6A.
It seems evident that children have limited memory for the sentences
presented to them. Once processed, sentences are likely to be quickly
erased from the child's memory. Here we shall consider learning functions
that undergo similar information loss.
DEFINITION 4.4.IA Let a E SEQ be given.
i. The result of removing the last member of" is denoted: a>. !f Ih(u) = 0,
then a" = a.
ii. For n E N the result of removing all but the last n members of" is denoted:
,,-no If lhlo) < n, then ,,-n ~ a.
Example 4.4.1A
Proof Let S be a recursive set of indexes ofr.e. sets containing exactly one
index for each finite set and such that, given a finite set D, we can effectively
find e(D) E S such that e(D) is an index for D. The existence of such a set and
function e is an easy exercise.
Now define f Efpec by f(a) = e(rng(a» for all a E SEQ. Informally f
chooses a canonical index for the range of a. Now, if f(a-) = f(t-), then
rngto") = rngtr ") and if also a-I = t-l, rng(a) = rngtr) so that f(a) =
f( r), Th us f E 9' I-memory limited. 0
68 Identification Generalized
Proof Let 2:' consist of the language L = {(O,X>IXEN) along with, for
each jEN, the languages Lj = { ( O , x > l x E N } U { ( l , i > } and Lj~
{(O,x>lx '" j) U {(I,D). It is easy to see that 2:' E [3"']. (In fact 2:' E
[ff rec ]. ) But suppose that !I! E [ffmemOTY limited]. For instance. suppose that
some <p E yl-memor ylimited identifies z'. (The case where qJ E Yll-memorylimited is
similar.) Intuitively, when", first sees (I,D for somej, '" cannot remember
whether it saw (O,j> or not and so cannot distinguish between L j and Lj.
Formally let (J be a locking sequence for", and L. Let (J' = (J A (I,jo>
for some jo such that (O,jo>!,rng(J). Let (J" = (JA (O,jo> A (I,jo>. Now
"'«J') = "'(J"), since "'(J) = "'(J A (O,jo» and '" is I-memory limited.
But now let t 1 = (J' A (0,0> A (0, I> A . . . A (0, i> A ... , for all i '" jo, and let
t 2 = (J" A (0,0> A (0, I> A . . . A (0, i> A " ' , for i '" jo' t , is a text for Ljo'
and t 2 is a text for Li o' but ip converges on t 1 and ts to the very same
index because of memory limitation. Thus", cannot identify both Lj o and
L}o" D
LEMMA 4.4.1A
i. [;y; I-memory limited] = [:fjimemory limited].
ii. [ff rec n g-l-memory limited] = [g:rec n.,¥"memory limited].
The proof of this lemma turns on the following technical result (d. lemma
1.2.1B).
LEMMA 4.4.1B There is a recursive function p such that p is one to one and
for every x and y, lfJx = Q)p(x,yj'
A proof of this lemma may be found in Machtey and Young (1978). Such a
function p is called a padding function, for to produce p(x, y) from x, we take
the instructions for computing lfJx and "pad" them with extra instructions
to produce infinitely many distinct programs for computing the same
function.
Strategies 69
Now define ljJ(a) ~ p(<p(8), ao). (Intuitively we simulate <p on texts for which
n-memory limitation is of no advantage over l-rnemory limitation due to
the repetitions.) IjJ evidently identifies 2, since for any text t for L E 2, f is
also a text for L. To see that IjJ is I-memory limited, suppose that ljJ(a-) ~
ljJ(r-) and a-I ~ r l . Since ljJ(a-) ~ ljJ(r-), p(<p(8-),ao) = p(<p(r-),ro) so
t
ii. The transformation of tp to IjJ in the proof of (i)produces a recursive IjJ if <p
is recursive. 0
quence for (() and L. This implies that for every n e A. (()(O''' (0. n» = rp(O').
Thereforeforsomeme.4, rp(O''' (O,m» = rp(O') else ...tisrecursivelyenumer-
able, implying that A is recursive. Fixing such an m, let s be an enumeration
of L, and define two texts, t and t', by t = 0''' (I,m) "s and t' =
0' " (0, m) " (I, m) "s. By l-memory limitation and the property of m,
rp(O''' (O,m) " (I,m» = rp(O''' (l.m» and so again by l-memory limita-
tion, (()(t~+l) = rp(t,.) for all n ~ Ih(O') + 1. But t' is a text for L~ and t for L~
and L II .;: L~. Thus rp does not identify both L~ and L~. 0
DEFINITION 4.4.1C
(Note that equality in the first clause means that both computations con-
verge and are equal.) f is evidently total and recursive.
Strategies 71
Fix R recursive, and suppose that CPr E y-h-time n§n-memory limited is such
that C{Ji' identifies .PR . Let 17' be a locking sequence for Rand qJj' such that,
in addition, CfJj',h(O',)(a') converges.
Claim For all but finitely many x, x E R if and only if f(i', ,l,x) ~ I.
Exercises
a. {N - {X}IXE N).
b. RE'd (see definition 2.3B).
c. {KU (X)IXEK).
4.4.1 B Let n E N be given, and let cp E g;n-memor y limited identify L ERE. Must there be
a locking sequence (J for qJ and L such that Ih(a) ::;; n?
4.4.1C Prove that [ffmemory limited] n [§h-timc] ~ [ffmemory limited nffh'limcJ for all
total h e !Free.
*4.4.1D Let a function F: SEQ -+ SEQ be given. cp e:F is called F-biased just in
case for all aESEQ, if F(a) ~ F(r) and <p(a-) ~ <p(r-), then <pta) = <p(r). To illus-
trate, let H : SEQ ~ SEQ be such that for all a E SEQ, H(a) ~ a- 5.Then <p E!J'" is
H-biased if and only if cp is 5-memory limited.
a. For n E N. let Gn : SEQ _ SEQ be defined as follows. For all CTE SEQ, Gn(CT) is the
sequence that results from removing from (1 all numbers greater than n. Thus
G6(3, 7. 8. 2) = (3,2). Prove: Let n E N be given. If .!t' E [:FGn·biIlSed], then If is finite.
b. (Gisela Schafer) For neN.let IL:SEO-SEO be defined as follows. For
all a E SEQ. Hn(a) is the result of deleting all but the last n different elements of (1.
[Thus H)(8,9,4,6,6,2) = (4,6,2).] Prove that for all n ~ 1, [:Frecn:FHn'biased] =
[$I'rec n $I' I·memory limited].
4.4.1F Exhibit £' £; RE such that(a) for all L, L' E£', if L '1' L' then Land L' are
not finite variants, and (b)!E E [.9'] _ [ffmemory limited].
DEFiNITION 4.4.2A (Wexler and Culieover 1980, see. 2.2) 'I' E:Ii' is said to
be set driven just in case for all (J, r E SEQ, if mgt«) = rng(r), then 'I'«J) =
'I'(f).
Example 4.4.2A
a. The function f defined in part a of example 1.3.48 and the function 9 defined in the
proof of proposition 1.4.38 are set driven.
b. The function h defined in part c of example 1.4.2A is not set driven.
Let ff = {L), Ljlj EN}. It is easy to see that 2 E (fflec]. Suppose, how-
ever, that ff E [ffrec n :Fsel dJi.en], and suppose that qJ) is a set-driven
recursive function. Now if !p) identifies 2, qJ) identifies the text t =
(j, 0) 1\ (j, 1) 1\ ••• 1\ (j, n) 1\ . . . . Thus there must be an n E N and an
index i for L) such that qJkr),n) = i. In particular, there must be an nand s
such that qJ)(a),R) = i and Wi,s::::> rng(u),n). But then qJ) does not identify
rng(c-:") since on the following text t' = a':" A (j, n) A (j, n) A (j, n) 1\ . . . .
qJ) must conjecture J¥; in the limit since qJ) is set driven. Thus qJ) does not
identify ff. D
Exercises
4.4.2A Prove that [§,e! dri.en] = [§].
4.4.28 Let <pE§,eldll ve n identify RE n•. Show that for all uESEQ, U is a locking
sequence for qJ and rng(u).
4,4.2C Prove that if !f contains only infinite languages, then !f E [§"C] if and
only if 2' E [§r« n §.CI d';.enJ.
4.5.1 Conservatism
Example 4.5.1A
f (o-- ),
) least i such that L = It;
f(O' = {
and L ::2 rng(o) ::2 DL , if such exists and ltf(a-) ~ rng(u),
least index for rngur), otherwise.
By the first clause of the definition, f is conservative. Note further that for
all y E SEQ, ltf(y) ::2 rng(y) (f is consistent), so this fact together with the
first clause of the definition implies that f never returns to a conjectured
language once it abandons a conjecture of that language.
To show thatf identifies Y, suppose that LeY and t is a text for L. Iff(tn)
is an index for L for any n, then fein) = fCc",) for all m ~ n. Further there is
an n' such that DL ~ rng(tn , ) c::::; L. Thus f will adopt the conjecture of the
least index for L on t", for some m 2: n' unless there is an index i for a
language L' =1= L such that f converges on t to i. Suppose for a contradiction
that such an i exists. Then L' :;2 rng(t) = L, since Uf<rJ ::2 rng(y) for all y. Let
n be least such that f(tn) = i; f(tn) was defined by either the second or the
76 Identification Generalized
third clause in the definition of f. Iff was defined by the third clause, L' =
rng(r.) so that L = rngtr) 2 rng(r.) = L' 2 L; thus L = L', contradicting
the assumption that L -# 1.:. Suppose, on the other hand, that !Ci.) is defined
by the second clause of the definition of f so tha t DL , ~ rng(r.) ~ L'. Thus
L 2 Du which by the property of DL' implies L ¢ L'. But this contradicts
L' ::> rng(t) = L. 0
Exercises
4.5.lA Prove:
a. [ff] = [sz;consiSlent n§consc:rvali'i"e n 9 p 'I'p d c:n t J.
b. [Free n ~consistcnl] $. [Free n ji""conscrvllltjve].
C. [~TCIC n ~4;vnSer",alj""l!] $. [~re(; n ~Ii;;Qnsistenl].
4.5.18
a. Let q>E ji7<ons;sl.nl n .:FeDn' ervae i ve be given. Show that for all e s SEQ, 11 is a locking
sequence fore and W"'<U)'
b. Let ([1E gennm.a'ive identify text t. Show that there is no n E N such that
W",(I.) => rng(c). (Thus conservative learners never "overgeneralize" on languages
they identify.)
*4.5.1C Prove that [~l11em-Ory Limited] = [ymcmor y Iimited n .?consiSlent n~conse-rv8(iv~].
Strategies 77
4.5.2 Gradualism
Thus, if cp E.? is gradualist, then the effect of any single input on q/s latest
conjecture is bounded. An argument similar to that for lemma 4.2.2B
shows that gradualism is not restrictive.
PROPOSITION 4.5.2A [.?grodua1isIJ = [.?].
Proof We will give an informal argument to show that if cpE:!F identifies
2, there is a qlE.? such that !p' identifies 2, and for all O'ESEQ
{cp'(a-" n)ln E N} has size at most 3 by showing that sp' can be constructed
from cp so that it never changes its conjecture in response to a new input by
more- than 1. The argument is a fall-behind-on-the-text argument as in
lemma 4.2.28. What cp' does on a text t is to simulate tp, Whenever cp
changes its conjecture, say by n, tp' then uses the next n arguments of t to
change its conjecture by ones. If!p converges on t, so will cp', although !p' will
start converging much later on the text. 0
Since all the procedures invoked in the preceding proof can be carried out
mechanically, we have the following corollary.
COROLLARY 4.5.2A [:!Fre • n :!FgradUalist] = [:!Frcc].
The next proposition shows that gradualism restricts memory limitation.
PROPOSITION 4.5.2B [g;graduaiisl n :!Fmemory limited] C [§memory limited].
Exercise
4.5.2A cp E:F is said to be n-gradualist just in case for all 0' E SEQ,
I{<p(<r A X)IXEN)I,; n. Note that as a corollary to the proof of proposition
4.5.2A, [:F3-gradualist] = [:F] and that [y;3- lIradualisl n .JFrec] = [:F uc ].
Let n, mEN be given. Let!£ ~ RE be as defined in the proof of proposition 4.5.2B.
Prove: If <p E g;m.gradulllist n !F"-memory limited, then <p can identify no more than (2m)"+ 1
languages in 2.
Proof
i. Let L. = {xix <: n}, and let 2' = {L.lnEN}. 2' can certainly be identi-
fied; in fact there is a recursive function that identifies z". Suppose, however,
that <p is an enumerator with enumerating function f. Were <p to identity 2',
rng(f) must contain indexes for each L•. Thus there would be i < j such
that I-Ij'il ::> I-IjU) and f(i) and f(j) are the least indexes in rng(f) for I-Ij'i)
and I-Ij(j). But then <p must, on any text for I-Ij(j)' conjecture I-Ij'kl for some
k S; i and so does not identify I-Ij(j).
ii. The function h of proposition 1.4.3B that identifies REO', is an enumerator
with enumerating function f(x) = x. 0
Strategies 79
PROPOSITlON 4.5.3B (Gold 1967) Let!f <;:;: RE sv! be r.e, indexable. Then
!f E [ffTeC n ffenUmeTalOT]sYl'
Exercises
4.5.3A Prove that for some h e §'«. [§h-time n §consistent n §conse,vat;ve n
§pn... denl] 1=. [§enUmc:riIlor].
4.5.38 Total f E§ is called strict just in case j ¢ j implies l-lj(iJ ¢ HjU), for all i,
jEN. cpEyenUmerllO' is called strict just in case cp's enumerating function is strict.
Prove that [~st!'iC' enlimeralOr] == [ii""enumeralorJ
*4.5.4 Caution
DEFINITlON 4.5.4A r.p E:Y is called cautious just in case for all (T, r E SEQ,
Wtp(<1 ~ t) is not a proper subset of Wtp(<1)'
Thus a cautious learner never conjectures a language that will be "cut back"
to a smaller language by a later conjecture. Both the function f defined in
Example 1.3.4B (part a) and the function g defined in the proof of propo-
sition 1.4.3B are cautious.
Caution is an admirable policy. A text presents no information allowing
the learner to realize that it has overgeneralized; consequently the need to
cut back a conjectured language could result only from a prior miscalcu-
lation. These considerations suggest that caution is not restrictive.
PROPOSITlON 4.5.4A [ffCaUlious] = [ff].
80 Identification Generalized
Exercises
4.5.4A Prove that [ffree nffcon,crv.,;ve n g eautiou,] = [gree n geon,.,va,ive].
4.5.48 Prove that [,jO'" n ,jOeau'iou,] 'it [,jOree n ,jOeon..,va'ive].
(Hint: See the proof of proposition 4.5.1B.)
*4.5.5 Decisiveness
Let C{> E ff be a strict enumerator in the sense of exercise 4.5.3B. Then C{>
never returns to a conjectured language once abandoned. The next defi-
nition isolates those learning functions whose successive conjectures meet
this condition.
DEFINITION 4.5.5A C{> Eff is called decisive just in case for all (J E SEQ, if
Wip(<T-) '1= W<p(<T)'then there is no r E SEQ such that W<p(<T" r] = Wq>(<T-j'
Both the function defined in example 1.3.48 (part a) and the function g
defined in the proof of proposition 1.4.3B are decisive.
Like caution, decisiveness appears to be a sensible strategy. It is not
restricti ve.
Strategies 81
The next result shows that decisiveness does not restrict fF'" in the
context of RE5vt .
Xj ' x:=s; k,
'Ph".k'(X) = 'P,(x), if x> k, for all y :=s; k, 'P,(y) = x y , and O('P,[x]) ~ i,
{
diverges. otherwise.
Case 1. Suppose that n is not in the domain of 'l'h".'" Then "'(u A r) is not
an index for Cf\(u-) for any T, since the domain of CPlf(tT" r) ;::2 {O,l, 2, ... , n} for
all r by the first clause in the defintion of h(i, k).
Case 2. n is in the domain of ((Jh(i,k)' Then O(CPh(i,kj[n]) = i, and so, in
particular, a '" 'l'h"..,[n]. But '1'",. A,j extends o for all r ESEQ, again by the
first clause in the definition of h(i, k). Thus in this case also, "'(u A r) is not an
index for CfJli,(,,-j" 0
Exercises
4.5.5A <p E.9' is called weakly decisive just in case for all o E SEQ, if <p(u-) '" <p(u),
then thereis no! E SEQ such that <p(a A r] = <p(q-); that is, weaklydecisivelearning
functions never repeat a conjecture once abandoned. Prove that [yrec n
jilwcakl y decisive] = [~rec].
If <p E:F identifies 2' ERE, then 'I' must converge to some index for mg(t) on
every text t for L E!.e. <p mayor may not converge to the same index on
different texts for the same language in 2, and q> mayor may not converge
on texts for languages outside of !E. In this section we consider three
constraints on convergence that limit the freedom of learning functions in
these ways.
4.6.1 Reliability
A learner that occasionally converges to an incorrect language may be
termed "unreliable."
Strategies 83
Example 4.6.1A
a. The function f in example 1.3.4B (part a) is reliable, for f identifies every text for a
finite language and fails to converge on any text for an infinite language.
b. The function f defined in the proof of proposition 2.3A is not reliable, for if n is an
index for N. then f converges on the text n, n, n, ... , but fails to identify it.
That is, [~re. n ~reuable-svlJ.YI is closed under finite union (cf.exercise 2.20).
i. qJ E:F is said to be almost everywhere zero just in case qJ(x) = 0 for all but
finitely many x E N. The collection {L E REsvt IL represents a function that is
almost everywhere zero} is denoted: REaez.
ii. cp E ffree is called self-indexing just in case the smallest x E N such that
cp(x) = 1 is an index for tp. The collection {LEREsy,IL represents a self-
indexing function} is denoted: REs;.
i. REs; E [:F.eeJsVI'
ii. REaez E [ff,ee n ff,eLiabLe-.vtJovl'
iii. REs; U RE ae,. i [ff,eeJSYI'
Proof For i and ii, the obvious methods for identifying REs; and REaez
work.
For iii, suppose that I/J identifies REaez. We will define a recursive function
f by the recursion theorem such that f is self-indexing and if L represents f,
then I/J does not identify L. To apply the recursion theorem, we define a total
Strategies 85
<P'u,(i) = I.
If x> i, and <p"ilY) has been defined for all Y < x, define <p",,(x) as follows.
For every integer n define 0'" = «0,0), (1,0), ... , (i - 1,0), (i, I), (i + 1,
<p,,,,(i + 1), ... , (x - 1, <p,,,,(x - 1), (x,O), ..• , (x + n, 0». Enumerate
simultaneously W,.,(ao)l W,.,(crl), ••• , W,.,(l1") for increasing n until a pair (x +
M + 1, 0) appears in W.,.") for some M. Then define <p,,,,(x) = ... =
°
<P"i'(X + M) = and <p,,,,(x + M + 1) = I. Such an n will exist, since the
sequences 0'0, 0'1, 0'2, ••. , are initial segments of a text for a function in
REm. Thus <P'u, is total for every i.
Let i' be such that <p"") = <p,.. By the definition of <p'O')' <p,. = <P"i') is
self-indexing. But there are infinitely many x such that for the correspond-
ing M and a", i/J(a M ) is an index for a set that does not represent <Pi' since
<p,O')(x + M + 1) = <P..(x + M + 1) = 1, but (x + M + 1, 0) E W.,.",.
These aM are initial segments of the same text t for <p'" so i/J does not
converge on a text for <Pi" 0
Thus [.9'"''']." is not closed under finite union (cf. exercise 2.2D). The
following corollaries are immediate from the two preceding propositions.
COROLLARY 4.6.1B REsi ¢ [.'Free n .'Freliable-svt]svt.
Exercises
4.6.1A cp E!F is called weakly reliable just in case for all texts t for any L ERE. if cp
converges on t then cp identifies t. (Thus weakly reliable learning functions need not
be total). Prove the following strengthened version of proposition 4.6.1A: let
cP E !Fweakly reliable identify L E RE. Then L is finite.
4.6.1B(Blum and Blum 1975) Let total! E.9'"'" be given. Suppose that for all j EN,
In . E !Free n oz-reliable-svl
't"f(l) 0.7".
Show that U.!feN [(m.
't"f(l)
)] svt E [!F rec n !Freliable-svt] svr- This
result generalizes proposition 4.6.1B.
*4.6.1CcpE:F is called finite-difference reliable just in case for all texts t for any
L ERE, if tp converges on t, then cp converges to a finite variant of rng(t) (see
definition 2.3A). Thus cp E:F is finite-difference reliable just in case cp never con-
verges to a conjecture that is "infinitely wrong." Reliability is a special case of finite-
86 Identification Generalized
4.6.2 CoafJdellCe
A learner that converges on every text may be termed "confident."
DEFINITION 4.6.2A '" E Y is called confiden: just in case for all t E:T, '"
converges on t.
Example 4.6.2A
Children are confident learners if they eventually settle for some approxi-
mation to input, nonnatural languages.
PROPOSITION 4.6.2A [jFconfident] C [.9'].
(0, I). Given (1"-', let (1" be the shortest sequence of n's such that
<p(cr O A '" A (in) is an index for {O, 1•...• n}. Obviously <p does not converge
on (i0 A ••• A (in A •••• 0
The next proposition shows that confidence and :Free restrict each other.
First, we prove a lemma.
LEMMA 4.6.2A Let q> E yconfident be given. Then for every L ERE, there is
(1 ESEQ such that (i) rng(o) £; L, and (ii)for all r ESEQ such that rng(r) £; 1.,
'P«1 A r) = <p«1).
Strategies 87
Proof This is much like the proof of proposition 2.IA, the locking se-
quence lemma. If such a (J did not exist. we could construct a text t for L on
which <p does not converge. contradicting its confidence. 0
PROPOSITION 4.6.2B [ff TC C n ffconfidcnl] c: [~rcc] n [:¥"confidcnl].
Proof ..\f={KU{x)lxEK} is the needed collection. We have noted
before that ..\fE['?"'''] (see exercise 4.2.1A). The following defines
f E [ffconfidcnt] which identifies 2:
index for K, if rng(u) c: K,
(o) = {
f index for KU {x}, if x is the least element of rng(u) - K.
Suppose however that <p E ffrcc nffconfidcnl and identifies if. By lemma
4.6.2A there is a sequence (J such that mg«(J) c K and if rng(r) c; K, cp{u" r] =
'I'(u). But then, much as in the proof of lemma 4.2.1C, we now have a way
of enumerating K, for, x E K if and only if there is a sequence t such that
mg(r) c: K and 'I'(u A x A r) "" 'I'(u). 0
Exercises
4.6.2A Recall the definition of !t' x:1" from exercise 1.4.3F. Prove: Let
Sf e [ffconfidcntJ and!£' E [ffconfidcnt] be given. Then, !£ x se e [ffconfidcntJ
4.6.2B if s; RE is called a w.c. chain just in case if is well ordered by inclusion.
a. Exhibit an infinite. w.o. chain in [yrcc].
b. Prove: If !£ s; RE is an infinite w.o. chain, then!£ ¢ [ffconfidcnt].
c. qJ E ff is called conjecture bounded (or cb)just in case for every t E:T, {qJ(Tm)lm E N}
is finite. Thus cp E g;Cbjust in case no text leads cp to produce conjectures of arbitrary
size. Prove: Let 2 be an infinite w.o. chain, theil'7t'l"[ff cb].
d. Prove that rg;-confidcnt] = [§cb].
4.6.2C
Example 4.6.3A
a. Both the function! defined in example I.3.4B, part b, and the function g defined in
the proof of proposition 1A.3B are order independent.
b. The function f defined in the proof of proposition 2.3A is order independent.
c. The function f in the proof of proposition 4.4.1A is order independent.
PROPOSITION 4.6.3A (Blum and Blum 1975) [ff re e nffo rde, independent] =
[;Fr«].
a. rng(y) S rngtr),
b. a £ y,
c. lh(y) ,,; Ih(T),
f(a) = fly)·
a exists since Titself satisfies I and 2. Define 9 (T) ~ f(a). 9 is recursive since
only finitely many sequences need be checked to define g(T).
Claim If f identifies Land t is a text for L, then 9 converges on t to f(a),
where a is the least locking sequence for f and L.
Proof of claim Let a be the least locking sequence for f and L, and let n be
such that rng(a) S rng(I.) and a ,,; I•. Since a is a locking sequence for L
and t is a text for L, it is clear that for every m ~ n, a satisfies both I and 2 for
T = 1m • Thus for each m ::>: n, g(lm ) = f(a) unless there is a' < a such that a'
also satisfies 1 and 2 for T = I",. Since no such a' can be a locking sequence
for f and L, there must be a y such that y :2 a', rng(y) S L, and fly) '" f(a').
(Otherwise, either a' would be a locking sequence for f and L or f would
converge on some text for L to an index, f(a'), for a language other than
L) But then if m is such that rng(y) S 1m and y ,,; 1m , a' cannot satisfy I
and 2 with T = 1m • Thus, for almost all m, 9 cannot conjecture f(a') for any
a' < a. 0
COROLLARY 4.6.3A For every !I! E [:7""J, thereis a gE :7"" that identifies
!I! such that for all L ERE, 9 identifies L if and only if there is a locking
90 Identification Generalized
We may now return to lemmata 4.3.4A and 4.3.4B whose proofs were
deferred to this section.
Since g is order independent and identifies :e', f will converge on any text t
for L E:e' to s.for the least i such that s, is an index for L. Since every index
in S is an index for a language in:e' and f outputs only indexes from S, f is
prudent. 0
'*'
if rngte) w',a,'
if a is a locking sequence for g and w',a"
otherwise.
Since 9 does not identify N, h identifies all that 9 does together with all
initial segments of N (cf. exercise 4.2.1H). 0
Exercises
4.6.3A Show that [jOrec n yorder Independent n sP] = [g;ro. n sP] as sP varies over
the following strategies:
a. Nontriviality
b. Prudence
c. Consistency
d. Memory limitation
e. Confidence
*4.6~3B (Gisela Schafer) rp E y is said to be partly set driven just in case for all (J,
"tESEQ, if Ih(a) = Ih(r) and rng(a) = rngfr), then rp(a) = rp(r). Prove that
[§rcc n ~par'IY let driven] = [§Cc-C].
DEFINITION 4.7A
LEMMA4.7A
i. There are 2t{o many local strategies.
ii. There are 22t-:o many strategies that are not local.
Proof
i. There are t{o many functions in /F fin. Thus there are 2~o many subsets F of
:!F rin. The local strategies are in one-to-one correspondence with such
subsets.
94 ldentification Generalized
ii. There are 2N o many functions in:F so 2 zt{o many subsets of ff. Thus there
are 2 2 1':0 - 2N o = 2 2 No many nonlocal strategies. 0
i. Nontriviality
ii. Consistency
iii. l-mernory limitation
iv. Conservativism
Proof These are very easy. We will just say how to choose the F in the
definition oflocality.
i. F = !Ffin n§""ontrivial
ii. F ~ {rpldomain of 'I' is finite andif '1'(0") converges, then rngto) s:; W.,.,}.
iii. F = !Ffin ng;1-memor y limited.
iv. F = yfin n§conservativc. 0
Exercises
such that Y" is r.e. indexable and [Y"] = [5"], A strategy without r.e, core may be
considered intrinsically complex, in a certain sense.
a. Show that ipo< n ylOlal has an r.e. core. (Hint: See proposition 4.2.2A.) Conclude
that there are non-r.e. indexable strategies with r.e. cores.
b. Show that not every 5" 5; /Fr ec has an r.e. core. (Hint: Consider /Fro< n ynontriviat;
see section 4.3.2.)
5 Environments
DEFINITION S.2A
Intuitively a text # for L results from inserting any number of blanks into a
text for L.
Our notation for texts may be carried over to texts #. Thus for n EN and
text # t, tnis the nth member of t (number or blank), and Tn is the sequence
determined by the first n members of t. The set of finite sequences of any
length in any text # is denoted: SEQ #. For a E SEQ #, the (unordered) set
of numbers in a is denoted: rng(cr) (thus # If rng(cr». As in section 1.3.4, we
fix upon a computable isomorphism between SEQ # and N, and denote the
code number of a E SEQ # by "o",
The notions of convergence and identification are adapted to texts # in
straightforward fashion. To be official:
DEFINITION 5.2B Let <p E g; be given, and let t be a text # .
Exercises
5.2A
a. Sincethe empty set is recursively enumerable, 0 ERE. On the revised conception
of text, what text is for 0? Is 0e2'(q» for all ({)E:F? (For notation, see exercise
1.4.3K.l
b. Prove: For all q> E:Free, there is l/! E:Fro< such that 2'(l/!) = {0} U 2'(q».
S.2D On the revisedconception of text, how many texts are there for the language
{2}?
DEFINITION 5.3A
i. The class of all texts (as understood from section 5.2) is denoted Y.
ii, Y x RE (the Cartesian product of Y and RE) is the set of all ordered
pairs (t,L) such that t is a text and Lis a language. A subset of f7 x RE is
called an evidential relation.
iii. The evidential relation {(t,L)lrng(t) = L} is denoted: text.
ii. 'I' E.'Y is said to identify f£' £; RE on S just in case 'I' identifies every L Ef£'
on S. In this case f£' is said to be identifiable on S.
Example S.3A
Thus [Y', S] is the family of all collections f£' of languages such that some
learning function in the strategy Y' identifies f£' on S. [Y', S]", is just
[Y', S] n iJI'(RE",l. For Y' £; .'Y, [Y', text] = [Y'], where [Y'] is interpreted
according to definition 4.lB. Similarly [Y', text]", = [Y'],,,. In this chapter
we consider the inclusion relations among collections of the form [Y', S],
where Y' is a strategy, and S an evidential relation.
Exercises
S.3A Let If, If' be evidential relations such that for all LeRE, {'I("L)elf} c
{t](t,L)EJ"}. Let $I' £; SF be given.
In this section we consider evidential relations that distort the content of the
ambient language.
Thus a noisy text t for a language L can be pictured as a text for L into which
any number of intrusions from a finite set have been inserted. Note that any
single such intrusion may occur infinitely often in t.
Example 5.4.1A
a. Since the empty set is finite, every text for a language Lcounts as a noisy text for L.
Consequently [9', noisy text] £: [9', text], for any strategy 9'.
b. Let L, L' E RE be finite variants such that L c L'. Then every noisy text for L' is a
noisy text for L, but not conversely.
c. Let L, L' e RE be finite variants, L #- L'. Then {tl(t, L) E noisy text} n {t I(t, L') E
noisy text} #- 0. In conjunction with part c of example 5.3A, the foregoing implies
that {L, L'}¢ [9', noisy text], and hence that [9', noisy text] c [$", text].
Proof Suppose to the contrary that 'I' identifies both Land L' on noisy text
and that L' - Lis finite. Let D = L' - L. We use exercise 5.4.1B. By that
exercise, let "E SEQ be such that (I) rngfe) ,:: LU D, (2) W.,a' = L, and (3)
for all ,E SEQ, if rngtr] ,:: LU D, then '1'(" A r) = '1'(,,). Then, if t is any text
for L', rng(t) ,:: LU D so that on "A t, 'I' converges to an index for L,
namely '1'(,,). But" A t is a text for L' Urngte) and so is a noisy text
for L', contradicting that 'I' identifies L' on noisy text. D
PROPOSITION 5.4.1B Let ft',:: RE be such that whenever L, L' Eft' and
L # L', then both L - L' and L' - Lare infinite. Then ft' E [ff,noisy text].
Proof Let L o, L 1 , ... , be a listing of the languages in ft' such that each
language in ft' appears in the list infinitely often. We define 9 which
identifies ft' on noisy text as follows. Define a function f by
PROPOSITION 5.4.le There is ft' ,:: RE such that (i) every LE ft' is infinite,
(ii) for all L. L' E ft', if L # L', then L n L' = 0, and (iii) ft' E [.~''', text] ~
[ji'rec, noisy text].
easy to see that for any h, 2'h is identifiable by a recursive learning function.
But if h # h', and if q> identifies zej, on noisy text, qJ does not identify 2"h' on
noisy text. Since there are only X o many recursive functions and 2~o
many permutations of N, there must be a permutation h such that no
recursive learning function identifies 2"h on noisy text. 0
Exercises
5.4.IA
5.4.1 D For mEN, the evidential relation {(t, L)lfor some set D of no more than m
elements, rngfr) = LU D} is called m-noisy text. Prove:
a. Let n < m. Then [3'", m-noisy text] c [3", n-noisy text].
b. [3", noisy text] c: nmEN
[9', m-noisy text].
5.4.IE The evidential relation {(t, L)lt isa noisy text for Land {nit. ¢L} is finite} is
called intrusion text. Prove:
a. [9', intrusion text] = [.'F,noisy text].
b. [§r<e, intrusion text] = [3"fOC, noisy text].
5.4.IF 2' ~ RE is said to be saturated on noisy rexz just in case 2' E [$k, noisy text]
and for all 2" ~ RE such that 2 c 2',2' ¢ [$k, noisy text). Note that RE ti o is not
saturated on noisy text. Show that infinitely many 2 ~ RE are saturated on noisy
text. Compare this result to exercise 2.2E.
Environments 103
Example 5.4.2A
a. Let t E:r be the blank text(for all ne N, tn = #). Then t is an incomplete text for
Le RE if and only if L is finite.
b. More generally, let L, L' e RE be finite variants such that L c: L'. Then any text
for L is an incomplete text for L', but not conversely.
c. Since the empty set is finite, every text for a language L counts as an incomplete
text for L
d. As for noisy text, it is easy to see that if L, L' E RE are finite variants such that
L * L', then {L,L'} ¢: [9=', incomplete text]. Consequently [9=',incomplete text] c::
[9=', text].
'*
To see that [§, incomplete text] [§, noisy text], let E be the set of
even integers, and let 2 = {N, E}. 2 is easily identifiable on incomplete
text, but it is not identifiable on noisy text by proposition 5A.lA. 0
The asymmetric relation between noisy and incomplete text is related to
the asymmetrical character of texts noted at the end of section 1.3.3. In
particular, noisy texts allow the intrusion of pseudolocking sequences,
which does not occur in incomplete text.
The next proposition parallels proposition 5A.IC.
PROPOSITION 5.4.2B There is 2 £ RE such that (i) every LE 2 is infinite,
0, and (iii)2 E [§,ec, text] -
(ii) for all L, L ' E 2, if L #- L'. then Ln L' =
[§rec, incomplete text].
Proof This proof is entirely analogous to that of proposition 5.4.1 C. If h
and hi are permutations of N, the collections 2 h and 2 h , defined in the proof
of that proposition cannot both be learned by the same recursive learning
function on incomplete text. Thus there are only ~o many collections
2 h E [.97"ec, incomplete text]. 0
Exercises
5.4.2A
a. Let f£ = {{ (i, Y>I yE Nand i EN - {x}} [x EN}. Specify qJ E:F such that qJ
identifies f£ on incomplete text.
b. Let SERE be given. Specify qJEff such that qJ identifies {{(i,Y>lyE SUD and
iE N} IDfinite} on incomplete text. Compare this result to part c of exercises 5A.IB.
5.4.2B Prove the following generalization of proposition 2.1A. Let qJ E:F identify
LE RE on incomplete text. Then for every finite DeN there is I:TE SEQ such that
rngto) S L - D, Wq>(~) = L, and for all "t ESEQ, if mgte) S L - D, then qJ(I:T A r) =
cp(I:T).
5.4.2C For mEN the evidential relation {(t, L)lfor some set D of no more than m
elements, rog(t) = L - D} is called m-incomplete text. Prove:
a. Let n < m. Then [S",m-incomplete text] C [ff,n-incomplete text].
b. Let f£ S RE be such that f£Erff,m-incomplete text] for all mEN. Then,
f£ E [:F, incomplete text]. Compare exercise 5.4.1D.
5.4.2D !f' s RE is called maximal o~ incomplete text just in case
f£ E [g;, incomplete text], and there is Le RE such that f£ U {L} ¢ [:F, incomplete
text] (cf. exercise 4.6.2C). Show that there are f£ S RE such that f£ is maximal on
incom plete text.
Environments 105
PROPOSITION 5.4.3B There is .P s RE sv t such that (i) for L, L' E.P, if L 'i'
L ', then Ln V = 0, and (ii) 2' E [.?'ree, text]svt - [.?'rec, imperfect textJsvl"
Proof The proofis similar to those of propositions 5.4.1C and 5.4.2B. Fix a
permutation h of N, and define In E :Free by
j,«n. h(n)) ~ I,
Define.Ph = {L. E RE ..,ln E Nand L. represents f.}. It is clear that ifn 'i' m,
L. n L m = 0 and that.Ph E [S".', text]. But just as in the proofs to propo-
sitions 5.4.IC and 5.4.2B, it is easy to see that if h 'i' h', then no 'PES' can
identify .Ph and .Ph' on imperfect text. The result follows. 0
Exercises
5.4.3A Prove the following generalization of proposition 2.1.A. Let cp E!F identify
Le RE on imperfect text. Then for every finite variant L' of L there is U E SEQ such
that rng(u} s; L', W,,(u) = L, and for all r E SEQ. if rngfr) S; L ', then cp(u 1\ t) = cp(u).
106 Identification Generalized
Each of the evidential relations discussed in the last section enlarged the set
of texts counted as "for" a given language. The present section concerns
evidential relations that have the reverseelfect. The new evidential relations
result from constraining the order in which a language may be presented to
a learner.
DEFINITION 5.5.1 A
i. t E:T is said to be ascending just in case for all n, mEN, if tn' t m E N, and
n :s; m thenz, ::::;; t m •
ii. The evidential relation {(t, L)!rng(t) = Land t is ascending} is called
ascending text. If (t, L) E ascending text, then t is called an ascending text
for L.
PROPOSITION 5.5.1A
Proof
i, ii. Let L; = N - {n}, and let 2' = {N} U {LnlnE N}. By proposition 2.2A,
2' ¢. [.9",text]. However, 2' E [.9"feC, ascending text]. The function that
witnesses this merely conjectures N unless a gap has appeared in the
ascending sequence seen so far. In this case it makes the appropriate
conjecture.
iii. To prove this, we note that there is an analog to the notion of locking
sequences for ascending texts. Namely suppose that qJ identifies L on
ascending text. Then there is a sequence a such that a is ascending,
rng(a) S;; L, W",(U} = L, and whenever r is such that a 1\ r can be extended to
an ascending text for L, W",(u t) = L~in fact cp(a 1\ r) = cp(a). The proof to
1\
this is similar to that of proposition 2.1A and exercises 5.4.1 Band S.4.3A.
Now iii is easy, for let a be such a locking sequence for Nand ip, Then qJ does
not identify rng(a) on ascending text. 0
The preceding results should be compared to proposition 2.2.A.
Exercises
5.5.1A t E:Y is said to be strictly ascending just in case for all n, meN, if tn' t m e N,
and n < m, then t, < t.; The evidential relation {(t,L)lrng(t) = Land t is strictly
ascending} is called strictly ascending text. Prove:
a. [sF, strictly ascending text] = [sF, ascending text].
b. [sF'"·, strictly ascending text] = [sF'«, ascending text].
5.5.18
a. Let LeRE be finite and nonempty. What are the cardinalities of {te:Ylt
ascending and for L} and {te:Ylt strictly ascending and for L}?
b. Let Le RE be infinite. What are the cardinalities of {t e:Ylt ascending and for L}
and {te:Ylt strictly ascending and for L}?
DEFINITION 5.5.2A
i. A text t is said to be recursive just in case {tn t n EN} is a recursive set.
ii, The evidential relation {(t, L)I rng(t) = Land t is recursive} is called
recursive text. If (t, L) E recursive text, then t is called a recursive text for L.
108 Identification Generalized
Proof Of course it is clear that [3'""', text] ~ [3'""', recursive text]. Given
IE E [!Free, recursive text] and t/J e!F rec whichidentifies .se on recursive text
we now claim that there is a qJ E §Tec which identifies ff on arbitrary text.
To see this, first notice that if ljJ identifies L on recursive text, then there is
a locking sequence a for ljJ and L. This is because the construction of
proposition 2.1A can be made effective. Namely, if there is no locking
sequence a for ljJ and L, then there is a recursive text t for L on which ljJ does
not converge. Now <p may be constructed as in the proof of proposition
4.6.3A, the locking-sequence-hunting construction. By corollary 4.6.3A,
given e' E :Free, there is a qJ E :Free such that for every LE 2, cp converges on
any text for L to i = ljJ(a), where a is the least locking sequence for ljJ and L.
Thus <p identifies !f' on arbitrary text. D
Environmen ts 109
Exercises
5.5.2A Let r! s;:T x RE be such that (a) for all LeRE, {tl(t,L)er!} is denumer-
able, and (b) for all L, L'eRE, if L #- L', then {t!(t,L)€r!} n{tl(t,L')er!} = 0.
Prove that REe [:JO,r!].This result generalizes proposition 5.5.2A.
*55.2B (Gold 1967): The evidential relation {(t,L)[rng(t) = Land {t.lneN}
is primitive recursive} is called primitive recursive text. Show that RE€ [:JO"c,
primitive recursive text]. What is the appropriate generalization of this result?
55.2e Show directly (without the use of proposition 5.5.2B) that REov l ¢
[:JOrec, recursive text]. (Hint: Modify the construction in the proof of proposition
4.2.1B.)
DEFINITION 5.5.3A The evidential relation {(t, L)I rng(t) = Land t is not
recursive} is called nonrecursive text. If (t, L) E nonrecursive text, then t is
called a nonrecursioe textfor L.
The sequence of utterances actually produced by children's caretakers
depends heavily on external environmental events. Such environmental
influences might seem guaranteed to introduce a random component into
naturally occurring texts. Such texts would be nonrecursive, perhaps
strongly so. It is natural to inquire whether limitation to nonrecursive text
facilitates identification. The next proposition provides a negative answer
to this question.
PROPOSITION 5.5.3A
i. [.97,nonrecursive text] = [.97, text].
ii. (Wiehagen 1977) [.97 re o, nonrecursive text] = [.97 re c, text].
Proof Forthe proof ofi, suppose that 2'E [%, nonrecursive text] and that
this is witnessed by tp, We will suppose that 04:2'; the other case is easily
handled. We will establish that 2' E [.97, text] just as in proposition
5.5.2B-namely, we will show that for every LE 2' there is a locking
sequence o for <p and L. Suppose otherwise. We will derive a contradiction
by showing that there are uncountably many texts t that <p does not identify
and hence that there are nonrecursive texts that <p does not identify. First
note that the nonexistence ofa locking sequence for <p and Limplies that for
every a, rng(u) ,,;; L implies that there are sequences r and t' such that
110 Identification Generalized
rng(r) S L, rng(r') s L, rand r' extend o, <p(r) '" <p(") or W.,,) '" L, <p(r') '"
<p(") or W-",) '" L, and, finally, that rand r' are incompatible, To see this,
let n be any fixed element of L Since "Anand "A
# are not locking
sequences for L, they can be extended to rand r', respectively, by elements
of rng(L)U {#} such that <p(r) #- <p(") or W.,,) #- L, and <p(r') #- <p(") or
W.,<') '" L rand r' have the desired properties, Now we simply apply this
splitting property iteratively to get uncountably many texts for L Applying
the principle with" = 0 yields r'', r 1 which are incompatible and for which
<p(") '" <p(r') or W-",) #- L, i ~ 0, L Let so, S1, "., be an enumeration of L
Applying the splitting property to both 'to 1\ So and r 1 1\ So yields -roo, 'to 1, t lO ,
r ", all incompatible and such that <p(r') '" <p(r'i) or W.,,'i) '" L. Continuing
this process gives uncountably many texts for L which <p does not identify,
namely one for each infinite sequence of O's and 1'5. For instance,
to U tOO U r Ooo ... is one such text.
The proof of ii is virtually identical and is left for the reader. 0
Exercises
5.5.3A Call an evidential relation $ big just in case (a) $S {(t,L)lrng(t) ~ L},
and (b) {(t.L)lrng(t) = L} - rf is denumerable. Thus a big evidential relation is
"nearly" {(t, L)I rng(t) ~ L}, that is, nearly text,
Let S be a big environmental relation. Prove:
a. [S', $] ~ [S', text].
b. [§rec,$] = [SPeC,text].
DEFtNITION 5.5.4A
i. teY is called Jot just in case for all ierng(t), {nit, = i} is infinite.
ii. The evidential relation {(t, Ljlrngfr) = Land t is fat} is called Jot text. If
(t,L)efat text, then t is called a Jot text for L.
Thus t is a fattext for Ljust in case t is a text for Lsuch that every member of
L occurs infinitely often in t.
Environments III
PROPOSITION 5.5.4A
Fat text is more interesting in the context of memory limitation. The next
proposition shows that the former entirely compensates for the latter.
Case 3. Otherwise,
(Notice that if t is any text, then as 'I' is given more and more of t, the
components m and D of the conjectures oftp can only change by increasing.)
It is easy to see that 'I' is I-memory limited since tp(a A n) depends only on
tp(a) and n and not on a. Suppose now that LE:t' and that t is a fat text
for L. We will show that 'I' converges to f(j, D, m) for somej, D, m such that
j is the least index for Land D ;2 DL •
Suppose first that 'I' converges on t. Then, since cases 2 and 3 both result
in a change of conjecture, cp must eventually be in case 1 forever on t and
so must converge to some f(i, D, m) such that IV; ~ L', DL • ~ D, and
D U {n} ~ L' for every n that appears on t after the point at which 'I' begins
to converge. Since t is a fat text for L, this implies that Dc- ~ L ~ L'. But by
the property of DL • this implies that L' ~ L, and so 'I' converges to an index
for L.
Suppose then that qJ does not converge on t. This implies that on t case 2
or3 happens infinitely often so that 'I' makes conjectures with arbitrarily
large m. Suppose that x ELand n is such that tp(t.) = f(i, D, m), m ;0, x, and
t n = x. Such an n must exist, since t is a fat text for L. Then cp(tll+1) =
f(i', D U [x], m') for some i', m', Thus every x E L is eventually added to the
sets D of tp's conjectures; that is, there is an no such that if n > no, tp(t.) =
f(i, D, m) for some D ;2 DL . This implies that for all n > no, tp(t.) will be
defined either by case I or by case 2 applied to a language L' of index :s;; j,
since L will satisfy the condition of 2 for all such n. However, for each
language L' of index < j, 'I' will eventually abandon L', since otherwise 'I'
will converge to L' and we argued that this does not happen. Thus 'I' will
eventually conjecture L on t., for some n" and then 'I' will be in case I for all
t. such that n ;0, n,. 'I' will then change its conjecture at most finitely often
after t. and will converge to f(j, D, m) for some D, m, D
Exercises
5.5.4A t E:Y is called lean just in case for all n, mEN, if tn, tm E Nand n ¥- m, then
t n # 1m, Thus lean texts never repeat a number.The evidential relation {(t, Lllrng(t) =
Land t is lean} is called lean text. Prove:
a. [3', lean text] = [3', text].
b. [§"rec,lean text] = [~rcc. text].
En vironments 113
5.5.4B Let i ~ j. t E ff is called mixed(i,j) just in case for all kEN, {tk , t H I" •• , t H j}
contains at least i + 1 different numbers. The evidential relation {(t,L)lrng(t) = L
and t is mixed(i,j)} is called mixed(i,j) text. Mixed text generalizes lean text.
Suppose that !i' s;; RE contains only infinite languages, and let i ~j be given.
Prove:
a. !i' E [S"", mixed(i,j) text] if and only if !i' E [S"", text].
b. !i' E [3"'"0, mixed(i,j) text] if and only if .!i' E [3", ec , text].
5.6 Informants
In section 1.3.3we noted that arbitrary texts for a language Ldo not provide
the learner with direct information about L. This feature of texts is moti-
vated by empirical studies suggesting the absence from children's environ-
ments of overt information about ungrammatical strings (see the references
cited in section 1.3.3). In other learning situations, however, the foregoing
property of texts is less justified. In mastering the extensions of certain
concepts, for example, the child may expect explicit correction for false
attributions; other examples may be drawn from scientific settings. We are
thus led to consider environments for a language L that provide equivalent
information about L.
Thus informants are special kinds of texts, but an informant for a language
L is not normally a text for L.
Example 5.6.1A
a. Let t E ff be such that for all n E N, t. = <n,O) if n is even, and t. = (n, 1) if n is
0deL Thf'!! t an informant for the set of even numbers.
j~
b. Let tEff be such that rngtr) = {(i,O)!ieN}. Then t is an informant for N.
114 Identification Generalized
LEMMA 5.6.1 A Let t.« REand tefT be given. Then tis an informant for L
if and only if rng(t) represents the characteristic function for L.
Example 5.6.18
Let (jJ e:Y' be defined as follows. For all a e SEQ, <p(a) is the smallest index for
{xl<x,O>emg(u)}. Then (jJ identifies REno on informant.
Exercises
5.6.1A Show that if t e.:r is an informantfor Le RE, then L is recursiveif and only
if rng(t)E REsv t '
5.6.1B Prove the following:
a. {N} UREno e [:Y'rec, informant].
b. {N} U {N - {x}lx e N} e [:Y',e.,informant].
5.6.1C Prove: [:Y',e., text] ~ [:Y're<, informant].
Environments 115
Proof If 0" E SEQ, we say that 0" is consistent with Itj if (x, 0) E rng(a)
implies that x E It; and (x, 1>
e mgfrr) implies that x ¢ Itj. Then we define qJ
by
Obviously on informant t for Itj, qJ converges to the least index for Wi. 0
Exercises
5.6.2A Let SEQ* = {Tn II! EN and t is an informant}. Prove the following variant of
proposition 2.lA. Let ({JEfF identify LERE on informant. Then there is CTESEQ*
with the following properties: (a) {x](x, 0) E rng(a)} ~ L; (b) W<p(a) = L, and (c) for
all tESEQ* such that {xl(x,O)Erng(1")} ~ L, cp(a" 1") = ({J(fJ).
5.6.28 Prove: [fFr«, informantjj; = [SFr<<, text].v,'
5.6.2C tE:!T is called an imperfect informantfor LE REjust in case t is an informant
for a finite variant of L. The evidential relation {(t, L)I t is an imperfect informant for
L} is called imperfect informant. Prove: There is Ii' ~ RE such that (a) every LE Ii' is
116 Identification Generalized
infinite, (b) for every L, L' E !I' if L # L', then L n L' = 0, and (c) !I' E [soree, text] -
[so'e", imperfect informant]. (Hint: See the proof of proposition 5.4.3B.)
5.6.2D Recall the evidential relation ascending text from definition 5.5.1A. Prove:
a. [3"", ascending text] c [SO, informant].
b. [SOTec, ascending text] c [soree,informant].
*5.6.2E An oracle for a language L is an agent that correctly answers questions of
the form "x E L?" in finite time. Conceive of a learner 1 as a device that queries an
oracle for an unknown language L an infinite number of times, producing a
conjectured index after each query is answered.l is said to identify L 0/1 oraclejust in
case (a) 1never fails to produce a conjecture after each answered query, (b) for some
i E N, all but finitely many of 1's conjectures are i, and (c) L = Jt;. Identification of
collections oflanguages on oracle is defined straightforwardly. (All of this is drawn
from Gold 1967;variants are possible.) For g ~ SO let the class of collections f£ of
languages such that some q> E g identifies f1' on oracle be denoted [g, oracle].
Prove:
a. [SO, oracle] = [SO, informant].
b. [SO,e., oracle] = [§re., informant].
*5.6.2F Consider a learning paradigm intermediate between oracles (in the sense
of exercise 5.6.2E)and text. In this case the learner is presented with an arbitrary text
fora language L but is allowed, in addition, to pose any finite number of questions of
the form "x E L?" each to be answered appropriately in finite time. Identification
may be defined straightforwardly for this situation. The corresponding class of
identifiable collections of languages is denoted "[g, text with finite oracle]," for
g ~ SO. Prove:
Proof This proof is essentially the same as that of proposition 4.4.IF. The
relevant collection is !l' = {N} U {DID finite}. !l' is easily identified on
informant by a recursive funtion cp; cpt,,) = N unless there is a pair (x, I ) in
rngtc) for some x in which case cp conjectures {x](x, 0) Emg(,,)}. Suppose
that .P E[g;memorylimited, informant]. Then by exercise 5.6.2A there is a
locking sequence" for each LE!l' and tp, Let" be such a locking sequence
forN,let D = {x](x, 0) Emgt,,)}, and let a' be such that r = "A "0 A o' is a
locking sequence for D. Now choose n ¢ {x](x, i) Erng(r) for some i}. Then
cp(" A "0 A,,') = cp(" A (n,O) A "0 A ,,') and is an index for D. However, if we
now complete" A (n, 0) A "0 A o' to an informant for D U {n} with pairs
(m, I), m ¢ D U {n}, we must have that cp converges on this informant to an
index for D by memory limitation and the property of r, 0
COROLLARY 5.6.3B [~memorYlimited.informant] C [&t".informant].
An "effective" version of the preceding corollary may also be proved.
PROPOSITION 5.6.3C [~ree n&t"memorylimited, informant] c: [g;ree,
informant].
Proof The proof of this proposition is left to the reader. 0
Exercises
5.6.38 Prove that there is !L' £ RE such that (a) every LE I.e is infinite, and (b)
I.e E [ffrec, informant] - [ffTeC n ,F,ontr'v'.!, informant]. (Hint: See the proof of pro-
position 4.3.2A.)
5.6.3C Prove that [ff TCC n ffco,.erv.,••e, informant] c [ff Te., informant]. (Hint: See
the proof of proposition 4.5.1B.)
5.6.3D Prove that [ff TeCn ffc.utiou., informant] c [ff rec, informant]. (Hint: See
the proof of proposition 4.5.4B.)
5.6.3E Prove that [,Freli.bLe, text] c [ff TeCn ,Froli.ble, informant] c [ff TCC,
informant]. (Hint: See section 4.6.1.)
5.6.3F Prove that [ffre., informant] = [,Free n ffotdcT;.depe.de." informant]. (Hint:
Don't use the construction given in the proof of proposition 4.6.3A-even though
this construction can be successfully modified for the present case; a simpler
construction is available.)
DEFINITION 6.1.1A Let <p E 9', t E Y, and S <::: N be given. <p is said to end in
Son t just in case (i) <p is defined on t, and (ii) <p(lm) E S for all but finitely
many mEN.
Thus <p ends in S on tjust in case <p(lmH for all mEN, and there is n E N such
that <p(lm)ES for all m;" n. More intuitively, <p ends in S on tjust in case <p
eventually produces on t an unbroken, infinite sequence of conjectures
drawn from S.
Example 6. 1.1 A
a. Let E be the set of even numbers. Then <p E!F ends in E on t E.'!T just in case <p is
defined on t and <p(tm) is even for all but finitely many mEN.
b. Let g bethe functiondefined in the proofofproposition 1.4.3B,lett ~ 1,2,3,4, ...•
and let no be the smallest index for N - {OJ. Then g ends in (no} on t. More
generally, <peg; converges on lEfT to neN ifand only ifcp ends in {n} on t.
c. If lpE# ends in S £; N on te!T, then cp ends in S' S; N on t forall S' 2 S.
120 Identification Generalized
DEFINITION 6.1.1B
Example 6.1.1B
Example 6.1.2A
a. Let rcbe asin part a of example 6.1.1B.Then cpEfF rc-identilies LE RE on text just
in case cp rc-converges to L on every text for L-that is,just in case cp converges to an
index for rng(t) on every text for L. Thus cp rc-identifies Lon text if and only if cp
identifies L in the sense of definition 1.4.2A.
b. Let '(f' be as in part b of example6.1.1B. Then 'PES' '(f'-identifies LE RE on noisy
text just in case cp ~'-converges to Lon every noisy text for L-that is.just in case cp
converges to an index for a finite variant of L on every noisy text for L.
c. Let ce" be as in part c of example 6.1.18. Then cpEff "C"-identifies LERE on
incomplete text just in case cp t(j" -converges to L on every incomplete text for L-
that is, just in case cp ends in the set of indexes for L on every incomplete text for L.
i. The class {2' >; Rlilsome 'P E Y' 'C-identifies 2' on <!'} is denoted:
W.$, '6].
ii. The class {2' >; RE",lsome q>EY' 'C-identifies 2' on $} is denoted:
[Y', s, '6]",.
Finally, we provide a name for {(L,{n})1 w" = L}, the convergence crite-
rion proper to the identification paradigm. The intuitive significance ofthe
name will become apparent as we consider alternative convergence criteria
in the sections that follow.
Example 6.1.2B
a. [S', text. INT] is the family of all identifiablecollectionsoflanguages (in the sense
of definition 1.4.3A).
b. Let C(j'be as in part b of example 6.1.1B.Then [.:F m , noisy text, CC'] is the family of
all collections !t' oflanguages such that for some recursive learning function cp, and
for every noisy text r for a language Lin !t', cp converges on t to a finite variant of L.
122 Identification Generalized
C. Let (C" be as in part c of example 6.1.1B. Then [ffconslslent, incomplete text, {C"] is
the family of all ccllections ze' of languages such that for some consistent learning
function ip, and for every incomplete text t for a language Lin !E, cp ends on l in the
set of indexes for L.
Exercises
6.1.2B Specify convergence criterion re such that (a) {C c INT and (b)
[!!Free, text, (C] = [!!Free, text, INT].
6.1.2C Prove: Let evidential relation Gand convergence criterion re be given. Then
[!!Free n !!Flolal. s, rc] = [!!Free. S. rc] (cf. proposition 4.2.1B).
rngfr] ~ L, then cp(rr '" r)ES. (In the foregoing situation (J is called a tC-Iocking
sequence for cp and L on text.)
6.1.2E Let convergence criterion (t be given. Show that [:F re <, text, 't&'] ~
[jtrc<, informant, 1i']. (For informants, see definition 5.6.lA.)
DEFINITION 6.2A The convergence criterion {(L, {n} )IL and J¥" are finite
variants} is called finite difference, intensional, abbreviated to: FINT.
PROPOSITION 6.2.1A
Proof Note that INT ~ FINT. Hence by exercise 6.1.2A it suffices to show
that the inclusions are proper. Let.It' = {N - DID finite}. By proposition
2.2A and lemma 2.2A, !f f/: [SF, text, INT] 2 [yre<, text, INT]. Let n be an
index for N. Define hEff red by h(r) = n for all -rESEQ. Then h FINT-
r
identifies se on text. Thus se E SFree, text, FINT] ~ [SF, text, FINT],
which establishes i and ii. 0
124 Identification Generalized
On the other hand, FINT does not allow the identification of RE.
The next proposition shows that SF'" is restrictive with respect to FINT
and text. It is a corollary of proposition 6.5.1C (see also exercise 6.2.1H).
Exercises
6.2.1A !.t, fiJ' £: RE are said to be finite analogues just in case (a) for all Le!i' there
is L' E.st' such that Land L' are finite variants, and (b) for all L' e!l" there is LeIi'
such that Land L' are finite variants. Now let:e.!£' ~ RE be finite analogues and
suppose that cpefF is such that 2 £: ~~.I,INT(CP).
a. Does it follow that If' £: !t;Uf.INT(CP)?
b. Does it follow that fiJ' E [g:. text, INT]?
6.2.1B Prove:
a. [ffcOnfiden\ text, INT] c [fFCOnfident, text, FINT].
b. Let !I! £: RE be a w.o. chain (in the sense of exercise 4.6.2B) such that for infinitely
many L, L' E fe. Land L' are not finite 'variants. Then::e¢ [ffeOnfiden" text, FINT).
Conclude that [:F m , text,INT] rt [:Feo~r;den'. text, FINT].
6.2.\ D Prove:
a. [§ree, text, INT] $ [§ree n§consislent, text, FINT).
"b. [!Free, text, INT] rt [:Free n :Feauliou"text, FINT].
*6.2.\E
a. ::e
s; RE is called [mite-difference saturated just in case se E [§, text, FINT], and
for all 5£' s; RE, if 5£ c: X', then IE' ¢ [Y. text, FINT]. Exhibit a finite-difference
saturated collection of languages (cf. exercise 2.2E).
b. ::e
s;; RE is called finite-difference maximal just in case se E [ff, text, FINT], and
Criteria of Learning 125
for some LE RE, Z U {L} '1 [.'F, text, FINn. Exhibit a finite-difference maximal
collection of languages different than the collection exhibited in part a (cf. exercise
4.6.2C).
6.2.IF For nEN the criterion {(L, (i})I(L- W;jU(W; - L) has no more than n
members} is denoted FINT(n). Prove:
a. If n < m, then [%', text, FINT(n)] c: [%', text, FINT(m)].
b. U.. N [%', text,FINT(n)J c [%', text, FINT].
"'c. (Case and Smith 1983) Let n < m. Then [yrec n g;Popperian, text, FINT(m)]sv, -
r%'''', text, F1NT(nll••, oF 0.
*d. (Case and Smith"'1983) [yrec n ffPopperian. text, FINT]svt - UnEN [yrec, text,
F1NT(n)],,, oF 0.
*6.2.1G <PEg; is called FINTorder independent just in case for all LERE, if ip
FINT -identifies L on text, then for all texts r, t' for L, cp converges on both t and t' to
the same index (cf. definition 4.6.3A). Prove the following variant of proposition
4.6.3A: [yrec n yFINT order independem. text, FINT] = [yree, text, FINT].
PROPOStTtON 6.2.2A
Proof Let!i' = {N} U {D - {O}ID finite}. It is clear that !i'E [$'''', text,
INT]. But suppose that!i' E [S', noisy text, FINT]. Observe that every text
for a language in {N} U RE n" is a noisy text for some language in !i'. But
from this it follows that {N} U RE n" E [S', text, FINT] contrary to the proof
of proposition 6.2.IB. 0
126 Identification Generalized
PROPOSITION 6.2.2B
Exercises
6.2.20 Prove: [.'F, imperfect text, FINT] = [.¥",noisy text, FINT].(Hint: See pro-
position 5.4.3A.)
Suppose that qJ Eff FINT-identifies LE RE,,, on text, and let t be a text for
L. It does not follow that qJ converges on t to an index for a total, single-
valued language. Since qJ is allowed a finite margin of error, '" may well
converge on t to a language that represents a properly partial function. This
is a useful fact to bear in mind when thinking about the results of the present
subsection (cr. exercise 6.2.3B).
In view of proposition 1.4.3C, [Si', text, INT],,, = [Si', text, FINT],.,. In
contrast, the next proposition shows that INT-identification on text and
Fl N'l'-identification on text can be distinguished in the context of Si"".
Criteria of Learning 127
DEFINITION 6.2.3A !/J E 9'rec is called almost self-naming just in case !/J(OH
and!/J and ({J1p(O) are finite variants, The collection {LE REsvtlL represents an
almost self-naming function} is denoted REasn'
Proof We follow the proof given by Case and Smith. The reader should
compare the construction here to the proof of proposition 4.6.1C(iii)
which gives a similar application of the recursion theorem in a simpler
setting.
We claim that RE asn E [ff rec, text, FINTJ,vl - [9'rec, text, INTJ.vl' It is
clear that REa•n E [9'rec, text, FINT]. Hence it suffices to show that RE a•n ¢
[~rec, text, INT]svt.
Suppose, to the contrary, that !/J E 9'ro. intensionally identifies RE as n on
text. By lemma 4.2.28 we may assume that !/J is total. We define a total
recursive function f by the recursion theorem such that f is almost self-
naming and if L represents f, then !/J fails to intensionally identify L. To
apply the recursion theorem, we construct a total recursive function h by the
following algorithm. The algorithm defines ({Jh(i) in stages indexed by s. In
the description of the algorithm, ({J~(i) denotes the finite piece of ((Jh(i) cons-
tructed through stage s; as denotes a number we are attempting to withhold
from rY,,(i) at stage s, and x S denotes the least number n such that n is not in
the domain of ({J~(iJ and n #- as, Recall from the proof of proposition 4.6.1C
that ({J[n] = «0, ({J(O), .. " (n, ({J(n)), where it is understood that ({J(mH for
every m :s; n. We also think of!/J as conjecturing indexes for partial recursive
functions rather than the languages representing them.
Construction
Stage 0: ({J~(i)(O) = i; aO = 1.
Stage s + 1: Suppose ({J~(i) (a finite function) and a' have been defined. We
define ({J~~i by the following three cases.
Case 1. There is a o such that ({J~(i)[a' - 1] <::: 0' <::: (4'~(i)U {(a',O)})[x'-
IJ and !/J«{J~(i)[a' - IJ) #- !/J(O'). Then let lfJ~~~ = ({Jh(j) U {(a',O)}, and let
a s +1 = x",
Case 2. The hypothesis of 1 fails and ((J1I'(<i'~III[a'-lJ),,(aS)!.Then let cp~~)l =
lfJh(i) U {(a', 1 - rplj>('I'~III[a'-ln(a')}, and let a'+1 = x'.
128 Identification Generalized
Case 3. If neither the hypothesis of case I holds nor the hypothesis of case
2 holds, then let 'P:U = 'P:(i) U {(x', O)} and let a'+! ~ a'.
Now, by the recursion theorem, let i be such that 'Ph(i) = 'Pi' In the
construction of lph(i) either(a)for every s thereis an s' > s such that as' =I=- as,
or (b) there is an s such that for every s' > s, as' = as. In each case we define
a total recursive function J such that if L represents J, then LE RE,," and
i/J fails to identify L.
In case (a) holds, 'Ph(i) is total. Let J ~ 'Ph(i) and let L represent J. Clearly
L ERE,,". Let t be the text for L such that t"+! ~ J[ n] for every n. Then, for
infinitely many s, CPh{i) is defined by case 1 or case 2 of the construction.
Hence either i/J changes its conjectures infinitely often on t or for infinitely
many n, 'Pw" . ) # J. In either case i/J fails to identify t.
In case (b) holds, 'Ph'i) is defined by case 3 at stage s' of the construction for
every s' > s. Hence 'Ph'i)(n)L for all n # a'. Let J = 'Ph(i) U {(a', O)}, and let L
represent J. Again it is clear that LE RE,,". Let t be the text for L such that
t"+! = J[n] for all n. Since 'Ph'i) is defined by case 3 of the construction for
all s' > s, i/J converges on t to an indexj such that 'Pj(a')f. But then i/J fails
to identify t. 0
The next proposition shows that even in the context of REsVH §rt:c is
restrictive for FINT-identification on text.
PROPOSITION 6.2.3B RE,,, ¢ [.'J"""', text, FINT],,,.
Proof The proof of proposition 4.2.1 B suffices to prove this result as well
with only minor modification. Note that L o and L, specified there are
members of REsvt and are not finite variants of one another. 0
COROLLARY 6.2.3A [y;rec, text, FINnsvt C [§, text, FINT]svt.
Exercises
L' differ at only finitely many arguments. (If L¢ REsvt, then L cannot be FFD-
identified.] Prove: [§rec, text, INTJs~1 = [§rcc, text, FFD]"'I. Compare this result to
proposition 6.2.3A.
6.2.3C(Case and Smith 1983) For ne N,let the convergence criterion FINT(n) be
defined as in exercise 6.2.1 E. Prove the following strengthenings of proposition
6.2.3A:
a. Let n < m.Then [,¥rec, text, FINT(n)]svl C [,¥rec, text, FINT(m)lvl'
b. UIliEN[greC,text,FINT(n)]svt c: [§"reC,text,FINTlw
6.3.1 EXT-Identification in RE
Proof It suffices to exhibit an ..P E[ff rec , text, EXT] - [g;ree, text, INT].
Let ..P = {KUDID finite}. By lemma 4.2.1C, ..P¢[ffree,text,INT]. Let
g E ffrec be such that l¥g(r) = K U rngtr). (Such a 9 may be defined using part
ii of definition 1.2.1A.)Then 9 EXT-identifies ..P on text: Let t be a text for
some K U D, D finite. Then for some n, D ~ rng(tn) and for every m ~ n, g(tm )
is an index for K U D. (Note that for m' > m ~ n, g(tm) and g(tm,) will not, in
general, be the same index for K U D.) D
The next proposition states that :!i'ree restricts EXT-identification on
text. It will be exhibited as a corollary of proposition 6.5.1 C (see also
exercise 6.3.1D).
PROPOSITION 6.3.10
i. [:Free, text, FINT] rt [:!i'ree, text, EXT].
ii. [:Free, text, EXT] ¢ [:Free, text FINT].
Criteria of Learning 131
Proof
Exercises
6.3.1A
a. Prove: [ffTCC n ffcon,e".'ivc, text, EXT] c [3O r«, text, EXT].
b. rp E 30 is said to be extensionally conservative just in case for all 0" E SEQ, if
rngfe) ~ W"'l,q, then W"'I~) = W"'ln (00- is explained in definition 4.4.1A). Thus an
extensionally conservative learner never abandons a language that generates all the
data seen to date (although a specific grammar may be abandoned at any time).
Conservatism is a special case of extensional conservatism. Prove: [3O,ee n
3Oe.,en,;onRllycon.e"alive, text, EXT] 'it. [ff,e., text, INT].
6.3.1B rpE30 is said to be extensionally confident just in case for all t€:Y there is
a:
LE RE such that C{J ends in {iI = L} on t. Confidence is a special case of exten-
sional confidence. Prove:
a. Let:.f 0;; RE bean infinite w.o. chain (see exercise4.6.2B for the definition ora w.o.
chain). Then 2! rj [ff u,ensional1y confident text, EXT]. (Hence the collection :.f of all
finite languages' is not a member of this latter class since :.f contains an infinite
chain.)
132 Identification Generalized
text,EXT].
c. Let s; !e' e [:~~n"'n'·d."'I""'.I1,.cuarldeDt, text, EXT]. Then, !l'U 2" e [:F'" n
,uteuiona1b coafldeal. text. sxrj.
*6.3.1C Let S, t eff be given. 5 is said to be final in t just in case there is n E N such
that Sm = l,"+11 for all mEN. Intuitively, s is final in t just in case t has s as infinite
"tail." Let cpe:? he defined on 1 eY. The infinite sequence of conjectures produced
by 'P on t is denoted 'P[t]' Formally,'P[t] is the unique s Eff such that s; = 'P(I,) for
all n E N. Finally, cp E g; is said to be extensionally order independent just in case for
all LeRE, if cp extensionally identifies L, then there is seg- such that for all texts
t for L, s is final in cp[e]. It can he seen that order independence is a special case
of extensional order independence. Prove [§rec n§e:llensional order Independence. text,
EXT] c [§rec, text, EXT].
6.3.1D Prove: Let.'l' be a denumerable subset of:T. Then [.'I',text,EXTJ c:
[:T, text, EXT]. (Hint: See the proof of proposition 4.l.A.) Note that proposition
6.3.1C follows from this result.
*6.3.1E (Case and Lynes 1982) Prove that there is !E £; RE rec such that
!E E [§rec, text. EXT] - [§rec. text. FINT]. (For REm. see definition 1.2.2B.) The
foregoing result strengthens proposition 6.3.1 D(ii). lts proof is nontrivial.
6.3.1F Prove proposition 6.3.1 E.
6.,j.lGLet REr;n'K be as defined in exercise 4.2.1H. Show that
{K} UREr;'KE [:T''', text, EXn - [:T''', text, INT].
6.3.1H Prove: There is !f ~ RE such that (a) every LE!E is infinite, and (b)
!E E [ff rec, text, EXT] - [§rec n j"accounlable, text. EXT]. (Hint: See the proof of
proposition 4.3.5A.)
Proposition 1.4.3C is enough to show that RE,,, E [.'F, text, EXT],.t- On the
other hand:
PROPOStTION 6.3.2A (Case and Smith 1983) RE,,, if [.'F"', text, EXTJ",·
Proof The proof we give parallels the proof of proposition 4.2.1B. Sup-
pose <p EXT -identifies RE,,, on text. Call a text t orderly just in case there is
an f E :T 'O'" such that 1,+1 = f[ n] for every n and call a sequence 0' ESEQ
orderly just in case it is in an orderly text. (See the proof of proposition
6.2.3A for <p[n].) For any orderly sequence 0', let s" be the orderly text s such
that o is in s and for every n ~ Ih(u),s. = (n,O),and lett" be the orderly text
Criteria of Learning 133
t such that a is in t and for every n ~ Ih(u), to = (n, 1). Each of the texts s"
and ttl is for a language in RE. vl' Consequently for every a there is an
n > lh(u) and an m such that (n, 0) E Wq>(:<:,),m and there is an n > lh(u) and an
m such that (n, 1> E Wq>(i~),m' Let p(a) be the first coordinate of the smallest
pair (n, m) with this property with respect to s", and let q(a) be likewise with
respect to t". We now define an orderly text t for a language in RE' Vl which qJ
fails to EXT-identify.
Le t a o -- 0 . F It
or n even e a 0+1 -- _"n - -"n
Sp("n). F or n 0 dd Ie t a 0+1 - tq (,, " ) , L e t
t = U(10, It is clear that t is an orderly text for a language in RE. v!'
In addition for every n > 0, t1h("n) if Wq>(,,")' which shows that qJ fails to
EXT-identify rng(t). 0
For the proof of the following proposition the reader may consult Case
and Smith (1983, theorem 3.1). Note the contrast to proposition 6.3.1D(i).
Exercise
6.3.2.A (John Steel, cited in Case and Smith 1983) Provide a simple proof for
the following weakening of proposition 6.3.~..B: [9're<, text, FINT]SV1 S;;
[9'r.o, text, EXT]svl' (Hint: The errors of a FINT learner on a text for Le REBvt
can be discovered and patched.)
Thus qJ E §" FEXT-identifies LE RE on text just in case for all texts t for L, qJ
ends on t in the set of indexes for some, one finite variant of L. Intuitively, to
134 Identification Generalized
PROPOSITION 6.3.3A
Exercise
Thus <[J Eff BEXT-identifies LE RE on text just in case for all texts t for L, <[J
ends on t in {il W, = Land i < n} for some nE N such that n is at least as big
as the least index for L.
Criteria of Learning 135
6.4.1 BEXT-Identification in RE
It is evident that [9", text, BEXT] = [9", text, INT] (= [9", text, EXT]).
The next proposition provides information about the classification of
[9""', text, BEXT].
PROPOSITION 6.4.1A
DEFINITION 6.4.IA
LEMMA 6.4.1A For any total recursive functions f and g, there exist i andj
such that J1'; = U/(i.i)l and ~ = H-;,(i.i»)'
Proof See Rogers (1967, sec. 11.4, theorem X, a). 0
will define two total recursive functions f and g. The definitions of f and g
rely on the following construction.
Stage °
0-° = (l,i),(2,j».
to = (1, i), (2,j),(O,O».
(10 = 0.
Stage n + 1
Case 1. l{1(o-n) =1= ljJ(o-n A WA #). Then let o-n+l = o-n A (1" t\ #, t"+1 = r",
and (1n+l = 0.
Case 2. 1jJ(0-") = l{1(o-n A WA #) = l{1(o-n A Tn). Then let 0-"+1 = an, t"+1 =
t n A<3, n), and (1"+1 = (1" A #.
Case 3. 1jJ(0-") = l{1(a n A WA #) =1= l{1«(Tn A r"). Then let 0'"+1 = an 1\ r" 1\
(~,2n+ 1),t"+1 =.nI\C(O,2n+ l),(O,2n + 2»), and (1n+l = 0.
Case a. (T" is defined infinitely often by case lor case 3 ofthe construction.
In either case t = U a" is a text for L£ on which ljJ changes its conjecture
infinitely often. Hence l{1 fails to FINT-converge to L£ on t.
Case b. an is not defined infinitely often by either case 1 or case 3 of the
construction. Then, a" is defined cofinitely often by case 2 of the construc-
tion. In this case there is an n such that for every m ;:::. n, (T'" = (T". Let s E!!I
be such that for every i, Sj = #, and let t = Urn. Then an 1\ S is a text for L£
and an A t is a text for L o , and on each of these two texts l{1 converges to the
same index. But in this case La and L£ are not finite variants. Hence l/J fails
to FINT-identify at least one of La and L£.
iii. Let .P = {K U {x} Ix E N} . .P E [§r.c, tex t, EXT] (cr. the proof of propo-
sition 6.3.1 B). We show on the other hand, that f£ f [spec, text, BEXTJ.
Our argument parallels the proof oftemma 4.2.1C. Suppose to the contrary,
Criteria of Learning 137
COROLLARY 6.4.1A
Exercises
6.4.1A Prove:
a. [jO'.' n jOeoIlIiIl.ll\ text, BEXT] c: ["OC, text, BEXT]. Compare this result with
proposition 6.3.1 E.
b. There is !i' s;; RE such that every Le!t' is recursive and !i' e [,r••, text,lNT] -
[,ree n jOe..... IltOftl, text, BEXT]. Compare this result with proposition 4.3.38.
6.4.18 Give an alternative proof of proposition 6.4.1A(ii) by showing that
!t' = {{ (O,j)} U{(x, y)IO < x and ye Kj}ljE N} UH(O,j)} U{(x,y)IO < x and
ye N} Ii eN} e [".', text, BEXT] - [,..." text, FlNT]. (Hint: To show that
.sf HjOre., text, FINT], show that otherwise X = {il HI, = N} would be 1:g,
contradicting the n¥-completeness of X. For 1:~ and ng, see chapter 1.)
PROPOsmON 6.4.2A (Barzdin and Podnieks 1973, cited in Case and Smith,
1983) [,-r••, text, BEXTJ ...t = [9"'.c, text, INT].....
the computation of 8 on a ESEQ as follows. Let i be an index for ljJ, and let
A = {'Pi.lhia,(r)lr £ a}. Let B = {nEAlfor every m and for every j < lh(a),
if mE w".lh,a, and "1(m) = "1(a), then ",(m) = ",(a)}. B consists of ljJ's
conjectures on a which are not contradicted by data from a within running
time bounded by Ih(a). Now 8's conjecture on a is an index for the recursive
function given by the following computation. For each XE N simultaneously
compute 'P.(x) for nEB, and give as output the result of the earliest
terminating computation (in case of ties give the smallest result and in case
all these computations diverge, diverge). For every t E Y, ifrng(t) is BEXT-
identified by ljJ, then 8's behavior on t depends on only finitely many
indexes, so 0 l N'F-identifies t. D
Exercise
Exercise
6.4.3A Prove the followingdirect consequencesofdefinitions and previous results:
a. [5P'c, text, BEXn c [.:¥'"c,text, BFEXT].
b. [.:¥'"c,text, FINT] c [.:¥r.c, text, BFEXT].
c. [.:¥TeC, text, BFEXT] <;::; [.:¥r.c,text, FEXT].
'*
d. [':¥"c, text, BFEXT] [.:¥'cc, text, EXT].
e. [.:¥,text,BFEXT] = [.:¥,text,FINT].
6.5.1 FD-Identification in RE
The principal results for FD-identification are as follows.
For a proof of the following proposition, the reader may consult Osherson
and Weinstein (1982, proposition 5).
Proof The proof of proposition 4.1A establishes this result as well. Note
that the proof of the claim there actually demonstrates that no tp E:!F FD-
identifies both .2"Q and 2'Q' on text, for Q '# Q'. 0
Exercise
COROLLARY 6.5.2B
Exercises
6.5.2A (Caseand Smith 1983) For nEN. theconvergence criterion {(I., {il(W, - L) U
(L - W,) has no more than n elements} ILE RE} is denoted: FD(n). Prove:
a, Let n < m. Then [.:Free, text. FD(n)]svl c; [.:Free, text, FD(m)]SV1'
b. UIlEN [ff rec, text, FD(n)JsVl c; [ff ree• text. FDJsVl'
6.5.28 Derive proposition 6.5.1D from proposition 6.5.2A. (Hint: Use an internal
simulation argument.)
Thus <P E:7' BFD-identifies LE RE on text just in case <p produces on every
text for L an infinite. unbroken sequence of indexes. all of them for finite
variants of L and all of them below some fixed bound.
The principal results for BFD-identification are as follows.
PROPOSITION 6.5.3A
The following fact follows directly from definitions 6.4.3A and 6.5.3A.
PROPOSITION 6.5.3C [ffrec, text, BFEXT] s [,?"rec, text, BFD].
Whether the inclusion in proposition 6.5.3C' is proper is presently
•
unknown.
Proof It only needs to be shown that [SO''', text, BFD] sv t ~ [SO"', text,
FINT]svt. So suppose IjJ E g;rcc BFD-identifies Sf ~ REsv! on text. We
construct a OE§"TeC which FINT-identifies !.f on text. The construction
is similar to that used in the proof of proposition 6.4.2A. We describe the
computation of 0 on uESEQ as follows. Let i be an index for 1jJ, and let
A = ('I',.lh,.)(t)lt ~u). For each jEA, let r(j,u) = card((n < Ih(u)lj =
'I".lh,.)(O'a)}), and let d(j, u) = card( (m E H:J.lh,.)lthere is an n < Ih(o-) such
that ",(m) = ",(ua) and "2(m) # "2(Ua)}). r(j,u) is the number of times
j is conjectured by '" on o and d(j, e) is the number of disagreements
registered by j with data from a in running time bounded by lh(u). Let
B = (jEAld(j,u) < r(j,u)}.
Now O's conjecture on (J is an index for the recursive function given by
the following computation. For each x E N, simultaneously compute 'l'ix),
j E B, and give as output the result of the earliest terminating computation
Criteria of Learning 143
(in case of ties give the smallest result, and in case all these computations
diverge, diverge). We leave it to the reader to verify that ift is a text for some
LEft', then e FINT-converges on t to L. D
Exercises
PROPOSITION 6.6B Let IE :F re• be such that I(x) ~ x for all x EN. Then,
[ffre. n :FsimpLeminded, text, INT] C [ffle., text, SIM(f)].
Proof It suffices to show the inclusion is strict, the inclusion itself being
a consequence of the definitions. By proposition 4.3.6A every If' E
[ff,e. n :Fsimpleminded, text, INT] is finite. Hence it suffices to show that
there is an infinite collection If' E [ffr.c, text, SIM(f)]' for every total
IE ffTCC. For each such I, we construct a l/J Effre. which SIM(f)-identifies
RE f in on text. l/J(a) is the index i ~ lh(a) with the following properties (if
there is no such i, let ljJ(a) be 0): (1) Jt;.lh(q) = rng(a), and (2) if j ~ lh(a) and
Wj,lh(q) = rng(a), then 10) ~ IU), and if 10) = IU), then i <j (i.e., i is the
fsmallest index less than lh(a) for rng(a) in running time bounded by lh(a).)
It is left to the reader to verify that l/J SIM(f)-identifies RE nn on text. 0
Exercises
6.6A (Chen 1982) Prove the following strengthening of proposition 6.6A:
U,o,aL!EjI"" [ff'OC, text, SIM(f)J sv, c [st",eo, text, INTJ sv,.
6.6B(Chen 1982) Lettotal I E ff'oc be given. The convergence criterion {(L, {i})1 H'.
is a finite variant of Land i is I-simple} is denoted: FINTSIM(f). Prove:
U,ol.l!ejI'''' [ff,e., text, FINTSIM (f) JSY, = [ff re", text, FINTJsY"
6.7 Summary
Figures 6.7A and 6.7B summarize the major results of sections 6.2 through
6.6. To interpret SIM(f), let total IE ,gp ec be given.
Exercise
6.7A For each convergence criterion '(j appearing in figure 6.7A add [ff, text, ~
to the figure.
Criteria of Learning 145
c- -?
n [y'oc, text,BFEXT]
n
[ji"'o, text, FD]
n
[STree, recursive text, FD]
&'(RE)
Figure 6.7.A
6.8.1 Cl-Convergence
[§"ree, text,SIM(f)]••,
n
[9Ore<, text,INTJ sv,
n
[.F Te. , text,FINT] svt
[ji'Te., text,BFD].YI
n
[Fre., text, EXTJevt
n
[.Free, text, FDJ svt
[F, text,INTJ.Y,
&'(RE..,)
Figure 6.7.B
DEFINITION 6.8.1A
Thus cP E:F Cl-identifies LE RE on text just in case for all texts t for L, cp
Criteria of Learning 147
Exercises
6.8.IA
a. Prove [S',text.CI] ~ [S',text,INT] n&,(RE",j.
b. Prove [S',informant, CI] ~ [S',informant, INT] n &'(RE"ol ~ &,(RE",).
6.8.1B
a. Prove [ffre<:, text, CI] ~ [ffre<:. text, INT] n&(RE re<:).
b. Prove [g;re<:, informant, CI] ~ [g;re<:. informant, INT] n&(RE rec ).
Proof Given e E:7"" we exhibit LE RE,. n RE,," such that'" does not CI-
identify L on informant. The construction of L should be compared to that
used in the proof of proposition 4.6.1 C.
In preparation for an application of the recursion theorem, we define a
total recursive function f such that for all i E N, HI'i) has the following
properties:
I. i is the least element of HI'i)'
2. "'f(i) is recursive.
3. '" fails to CI-identify HI,,) on informant.
148 Identification Generalized
Construction
Stage 0:
uO=«O,I), ... ,<i-I,I),<i,O».
WJ(i) = {t},
Stage s + 1: Let! be the least CT E SEQ such that CT :::; s and for some n,
a = «lh(u S ) , 1), ... ,<lh(u S ) + n,I», and <p"'(<1'1'<1).s(lh(u') + n + 1) = 1, if
there is such a o; let! = 0, otherwise.
Let us+! = US A r A <lh(CT S) + n + 1,0), if r "*
0; = US, otherwise.
Let Wit: = Wi(i) U {lh(CT') + lhtr) - I}.
By the recursion theorem, let i be such that J.ti = l1j(il. Then W'i E
RE. d n RErecs and l/J fails to CI-identify J.ti. 0
Using exercise 6.1.2E, we obtain the following corollary.
Exercises
6.8.2A What is the relation between U;i:nc, text, INTJrec and [yrec,
informant, CI]?
6.8.2B Prove: [yrec n ymemorylimited, text, CI] c: [yrec, text, crj. (Hint: See the
proof of proposition 4.4.1F.)
6.8.2C Prove: [ffree n ffeonservative, text, CI] c: [ff ree, text, Cl]. (Hint: See the proof
of proposition 4.5.1B.)
*6.8.2D Prove: [yree nyPopperian, text, CI] c: [yrec, text, crj, (Hint: Consider
{{<O,i)}U({l} x ~)Ij is a characteristic index for W;}.)
*6.8.2E (Gold 1967) Let REp, be the set of primitive recursive languages. Show
that REpre [ff rtC , informant, CI].
-,
6.8.2F Let the evidential relation imperfect informant be defined as in exercise
5.6.2C. Show that there is!i' £; RE such that <a) every LE!i' is infinite, (b)for every
L, L'e!l', if L#-L', then LnL'=0, and (c) Ii'e[yrec,informant,CI]-
[ff ree, imperfect informant, CIl
As Case and Lynes (1982) point out, the interpretation of this proposition is
rather remarkable. It implies the existence of a collection of recursive
languages for which decision procedures with n + 1 mistakes can be syn-
thesized in the limit (on informant) but for which positive tests with only n
mistakes cannot be so synthesized. The proof is omitted.
COROLLARY 6.8.3A [S","o,informant, FINTCI] 'Of, U"N [S"'.o, informant,
FINT(n)].
Naturally [~rec, informant, FINTCI] ~ [~rcc, informant, FINTJ, and
similarly for FINT(n)CI and FINT(n).
The next definition provides the CI counterparts of EXT (definition 6.3A)
and FD(n) (exercise 6.5.2A).
DEFINITION 6.8.3C (Case and Lynes 1982)
i. The convergence criterion {(L,SJlevery XESL is a characteristic index
for L) is called extensional characteristic index, abbreviated to: EXTCI.
ii. For nEN, the convergence criterion {(L,SL)levery XESL is an n-finite
difference characteristic index for L) is called n-finite difference character-
istic index, abbreviated to: FD(n)CI.
It may be observed that the proof of proposition 6.8.3B yields all the results
of section 6.8.2 as corollaries.
'*
PROPOSITION 6.8.3D (Case and Lynes 1982, theorem 5) For every nEN,
[S'""', informant, FD(n + l)CI] [S'""', informant, FD(n)].
7 Exact Learning
The converse ofthe dictum that natural languages are learnable by children
(via casual exposure, etc.) is that nonnatural languages are not learnable.
Put differently, the natural languages are generally taken to be the largest
collection of child learnable languages. We are thus led to consider learning
paradigms in which learning functions are required to respond successfully
to all languages in a given collection and to respond unsuccessfully to all
other languages. For this purpose the following definition is central.
DEFINITION 7.1A Let learning strategy .51', evidential relation C, and con-
vergence criterion ce be given.
Example 7.1A
Suppose that for some evidential relation If and convergence criterion C(j
children C(j-identify the class of natural languages on @. Then we may
require of a theory of human linguistic competence that the collection z' of
languages it embraces be a member of [.?", C, 'C). Should we also require
that !e E [.?", @, C(jJ'? Although the first paragraph of the present section
suggests an affirmative response, the matter is clouded by the following
consideration.
Natural languages are not only learnable, they are also highly expressive
in the sense that very many thoughts can be communicated within anyone
of them. Let us therefore stipulate that a language be counted as natural just
in case it is both learnable and highly expressive. Now consider the im-
poverished language consisting of the single expression "Go" with its usual
meaning. The Go-language is not highly expressive. On the other hand, the
Go-language may well be learnable by children through casual exposure. If
so, then not every learnable language is natural, and hence the natural
languages are a proper subset of the class of learnable languages. This
entails that a theory of natural language can be legitimately evaluated
against the standard of identifiability but not against the standard of exact
identifiability.
It may be possible to disarm the foregoing objection to exact learning as
follows. There is evidence that children exposed to inexpressive languages
(e.g.,pidgins) as well as children denied access to any ambient language (e.g.,
154 Identification Generalized
Exercises
7.1A Let L, ~ «i,n)lnE Wi}, and let !E ~ {L,I Wi¢RE".}. Show that !EE
[!Free, text,INn u . This collection figures in the proof of proposition 4.3.2A.
7.1B Prove: Let 5fJ !;; RE, learning strategy g, evidential relation S and conver-
gence criterion C(f be given, and suppose that :l! E [sP, 8, ~. Then there is .!l" S; RE
such that!E <;; !E' and !E' E [Y',<3', 'C]".
7.le Let learning strategy Y, evidential relation 8, and convergence criterion f(l
be given.
a. Show that 2"'E[9',8,f(l:Y~ if and only if there is a CfJE9' such that 2 = 2".<t(cp)
(ef. exercise 6.1.2A(ii) for !E...(q>)).
b. Prove: rY', 8, ~ .. <;; rY', 8, 'C].
c. Prove: Card([9', 8, CC]U) :::;; card(9'). Conclude that [§"rec, 8, rtJel is at most
countably infinite.
Exact Learning 155
DEFINITION 7.2A
The following facts will be appealed to in the proofs of the next two
propositions.
PROPOSITION 7.2A !I! E[yr.", text, INT]CZ if and only if !I! E [SFr.", text,
INT] and se is n t indexable.
assume without loss of generality that tf; is total and order independent.
We proceed to define <pea) for every a E SEQ. We first treat the case in
which rng(o] = 0. Let e be an index for 0 and eo be an index for {OJ. If
o E:l', then <p(0") = e, and if 0 ¢:l', then <p(0") = eo. For any O"E SEQ, if
rngte) =I- 0, then
We claim that cp identifies :l' on text exactly. First, <p identifies :l' on text,
for let t E$ be a text for some LE fE. Then l/J converges on t to some a E Y.
But then f(a) E rclre<' and hence for large enough m, for every n > m, either
Z'n.'I'(ln) is not a £;; -chain or card(Ztn,,.,(,J = card(Zlm.'I'llml)' Consequently, <p
converges on t to a.
On the other hand, suppose L¢:l' and, for reductio, suppose that <p
identifies L on text. Then tf; must identify L on text. Since tf; is order
independent, there is an i such that for every text t for L, t/J converges on
t to i. Since J!V; = L¢.:£, l1j-(il is not a tree. Hence there is an infinite
set X s: l1j-(i) such that X is a £;; -chain. Let t be a text for L such that
U {Xd n E X} = X. Then by the definition of cp, <p(tn ) = e for infinitely
many n E N. Hence <p fails to identify t, and therefore <p fails to identify Lon
text.
Turning to the converse, suppose that .:£ E [ff rec, text, INT]eI. It follows
immediately that .:£ E [ff rec, text, INT]. Using the definition of [ff rec, text,
INT]ex, a straightforward, if tedious, Tarski-Kuratowski computation
verifies that .:£ is nl indexable, The reader unfamiliar with such computa-
tions should consult Rogers (1967, chs. 14, 16) or Shoenfield (1967, chs. 6,
7).0
PROPOSITION 7.2B There is an !l' E [9'"ree, text, INT]O' such that!? is not
1: ~ indexable.
Proof Let L= RE. d n REt, ee- It is clear that!? E [9'"reo, text, INT] (cf. the
proof of proposition 2.3A). A simple Tarski-Kuratowski computation suf-
fices to verify that If is n~ indexable. It then follows from proposition 7.2A
that!? E [ji>r0c, text, INTJ'.
That If is not 1:~ indexable follows from the bounded ness theorem for 1:~
sets and the fact that every r.e. set is a finite variant of some member of RE' d
(cf.lemma 2.3B). The reader may provide more detail for this argument by
consulting Shoenfield (1967, ch. 7). D
Exercises
7.2A Let "II'" £ &'(RE) be a class of collections of languages, and let !f' £ RE be
given. !f' is called saturated with respect to "II'" just in case 2' E "11'", and for every
proper superset!f" of 2',2" ¢ "11'". 2' is called maximal with respect to "II'" just in case
!f' E "II'" and for some LE RE, 2' U {L}It "II'" (cf. exercises 2.2E and 4.6.2C).
a. Prove: 2' £ RE is saturated with respect to [9"'.c, text, INT)"' if and only if
!f' = REn n.
b. Prove: If 2' £ RE is maximal with respect to [SO'·c,text, INT)"" then !f' is
maximal with respect to [.:F,.e, text, INT).
c. Show that the converse to b is false.
7.28 Prove that if 2' E [9""., text, INT]"' and !f" E [3'"""., text, tNT]"', then
!f' x !f"E [3'""' ec , text,INT]·'.
Proof The proposition follows from proposition 4.6.3A and the fact that
the <p constructed in the proof of proposition 7.2A is order independent. D
Exercises
*7.3.1A Prove: [ff r•• , text, INT]°' $. [.'Fro. n .'Fs.ldri•• n, text, INT]. (Hint: See the
proof of proposition 4.4.2A.)
*7.3.1B Show that if texts with #'s are excluded from !7 then proposition
7.3.1A fails (Hint: Construct a collection :f of singleton languages with
:f E [.'Fr0e, text, INTJ" - [ffre. n .'F,olal, text, INT]". Discover where the proof of
proposition 7.2A fails for singleton languages in the absence of texts with # 's.)
7.3.le Let:f s; RE be given. Show that :f E [ffro. n ffPrudon" text,INT]" if and
only if:f is r.e, indexable and !l' E [.'Fr0e, text, INT].
7.3.10
a. Prove: Let :fe [.'F roe n .'Fsoldriv.n n ffeonsorva'lve, text, INTJ". Then !l' is r.e,
indexable.
b. Prove: There is :f S; RE such that :f is not r.e. indexable and :f E
[§l"or:c n ffconscnrativc, text, INT]e:I.
c. Conclude from a and b that [~roe n ~S.' driven n ~eonservalive, text, INT]"' c
[,'Free n ,'Feonserva'l.', lext, INT] e,.
Exact Learning 159
PROPOSITION 7.3olA
i. [:!Free, noisy text, INTy' c [:JFrcc, text, INT]Cl.
ii. [ffrec, incomplete text, INT]Cl c [ff rec, text, INT]Cl.
iii. [§.<o, imperfect text,INTJ"" c [§rce, text, INT]c,.
Proof For each of the environmental relations If mentioned in i, ii, and iii
we have if .PE [ffrec,tt, INT)"' then .PE [ffrcc,text, [NT] and.P is n~
indexable (see the remark following the proof of proposition 7.2A). Hence
eae? of the inclusions follows by proposition 7.2A.
The strictness of the inclusions in i, ii, and iii may be inferred from
proposition 7.2A and the n1 indexability of the collections of languages
constructed in propositions 5.4.1A and 5.4.2A. D
PROPOSITION 7.3.2B [jOrcc, recursive text, INT]o, c [9"00, text, INT] Cl.
The following facts about arithmetical sets will be needed in the proof of
proposition 7.3.2B:
Exercises
PROPOSITION 7.3.3A
i. [9"rec, text, INTJ~X c; [Si'rec, text, FINTrll..
ii. [!!Free, text, INT]CX c; [g;rec, text, EXT]cx.
iii. [:Free, text, FINT]CX '*
[!!Free, text, EXT]cx.
iv. [!!Free, text, EXTJe:l '*
[:Free, text, FINT]cx.
It may be that natural languages have such special properties that no text
for a nonnatural language leads children to a correct and stable grammar.
The next definition allows us to formulate one version of this hypothesis.
DEFINITION 7.4A
i. <{! E ff is said to identify !i' ~ RE very exactly just in case for all t E:T, <{!
identifies t if and only if t is for some LE!i'.
ii. For.9' ~ ff, the class {!i' ~ REI some <{! E.9' identifies!i' very exactly} is
denoted: [.9', text, INT]"'.
Exact Learning 161
LEMMA 7.4A Let ({JE9' identify !f' ,; RE very exactly. Then, for all LE RE,
LE!f' if and only if there is a locking sequence for <p and L.
Proof Suppose.!f' E [9" ec, text, INTJvox. Then, by the preceding lemma,
there is <p E9'ro. such that Wi E!f' if and only if there is a locking sequence
for Jt; and <po But then {iI W; E.!f'} is arithmetical. This may be verified by
examination of the definition of "locking sequence" (definition 2.1A) and a
simple Tarski-Kuratowski computation. D
PROPOSITION 7.4B If!l' E [9" ec , text, INT], then there is :£' s RE such
that if' s !i" and if" E [g;ro., text, INTJvox.
Exercises
7.4A
a. For n EN, let S. = to, 1,... , n). Prove: {N - S.lnEN} E [3""', text,INTJ"'.
b. Prove: {K UDID finite} E [3"''', text,EXTJ"".
7.48 Prove: [ji'rec n§prudcnt, text. INT]u = [§"'c n§prudent, text, INTJ"u. De-
duce proposition 7.4B as a corollary.
Exercise
7.5A qJe§' is said to partially identify te.r just in case (a) there is exactly one i e N
such that for infinitely many j eN, qJ(Tj ) = i, and (b) It; = rng(t). qJ E §' is said to
partially identify !/ ~ RE just in case qJ partially identifies every text for every
Le!/. [§', partially identify] is defined to be {~ ~ RElsome qJe §' partially iden-
tifies .?}. Note that partial identification is not a generalized identification para-
digm. Prove that REe[§',partially identify].
III OTHER PARADIGMS OF LEARNING
This part discusses several models of learning that lie outside the frame-
work of generalized identification paradigms. Chapter 8 presents a criterion
of successful learning that cannot be construed as a convergence criterion in
the sense of section 6.1; chapter 9 discusses an environmental issue that is
not easily viewed through the lens or evidential relations (section 5.3); and
chapter 10 reformulates identification in the language of topology.
Throughout this part "identification" is to be understood as INT-
identification on text, the learning paradigm defined in section 1.4. Thus, to
say that q> E.9' identifies.Ii' ~ RE is to say-in the expanded vocabulary of
part II-that q> INT-identifies.Ii' on text. Similarly the expression "conver-
gence" is to be understood as in definition 1.4.IA(ii).
8 Efficient Learning
Useful learning must not take too much time. This vague admonition can
be resolved into two demands: first, the learner must not examine too many
inputs before settling for good on a correct hypothesis and, second, the
learner must not spend too long examining each input. Recursive learning
functions satisfying the second demand were discussed in section 4.2.2.
Learning functions satisfying the first demand are introduced in section 8.1
and studied in section 8.2. The effect of imposing both demands on learning
functions is taken up in section 8.3.
8.1 Text-Efficiency
Let <p Eff canverge on t E g- to j EN. Then, for some n ,E N, <p (tm ) = i for all
m ;:::: n. The least such n is called the convergence point of <p on t. The
following definition provides a notation for this concept.
DEFINITION 8.1A
i. ff x :T is the set of all pairs consisting of a learning function and a text.
ii. For all <p Eff, t E:T, the partial function CONY: ff x :T -+ N is defined
as follows.
Example S.lA
a. Let g E § be defined as in the proof of proposition 1.4.3B. Lett be 3, 0, 4, 1,5,6, 7,
8,9, ... , PI, n + 1, .... Then CONV(g, t) = 4.
b. Let f be as defined in example 1.3.4.B,part a. Let t be 2, 3, 3, 3, 4, 2, 2, 2, 2, ... ,2, ....
Then CONV(j, r) = 5. Let s be any text for N. Then CONV(f,s)l even though f is
r1l'ofi n f'ti rm x .
C.Let cp E § converge on t E.'!T. Then t CONV('I'.,) is the finite sequence starting from
which cp begins to converge on r. Informally, lcoNv('I',') is the last sequence in t on
which q> changes its mind.
168 Other Paradigms of Learning
The notion of convergence point suggests the following criterion for the
efficient use of text. Let <p E fF and 2' ~ RE be given. <p is said to identify 2'
Jastjust in case (I)<p identifies 2', and (2) for aliI/! EfF, if I/! identifies 2' then
CONV(<p,t):S; CONV(I/!,t) for all tEfTy . In other words, <p identifies 2'
fast just in case <p identifies 2', and no other learning function that also
identifies .P converges on any text for any language in !£ sooner than cp
converges on that text. Despite its natural character, however, fast identifi-
cation is a concept of limited interest, for, there are simple, identifiable
2' ~ RE such that no <pEfF identifies 2' fast (see exercise 8.1B). The next
definition avoids this problem by weakening the requirements for efficient
use of text.
DEFINITION 8.IC (Gold 1967) Let tp, I/!EfF and:t' ~ RE be given.
i.1/! is said to identify 2' strictly Jaster than <p just in case
a. both <p and I/! identify 2',
b. CONV(I/!,t):s; CONV(<p,t) for all tEfTy ,
c. CONV(I/!, s) < CONV(<p,s) for some s E fTy .
ii. <p is said to identify :t' text efficiently just in case
a. <p identifies !/!,
b. no OEfF identifies 2' strictly faster than 'P.
Example 8.IB
Let f be as defined in example 1.3.4B, part a. Then f identifies REno text efficiently.
To seethis, supposethat OE:F identifies REn• and that CONV(O, t) < CONV(f, t)
for some t E SREfin' Then 0 must begin to converge on t prior to seeing all of
Efficient Learning 169
Thus example 8.1B shows that RE rin E [fFrecJ-e.. We shall also make use in
this chapter of the unadorned bracket notation from section 4.1. Thus [Y]
denotes the class of all collections of languages that are identifiable (not
necessarily text efficiently) by a learning function in Y.
Exercises
8.1A Let cp e fpee iden tify {K U {x} Ix e K} (cf. exercise 4.2.1A, part a). Show that
CONV(cp, tH for some text t for K.
a. Show that <If is a partial order on:F (be sure not to overlook functions failing to
identify 2').
b. Show that f E:F identifies 2 text efficiently if and only if f is a minimal element
with respect to <If.
c. Show that f E:F identifies 2 fast if and only if f is a least element with respect to
<If·
i. L E !tift for aU n.
ii. there is an m such that for aU n ~ m, Le Itr 1ra
ft •
iii. for every L' #- L such that L' e Y, there is an m such that for all n ;::>: m,
L' ;. 9f.:'lra.
Proof The proof of i is obvious. For ii, recall that by proposition 2.4A,for
every Le Y there is a finite set DL S; Lsuch that if DL s;: L' and L' E.P, then
L' ¢ L. Thus if m is such that rng(tm ) :2 DL , Le ..<li::in and thus LE in
2f:
for
all n;::>: m, For iii, suppose that L' E 2f:
in
for arbitrarily large n. Then
L' :2 rng(tn ) for arbitrarily large n. Thus L' :2 L. Since Le 21 for all nand
ft
O' if 2. = 0,
f(u) = f(u-), iflh(u) > 1 and J.!j(._)E2"mlo,
{
least j such that Jti E !Ramin, otherwise.
To see that f identifies 2, let t be a text for LE 2. By lemma 8.2.1A(ii),for
all sufficiently large n, t, ~ to' By lemma 8.2.1A(ii) and (iii)and the choice of
the least i in the third clause in the definition of fJ(to ) is the least index for L
for all sufficiently large n.
To show that f identifies 2 text-efficiently, we use exercise 8.l.C. Sup-
pose then that e identifies 2 and that CONV("" t) < CONV(j, t) for some
text tfor some LE 2. Let n ~ CONV(j, t) - 1. Then "'(to) is an index for L,
but i(to ) is an index for some L' E2, L' # L. Ifa = to, let s be any text for L'
which begins with a.
Now s is a text for L' and CONV(j, s) = n. But CONV("" s) > n, since
"'(so) = "'(to) is an index for L # L'. By exercise 8.l.C, f identifies 2 text-
efficiently. 0
Proof Suppose that !/J identifies 2(cp) and that CONV(!/J, t) <
CONV(cp, t) for some text t E f/~(",). Let n = CONV(cp, t) - 1. Then if t
is a text for L, ljJ(in ) is an index for L but cp(tn ) is not (since cp{tn) "# cp(tn +1 )'
cp(tn + 1) is an index for L, and cp is conservative).
Let L' = W"'On)' and let s be any text for L' beginning with tn' Such a text s
exists since cp is consistent. Then since cp is prudent, cp identifies rng(s) and,
since <p is conservative, cp(s.) = cp(sm) for all m ~ n. Thus CONV(cp,s) :5'; n,
but CONV(!/J,s) > n since ljJ(s.) = !/J(t.) is an index for L. 0
Exercises
8.2.2D For each ieN, let Li = {O,i,i+ 1,... }. Prove that {LilieNH
[ffconscnoatiYc]1.C. U [.:f'"decisiveJl.e. (= [.¥con5crvati'll-t U ~dc.ciSjVe]t.e.).
{i + I}, ieK,
(0) W.(i+l)~ { {O,i+ I}, ieK.
For otherwise, let i o be such that (0) doesn't hold. We define a function
l/!e9'" such that for all texts te:T:e, CONV(l/!,t) s: CONV('P,t) and such
that for some LE Ii' and r o for L, CONV(l/!, '0) < CONV('P, '0)' Define l/! as
follows:
On the other hand, no recursive function satisfies (*) since such a function
would exhibit K as recursively enumerable. D
ProoJ For the collection that witnesses this we simply use the character-
istic functions of the languages L in the proof of the proposition. The proof
is then entirely parallel to that of the proposition. 0
Proof Let Ii' ~ {K} U {{i}lieK}. We claim that Ii'e[9'""']"" but that
.P ~ [g-rec n§"order independenlr·e.. To see the former, first define a recursive
function J by
a, ify ~ x,
'Pf(x,(Y) ~
{ 0, if 'Pxlx)L and 'P,(y)!,
t, if 'Px(x)j or 'P,(y)j.
If x e K, Itf(x, = {x}; if x eK, Itf(x, ~ K. Now define ge9'"'" by g(u) ~
174 Other Paradigms of Learning
Exercises
*8.2.3C Recall the convergence criterion EXT from definition 6.3A. Define the
partial function CONVext ::F x ff -+ N as follows. For all <p E!7, t E.:r,
a. if <p is not defined on r, then CONV... ("" t)f.
b. if <p is definedon t, then CONV"lI(<p, r) = the least n E N such that for all m;;::: n,
Wip(r..,) = Wrp(i..i-
Proof Part i follows from propositions 4.5.3A and 8.2.1A which together
say that [ff·rtUm.uIO'] C [ff] = [ff]I.··.
As for ii, the proof of the inclusion is due to Gold. Suppose that
2 E [fff'C nff·rtumeraIO']. We claim that the enumerator cp that identifies !l'
is itself text efficient. For such a cp is consistent, prudent, and conservative
(at least on texts for languages in 2). Thus cp is text efficient by proposition
8.2.2A. That the inclusion is proper is witnessed by the collection
2E {LnlnEN}, where L; = {n,n + 1,... }. This collection is identifiable by
a conservative, consistent, prudent, recursive function but not by an
enumerator (as established in the proof of proposition 4.5.3A). 0
Exercise
The next proposition shows that the requirements of simplicity and text-
efficiency are more stringent taken together than taken separately.
PROPOSITION 8.2.5A For every total fEg"'·· such that f(x) ~ x, [ff'OC,
text, SIM (f)]"" c [ff rec, text, SIM(f)] n [ff'··]I.··.
Proof The inclusion is obvious. The collection that witnesses that the
inclusion is proper is!l' = {{ i} liE N}. It is easy to see that !l! E [g".ecJ.•..To
176 Other Paradigms of Learning
However, the set {"'(n)lne N} is an infinite r.e. set, and so by lemma 4.3.6A,
there is an n such that m("'(n)) > f(M(W",.,)). This contradicts the fact that
'" identifies 9' according to the SIM(f) convergence criterion. 0
Exercise
8.1.5A Let strategy 9', evidentialrelation I, and convergence criterion <& be given.
Frame an appropriate definition of [9',1, 'j'j•.•., that is, of the class of collections of
languagesthat can he text-efficiently identifiedwithin the learning paradigm defined
by 9', I, 'fl.
Claim Let <p, be total, and let a be a locking sequence for L, and tp. Then
<i, x) EL, if and only if <pia A<i, x» = <pia).
How much input from the environment is required for learning? In this
chapter we examine the problem.
Let cP Eff, LE RE, and (J E SEQ be given. A reasonable construal ofthe idea
that (J is sufficient input for cP to learn L is this: (J is drawn from L, cp
conjectures an index i for Lon (J, and no further input from Lean cause qJ to
abandon i. In turn examination of definition 2.IA reveals that in the
foregoing circumstances (J is a locking sequence for cp and L. We shall
therefore identify sufficient inputs with locking sequences.
DEFINITION 9.IA Let cpEff be given. The set {(JE SEQlfor some LE RE, (J
is a locking sequence for sp and L} is denoted: LS'P'
Thus (JE LS'Pjust in case (J is a locking sequence for cp and Wtp(all hence just
in case (J is a sufficient input for cp to learn Wtp(al in the sense just discussed.
By proposition 2.IA, if cp Eff identifies LE RE, then LStp contains some (J
such that L = WIp(aj' In the present context proposition 2.IA is equivalent to
the claim that if cp E:IF learns LE RE, then there is some sufficient input for qJ
to learn L. On the other hand, exercise 2.1B shows that there can be a
locking sequence for cp Eff and LE RE even if cp does not identify L. In the
present context this fact may be reformulated as follows: the existence of
some input sufficient for cp E ff to learn L ERE does not guarantee that cp
will also learn L in the absence of this input.
Example 9.1A
Let JE:7 be as described in example 1.3.48, part a. Then example 2.1A, part a,
shows that LSI = SEQ. .
Exercises
9.1A Let ge:F be as described in the proof of proposition 1.4.38. Show that
LS g = SEQ.
9.1B Prove: Let lp E :Fconsi".nt n :7 0o n servaa;ve be given. Then LS" = SEQ.
av'
9.1C qJ E:F is called avid just in case LS" = SEQ. Prove [:F d ] = [:F], where for
[/' :F, [9"] is to be interpreted as in definition 4.1B (Hint: Use the construction in
~
the proof of proposition 4.S.IA, and rely on exercise 9.1B.)
Sufficient Input for Learning 179
Let 'fJ E:F be given. A successful "psychological" theory of'fJ should charac-
terize the environmental inputs sufficient for 'fJ to learn; that is, such a
theory should characterize LS•. One way for such a characterization to be
perspicuous would be to provide a means of effectively enumerating LS•.
Are perspicuous psychological theories in this sense always possible?
Recall from section 1.3.4 that each 0' E SEQ is assumed to be associated
with a unique natural number via some fixed, computable isomorphism
between SEQ and N. Accordingly we say that a subset L of SEQ is r.e. just
in case the set of code numbers associated with I: is r.e.
DEFINITION 9.2A 'fJ E:F is called predictable just in case LS. is r.e.
It is easy to verify that SFpredictable C SF; that is, there are learning func-
tions whose associated set oflocking sequences is not r.e, (see exercise 9.2A).
For such learning functions, no perspicuous theory of sufficient input is
possible in the sense discussed earlier. On the other hand, SFpredictable is
sufficient for all inferential purposes, as revealed by the following
proposition.
PROPOSITION 9.2A [jFPrediClabIC] = [jF].
Proof Since SEQ is r.e., the proposition follows easily from exercise
9.IC.D
In contrast to proposition 9.2A, the following result shows that there are
collections 2 of languages such that (I) some recursive learning function
identifies 2, but (2) no recursive learning function whose sufficient inputs
are r.e. identifies !e.
witnesses this. Then LSq> is r.e. since qJ is predictable. But now we claim
that IE R if and only if there is a UE LS. such that {(O, i)} s;; rng(u) S;; L i•N
and W.,.)
=> rng(u). If i E R, such a a must exist, since any locking sequence
must have this property. However, if i E K, no such (J can exist, since
otherwise rng(u) = L i • D for some finite set D. Yet 'fJ fails to identify this
180 Other Paradigms of Learning
language on the text for L i . D which begins with (J, since q> is locked into an
incorrect conjecture by (J. The claim shows that were such a q> to exist, K
would be recursively enumerable. 0
COROLLARY 9.2A g;ree n g;predictable C g;rec.
The corollary shows that there are recursive learning functions for which no
perspicuous theory of sufficient input is possible in the sense discussed
before.
Exercises
9.2A Prove that 9' p red ;e la ble c:: 9'.
9.2C Recall the definition of ff··id from exercise 9.1C. Prove: [ffrecnF o• id] c::
[ff reo].
9.2D Let ({Jeff be given. and let SI" = {UE SEQlfor all ,E SEQ, ifrng(r) s: W.,(a),
then rp(u /I r) = cp(u)}. Sl, is another conception of sufficient input; it does not
require a sufficientinput to be drawn from the learned language. Clearly LS., S Slop.
a. Specify cp E ff such that Si m #- LSm • (Thus the Sl; version of sufficient input is
strictly more liberal than the LSop version.)
b. Call ({) e 9' predictable' just in case Slop is r.e Prove the following variant of
proposition 9.2B: [ffnc nffpndiclable'] c:: [ffre<].
DEFINITION 9.3A
i. a E SEQ is called an EXT-locking sequence for <p E g; and LE RE just
in case rng(u) £; L. W<p(ul = L. and, for all r E SEQ, if rng(r) £; L, then
W<p(a "T) = L.
Sufficient Input for Learning 181
ii, Let <{JEff be given. The set {O"ESEQlfor some LERE, 0" is an EXT-
locking sequence for <{J and L} is denoted LS~".
iii. qJ E!J' is called EXT-predictable just in case LS~xt is r.e.
Example 9.3A
Let g E!Free be defined as in the proof of proposition 6.3.1B. Then LS:ll = SEQ.
Proof The collection oflanguages !l' used in the proof of proposition 9.2B
works here by exactly the same proof. 0
Exercise
9.3A Prove: [g-Tceng-EXT.prcdictablC,imperfcct text,EXT] c [,:Frce,imperfect text,
EXT].
* 10 Topological Perspective on Learning
Thus BO" consists of all texts that begin with (J, and B~ consists of all texts for
subsets of L that begin with (J.
The following lemma is left to the reader.
COROLLARY IO.lA Let LERE and tE:!TL be given. Then {t} is closed in
!YL ·
With each learning function qJ we now associate a function F", from ff to
17. Intuitively F",(t) is the infinite sequence of conjectures produced by qJ in
response to t (if qJ is defined on t).
i. t is said to be stabilized on ijust in case there is n E N such that for all m ;::: n,
t m = i.
Topological Perspective on Learning 183
Exercises
1O.1A Define the function d::Y x :Y -+ IR (IR is the set of real numbers) as follows.
Is."'.
For all s, t E:Y, d(s, t) = 2-°, Show that :!T is a complete metric space with
respect to d.
10.18
a. Show that .'!T has a countable basis.
b. Show that :!T is a regular space.
c. Use the Urysohn metrization theorem to provide an alternative proof that:!T is
metrizable.
10.lC Show that:!T is the product topology on N W , where N is endowed with the
discrete topology.
Proof Suppose that qJ E .9'l0101. We need to show that, given a basic open
set B" F<p-I(B.) = {tl F",(t) E B.} is an open set. But F",-l(B.) = {rlfor all n <
lhtr), qJ(t.) = "t.}. Thus Fq>-I(Bt } = Uye SEQ {Byl qJ(Y.) = r, for all n < Ih(r)}.
Thus F;'(B.) is a union of open sets and so is open. 0
184 Other Paradigms of Learning
Exercise
10.2A Exhibit a continuous function f on S- such that for every ({) E $/', f ~ F...
(H int: Let f be such that for all t E ff, f(t) is the result of removing to from t.)
Let LE RE and a E SEQ be such that rng(a) s;; L. Note that for any 'f E SEQ
such that rngfr) s;; L, B~ A. s;; B;. With this in mind it can be seen that
proposition 2.1A amounts to the following result.
PROPOSITION 1O.3A Let cp E jOlnt_' identify LE RE. Then there is some open
set B; of ffL> some i E N, and some t E fT such that (i) t is stabilized on i,
(ii) J¥;= L, and (iii) F<jl[B;] = {r},
The proof of the proposition hinges on the following lemma.
Proof For each n E L, fT"(L-{n}l is nowhere dense in ffL • This follows from
the fact that for each a with rng(cr) ~ L, B;; 2 B~A n which is disjoint from
fT"(L-{n}l' Hence fTg>(L) - ~ = UnEL5""g>(L-{njl which is a countable union
of nowhere dense sets. 0
union of closed sets ({t} is closed in ff so F'P- 1( {t}) is closed by the continuity
of F'P)' However, 5""L is comeager in a complete metric space by the lemma,
so at least one of these closed sets F;l( {t}) must contain a basic open set B;
by the Baire category theorem. This t and a satisfy i and iii;ii follows since cp
identifies L. 0
Indeed, the original proof due to Blum and Blum (1975) of proposition
2.1.A can be viewed as a special case of standard proofs of the Baire
category theorem (e.g., Levy, 1979, theorem VI.3.6).
Note that the proof of proposition 1O.3A does not show that a is a locking
sequence for <p and L since it is possible that t1h(O"l -# t1hCO"I+l' However, the
Topological Perspective on Learning 185
proof does show that for some n and for every r with rng(r) £; Land
lh(r) ~ n. (J 1\ r is a locking sequence for q> and L.
In the present context we may also provide an alternative proof of
corollary 2.1A, which amounts to the following proposition.
PROPOSITION IO.3B Let 'I' E SOlol., identify L ERE, and let IJ ESEQ be such
that rngte) £; L. Then there is some open set B;A, of :TL , some iEN,
and some t E:T such that (i) t is stabilized on i, (ii) W; = L, and (iii)
F.[B;A,] = it}.
Proof This follows from the Baire category theorem just as in proposition
10.3A with the substitution for:TL of B;. 0
DEF"INITION lO.4A Let q> e!!F identify Le RE. Let t be a text for L. t is called
a locking text for 'I' and L just in case there exists n E N such that t, is a
locking sequence for 'I' and L.
Locking texts were first discussed in exercise 2.1C. The following propo-
sition highlights the role of locking texts in determining the behavior of
learning functions.
PROPOSITION 10AA Let 'I' E SO'"'" identify LE RE, and let tfr E SO'"'" be such
that for all locking texts t for 'I' and L, F.(t) = F.(t). Then for all t E ff"'(L)'
F.(t) = F.(t).
In particular, '" identifies L in the preceding situation.
Proof By proposition 10.3B, the locking texts t for 'I' and L are dense in
fTL . Thus FIp and Flp are continuous functions that agree on a dense subset
of a complete metric space. Therefore they agree on all of ff"'(L)' 0
'.
Exercise
10.4A Prove: Let IjI, tpE§"IOlal be such that for all recursive texts t, F...(t) = F.,(t).
Then for all texts t, F",(t) = F.,(t).
186 Other Paradigms of Learning
LEMMA 1O.5.1A Let L, L' e RE be such that L "I- L'. Then M L (5""d = O.
LEMMA 1O.5.1B Let <p e!F and Le RE be given. Then M L ( {t e ffLI<p iden-
tifies t}) is defined.
Topological Perspective on Learning 187
Exercise
to.5.lA Recall the definition offat text (definition 5.5.4A). Prove: Let LERE be
given. Then ML({tE5Llt is fat}) = 1.
DEFINITION 1O.5.2A (Wexler and Culicover 1980, ch. 3) Let q> E:F,
LE RE, and 2 ~ RE be given.
Intuitively, q> E:F measure one identifies L ERE just in case the probability
that <p identifies an arbitrary text for L is unity.
Measure one identification of a language differs from ordinary identifi-
cation only by a set of measure zero. The next proposition reveals the
significance of this small difference; it generalizes results due to Horning
(1969).
PROPOSITION 1O.5.2A RE is measure one identifiable.
Proof We define f E:F such that for all LE RE, f measure one identifies L.
Let h be a function such that J¥n(O)' J¥n(l)" .• , is a listing of all the r.e. sets, and
let Mo, M I , ... , be an enumeration of their associated measures. If n e N,
(J e SEQ, and W is an r.e. set, we say that (J agrees with W through n just in
every j, n, mEN, Aj , " , ,,, = {tit is for "'1.(;) and t m does not agree with "'1. 0)
through n}. It is easy to see that MiAj,".",) is defined and that for every j,
n E N, lim",~oo Mj(Aj, ..,m) = O.
Define a function d by
d(n) = least m such that Mj(AJ,",m) < r" for all i:::;; n.
Notice that L"EN MJ(Ai.",d("» is finite for all i E N. Now let X, = {tit E A i ••• d(. )
for infinitely many n} = nkeN U.>kAi,.,d(.)' Then by the Borel-Cantelli
lemma, Mj(Xj ) = 0 for all i E N.
Now given a text t define I on t as follows. For given mEN, letj be the
least h(i) :::;; m such that t m agrees with l¥,,(i) through n, where n is the greatest
integer such that d(n) :::;; m if such exists, and 0 otherwise, Let I(T",) equal the
least index for aj. With this definition of I it is clear that I converges to the
least index for l¥,,(i) on all texts t for l¥,,(i) which are not in X; 0
DEFINITION to.5.3A
b. the predicate "Mw..,,(B,) = p" (where lEN, <1 ESEQ, and p is rational)
is decidable.
iii. 2 £; RE is said to be uniformly measurable just in case 2 is uniformly
measured by some collection of measures.
DEFINITION IO.5.3B Let 'P E:F and 2 £; RE be given, and let ..It =
{MLILERE} be a collection of measures on RE. 'P is said to measure one
identify 2 with respect to ..It just in case for all I.E 2, ML( {t E.'TLI 'P
identifies t}) = 1.
190 Other Paradigms of Learning
It may be that the human brain is able to generate arbitrarily long se-
quences of random events and to employ such sequences in its internal
calculations. The following definitions provide one formalization of this
idea.
DEFINITION IO.6.A
DEFINITION 1O.6B Let <p elF, Le RE, and ceff{o.l) be given. <p is said to
a-identify Ljust in case ).0". <pC <C1h(lTj, 0"» identifies L.
Restated without the A-notation, <p c-identifies Ljust in case for every t E $i.,
(1) <p«cj , T;>)! for all i E N, and (2) for some j EN such that ~ = L,
<pC (c j , i) = j for all but finitely many i E N. Intuitively, to c-identify L, qJ is
allowed to "flip a coin" once before each conjecture emitted; note that the
same coin is to serve for all texts in :YL •
To proceed, we let M* be the natural probability measure on :Y{O.I)'
Topological Perspective on Learning 191
The proof of proposition 1O.6B proceeds in two steps. In the first step we
introduce a new criterion of learning.
Proof Let yO, yl, ... , be an effective listing of all coin sequences, and let
m(yi) = 2- lb (1') (= M*(B;';)). Given yl such that lh(y') ::; lh(a), we say that r:p
appears to be converging at 0' with y' just in case for all y 2 yl, if
lh(y)::; Ih(a), then r:p(y,U1b(1» = qJ(y',Ulh(yl». Now let D~ = {jlqJ appears to
v!
be converging with at a}, and define i~ = least i such that Li~1 m(y.l) >
0.5. Such an i must exist, since for every coin sequence y of len~iHt = lh(a),
r:p appears to be converging with y at 0'. Let c~ = {i::; i"liED,,}. Now we
define IjJ Eff re• which OEX-identifies If as follows: W..,(,,) = {jlthere is an
i E C" such that qJ(yl, U1h(y') = n. W..,(,,) is just the collection of indexes of
languages that the coins beginning with yi, i E C", are appearing to converge
to.
Suppose that LE!e and that t is a text for L. Then there is a set of coins c
of measure >0.5 such that r:p e-converges on t. Thus the sets e l N have a limit
as n approaches infinity. (This requires compactness of the measure space
on coins.) Thus C = lim e has the following properties: if i E C and yt ;2 v'
'N
then r:p(y',1;h(Y) = qJ(yl,t;h(yl» and LieCm(yi) > 0.5. Further IjJ converges to
an index for (ilr:p(y, t1h(y» = i for some i E C}. Since the set of coins Con which
r:p c-identifies t has measure> 0.5, there must he a coin c and i E C such that
yl is in c and qJ c-identifies t. Thus IjJ OEX-identifies t. 0
LEMMA 1O.6B (Case and Smith 1983) If 9'E [:Free, text, OEX]SVl' then
se E [:Free, text, INT]'vt.
Proof Suppose that r:p OEX-identifies 2. Then for each 0', r:p(a) is an index
for some finite set F~ of indexes of r.e. sets. If i E F~, we say that i is consistent
with 1:1 at stage s if (x, z) E »l.sand (x, y) E mglo) implies y = z. Define Xso
that ~(~) = U {H-iliEF" and i is consistent with a at lh(a)}. To see that X
Topological Perspective on Learning 193
INT-identifies ff', suppose that LE 2. Let t be a text for L. Let n be such that
for all m ~ n, (J'(tm ) = (J'(tn ). Then cp(tn ) is an index for some finite set F of
indexes one of which is for L. Now ifj e F is not an index for L, eitherj is not
consistent with tor H-j ~ L. In the former case, for each such j there is an
m > n such that Xwill never use H-j after m. Thus on t, X stabilizes to an index
for U {»;lieF and»; consistent with t}. Since ieF, »; consistent with t
implies that W; is a subset of L, and since there is an i e F such that Hi; = L, X
stabilizes to an index for L. 0
Exercise
lO.6A Let finite D ~ N be given. cE::T9'(D) is called a D-coin. Let M .... be the
natural measure on ::TD , and modify definition 1O.6C accordingly. Prove the
corresponding versions of propositions 10.6A and 10.6B.
Bibliography
Angluin, D. 1980. Inductive inference of formal languages from positive data. Information and
Conrro/45: 117-135.
Angluin, D., and Smith, C. 1982. A survey of inductive inference: theory and methods,
Technical report 250. Department of Computer Science, Yale University, New Haven,
October.
Barzdin, L, and Podnieks, K. 1973. The theory of inductive inference. In Proceedings of the
M arhematicaf Foundations of Computer Science, pp. 9-15.
Blum, M. 1967. A machine independent theory of the complexity of the recursive functions.
Journal a/the Association/or Computing Machinery 14(2): 322-336.
Blum, M. 1967a. On the size of machines. Information and Control II (3): 257-265.
Blum, L., and Blum, M. 1975. Toward a mathematical theory of inductive inference. Infor
malion and Control 28 : 125-155.
Brown. R., and Hanlon. C. 1970. Derivational complexity and the order of acquisition of child
speech. In Cognition and rhe Development oj Language J. Hayes (ed.). New York: Wiley.
Case, J., and Lynes, C. 1982. Proceedings ICALP82, Aarhus, Denmark, July 1982, Lecrere
Notes in Computer Science. New York; Springer-Verlag.
Case. J., and Ngo-Manguelle, S. 1979. Refinements of inductive inference by Popperian
machines, Technical report. Department of Computer Science, SUNY, Buffalo.
Case, J., and Smith, C. 1983. Comparison of identification criteria for machine inductive
inference. Theoretical Computer Science 25: 193-220.
Chen, K.-J. 1982. TradeolTs in the inductive inference of nearly minimal size programs.
lrformauon and Control 52: 68-86.
Chomsky, N. 1957. Syntactic Structures. The Hague: Mouton & Co.
Chomsky, N. 1965. Aspects of the Theory of Syntax. Cambridge, Mass.: MIT Press.
Chomsky, N. 1975. Reflecuons on Language, New York: Random House.
Chomsky, N. 1980. Rules and Representations. New York: Columbia University Press.
Chomsky, N. 1980a. Initial states and steady states. In Language and Learning, M. Piattelli-
Palmarini (ed.). Cambridge, Mass.: Harvard University Press, pp. 107-130.
Feldman, H., Goldin-Meadow, S., and Gleitman, L. 1978. Beyond Herotodus: the creation of
language by linguistically deprived deaf children. In Action, Symbol and Gesture: The
Emeroence of Languaye, A. Lock (ed.). New York: Academic Press.
Freivald, R., and Wiehagen, R. 1979. Inductive inference with additional information.
Elektronische I nJormatiansverarbeitung und K ybernetik 15: 179-185.
Fodor, J. 1976. The Language of Thought. Cambridge, Mass.: Harvard University Press.
Gold, E. M. 1967. Language identification in the limit. Information and Control 10: 447-474.
Hopcroft, L, and Ullman, J. 1979. lmroducuon to Automata Theory, Languages, and Compu-
tation. N. Reading, Mass.: Addison-Wesley.
Horning, J. 1969. A Study of Grammatical Inference, Ph.D. dissertation. Computer Science
Department, Stanford University, Stanford.
Kripke, S. 1982. Wittgenslein: On Rules and Private Language-An Elementary Exposition.
Cambridge, Mass.: Harvard University Press.
Lenncbcrg, E. 1967. Biological Foundations of Language. New York: Wiley.
Levy, A. 1979. Basic Set Theory. New York: Springer-Verlag.
196 Bibliography
Rogers, H. 1967. Theory of Recursive Functions and Effective Computahility. New York:
McGraw-Hill.
Sankoff G., and Brown, P. 1976. The origins of syntax in discourse: a case study ofTok Pisin
relatives, Language 52: 631-666.
Shapiro, E. 1981. Inductive inference of theories from facts. Research report 192. Department
of Computer Science, Yale University. New Haven.
Shoenfield, J. 1967. Mathematical Logic. N. Reading, Mass.: Addison-Wesley.
Smith, C. H. 1981. The power of parallelism for automatic program synthesis. Proceedings of
the Twenty Second Symposium on the Foundations a/Computing. IEEE, pp. 283-295.
Solomonoff, R. J. 1964. A formal theory of inductive inference. Information and Control
7: 1-22, 224-254.
Wexler. K.• and Culicover, P. 1980. Formal Principles of Language Acquisition. Cambridge,
Mass.: MIT Press.
wiehagen, R. 1976. Limeserkennung rekursiver Funktionen durch spezielle Stratgien,
Elektronische Irformauonsverarbeitunq und Kybernerik 12: 93, 99.
Wiehagen, R.1977. Identification of formal languages. In Lecture Nores in ComputerScience
(New York: Springer-Verlag), 53: 571-579.
Wiehagen, R. 1978. Characterization problems in the theory of inductive inference. In PrO-
ceedings of the Fifth Colloquium on Automata, Languages, and Programming. New York:
Springer-Verlag, pp. 494-508.
Wiehagen, R., Freivald, R., and Kinber, E. 1984. On the power of probabilistic strategies in
inductive inference. Theoretical Computer Science 28: 111-133.
Wittgenstein, L 1953. Philosophical Investigations. New York: Macmillan.
List of Symbols
N Section 1.2.1, p. 8
§ Section 1.2.1, p. 8
<p(x)l, <p(xH Section 1.2.1, p. 8
<x,y) Section 1.2.1, p. 8
AxB Section 1.2.1, p. 8
Xl (X), X2(X), Section 1.2.1, p. 8
Section 1.2.1, p. 8
lfJi Section 1.2.1, p, 9
Wi Section 1.2.2, p_ 10
RE Section 1.2.2, p. 10
REn n Definition 1.2.2A, p. 10
RE rec Definition 1.2.2B, p. 10
REm Definition 1.2.2E, p. 11
rng(t) Definition 1.3.3A, p. 13
y Definition 1.3.3A, Definition 5.3A, pp. 13,98
rng(cr) Section 1.3.4, Definition 5.2A, pp. 15,97
Ih(a) Section 1.3.4, p. 15
SEQ Section 1.3.4, p. 15
Section 1.3.4, p, 14
Section 1.3.4, p, 14
Exercise 1.4.3K, p. 22
Section 2.1, p. 25
Section 2.1, p. 25
Section 2.1, p. 25
REid Definition 2.3B, p. 29
[9'l, [9']... Definition 4.1D, p. 45
fFP Definition 4.1C, p. 46
K Definition 4.2.1A, p. 48
REn ,, ]! Exercise 4.2.1G, p. 50
RE.... Exercise 4.2.1H, p. 50
lI>i Definition 4.2.2A, p. 50
SO Definition 4.3.5B, p. 61
REsD Definition 4.3.5B, p. 61
REc b., Exercise 4.3.5C, p. 63
M(L) Definition 4.3.6B, p. 64
Definition 4.4.lA, p. 66
Definition 4.4.1A, p. 66
Definition 4.4.1C, p. 70
List of Symbols 199
Angluin. D.• xi. xiii. 30. 56. 75. 76 P utn am. H .• 3. 20. 63. 14 5
Ba rzd in. r., 137 Roger s. H., xv, 9. 12, 29, 48. 59. 63. 88. 117.
Blum. L., xiii, 25. 72. 83 -85. gg, 108. 184 135, 155, 156, 159
Blum. M.• xiii, 25. SO. 5 1. 63 -66. 72. 83-85.
88, 108, 184 Sa nkoff G.. 154
Bro wn. P.• 154 Schafer . G.. 72. 73. 8 1. 92
Brown. R, 14 Shapiro. E.• 59
Shoenfiel d, J.• 155-1 57
Ca nny . J., 56, 72 Smith . C . xi. xlll , 125. 127. 128. 132, 133.
Case. J.• xiii. 62.125.1 27.1 28. 132. 133.1 37. 137,1 40 -142, 192
140- 142, 149,1 50,1 51,1 92 Sol omonoff R.• 3
C he n. K., 143. 144 Sloe!,)., 133
C homsky. N .• xiii, 20, 34 Slo b. M .• xi. xiii
C uhcover. P., xiii. 12.34. 66. 73. 187
Ullma n. J .• 12
Feld man. 11.• 154
F reivald. a., 23. 147, 148. 191, 193 We instei n. S.• xi. 139
Fod or. J.• 41 We xler . K.• xiii. 12. 34.66. 73, 187
Fulk , M., 60, 89, 9 1 White. L.. 76
Wie hage n. R.• 23. 29, 58. 109. 147. 148. 191.
Gleitman. u., 14, 106 193
Glcitman. L.. xiii . 14, 106. 154 witrgenstein. L..41
G old. E. M .• xiii, 3. 7, 23. 27, 28. 48. 78. 79.
109, 113,11 5, 116,146,149,1 68,1 74 Yo ung. P.• H . 9. 5 1.68
G o ldin- M cad ow, S.• 154
H anlon. C. 14
H ar rin gton. L.. 140
Ho pcroft. L, 12
Ho rn ing. J.• 187
K ri pke. S.•41
K in bcr. E.• 191. 193
Pa padimit rio u. C; xv
Pi nk er . S.• xiii. 33. 186
Pitt . L.. 193
Podnie ks. K .• 137
Popper. K.• 62
Subject Index