Professional Documents
Culture Documents
Stuart Russell
UC Berkeley
2
Expressiveness of formal models
• Atomic models: each possible world is an
atom or token, i.e., no internal structure.
E.g., Heads or Tails
• Propositional models: each possible world
defined by values assigned to variables.
E.g., Boolean logic, Bayes net, neural net
• First-order models: each possible world
defined by objects and relations
3
Expressiveness
• Expressive language => concise models
=> fast learning, sometimes fast reasoning
– E.g., rules of chess:
1 page in first-order logic (e.g., Prolog),
100000 pages in propositional logic,
100000000000000000000000000000000000000
pages as atomic-state model
• Expressive language => general-purpose
inference and learning methods
4
5
[Milch, 2006]
6
[Milch, 2006]
Expressiveness
probability
logic
13
First-order probabilistic languages
• Gaifman [1964]:
– distributions over first-order possible worlds
• Halpern [1990]:
– syntax for constraints on such distributions
• Poole [1993] and several others:
– KB defines distribution exactly
– assumes unique names and domain closure
– like Prolog, databases (Herbrand semantics)
• Milch et al [2005] and others:
– distributions over full first-order possible worlds
• generative models for events and existence
– complete inference for all well-defined models
14
Herbrand vs full first-order
Given
Father(Bill,William) and Father(Bill,Junior)
How many children does Bill have?
15
Herbrand vs full first-order
Given
Father(Bill,William) and Father(Bill,Junior)
How many children does Bill have?
• First-order open-world
17
ABCD ABCD ABCD ABCD ABCD ABCD
Summary of distinction
• Herbrand (and DB, Prolog) semantics
assumes unique names and domain
closure, so all possible worlds have the
same, known, named objects
• First-order logic allows
– different constants to refer to the same objects
– objects that are not referred to by any constant
I.e. unknown worlds
18
Example: balls and urns
19
Balls and urns contd.
• N balls, prior distribution P(N)
• True colours C1,…CN, identical priors P(Ci)
• k observations, observed colours O=O1,..,Ok
• Assignment ω specifies which ball was observed
in each observation
balls
observations
21
Example: Citation Matching
[Lashkari et al 94] Collaborative Interface Agents,
Yezdi Lashkari, Max Metral, and Pattie Maes,
Proceedings of the Twelfth National Conference on
Articial Intelligence, MIT Press, Cambridge, MA,
1994.
23
CiteSeer02: Russell w/4 Norvig
• Russell S, Norvig P (1995) Artificial Intelligence: A Modern
Approach, Prentice Hall Series in Artificial Intelligence. Englewood
Cliffs, New Jersey
24
• Stuart Russell and Peter Norvig. Artificial intelligence: A modern approach. Prentice-
Hall Series on Artificial Intelligence. Prentice-Hall, Englewood Cliffs, New Jersey,
1995.
• Stuart Russell and Peter Norvig. Articial Intelligence: A Modern Approach. Prentice
Hall, 1995.
• Stuart Russell and Peter Norvig. Artificial Intelligence. A modern approach. Prentice-
Hall, 1995.
25
• Stuart Russell and Peter Norvig. Artficial Intelligence: A Modern Approach. Prentice-
Hall, Saddle River, NJ, 1995.
• Stuart Russell and Peter Norvig. Articial Intelligence a modern approach. Prentice
Hall series in articial intelligence. Prentice Hall, Upper Saddle River, New Jersey,
1995.
• Stuart Russell and Peter Norvig. AI: A Modern Approach. Prentice Hall, NJ, 1995.
26
• 391-414. Russell SJ, Norvig P (
27
• In Proceedings of the Third Annual Conference on Evolutionary Programming (pp.
131--139). River Edge, NJ: World Scientific. Russell, S.J., \& Norvig, P. 1995.
Artificial Intelligence, A Modern Approach. Englewood Cliffs, NJ: Prentice Hall.
• John Wiley. Russell, S., \& Norvig, P. (1995). Artificial Intelligence: A Modern
Approach. Prentice-Hall, Inc.
28
• Stuart J. Russell and Peter Norvig. Articial Intelligence: A Modern Approach, chapter
17. Number 0-13-103805-2 in Series in Articial Intelligence. Prentice Hall, 1995.
• Stuart J. Russell and Peter Norvig. Articial Intelligence A Modern Approach. Prentice
Hall, Englewood Cli s, New Jersey, USA, 1995. 32
• Morgan Kaufmann Publishers. Russell, S., and Norvig, P. 1995. Artificial Intelligence:
A Modern Approach. Prentice Hall.
• W. Shavlik and T. G. Dietterich, eds., Morgan Kaufmann, San Mateo, CA. Russell, S.
and Norvig, P. (1995). Artificial Intelligence - A Morden Approach. Englewood Cliffs,
NJ: Prentice-Hall.
29
• Russell, Stuart and Peter Norvig, Artificial Intelligence, A Modern Approach, New
Jersey, Prentice Hall, 1995.
• Russell, S., \& Norvig, P. (1995) . Artificial intelligence: A modern monitoring methods
for information retrieval systems: From search approach. Prentice-Hall series on
artificial intelligence. Upper Saddle product to search process. Journal
• of the American Society for Information Science, 47, 568 -- 583. River, NJ: Prentice-
Hall.
• Stuart J. Russell and Peter Norvig. Artificial Intelligence: A Modern Approach, chapter
17. Number 0-13-103805-2 in Series in Artificial Intelligence. Prentice Hall, 1995.
30
• Russell, Stuart and Norvig, Peter: Artificial Intelligence: A Modern Approach, Prentice
Hall, Englewood Cliffs NJ, 1995
31
• Stuart J. Russell and Peter Norvig. Artiilcial Intelligence: A Modern
Approach. Prentice Hall, Englewood Cliffs, N J, 1995.
• Cambridge, MA: MIT Press. Russell, S. J., and Norvig, P. (1994). Artificial
Intelligence: A Modern Approach. Englewood Cliffs, NJ: Prentice-Hall.
• Fast Plan Generation Through Heuristic Search Russell, S., \& Norvig, P.
(1995). Artificial Intelligence: A Modern Approach. Prentice-Hall, Englewood
Cliffs, NJ.
• Hoffmann \& Nebel Russell, S., \& Norvig, P. (1995). Artificial Intelligence: A
Modern Approach. Prentice-Hall, Englewood Cliffs, NJ.
32
• Stuart Russel and Peter Norvig. Artificial Intelligence: A Modern Approach,
chapter 12.1 - 12.3, pages 367--380. Prentice Hall, 1995.
33
• S. Russel and P. Norvig Artificial Intelligence: A Modern Approach, MIT Press 1995.
• Russel, S., P. Norvig. Artificial Intelligence: A Modern Approach Prentice Hall 1995.
34
• Artificial Intelligence, S Russel \& P Norvig, Prentice Hall, 1995 21
35
Example: classical data association
36
Example: classical data association
37
Example: classical data association
38
Example: classical data association
39
Example: classical data association
40
Example: classical data association
41
Example: modern data association
42
Modern data association
Same car?
43
Example: natural language
• What objects are referred to in the
following natural language utterance?
44
Example: vision
• What objects appear in this image
sequence?
45
Outline
• Background and Motivation
– Why we need more expressive formal languages for probability
– Why unknown worlds matter
• Technical development
– Specifying distributions over all possible worlds: BLOG
– Sound and complete inference
• Applications
– Citation matching
– State estimation
• Open problems, future work
46
Citation Matching Example
Name: …
AuthorOf
Title: …
PubCited
49
Simplest Generative Process for
Citations
#Paper ~ NumPapersPrior(); number statement
Title(p) ~ TitlePrior();
part of skeleton:
exhaustive list of distinct citations
Text(c) ~ NoisyCitationGrammar(Title(PubCited(c)));
50
Adding Authors
#Researcher ~ NumResearchersPrior();
Name(r) ~ NamePrior();
#Paper ~ NumPapersPrior();
FirstAuthor(p) ~ Uniform({Researcher r});
Title(p) ~ TitlePrior();
PubCited(c) ~ Uniform({Paper p});
Text(c) ~ NoisyCitationGrammar
(Name(FirstAuthor(PubCited(c))), Title(PubCited(c)));
51
Objects Generating Objects
• Can we also specify a distribution for how many
papers each author writes?
• Yes, if the “origin graph” for objects is acyclic:
– Allow objects to generate objects
– Designate FirstAuthor(p) as an origin function
• set when paper p is generated,
• ties p back to the Researcher object that generated it
– FirstAuthor(p) no longer has its own dependency
statement
52
Number Statement Syntax
• Include FirstAuthor in number statement:
#Paper(FirstAuthor = r) ~ NumPapersPrior(Position(r));
53
Semantics: Objects
[Milch et al., IJCAI 2005]
54
Semantics: Infinite “Bayes net”
#Paper
Title((Paper, 1)) Title((Paper, 2)) …
Title((Paper, 3))
Text(Cit1) Text(Cit2)
PubCited(Cit1) PubCited(Cit2)
56
Sufficient Condition for
Well-Formedness
[Milch et al., IJCAI 2005]
57
Inference for BLOG
• Does infinite set of variables prevent inference?
• No: Sampling algorithm only needs to instantiate
finite set of relevant variables
• Algorithms:
– Rejection sampling [Milch et al., IJCAI 2005]
– Guided likelihood weighting [Milch et al., AI/Stats 2005]
– Markov Chain Monte Carlo [Milch et al., UAI 2006]
• Theorem 2: For any well-formed BLOG model,
these sampling algorithms converge to correct
probability for any query, using finite time per
sampling step
58
Outline
• Background and Motivation
– Why we need more expressive formal languages for probability
– Why unknown worlds matter
• Technical development
– Specifying distributions over all possible worlds: BLOG
– Sound and complete inference
• Applications
– Citation matching
– State estimation
• Open problems, future work
59
Citation Matching
[Pasula et al., NIPS 2002]
• Elaboration of generative model shown earlier
• Parameter estimation
– Priors for names, titles, citation formats learned offline
from labeled data
– String corruption parameters learned with Monte Carlo
EM
• Inference
– MCMC with cluster recombination proposals
– Guided by “canopies” of similar citations
– Accuracy stabilizes after ~20 minutes on 2000 citations
60
Citation Matching Results
0.25
(Fraction of Clusters Not Recovered Correctly)
0.2
Phrase Matching
0.15 [Lawrence et al. 1999]
Error
0.05
0
Reinforce Face Reason Constraint
62
Preliminary Experiments:
Information Extraction
• P(citation text | title, author names)
modeled with simple HMM
• For each paper: recover title, author
surnames and given names
• Fraction whose attributes are recovered
perfectly in last MCMC state:
– among papers with one citation: 36.1%
– among papers with multiple citations: 62.6%
Can use inferred knowledge for disambiguation 63
State Estimation for “Aircraft”
• Dependency statements for simple model:
#Aircraft ~ NumAircraftPrior();
State(a, t)
if t = 0 then ~ InitState()
else ~ StateTransition(State(a, Pred(t)));
#Blip(Source = a, Time = t)
~ NumDetectionsCPD(State(a, t));
#Blip(Time = t)
~ NumFalseAlarmsPrior();
ApparentPos(r)
if (Source(r) = null) then ~ FalseAlarmDistrib()
else ~ ObsCPD(State(Source(r), Time(r)));
72
Aircraft Entering and Exiting
#Aircraft(EntryTime = t) ~ NumAircraftPrior();
Exits(a, t)
if InFlight(a, t) then ~ Bernoulli(0.1);
InFlight(a, t)
if t < EntryTime(a) then = false
elseif t = EntryTime(a) then = true
else = (InFlight(a, Pred(t)) & !Exits(a, Pred(t)));
State(a, t)
if t = EntryTime(a) then ~ InitState()
elseif InFlight(a, t) then
~ StateTransition(State(a, Pred(t)));
#Blip(Source = a, Time = t)
if InFlight(a, t) then
~ NumDetectionsCPD(State(a, t));
…plus last two statements from previous slide
73
MCMC for Aircraft Tracking
[Oh et al., CDC 2004]
76
Extending the Model: Air Bases
#Aircraft(InitialBase = b) ~ InitialAircraftPerBasePrior();
CurBase(a, t)
if t = 0 then = InitialBase(b)
elseif TakesOff(a, Pred(t)) then = null
elseif Lands(a, Pred(t)) then = Dest(a, Pred(t))
else = CurBase(a, Pred(t));
InFlight(a, t) = (CurBase(a, t) = null);
TakesOff(a, t)
if !InFlight(a, t) then ~ Bernoulli(0.1);
Lands(a, t)
if InFlight(a, t) then
~ LandingCPD(State(a, t), Location(Dest(a, t)));
Dest(a, t)
if TakesOff(a, t) then ~ Uniform({Base b})
elseif InFlight(a, t) then = Dest(a, Pred(t))
State(a, t)
if TakesOff(a, Pred(t)) then
~ InitState(Location(CurBase(a, Pred(t))))
elseif InFlight(a, t) then
77
~ StateTrans(State(a, Pred(t)), Location(Dest(a,
t)));
Unknown Air Bases
• Just add two more lines:
#AirBase ~ NumBasesPrior();
Location(b) ~ BaseLocPrior();
78
BLOG Software
• Bayesian Logic inference engine available:
http://www.cs.berkeley.edu/~milch/blog
79
Summary
• Expressive probability languages are important for many
real-world problems and for real AI
• We must be able to specify probability distributions over
possible worlds with unknown objects and relations
• BLOG does this
– Every well-formed BLOG model specifies a unique distribution over
possible worlds
– Inference algorithms work correctly for all well-formed BLOG models
– BLOG inference creates probabilistic relational KBs from raw data
– Applications requiring several PhDs and thousands of lines of code
can be implemented in a few dozen lines of BLOG with no new
PhDs
80
Open Problems
• Inference
– Applying “lifted” inference to BLOG (like Prolog)
– Approximation algorithms for problems with huge
numbers of objects
– Effective filtering algorithm for DBLOG
• Structure learning
– Learning more complex dependency statements
– Hypothesizing new random functions, new types
• Applications to vision, NLP, WWW => KB
81
Expressiveness and complexity in logic
undecidable [Poole, Mackworth & Goebel, 1998]
First-order logic
decidable
3-CNF Datalog
NP-hard
polytime
2-CNF Propositional definite
Propositional database
82
Merging Logical and Probabilistic AI
• We need to develop a similarly rich hierarchy of
probabilistic languages
• Logical languages are special cases of probabilistic!
• Does probabilistic inference operate in the
deterministic limit like logical inference?
– One hopes so! The brain does not have a big switch!!!
– MCMC asymptotically identical to greedy SAT algorithms
– Several algorithms exhibit “default logic” phenomenology
• Modern AI should build on, not discard, classical AI
83