You are on page 1of 26

Acquiring and Using World Knowledge

using a Restricted Subset of English

Peter Clark, Phil Harrison, Tom Jenkins,


John Thompson, Rick Wojcik
Boeing Phantom Works, Seattle
Introduction
• Knowledge acquisition is still a major bottleneck
– automated methods are good but still very restricted
• Our approach:
– Knowledge entry using Controlled Language
– Hits “sweet spot” between logic and full NLP
– language interpreter generates logic output
• Outline:
1. Our Controlled Language Processing technology
2. Discussion on Natural Language as a basis for KR
The Language Spectrum
Formal Unrestricted
language Controlled English natural
language
“xy B(x) “A ball falls from a cliff”
“Consider the following
R(x,y)C(y)” possible situation in
which a ball first…”

too too hard for


hard the computer
for the to understand
user
CPL (Computer-Processable Language)
Original text (incomprehensible to computer):

An object is thrown with a horizontal velocity of 20 m/s from a


cliff that is 125 m above level ground. If air resistance is
negligible, how long does it take the object to fall to the ground?

Short sentences No pronouns


Rewritten in CPL (computer can understand):
An object is thrown from a cliff.
The horizontal velocity of the object is 20 m/s.
The top of the cliff is 125 m above level ground.
The object falls 125 m to the ground.
What is the duration of the fall?
Simple sentence structures
Target Interpretation
• Sentences in first-order logic
• Capable of supporting machine inference

“An object is thrown from a cliff”

isa(_Object1, object_n1) Throw


isa(_Cliff2, cliff_n1) object origin
isa(_Throw3, throw_v1) Object Cliff
object(_Throw3, _Object1)
origin(_Throw3, _Cliff2)
Target Interpretation
• Sentences in first-order logic
• Capable of supporting machine inference
IF “a person is carrying an entity that is inside a room”
THEN “the person is in the room.”

isa(_Person1, person_n1)
isa(_Room2, room_n1) Carry
isa(_Entity3, entity_n1)
isa(_Carry4, carry_v1) agent object
object(_Carry4, _Entity3) Person Object
agent(_Carry4, _Person1) is-inside
is-inside(_Entity4, _Room2) is-inside
=====> Room
is-inside(_Person1, _Room2)
Overview of Processing
“An object is thrown from a cliff”

Parser & LF Generator

Word sense disambiguator Linguistic


Knowledge
Relational disambiguator

Coreference identifier
World
Structural reorganizer Knowledge

(_Object13320 instance_of object_n1)


Throw
(_Cliff13321 instance_of cliff_n1)
(_Throw13319 instance_of throw_v1) object origin
(_Throw13319 object _Object13320) Object Cliff
(_Throw13319 origin _Cliff13321)
Entering Quantified Expressions (Rules)
• Seven “rule templates” used:
IF sentence THEN sentence
ABOUT object: sentence
object IS noun/verb phrase
BEFORE sentence, sentence
BEFORE sentence, it is not true that sentence
AFTER sentence, sentence
AFTER sentence, it is not true that sentence
Processing:
1. Each sentence processed as a ground assertion
2. Quantifiers are added (Prolog-style)
3. “Action” templates become situation calculus rules
Overall Flow of Processing
CPL (Controlled english)
An object is thrown from a cliff.
The horizontal velocity of the

Original
object is 20 m/s. The top of the
cliff is 125 m above level ground.

text

Rewriting
advice Logic

An object is thrown from a cliff.


The horizontal velocity of the
object is 20 m/s. The top of the
cliff is 125 m above level ground.

Paraphrase of
system’s understanding KB
Part II: Discussion

Controlled Languages:
Strengths and challenges
Strengths…
xy B(x) “A man is driving a truck
R(x,y)C(y)??? towards the factory”

• CPL is easy to use, appears viable


– built KB with over 1000 rules
– KB is
• inference-capable
• easy to inspect and organize
• Makes knowledge entry accessible to many users
– major achievement
Challenges: 1. Reformulating in a
Controlled Language is not trivial
• Task is not just grammatical reformulation
• Rather:
– “natural” English leaves much knowledge implicit
– CPL author must make that explicit
Original text:
“attack: intense adverse criticism”

CPL:
“IF a person attacks a 2nd person
THEN the first person criticizes the 2nd person intensely.”
Challenges: 1. Reformulating in a
Controlled Language is not trivial
• Task is not just grammatical reformulation
• Rather:
– “natural” English leaves much knowledge implicit
– CPL author must make that explicit
Original text:
“axis: the center around which something rotates”

CPL:
“IF an object is rotating
THEN the object is turning around the object’s axis.”
2. Users may not be aware of system’s mistakes
1. User must be able to spot misinterpretations easily
– System’s paraphrase must be unambiguous
2. User must know how to correct them

“The man ate the sandwich on the plate”

“The man ate on the plate. He ate the sandwich.”

??????
2. Users may not be aware of their mistakes
• User must be able to spot errors easily
– System’s paraphrase must be unambiguous
• User must know how to correct them

“The man ate the sandwich on the plate”

“The man ate on the plate. He ate the sandwich.”

“the man ate the sandwich that was on the plate”


3. Natural-Language-based knowledge
representations have limited expressivity

“Natural language is very expressive”

• …not to the computer! (Avoid “wishful semantics”)


• Expressiveness =
– the amount the computer understands
– the amount it is able to use to draw conclusions from
• Everything else is meaningless to the computer

• e.g., CPL can’t express:


– constraints, defaults, some quantification patterns
4. Sometimes, linguistically motivated
representations are poor
• Language-based KR:
– Most concepts correspond to words
– Structure of KB will mirror structure of language
• Is this bad? Sometimes…

“… walked for 10 miles”


NL-based “Traditional”
KR KR
distance(_Walk1, _Mile1) distance(_Walk1, _Distance1)
count(_Mile1, 10) value(_Distance1, 10, mile)
 
5. (Lack of) Canonicalization
“conducting a test of an entity”
“testing an entity”
• Many ways to say the same thing
• System needs to realize the equivalence
BUT: often NL-based KRs will not 

Solutions:
• Add equivalence rules. (But there are lots!!)
– e.g., “Conducting a X of Y ↔ Xing a Y”
• Have the interpreter normalize the input.
• Restrict the input language.
Summary
• CPL = a restricted English language for knowledge
– Hits “sweet spot” between logic and full NLP
– Produces inference-capable representations
– Is viable, used to build a large KB
• But: No “free lunch”
– requires skill to use it effectively

• NL-based KRs are becoming more important!


– Web: need semantically meaningful annotations
– AI: need better knowledge acquisition tools
• Some exciting possibilities ahead (esp. at Boeing!)

You might also like