An Article About The Cyc Project

Jared Friedman
October 22, 2003

Harvard Science Review
The Sole Contender for AI
"AI has been brain-dead since the 1970s,"1 said Marvin Minksy, one of the
undisputed founding fathers of the field of artificial intelligence (AI), at a recent speech
at Boston University. Minsky was referring to the fact that while AI has had success in
narrow, expert domains, like playing chess or composing music, it remains stumped by
the ultimate goal of building a single computer system that is generally intelligent about
the world. While there are many AI researchers working on various sub-problems of the
field, it is surprising how few are actually tackling the age-old dream of building a
computer one could just have a conversation with, like HAL 9000 in 2001: A Space
Odyssey. Indeed, Minsky went on to say that, in his opinion, only one project was on the
right track to solving the deep problems of AI: the Cyc project.
The Cyc (derived from en-cyc-lopedia) project is based on the idea that the
primary impediment to a HAL-like system is the lack of commonsense knowledge in
computers. In AI, commonsense knowledge is loosely the set of facts that any normal
adult person would be expected to know, such as "when people die, they stay dead," "a
dog is a type of animal," and "submarines are meant to travel under water." While it is
difficult to estimate the size of this set of facts, it is known to be very large, perhaps in the
ten millions, and there is currently no computer system that possesses it. Time and again,
AI has met a roadblock in this lack of commonsense knowledge. The hallmark of AI,
expert systems, which are programs that attempt to replicate tasks that human experts do
such as giving medical advice or making travel plans, are known for their brittleness -
their tendency to break down in unfamiliar situations. A classic example is the program
MYCIN, which diagnoses blood diseases better than most human experts but would be
likely to prescribe penicillin for the ailments of a rusting car. The lack of commonsense
knowledge also limits the accuracy of foreign language translation systems. For instance,
a program that attempts to translate the phrase "bats and other small mammals" into
Spanish must be able to tell whether the word "bat" refers to an animal "murciélago" or to
a baseball bat "bate," and to do this it must know that the former meaning is kind of small
mammal and the latter is not. Because of the commonsense problem, even the best
computerized translators give very mediocre performance and commercially viable
expert systems are rare.
The Cyc project is an attempt to break this bottleneck on AI once and for all by
programming the whole of commonsense knowledge into a computer. Commonsense
facts in Cyc are stored in formal logic, and are hand-entered by human knowledge
engineers. An inference engine is able to reason with these facts to answer queries, and a
natural language interface can translate between English and Cyc's internal representation
language. Potential applications for Cyc range from the prosaic, such as a sanity-checker
1
"AI Founder Blasts Modern Research." Wired News, May 13, 2003.
<http://www.wired.com/news/technology/0,1282,58714,00.html> (cited 27 Sept., 2003).
for spreadsheets that would know that "24,562" is not a reasonable value for a person's
age, to the world-changing, such as a question-answering system that could use
information on the internet to respond to questions like "What is the second-tallest
mountain on earth and how high is it?" with a single answer "K2, 28,251 ft."
The Cyc project has been working on this problem for about twenty years, and
Cyc now contains about 1.5 million facts, or assertions. This is by far the largest
collection of commonsense knowledge in the world. Cycorp, the company that produces
Cyc, has about eighty employees, most of whom are ontologists, who spend their time
adding to and improving the Cyc knowledge base (KB). While Cyc's KB contains a
significant fraction of commonsense knowledge, and Cyc has in general made great
strides towards its ultimate goal, it remains a work in progress. In particular, Cyc has yet
to be successfully deployed in a commercial situation, and its development is still funded
primarily by DARPA, the Defense Advanced Research Projects Agency. Cyc's KB is
large enough now to answer a substantial percentage of randomly chosen commonsense
questions, like "What color is grass?", "Do fish fly?", "What shape is the earth?". Cyc's
natural language system can almost always translate Cyc's internal knowledge and
thinking into English, but Cyc cannot yet understand most normal English sentences.
1. (#$isa #$HurricaneAndrew #$Hurricane)

2. (#$genls #$Hurricane #$RainProcess)
3. (#$relationAllExists #$eventOccursAt #$RainProcess #$RainyLocation)
4. (#$genls #$RainyLocation #$CloudyLocation)
5. (#$disjointWith #$CloudyLocation #$SunnyLocation)
Figure 1. Five Example Sentences in CycL
Commonsense
knowledge in Cyc is
stored in a proprietary
formal language called
CycL. CycL consists
of about 100,000
constants, each of
which corresponds to a
commonsense
concept, like #$Dog,
#$Hammer, or
#$Shouting.
Combining Cyc
constants into
expressions, Cycorp's
ontological engineers
can state facts about
the real world, such as
in the example
sentences in Figure 1.
Cyc's inference engine
can then reason with
Screenshot of Cyc. This is the Cyc KB browser, where users can browse the stored information
and improve Cyc's knowledge. This page displays OpenCyc's knowledge of to answer questions
dogs. even if the answers
are not directly stated
in the KB. So for instance, say that some user testing Cyc's knowledge asked Cyc if it
was cloudy during Hurricane Andrew. Even though no ontologist would assert such a
fact directly, Cyc can still use the example sentences to derive it. Cyc would reason that
(1) Hurricane Andrew is a hurricane, and (2) all hurricanes are rain processes, so
Hurricane Andrew must be a rain process. Sentence three tells Cyc that all rain processes
must occur at some rainy location, so Andrew must also occur at a rainy location. But (4)
says that all rainy locations are also cloudy locations, so Cyc can conclude that Andrew
must have occurred at a cloudy spot.
It may surprise you to learn that if asked whether the sun was shining during the
middle of Hurricane Andrew, Cyc will not be able to use (5), which says that cloudy
locations are never sunny, to prove that it was not. The problem here is a subtle logical
flaw in the third sentence, which does not rule out the possibility that Andrew might have
occurred at some other location (which might be sunny) in addition to the cloudy one.
This is exactly the kind of problem that Cycorp's ontologists constantly confront. Subtle
holes in Cyc's knowledge are omnipresent, and tremendously difficult to find and patch.
Subtle contradictions are just as much of a problem. For example, combining (4) and (5),
we can conclude that if it is rainy outside, it is not sunny. This may be a good general
rule of thumb, but it obviously breaks down in the case of sun showers. Considering
every such exceptional case is a daunting but necessary task for Cycorp's ontologists.
Another pressing problem for Cycorp is the speed of inference. As Cyc's
knowledge base has grown to an enormous size, it has become increasingly difficult for
the inference engine to search through the entire KB, and inference in Cyc has slowed to
a crawl. While simple inferences still usually finish in seconds, more complicated ones
can take hours. To see what takes so long, consider the last example, in which we asked
Cyc to prove that Hurricane Andrew could not have occurred in a sunny location. What
would happen if, in addition to the facts in fig. 1., Cyc also knew the somewhat arcane
but true fact that hurricanes never occur in the area approximately within five degrees of
the equator? Cyc's inference engine might then decide that the best way to prove that it
wasn't sunny during Hurricane Andrew was to prove that it is never sunny more than five
degrees from the equator. Cyc might have all sorts of facts that it could use to try to
prove this new statement, and it might spend a great deal of time working on the problem
before giving up and trying to prove that the hurricane wasn't sunny a different way.
Among Cycorp employees, this problem is sometimes called the "menstruating

television problem,"2 after a famous long inference that, when stopped mid-way for sheer
impatience, turned out to have decided that it could prove the question it had been asked
if only it could prove that televisions could menstruate, and to have proceeded to spend a
great deal of time trying to prove this dubious claim. Hope is not lost, though, for Cycorp
is working hard to make the inference engine smarter, so that it wastes less time trying to
prove things that are obviously false. Many people at Cycorp conjecture a speedup of as
much as a hundred-fold in the foreseeable future.
Even more critical for Cyc's ultimate success, though, will be its support for
natural languages, like English. Successful reasoning about hurricanes is impressive to a
few, but ordinary people do not know CycL and will be interested in Cyc only if they can
ask such questions in English. But translating the free structure of a natural language into
a rigid formalism like CycL is a famously difficult problem of AI. Part of the trouble is
that the sentences translated into CycL don't look anything like the original English. To
ask Cyc "Was it cloudy during Hurricane Andrew?", you would have to say:
(#$implies
(#$eventOccursAt #$HurricaneAndrew ?LOC)
(#$isa ?LOC #$CloudyLocation))
There is just no straightforward algorithm for this conversion. Most randomly chosen
newspaper sentences can be translated into CycL without adding additional vocabulary -
an enormous and unparalleled achievement. However, the CycL translations of most
newspaper sentences fill at least a page and are very complex. Cycorp has made great
strides in the translation of CycL to English, which is now very good. But the harder
problem of English to CycL will likely be a stickler for Cycorp in the years to come.
In academia, Cyc has always been controversial. For one, it has been developed
entirely in the private sector, and Cycorp has not published or participated in AI
2
Personal correspondence with Daniel Mahler, a Cycorp employee.
conferences in years. Cycorp largely put an end to this criticism two years ago, when
they released a version of Cyc online for free at www.opencyc.org. OpenCyc is similar
to full Cyc, but its knowledge base is just a few percent of the full KB and its
functionality is greatly reduced. Since Cyc's success lies in the completeness of its
knowledge base, the only people who really know the extent of Cyc's progress are
Cycorp employees.
But most of the criticism leveled at Cyc attacks the lack of a theoretical basis for
Cyc. There are many famously unsolved problems in the representation of commonsense
knowledge and Cycorp does not claim to have complete solutions. No one knows how
humans store and reason with commonsense knowledge, and there is no reason to believe
that Cyc's methods are anything like a human's. Since there is no sentient being that
thinks like Cyc, it is impossible to be certain that any amount of knowledge could make
Cyc intelligent.
Cyc's inference engine has also been a controversial topic. To allow CycL to be
expressive enough to represent almost any conceivable knowledge, Cycorp had to give
up both soundness and completeness in inference. This means that if you ask Cyc a
question and it does not reply for a while, you cannot in general know whether it is
unable to answer the question or whether it might respond if just given a little more time.
It is also theoretically possible that Cyc's inference engine could give erroneous answers
to questions, even if all the relevant knowledge in Cyc were correct.
Cycorp,
however, bashes right
back at the naysayers.
Cyc, they say, is
fundamentally an
engineering project
and is not meant to
resolve theoretical
issues. The subtle
philosophical
problems that AI
researchers have spent
so much time
Inference in Cyc is unpredictable. This diagram shows an inference tree that
might have been created in an attempt to prove that Hurrican Andrew wasn't
worrying about rarely
sunny. Currently, the engine is stuck down a nearly bottomless rabbit hole - come up in practice.
a proof method that, while plausible, will not succeed. Cycorp's president
has accused AI
researchers of "physics envy," of insisting that there must be some "free lunch" approach
to achieving machine intelligence.3 Academic AI has made little fundamental progress in
recent years, he argues, because they are stuck in a circle of philosophizing and unwilling
to do the hard work of actually creating artificial intelligence. The debate has been
heated on both sides, but ultimately, argument is powerless to resolve the issue - the only
way to know whether Cyc will work is to try it and see, and that is exactly what Cycorp is
doing.
3
"The Know-It-All Machine", Lingua Franca. Volume 11, No. 6 (Sept. 2001).

An Article About The Cyc Project

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

An Article About The Cyc Project

Uploaded by

Copyright:

Available Formats

Jared Friedman

October 22, 2003

The Sole Contender for AI

1. (#$isa #$HurricaneAndrew #$Hurricane)

Among Cycorp employees, this problem is sometimes called the "menstruating

You might also like