Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Download
Standard view
Full view
of .
Save to My Library
Look up keyword
Like this
2Activity
0 of .
Results for:
No results containing your search query
P. 1
Ferrucci-Watson2010 - Build Watson - An Overview of DeepQA Project

Ferrucci-Watson2010 - Build Watson - An Overview of DeepQA Project

Ratings: (0)|Views: 204 |Likes:
Published by Franck Dernoncourt

More info:

Published by: Franck Dernoncourt on Apr 09, 2012
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

04/11/2013

pdf

text

original

 
T
he goals o IBM Research are to advance computer scienceby exploring new ways or computer technology to aectscience, business, and society. Roughly three years ago,IBM Research was looking or a major research challenge to rivalthe scientific and popular interest o Deep Blue, the computerchess-playing champion (Hsu 2002), that also would have clearrelevance to IBM business interests.With a wealth o enterprise-critical inormation being cap-tured in natural language documentation o all orms, the prob-lems with perusing only the top 10 or 20 most popular docu-ments containing the user’s two or three key words arebecoming increasingly apparent. This is especially the case inthe enterprise where popularity is not as important an indicatoro relevance and where recall can be as critical as precision.There is growing interest to have enterprise computer systemsdeeply analyze the breadth o relevant content to more precise-ly answer and justiy answers to user’s natural language ques-tions. We believe advances in question-answering (QA) tech-nology can help support proessionals in critical and timelydecision making in areas like compliance, health care, businessintegrity, business intelligence, knowledge discovery, enterpriseknowledge management, security, and customer support. For
 Articles
FALL 2010 59
Copyright © 2010, Association or the Advancement o Artifcial Intelligence. All rights reserved. ISSN 0738-4602
Building Watson:An Overview o theDeepQA Project
 David Ferrucci, Eric Brown, Jennier Chu-Carroll, James Fan, David Gondek, Aditya A. Kalyanpur, Adam Lally, J. William Murdock, Eric Nyberg, John Prager,Nico Schlaeer, and Chris Welty 
I
 IBM Research undertook a challenge to build a computer system that could compete at thehuman champion level in real time on the American TV quiz show, Jeopardy. The extent o the challenge includes felding a real-timeautomatic contestant on the show, not merely alaboratory exercise. The Jeopardy Challengehelped us address requirements that led to thedesign o the DeepQA architecture and theimplementation o Watson. Ater three years o intense research and development by a coreteam o about 20 researchers, Watson is per-orming at human expert levels in terms o pre-cision, confdence, and speed at the Jeopardy quiz show. Our results strongly suggest that  DeepQA is an eective and extensible architec-ture that can be used as a oundation or com-bining, deploying, evaluating, and advancing awide range o algorithmic techniques to rapidly advance the feld o question answering (QA).
 
researchers, the open-domain QA problem isattractive as it is one o the most challenging in therealm o computer science and artificial intelli-gence, requiring a synthesis o inormationretrieval, natural language processing, knowledgerepresentation and reasoning, machine learning,and computer-human interaces. It has had a longhistory (Simmons 1970) and saw rapid advance-ment spurred by system building, experimenta-tion, and government unding in the past decade(Maybury 2004, Strzalkowski and Harabagiu 2006).With QA in mind, we settled on a challenge tobuild a computer system, called Watson,
1
whichcould compete at the human champion level inreal time on the American TV quiz show,
 Jeopardy 
.The extent o the challenge includes fielding a real-time automatic contestant on the show, not mere-ly a laboratory exercise.
 Jeopardy!
is a well-known TV quiz show that hasbeen airing on television in the United States ormore than 25 years (see the
 Jeopardy!
Quiz Showsidebar or more inormation on the show). It pitsthree human contestants against one another in acompetition that requires answering rich naturallanguage questions over a very broad domain o topics, with penalties or wrong answers. The natureo the three-person competition is such that confi-dence, precision, and answering speed are o criticalimportance, with roughly 3 seconds to answer eachquestion. A computer system that could compete athuman champion levels at this game would need toproduce exact answers to oten complex naturallanguage questions with high precision and speedand have a reliable confidence in its answers, suchthat it could answer roughly 70 percent o the ques-tions asked with greater than 80 percent precisionin 3 seconds or less.Finally, the
 Jeopardy 
Challenge represents aunique and compelling AI question similar to theone underlying DeepBlue (Hsu 2002)
 — 
can a com-puter system be designed to compete against thebest humans at a task thought to require high lev-els o human intelligence, and i so, what kind o technology, algorithms, and engineering isrequired? While we believe the
 Jeopardy 
Challengeis an extraordinarily demanding task that willgreatly advance the field, we appreciate that thischallenge alone does not address all aspects o QAand does not by any means close the book on theQA challenge the way that Deep Blue may have orplaying chess.
The
 Jeopardy 
Challenge
Meeting the
 Jeopardy 
Challenge requires advancingand incorporating a variety o QA technologiesincluding parsing, question classification, questiondecomposition, automatic source acquisition andevaluation, entity and relation detection, logicalorm generation, and knowledge representationand reasoning.Winning at
 Jeopardy 
requires accurately comput-ing confidence in your answers. The questions andcontent are ambiguous and noisy and none o theindividual algorithms are perect. Thereore, eachcomponent must produce a confidence in its out-put, and individual component confidences mustbe combined to compute the overall confidence o the final answer. The final confidence is used todetermine whether the computer system shouldrisk choosing to answer at all. In
 Jeopardy 
parlance,this confidence is used to determine whether thecomputer will “ring in” or “buzz in” or a question.The confidence must be computed during the timethe question is read and beore the opportunity tobuzz in. This is roughly between 1 and 6 secondswith an average around 3 seconds.Confidence estimation was very critical to shap-ing our overall approach in DeepQA. There is noexpectation that any component in the systemdoes a perect job
 — 
all components post eatureso the computation and associated confidences,and we use a hierarchical machine-learningmethod to combine all these eatures and decidewhether or not there is enough confidence in thefinal answer to attempt to buzz in and risk gettingthe question wrong.In this section we elaborate on the variousaspects o the
 Jeopardy 
Challenge.
The Categories
A 30-clue
 Jeopardy 
board is organized into sixcolumns. Each column contains five clues and isassociated with a category. Categories range rombroad subject headings like “history,” “science,” or“politics” to less inormative puns like “tutumuch,” in which the clues are about ballet, to actu-al parts o the clue, like “who appointed me to theSupreme Court?” where the clue is the name o ajudge, to “anything goes” categories like “pot-pourri.” Clearly some categories are essential tounderstanding the clue, some are helpul but notnecessary, and some may be useless, i not mis-leading, or a computer.A recurring theme in our approach is the require-ment to try many alternate hypotheses in varyingcontexts to see which produces the most confidentanswers given a broad range o loosely coupled scor-ing algorithms. Leveraging category inormation isanother clear area requiring this approach.
The Questions
There are a wide variety o ways one can attempt tocharacterize the
 Jeopardy 
clues. For example, bytopic, by dificulty, by grammatical construction,by answer type, and so on. A type o classificationthat turned out to be useul or us was based on theprimary method deployed to solve the clue. The
 Articles
60AI MAGAZINE
 
 Articles
FALL 2010 61The
 Jeopardy!
quiz show is a well-known syndicat-ed U.S. TV quiz show that has been on the airsince 1984. It eatures rich natural language ques-tions covering a broad range o general knowl-edge.It is widely recognized as an entertaininggame requiring smart, knowledgeable, and quickplayers.The show’s ormat pits three human contestantsagainst each other in a three-round contest o knowledge, confidence, and speed. All contestantsmust pass a 50-question qualiying test to be eligi-ble to play. The first two rounds o a game use agrid organized into six columns, each with a cate-gory label, and five rows with increasing dollarvalues. The illustration shows a sample board or afirst round. In the second round, the dollar valuesare doubled. Initially all the clues in the grid arehidden behind their dollar values. The game playbegins with the returning champion selecting acell on the grid by naming the category and thedollar value. For example the player may select bysaying “Technology or $400.”The clue under the selected cell is revealed to allthe players and the host reads it out loud. Eachplayer is equipped with a hand-held signaling but-ton. As soon as the host finishes reading the clue,a light becomes visible around the board, indicat-ing to the players that their hand-held devices areenabled and they are ree to signal or “buzz in” ora chance to respond.I a player signals beore thelight comes on, then he or she is locked out orone-hal o a second beore being able to buzz inagain.The first player to successully buzz in gets achance to respond to the clue. That is, the playermust answer the question, but the response mustbe in the orm o a question. For example, validlyormed responses are, “Who is Ulysses S. Grant?”or “What is
The Tempest 
?” rather than simply“Ulysses S. Grant” or “
The Tempest 
.The
 Jeopardy 
quiz show was conceived to have the host provid-ing the answer or clue and the players respondingwith the corresponding question or response. Theclue/response concept represents an entertainingtwist on classic question answering.
 Jeopardy 
cluesare straightorward assertional orms o questions.So where a question might read, “What drug hasbeen shown to relieve the symptoms o ADD withrelatively ew side eects?”the corresponding
 Jeopardy 
clue might read “This drug has beenshown to relieve the symptoms o ADD with rela-tively ew side eects.The correct
 Jeopardy 
response would be “What is Ritalin?”Players have 5 seconds to speak their response,but it’s typical that they answer almost immedi-ately since they oten only buzz in i they alreadyknow the answer. I a player responds to a clue cor-rectly, then the dollar value o the clue is added tothe player’s total earnings, and that player selectsanother cell on the board. I the player respondsincorrectly then the dollar value is deducted romthe total earnings, and the system is rearmed,allowing the other players to buzz in.This makesit important or players to know what they know
 — 
to have accurate confidences in their responses.There is always one cell in the first round andtwo in the second round called Daily Doubles,whose exact location is hidden until the cell isselected by a player. For these cases, the selectingplayer does not have to compete or the buzzer butmust respond to the clue regardless o the player’sconfidence. In addition, beore the clue is revealedthe player must wager a portion o his or her earn-ings.The minimum bet is $5 and the maximumbet is the larger o the player’s current score andthe maximum clue value on the board. I playersanswer correctly, they earn the amount they bet,else they lose it.The Final
 Jeopardy 
round consists o a singlequestion and is played dierently. First, a catego-ry is revealed. The players privately write downtheir bet
 — 
an amount less than or equal to theirtotal earnings. Then the clue is revealed. Theyhave 30 seconds to respond. At the end o the 30seconds they reveal their answers and then theirbets. The player with the most money at the endo this third round wins the game.The questionsused in this round are typically more dificult thanthose used in the previous rounds.
 The
 Jeopardy
Quiz Show

You're Reading a Free Preview

Download
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->