You are on page 1of 39

Fron%ers

of Computa%onal Journalism
Columbia Journalism School Week 8: Knowledge Representa%on November 5, 2012

Week 8: Knowledge Representa%on


Story Metadata Linked Open Data Knowledge as Rela%ons Automa%c story wri%ng

Unstructured data

Structured data

Ar%cle Metadata
headline

photo

photo credit photo cap%on byline publica%on date dateline ar%cle body related ar%cles

Schema.org news markup


Overall type of the object on this page, in HTML head

Headline, dateline, date as addi%ons to div/span proper%es

Byline expressed as nested object (using itemscope) of type schema.org/Person

Driving applica%on: rich snippets

Schema.org covers not just news but music, restaurants, people, organiza%ons, reviews, oers... Snippets, and be\er search-ability generally, are mo%va%on for Google, Yahoo, Bing to push schema.org

Addi%onal metadata from indexing team

In database, but doesn't necessarily make it to HTML.

News applica%on: content naviga%on

Ar%cles about Syria on NYT topic page More reliable than simple text search (because the relevance algorithm knows a story is "about" Syria.)

Week 8: Knowledge Representa%on


Story Metadata Linked Open Data Knowledge as Rela%ons Automa%c story wri%ng

Ontologies
What objects and rela%ons are available?

Oden represented as class hierarchy. Arrows = is_a rela%on

(Part of) a real ontology, from Cyc

Every big news org has their own big ontology L

topics, people, organiza%ons, places...

Yaaay Linked Data!


Triples of (subject rela%on object), each a URL or literal


<urn:x-states:New%20York> ! <http://purl.org/dc/terms/alternative> ! "NY! <http://dbpedia.org/resource/Columbia_University> ! <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>! <http://schema.org/CollegeOrUniversity>!

Abbrevia%ons possible with many formats...


<http://dbpedia.org/resource/Columbia_University> !rdf:type!ns6:CollegeOrUniversity!

NYT ontology available as LOD

owl:SameAs makes this interoperable

NYT API can return linked data


{! !"title": "Syria's Rebels Open Talks on Forging United Political Front"! !"body": "BEIRUT, Lebanon Syria s fractious opposition groups began negotiations in Doha, Qatar, on Sunday to forge a more unified front to reshape the political landscape in a bloody conflict that claims more than 100 lives virtually every day. Given the scant prospects that any attempt to restructure the opposition will succeed the",! !"dbpedia_resource_url": [! "http://dbpedia.org/resource/Hillary_Rodham_Clinton",! "http://dbpedia.org/resource/Bashar_al-Assad"],! !"facet_terms": "CLINTON, HILLARY RODHAM ASSAD, BASHAR AL- SYRIA DOHA (QATAR) SYRIAN NATIONAL COUNCIL STATE DEPARTMENT WAR AND REVOLUTION DEFENSE AND MILITARY FORCES"! }! !

Week 8: Knowledge Representa%on


Story Metadata Linked Open Data Knowledge as Rela%ons Automa%c story wri%ng

Objects and rela%ons in text?

names, dates, places, verbs.

Named En%ty Recogni%on


Extract subjects, objects, from text. Also, resolve pronouns if possible. "Gov. Andrew M. Cuomo on Wednesday gave a sea wall the nod. Because of the recent history of powerful storms hiing the area, he said, elected ocials have a responsibility to consider new and innova%ve plans to prevent similar damage in the future."

NER state of the art


Commercial: Reuters OpenCalais Academic: Stanford NER library

Next level of understanding: verbs


The water that made rivers of Avenues C and D receded on Tuesday, and the East Village was a mixture of disaster and nonchalance. A group of young men in pajama pants and shorts threw a football on East 12th Street, while workers pumped the basement of CHP Hardware on Avenue C and Eighth Street. subject verb object

Knowledge Representa%on in AI (a crazy brief introduc%on)


Classic "symbolic" paradigm represents knowledge as statements in mathema%cal logic. Many varia%ons. Most are subsets or modica%ons of standard rst order logic (FOL).

Predicates and Rela%ons


Predicate: asserts that object belongs to a class
vechicle(schoolbus)! bird(tweety)! straight_gangsta(emily_bell)!

Rela%on: asserts rela%onship between objects


is_a(car, vehicle)! higher_rank(general, colonel)! capital(paris, france)!

Inference
General rules
a (a => b) => b! p !p!

Domain specic inferences


is_a(car, vehicle)! can_move(vehicle)! => can_move(car)!

News as rela%ons between en%%es


Alice a\ended the wedding
!attended(alice, wedding)!

! IBM was founded in 1917.


!founded(IBM, 1917)!

! Hurricane Sandy hit New York !hit(hurricane_sandy, New_York)! ! Encode facts as relation(subject,object)! also wri\en (subject relation object)! !

Things we could do with this


Ques%on answering
The granddaughter of which actor starred in E.T.?
(?x acted-in E.T.)(?y is-a actor)(?x granddaughter-of ?y)!

Inference
!(bob brother-of alice)! !(alice mother-of lucy) =>! ! !(bob uncle-of lucy)!

Answer ques%ons using inference how many execu%ves of publicly-traded Canadian companies died in car
crashes?

Problems
Not all subjects are simple.
Over a hundred guests a\ended the wedding
!attended(num_guests, wedding) ! !greater_than(num_guests,100)! ! ! !!

! Some rela%ons have mul%ple parts. ! !

Hurricane Sandy hit New York on Monday


!hit(sandy, New_York, monday)!

Standard inference doesnt allow defaults


All birds y
!bird(tweety) ! ! ! !! !bird(?x) => flies(?x)! => flies(tweety)!

! But, penguins and chickens dont y


bird(?x) & !penguin(?x) & !chicken(?x)=> flies(?x)! !

Now we cant guess that tweety ies


bird(tweety)!=> flies(tweety) ?! we dont know!!

Standard mathema%cal logic doesnt deal well with excep%ons


Some people dont have a last name. ! Some%mes an elec%on isnt decided on elec%on day. Is a trash can used as a ower pot s%ll a trash can? Is a broken car s%ll a vehicle if it can't move?

Rela%ons from sentence parsing


The water that made rivers of Avenues C and D receded on Tuesday, and the East Village was a mixture of disaster and nonchalance. A group of young men in pajama pants and shorts threw a football on East 12th Street, while workers pumped the basement of CHP Hardware on Avenue C and Eighth Street. subject verb object

Rela%on extrac%on systems


Commercial: IBM's DeepQA (Watson) Academic: Reverb algorithm

Ontology explosions
(water made rivers of Avenues C and D) (East Village was a mixture of disaster and nonchalance) (group of young men in pajama pants and shorts threw football) (workers pumped the basement of CHP Hardware )

Do we have all of these in the ontology?

General Ques%on Answering

Precision/recall tradeo. State of the art is IBMs DeepQA

DeepQA use of structured data


Watson can also use detected rela%ons to query a triple store and directly generate candidate answers. Due to the breadth of rela%ons in the Jeopardy domain and the variety of ways in which they are expressed, however, Watsons current ability to eec%vely use curated databases to simply look up the answers is limited to fewer than 2 percent of the clues. - Ferruci et. al. Building Watson

Week 8: Knowledge Representa%on


Story Metadata Linked Open Data Knowledge as Rela%ons Automa%c story wri%ng

Wall Street is high on Molson Coors Brewing (TAP), expec%ng it to report earnings that are up 17.5% from a year ago when it reports its third quarter earnings on Wednesday, November 7, 2012. The consensus es%mate is $1.34 per share, up from earnings of $1.14 per share a year ago. The consensus es%mate has dipped over the past month, from $1.35, but its s%ll up from the consensus es%mate of $1.19 three months ago. For the scal year, analysts are expec%ng earnings of $3.89 per share. Revenue is projected to eclipse the year-earlier total of $954.4 million by 31%, nishing at $1.25 billion for the quarter. For the year, revenue is projected to roll in at $4.04 billion. The companys net income has declined in the last two quarters. The company posted prot falling by 52.8% in the second quarter. This is ader it reported a prot decline in the rst quarter by 4.1%.

You might also like