You are on page 1of 44

AI Seminar

Our web page is at:


www.cs.nmsu.edu/~gradrep
Under “Events” in left frame

September 5, 2001 Melanie Martin - AI Seminar 1


Identifying Ideological Point of View
Part II

Melanie Martin
September 5, 2001

September 5, 2001 Melanie Martin - AI Seminar 2


Outline of this presentation
 Where are we???
 Ideology
 Statistical NLP and Machine Learning
 Discourse features
 Internet
 Conclusion

September 5, 2001 Melanie Martin - AI Seminar 3


Where are we???
 Let’s recall what we want to do:

 Build a system that could take


information from web pages and Usenet
newsgroups on a given topic and
segment, classify or cluster it by
ideological point of view…..

September 5, 2001 Melanie Martin - AI Seminar 4


The Proposed System
User
inputs
topic

Topic
Set of Ideological
Search Clustering,
documents Clustering
Engine Filtering
on topic

Docs on
Internet:
topic
Web pages,
clustered
Usenet
by IPV

September 5, 2001 Melanie Martin - AI Seminar 5


Where are we???
 What do we need?
– A computationally feasible definition of
ideological point of view

– A search engine, possibly with additional


processing, to produce a collection of
documents on the topic specified by the
user

September 5, 2001 Melanie Martin - AI Seminar 6


Where are we???
 What else do we need?
– A module to cluster documents by
ideological point of view

– A user interface

– A way to evaluate the system

September 5, 2001 Melanie Martin - AI Seminar 7


Where are we???
 Why do we need this?
 Some examples using google:
– query: back pain ~2,220,000
• scoliosis ~121,000
– query: lyme disease ~163,000
– query: zoning shopping center ~65,100
• (add) clark county nv ~299
– query: un racism conference ~74,000
September 5, 2001 Melanie Martin - AI Seminar 8
Outline of this presentation
 Where are we???
 Ideology
 Statistical NLP and Machine Learning
 Discourse features
 Internet
 Conclusion

September 5, 2001 Melanie Martin - AI Seminar 9


Ideology
 Working definition from van Dijk:
“Ideologies are the fundamental beliefs
of a group and its members.”
– instantiated as Us vs. Them
– predefined ideologies will not work across
domains
– want to avoid researcher bias
– definition likely needs more work
September 5, 2001 Melanie Martin - AI Seminar 10
Ideology
 Linguistics
– van Dijk (1998)
– Blommaert & Verschueren (1998)
– Wang (1993)
– Wortham & Locher (1996)

September 5, 2001 Melanie Martin - AI Seminar 11


Ideology
 The Systems
– Ideology Machine -1965 to 1973 - Abelson et al.
– Politics - 1979 - Carbonell
– Pauline - 1987 - Hovy
– Tracking Point of View in Narrative - 1994 - Wiebe
– Spin Doctor - 1994 - Sack
– Terminal Time - 2000 - Mateas et al.

September 5, 2001 Melanie Martin - AI Seminar 12


Ideology
 Some issues
– Evaluation!!!
– Hard-coded knowledge
– Domain dependence
– Cognitive plausibility
– More precise definitions

September 5, 2001 Melanie Martin - AI Seminar 13


Outline of this presentation
 Where are we???
 Ideology
 Statistical NLP and Machine Learning
 Discourse features
 Internet
 Conclusion

September 5, 2001 Melanie Martin - AI Seminar 14


Statistical NLP and ML
 Two techniques we will consider
– Latent Semantic Analysis
– Probabilistic Classification

September 5, 2001 Melanie Martin - AI Seminar 15


Statistical NLP and ML
 Issues
– clustering versus classification
• categories may not be predefined
• may want to take a variety of features into
account
– favor learning over hard-coding knowledge
– supervised versus unsupervised
• cost of annotated training data

September 5, 2001 Melanie Martin - AI Seminar 16


Statistical NLP and ML
 Latent Semantic Analysis
– text represented as a matrix
• entries are weighted frequency of word in
context
– semantic space obtained through SVD
• words appearing in similar context have similar
feature vectors
– characterizes semantic content of words in
context
September 5, 2001 Melanie Martin - AI Seminar 17
Statistical NLP and ML
 Why LSA is a good choice here
– semantics is key component of ideological
discourse
– clustering without need for predefined
categories
– already shown useful for:
• summarization (Ando 2000)
• text segmentation (Choi 2001)
• measuring text coherence (Foltz 1998)
September 5, 2001 Melanie Martin - AI Seminar 18
Statistical NLP and ML
 We want to look a little more closely at
Ando’s work
– uses term, sentence, and document
vectors
– modified SVD algorithm
– interesting interface
 Multi-document summarization by visualizing topical content.
Rie Kubota Ando, Branimir Boguraev, Roy Byrd, and Mary Neff.
ANLP/NAACL '00 Workshop on Automatic Summarization

September 5, 2001 Melanie Martin - AI Seminar 19


Statistical NLP and ML
 Another option is a probabilistic
classifier
– assigns most probable class to an object
bases on a probability model
– can we get around predefined classes?

September 5, 2001 Melanie Martin - AI Seminar 20


Statistical NLP and ML
 Probability model
– defines joint distribution of variables
• set of feature variables and a class variable
 Wiebe and Bruce (1995) got around the
issue of not knowing the classes in
advance by breaking up the problem
and using a series of classifiers

September 5, 2001 Melanie Martin - AI Seminar 21


Statistical NLP and ML
 We need to come up with a set of
features…our next topic

 Then deciding which features to use


can be determined statistically with
goodness of fit of graphical models

September 5, 2001 Melanie Martin - AI Seminar 22


Statistical NLP and ML
 Both methods seem to have a lot of
potential
 LSA would be easier to implement
– possibly a baseline for evaluation of
probabilistic classifiers
 Less linguistic knowledge gain likely
with LSA

September 5, 2001 Melanie Martin - AI Seminar 23


Outline of this presentation
 Where are we???
 Ideology
 Statistical NLP and Machine Learning
 Discourse features
 Internet
 Conclusion

September 5, 2001 Melanie Martin - AI Seminar 24


Discourse features
 If we use probabilistic classifiers we
need features, so we look at:
– linguistics
– previous systems
– discourse theory
– literary theory

September 5, 2001 Melanie Martin - AI Seminar 25


Discourse features
 From linguistics and discourse:
 General strategy of most ideological
discourse (van Dijk’s Ideological Square):
– Emphasize positive things about Us
– Emphasize negative things about Them
– De-emphasize negative things about Us
– De-emphasize positive things about Them

September 5, 2001 Melanie Martin - AI Seminar 26


Discourse features
 How are these strategies instantiated in
discourse? (van Dijk)
– What is there:
• argument structure
• syntactic patterns
• style and non-literal language
• actor descriptions
• thematic structure
• topoi (standardized topics)
September 5, 2001 Melanie Martin - AI Seminar 27
Discourse features
– What is not there
• implication
• presupposition
• inference
• goals and plans

September 5, 2001 Melanie Martin - AI Seminar 28


Discourse features
 Disclaimers, selected examples:
– Apparent Negation: I have nothing against X, but...
– Apparent Concession: They may be very smart,
but...
– Apparent Empathy: They may have had problems,
but...
– Apparent Effort: We do everything we can, but...
 Positive self-representation and face
keeping
September 5, 2001 Melanie Martin - AI Seminar 29
Discourse features
 Some discourse theories from
Computational Linguistics
– Mann & Thompson (RST) (1988)
– Grosz & Sidner (G&S) (1986)
– Morris & Hirst (Lexical chains) (1991)

September 5, 2001 Melanie Martin - AI Seminar 30


Discourse features
 Issues

– implementation
• G&S, RST
– finite number of fixed primitives
• RST
– domain specific
• RST depends on training

September 5, 2001 Melanie Martin - AI Seminar 31


Discourse features
 A reasonable first approach: Lexical
Chains (Morris & Hirst)
 Sequences of related words spanning a
topical unit in the text
– based on lexical cohesion
– encapsulates context
– helps identify key phrases

September 5, 2001 Melanie Martin - AI Seminar 32


Discourse features
 Idea of Algorithm
– read next word
• if candidate
– check chains within suitable span
» check thesaurus or WordNet
» check other knowledge sources
– if found
» include in chain
» recalculate chain

September 5, 2001 Melanie Martin - AI Seminar 33


Discourse features
 Lexical chains could help us in:
– topic segmentation
– intentional structure
– lexical features for a classifier

September 5, 2001 Melanie Martin - AI Seminar 34


Discourse features
 Lexical chains are easy to implement,
but are unlikely to be sufficient…
 For the next approximation: RST
– Marcu’s implementation incorporating G&S
– Mostly used for summarization and
generation
– Would help get at the argument structure
of the text
September 5, 2001 Melanie Martin - AI Seminar 35
Discourse features
 RST Basics
– about 23 rhetorical relations
• account for discourse coherence
• link adjacent spans of text
– 5 schema
• defined in terms of relations
• specify how spans can co-occur
– nucleus and satellite spans
– end up with tree structure
September 5, 2001 Melanie Martin - AI Seminar 36
Discourse features
 Would most likely use RST to generate
features for a classifier or as input to a
pattern recognizer
 Nuclei spans help pick out the more
important segments of text
 Produces a tree that gives the structure
of the rhetorical structure of the text

September 5, 2001 Melanie Martin - AI Seminar 37


Outline of this presentation
 Where are we???
 Ideology
 Statistical NLP and Machine Learning
 Discourse features
 Internet
 Conclusion

September 5, 2001 Melanie Martin - AI Seminar 38


Internet
 We would like to mine the structure of
the internet
– see if there is a correspondence with
groups
– improved IR by topic
– figure out what search engine to use as a
base for our system

September 5, 2001 Melanie Martin - AI Seminar 39


Internet
 Issues
– topic or query disambiguation
– what is a minimal unit
– how to use the structure of the web
• finding authorities
• communities and subgraphs
– Evaluation!!!

September 5, 2001 Melanie Martin - AI Seminar 40


Internet
 Kleinberg (1997)
– link based model
– hub - links to many related authorities
– authority
– iterative weighting algorithm that
converges (rapidly in practice)
– can disambiguate authorities by sense
– can be used to trawl for cyber communities
September 5, 2001 Melanie Martin - AI Seminar 41
Outline of this presentation
 Where are we???
 Ideology
 Statistical NLP and Machine Learning
 Discourse features
 Internet
 Conclusion

September 5, 2001 Melanie Martin - AI Seminar 42


Conclusion
 It seems that such a system can be built
– find a good search engine
– use Kleinberg’s algorithm to improve
collection of documents retrieved
– use LSA and/or a probabilistic classifier to
handle the ideological point of view
– with a probabilistic classifier use linguistic
and discourse features
– develop evaluation methodolgy
September 5, 2001 Melanie Martin - AI Seminar 43
The End

Thanks for listening!


If you want to know more, my
Comprehensive Exam paper is at:
www.CS.NMSU.Edu/~mmartin/courses/comps_all.html

September 5, 2001 Melanie Martin - AI Seminar 44

You might also like