AI Seminar: Our Web Page Is At: WWW - Cs.nmsu - Edu/ Gradrep Under "Events" in Left Frame

AI Seminar
Our web page is at:

www.cs.nmsu.edu/~gradrep
Under “Events” in left frame
September 5, 2001 Melanie Martin - AI Seminar 1

Identifying Ideological Point of View
Part II
Melanie Martin
September 5, 2001

Outline of this presentation
 Where are we???
 Ideology
 Statistical NLP and Machine Learning
 Discourse features
 Internet
 Conclusion

Where are we???
 Let’s recall what we want to do:
 Build a system that could take

information from web pages and Usenet
newsgroups on a given topic and
segment, classify or cluster it by
ideological point of view…..

The Proposed System
User
inputs
topic
Topic
Set of Ideological
Search Clustering,
documents Clustering
Engine Filtering
on topic
Docs on
Internet:
topic
Web pages,
clustered
Usenet
by IPV

Where are we???
 What do we need?
– A computationally feasible definition of
ideological point of view
– A search engine, possibly with additional

processing, to produce a collection of
documents on the topic specified by the
user

Where are we???
 What else do we need?
– A module to cluster documents by
ideological point of view
– A user interface
– A way to evaluate the system

Where are we???
 Why do we need this?
 Some examples using google:
– query: back pain ~2,220,000
• scoliosis ~121,000
– query: lyme disease ~163,000
– query: zoning shopping center ~65,100
• (add) clark county nv ~299
– query: un racism conference ~74,000
 Where are we???
 Ideology
 Internet
 Conclusion

Ideology
 Working definition from van Dijk:
“Ideologies are the fundamental beliefs
of a group and its members.”
– instantiated as Us vs. Them
– predefined ideologies will not work across
domains
– want to avoid researcher bias
– definition likely needs more work
Ideology
 Linguistics
– van Dijk (1998)
– Blommaert & Verschueren (1998)
– Wang (1993)
– Wortham & Locher (1996)

Ideology
 The Systems
– Ideology Machine -1965 to 1973 - Abelson et al.
– Politics - 1979 - Carbonell
– Pauline - 1987 - Hovy
– Tracking Point of View in Narrative - 1994 - Wiebe
– Spin Doctor - 1994 - Sack
– Terminal Time - 2000 - Mateas et al.

Ideology
 Some issues
– Evaluation!!!
– Hard-coded knowledge
– Domain dependence
– Cognitive plausibility
– More precise definitions

 Where are we???
 Ideology
 Internet
 Conclusion

Statistical NLP and ML
 Two techniques we will consider
– Latent Semantic Analysis
– Probabilistic Classification

 Issues
– clustering versus classification
• categories may not be predefined
• may want to take a variety of features into
account
– favor learning over hard-coding knowledge
– supervised versus unsupervised
• cost of annotated training data

 Latent Semantic Analysis
– text represented as a matrix
• entries are weighted frequency of word in
context
– semantic space obtained through SVD
• words appearing in similar context have similar
feature vectors
– characterizes semantic content of words in
context
 Why LSA is a good choice here
– semantics is key component of ideological
discourse
– clustering without need for predefined
categories
– already shown useful for:
• summarization (Ando 2000)
• text segmentation (Choi 2001)
• measuring text coherence (Foltz 1998)
 We want to look a little more closely at
Ando’s work
– uses term, sentence, and document
vectors
– modified SVD algorithm
– interesting interface
 Multi-document summarization by visualizing topical content.
Rie Kubota Ando, Branimir Boguraev, Roy Byrd, and Mary Neff.
ANLP/NAACL '00 Workshop on Automatic Summarization

 Another option is a probabilistic
classifier
– assigns most probable class to an object
bases on a probability model
– can we get around predefined classes?

 Probability model
– defines joint distribution of variables
• set of feature variables and a class variable
 Wiebe and Bruce (1995) got around the
issue of not knowing the classes in
advance by breaking up the problem
and using a series of classifiers

 We need to come up with a set of
features…our next topic
 Then deciding which features to use

can be determined statistically with
goodness of fit of graphical models

 Both methods seem to have a lot of
potential
 LSA would be easier to implement
– possibly a baseline for evaluation of
probabilistic classifiers
 Less linguistic knowledge gain likely
with LSA

 Where are we???
 Ideology
 Internet
 Conclusion

Discourse features
 If we use probabilistic classifiers we
need features, so we look at:
– linguistics
– previous systems
– discourse theory
– literary theory

Discourse features
 From linguistics and discourse:
 General strategy of most ideological
discourse (van Dijk’s Ideological Square):
– Emphasize positive things about Us
– Emphasize negative things about Them
– De-emphasize negative things about Us
– De-emphasize positive things about Them

Discourse features
 How are these strategies instantiated in
discourse? (van Dijk)
– What is there:
• argument structure
• syntactic patterns
• style and non-literal language
• actor descriptions
• thematic structure
• topoi (standardized topics)
Discourse features
– What is not there
• implication
• presupposition
• inference
• goals and plans

Discourse features
 Disclaimers, selected examples:
– Apparent Negation: I have nothing against X, but...
– Apparent Concession: They may be very smart,
but...
– Apparent Empathy: They may have had problems,
but...
– Apparent Effort: We do everything we can, but...
 Positive self-representation and face
keeping
Discourse features
 Some discourse theories from
Computational Linguistics
– Mann & Thompson (RST) (1988)
– Grosz & Sidner (G&S) (1986)
– Morris & Hirst (Lexical chains) (1991)

Discourse features
 Issues
– implementation
• G&S, RST
– finite number of fixed primitives
• RST
– domain specific
• RST depends on training

Discourse features
 A reasonable first approach: Lexical
Chains (Morris & Hirst)
 Sequences of related words spanning a
topical unit in the text
– based on lexical cohesion
– encapsulates context
– helps identify key phrases

Discourse features
 Idea of Algorithm
– read next word
• if candidate
– check chains within suitable span
» check thesaurus or WordNet
» check other knowledge sources
– if found
» include in chain
» recalculate chain

Discourse features
 Lexical chains could help us in:
– topic segmentation
– intentional structure
– lexical features for a classifier

Discourse features
 Lexical chains are easy to implement,
but are unlikely to be sufficient…
 For the next approximation: RST
– Marcu’s implementation incorporating G&S
– Mostly used for summarization and
generation
– Would help get at the argument structure
of the text
Discourse features
 RST Basics
– about 23 rhetorical relations
• account for discourse coherence
• link adjacent spans of text
– 5 schema
• defined in terms of relations
• specify how spans can co-occur
– nucleus and satellite spans
– end up with tree structure
Discourse features
 Would most likely use RST to generate
features for a classifier or as input to a
pattern recognizer
 Nuclei spans help pick out the more
important segments of text
 Produces a tree that gives the structure
of the rhetorical structure of the text

 Where are we???
 Ideology
 Internet
 Conclusion

Internet
 We would like to mine the structure of
the internet
– see if there is a correspondence with
groups
– improved IR by topic
– figure out what search engine to use as a
base for our system

Internet
 Issues
– topic or query disambiguation
– what is a minimal unit
– how to use the structure of the web
• finding authorities
• communities and subgraphs
– Evaluation!!!

Internet
 Kleinberg (1997)
– link based model
– hub - links to many related authorities
– authority
– iterative weighting algorithm that
converges (rapidly in practice)
– can disambiguate authorities by sense
– can be used to trawl for cyber communities
 Where are we???
 Ideology
 Internet
 Conclusion

Conclusion
 It seems that such a system can be built
– find a good search engine
– use Kleinberg’s algorithm to improve
collection of documents retrieved
– use LSA and/or a probabilistic classifier to
handle the ideological point of view
– with a probabilistic classifier use linguistic
and discourse features
– develop evaluation methodolgy
The End
Thanks for listening!

If you want to know more, my
Comprehensive Exam paper is at:
www.CS.NMSU.Edu/~mmartin/courses/comps_all.html

AI Seminar: Our Web Page Is At: WWW - Cs.nmsu - Edu/ Gradrep Under "Events" in Left Frame

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

AI Seminar: Our Web Page Is At: WWW - Cs.nmsu - Edu/ Gradrep Under "Events" in Left Frame

Uploaded by

Copyright:

Available Formats

AI Seminar

Our web page is at:

September 5, 2001 Melanie Martin - AI Seminar 1

September 5, 2001 Melanie Martin - AI Seminar 2

September 5, 2001 Melanie Martin - AI Seminar 3

 Build a system that could take

September 5, 2001 Melanie Martin - AI Seminar 4

September 5, 2001 Melanie Martin - AI Seminar 5

– A search engine, possibly with additional

September 5, 2001 Melanie Martin - AI Seminar 6

– A way to evaluate the system

September 5, 2001 Melanie Martin - AI Seminar 7

September 5, 2001 Melanie Martin - AI Seminar 9

September 5, 2001 Melanie Martin - AI Seminar 11

September 5, 2001 Melanie Martin - AI Seminar 12

September 5, 2001 Melanie Martin - AI Seminar 13

September 5, 2001 Melanie Martin - AI Seminar 14

September 5, 2001 Melanie Martin - AI Seminar 15

September 5, 2001 Melanie Martin - AI Seminar 16

September 5, 2001 Melanie Martin - AI Seminar 19

September 5, 2001 Melanie Martin - AI Seminar 20

September 5, 2001 Melanie Martin - AI Seminar 21

 Then deciding which features to use

September 5, 2001 Melanie Martin - AI Seminar 22

September 5, 2001 Melanie Martin - AI Seminar 23

September 5, 2001 Melanie Martin - AI Seminar 24

September 5, 2001 Melanie Martin - AI Seminar 25

September 5, 2001 Melanie Martin - AI Seminar 26

September 5, 2001 Melanie Martin - AI Seminar 28

September 5, 2001 Melanie Martin - AI Seminar 30

September 5, 2001 Melanie Martin - AI Seminar 31

September 5, 2001 Melanie Martin - AI Seminar 32

September 5, 2001 Melanie Martin - AI Seminar 33

September 5, 2001 Melanie Martin - AI Seminar 34

September 5, 2001 Melanie Martin - AI Seminar 37

September 5, 2001 Melanie Martin - AI Seminar 38

September 5, 2001 Melanie Martin - AI Seminar 39

September 5, 2001 Melanie Martin - AI Seminar 40

September 5, 2001 Melanie Martin - AI Seminar 42

Thanks for listening!

September 5, 2001 Melanie Martin - AI Seminar 44

You might also like