Professional Documents
Culture Documents
Hypertext documents
•Text
•Links
Web
•billions of documents
•authored by millions of diverse people
•edited by no one in particular
•distributed over millions of computers, connected by
variety of media
History of Hypertext
Citation,
• Hyperlinking
Ramayana, Mahabharata, Talmud
• branching, non-linear discourse, nested
commentary,
Dictionary, encyclopedia
• self-contained networks of textual nodes
• joined by referential links
Classification
• For assisting human efforts in maintaining
taxonomies
• E.g.: IBM's Lotus Notes text processing system &
Universal Database text extenders
Mining the Web Chakrabarti and Ramakrishnan 14
Hyperlink analysis
Take advantage of the structure of the Web
graph.
• Indicators of prestige of a page (E.g. citations)
• HITS & PageRank
Bibliometry
• bibliographic citation graph of academic papers
Topic distillation
• Adapting to idioms of Web authorship and linking
styles