You are on page 1of 4

Submitted by

Keerthana Subramani
ASU ID: 1203845157
 Source: Huffington Post
 News website that has news articles and user
comments.
 Data type: Text data
 Structure of pages: HTML
 Seed URL and search query is specified
 Crawl URLs using depth first approach and
store in database.
 k, k+1th , k+2 and so on..
 Eyeballing
 Identify anchors
 Access source using the URL
 Run RE over HTML data
 Get title, timestamp, tags and article data and
store to database.
 User comments are also extracted.
 Identify by string matching
 Query dataset using timestamp.
 Creating a timeline of events from the
timestamp of each data
 Software used: Timeline creator
 Shows the date and title, occurrence of events
over time

You might also like