37390158 Assignment 1 Database

37390158 Assignment 1 Database

Published by kumarharsh

Published by: kumarharsh on Sep 28, 2010
There are 10 text files. These text files are AOL query log for 3 months. The
structure of file is as follows
anon_id  anonymous id  number
query  query entered by user  String
query time  time at which query was entered  number
item rank  rank of clicked URL - integer
click_url  Url clicked by the user  String

Step1: Create a mysql database aol
Step2: Under database aol, create a table, aol_queries
Step3: The table aol_queries has the following schema

Step4: export the text files to table aol_queries
Hint: This is a lot of data, for fast retrieval, make an index on table aol_queries.
Step5: Write a Java program to connect to the table aol_queries,
- List the number of unique (distinct) users.
- For each user, list the query, clicked URL, and rank.
-- For each user- list the query, clicked URLs, date, time grouped by day
and time i.e. queries issued on a particular day should appear together
sorted by time.
- Draw a graph that shows the percentage of users and averaged clicked
page rank.
Take the average rank (AR) of clicked URL for each user
Take the percentage of users whose average rank (AR) of clicked
URL is 1. Do the same for AR of 2, 3 until 10.
Plot the graph with AR as x-again and percentage of users as y-
Plot another graph with AR as y-axis and percentage of users as x-
Step6: For a particular user, extract the tags related to the Clicked URL (use the
attached delicious.jar file- will explain in the class how to use it)
- For a particular user,
list the query, clicked URL, tags, tag weights and time
Store the user id, query, clicked URL, tags, tag weight and time in
a file userid.txt.

tag1==weight1==date1==time1 Tag2==weight2==date1==time1 Tag3==weight3==date2==time2 Tag1==weight2==date1==time1

Plot a graph showing, how many new tags are generated as a
function of time. X-axis will show time: 1st week, 2nd week, 3rd
week, 4th week, 2nd month, and 3rd month. Y-axis: number of tags
Make a 3-dim graph showing time (x-axis) value of tags (y-axis)
and tag (z-axis): this graph will show how the tag values change
over time.

