You are on page 1of 3

There are 10 text files. These text files are AOL query log for 3 months.

The
structure of file is as follows

anon_id – anonymous id – number

query – query entered by user – String

query time – time at which query was entered – number

item rank – rank of clicked URL - integer

click_url – Url clicked by the user – String

Step1: Create a mysql database aol

Step2: Under database aol, create a table, aol_queries

Step3: The table aol_queries has the following schema

Step4: export the text files to table aol_queries

Hint: This is a lot of data, for fast retrieval, make an index on table aol_queries.

Step5: Write a Java program to connect to the table aol_queries,

- List the number of unique (distinct) users.


- For each user, list the query, clicked URL, and rank.
-
- For each user- list the query, clicked URLs, date, time grouped by day
and time i.e. queries issued on a particular day should appear together
sorted by time.
- Draw a graph that shows the percentage of users and averaged clicked
page rank.
o Take the average rank (AR) of clicked URL for each user
o Take the percentage of users whose average rank (AR) of clicked
URL is 1. Do the same for AR of 2, 3 until 10.
o Plot the graph with AR as x-again and percentage of users as y-
axis
o Plot another graph with AR as y-axis and percentage of users as x-
axis.

Step6: For a particular user, extract the tags related to the Clicked URL (use the
attached delicious.jar file- will explain in the class how to use it)

- For a particular user,


o list the query, clicked URL, tags, tag weights and time
o Store the user id, query, clicked URL, tags, tag weight and time in
a file “userid.txt”.
Ex: tag1==weight1==date1==time1
Tag2==weight2==date1==time1
Tag3==weight3==date2==time2
Tag1==weight2==date1==time1

o Plot a graph showing, how many new tags are generated as a


function of time. X-axis will show time: 1st week, 2nd week, 3rd
week, 4th week, 2nd month, and 3rd month. Y-axis: number of tags
o Make a 3-dim graph showing time (x-axis) value of tags (y-axis)
and tag (z-axis): this graph will show how the tag values change
over time.

You might also like