This action might not be possible to undo. Are you sure you want to continue?
The structure of file is as follows anon_id anonymous id number query query entered by user String query time time at which query was entered number item rank rank of clicked URL - integer click_url Url clicked by the user String
Step1: Create a mysql database aol Step2: Under database aol, create a table, aol_queries Step3: The table aol_queries has the following schema
Step4: export the text files to table aol_queries Hint: This is a lot of data, for fast retrieval, make an index on table aol_queries. Step5: Write a Java program to connect to the table aol_queries, - List the number of unique (distinct) users. - For each user, list the query, clicked URL, and rank.
- For each user- list the query, clicked URLs, date, time grouped by day and time i.e. queries issued on a particular day should appear together sorted by time. - Draw a graph that shows the percentage of users and averaged clicked page rank. o Take the average rank (AR) of clicked URL for each user o Take the percentage of users whose average rank (AR) of clicked URL is 1. Do the same for AR of 2, 3 until 10. o Plot the graph with AR as x-again and percentage of users as yaxis o Plot another graph with AR as y-axis and percentage of users as xaxis. Step6: For a particular user, extract the tags related to the Clicked URL (use the attached delicious.jar file- will explain in the class how to use it) - For a particular user, o list the query, clicked URL, tags, tag weights and time o Store the user id, query, clicked URL, tags, tag weight and time in a file userid.txt . Ex: tag1==weight1==date1==time1 Tag2==weight2==date1==time1 Tag3==weight3==date2==time2 Tag1==weight2==date1==time1 o Plot a graph showing, how many new tags are generated as a function of time. X-axis will show time: 1st week, 2nd week, 3rd week, 4th week, 2nd month, and 3rd month. Y-axis: number of tags o Make a 3-dim graph showing time (x-axis) value of tags (y-axis) and tag (z-axis): this graph will show how the tag values change over time.