You are on page 1of 3

Hadoop Programming Challenge

Submission deadline: September 30th, 2011 (11:59pm Toronto, Canada time) Participation details
1. Ensure you fully complete your profile in BigDataUniversity.com (or DB2University.com). Should you be selected for the trip, this information will be used for validation purposes. To update your profile, visit http://www.db2university.com/courses/user/view.php 2. Choose a dataset from the link provided below, or Google for a dataset of your choice. Ensure you follow any licensing requirements for using the dataset. http://www.delicious.com/pskomoroch/redistributable+dataset 3. With the dataset selected, use IBM InfoSphere BigInsights software to run a Hadoop MapReduce job that can discover something interesting about the dataset. This challenge requires you to be innovative and creative! Here are some examples: B.C. federal prison seizures, 2008-2010 dataset http://buzzdata.com/mariusbutuc/b-c-federal-prison-seizures-2008-2010 Using this dataset you can find interesting facts about things that have been seized from prisoners at British Columbia (Canada) prisons from 2008 to 2010. For example, the wordcount sample provided in the course could be used to count which items where seized the most. Distribution of Venture Capital in the United States in 2011 dataset http://buzzdata.com/azad2002/the-united-states-of-venture-capital-2011 Using this dataset, you can determine the city and industry where most venture capital has been raised in the first six months of 2011 in the United States Validate the "Halloween Effect" phenomenon in the stock market The Halloween Effect is a phenomenon that occurs in the stock market where returns are significantly higher during the November-April periods (after Halloween) versus the MayOctober period. Using Hadoop and a dataset for the stock market, you can confirm or reject such phenomenon. 4. The dataset can be small so that it can be run with Hadoop in pseudo-distributed mode in one single node. If you need to work on a larger dataset, you may want to use a Hadoop Cluster on the Cloud, but first ensure your program works on a subset of the dataset and develop it on a single node in pseudo-distributed mode. The selection process will be based mainly on creativity and interesting results rather than the size of the dataset. 5. You must use IBM InfoSphere BigInsights software. You can use the VMWare image provided in the course, install BigInsights Basic locally, or run it on the Cloud. To analyze or graph the results, you may use any other software for which you have a license such

as SPSS, Excel spreadsheets, and so on, or you may write an application in any language that can display results in a neat way. 6. By submitting your solution to this challenge, you agree to have your entire solution (including your code) added to examples for courses in BigDataUniversity.com or any other promotional avenues. If your submission is very interesting, you may be invited to present it in person at the IOD conference.

What needs to be submitted?


The following information has to be submitted in a zip file named Hadoop_<Lastname>_<Firstname>.zip Where: Lastname is your last name as input in your BigDataUniversity.com profile Firstname is your first name as input in your BigDataUniversity.com profile For example, if your first name is John, and your last name is Smith, you would submit the file as Hadoop_Smith_John.zip The zip file should contain the following files: 1) PARTICIPANT.pdf This file should include your personal information as input in your BigDataUniversity.com profile: - First name - Last name - Email address - Mailing address (street, city, province, country, postal code) - Telephone number including area code 2) README.pdf This file should describe each file included in the zip, and provide all the instructions required to recreate your solution. Include URLs, and any other details needed. If we cannot reproduce your solution based on your instructions, we may not be able to select your solution. 3) MYPROGRAM This can be a single file or a directory containing several files that are required to run your program 4) Any other file required to make your submission complete! Dont include your dataset, just indicate the URL where it can be downloaded. If you can host your solution on a Web site, provide the URL. You still need to submit all supporting files.

How should I submit?


Send the zip file by email to administrator@db2university.com. Your submission file cannot be larger than 5MB. Remember, your dataset must NOT be included in the submission. The email title should be: Hadoop Programming Challenge submission

What happens if Im selected?

On October 3rd, we will announce at BigDataUniversity.com who were selected for this trip. If you were selected, you will be notified by email and telephone. As soon as you are confirmed, you need to process your travel visas to the United States.

Good luck!