You are on page 1of 4

Introduction to Big Data Ecosystems

(Group Project)

Total Points: 100

Prof. Umesh Rao. Hodeghatta, Ph.D

Web: http://www.mytechnospeak.com

Description
As a part of this course, you and your team will demonstrate your learning by working on developing an
analytical solution. Given the short time and background in programming, your task is to work on any
“Challenges related to Big Data Analytics” and write a detailed report/case study. You are open to write a
code to demonstrate your findings, but it is not compulsory.

You can select the topic from any domain (HR, marketing, finance, etc.). If two teams happen to choose
the same domain, they should avoid the same topic.

Assignment
Your project report should clearly demonstrate the process that you have followed. You should explain
each step of the analysis and demonstrate your understanding of the subject. I encourage all of you to
publish this report in an open community for feedback.

1. Project report (50 points + 15 points for report writing, neatness, visual representations, etc.):
This is your end project delivery document (12-20 pages, 1.5 line spacing, 11 Font size). At minimum, the
report should include the following sections:

1. Title of the Project (Problem)

2. Executive Summary (Challenges/problems you are trying to solve)

3. Introduction

4. Business Requirements (Detailed steps needed to solve the business problem)

5. Overview of data and data resources (if applicable)

6. Big Data Technology and Tools (to solve business problem)

7. Details of your modeling strategy (i.e. which technique and why)

8. Insights and conclusions


You can include snapshots of your code and the outputs in the report (if applicable). You must submit a
single document per group in PDF format.

2. Class Presentation Slides (25 points):


You need to prepare presentation slides sharing your insights from your project, assuming you are making
this presentation to senior executives of the company who will take decisions based on your report. This
should mostly focus on the benefits to the management rather than having too many technical details. You
are presenting to the class during the last week of the semester. That means, you must complete your project
before the presentation. The class will assess your presentation (they are the decision makers) and provide
feedback.

Submission guidelines (10 points):


The first page of the document should include a table with a list of the names of the group participants and
a summary of contribution of each team member.

Submit the above items (1-2) on the course portal as a single zipped file. One Submission per group is
sufficient. The zip file should be named Section_X_Group_Y.zip where X is your section, and Y is your
group number. (e.g. Section_A_Group_10.zip).

Sample Topics:
1. Big Data Tools and Technology

2. BIGDATA (Hadoop) based WEB MINING


3. Growing IoT Networks

4. Big Data Analytics and Career


5. Real-time streaming in Hadoop (Big Data)

6. Big Data major players in the industry


7. Graph Database for Big Data Analytics

8. Big Data Analytics in Manufacturing


9. Big Data Analytics in Finance
10. Big Data Visualization tools

11. Using Big Data in Personalized Health care

Assessment: Total 100 points (30% of your final grade)


Sample reference Data Source:

NOTE: Please read carefully on how to use the data and how to reference in each of the data source and
follow the data policy with proper referencing.

Sl.No Repository URL

1 UCI Machine Learning http://archive.ics.uci.edu/ml/about.html


Library

2 Gapminder https://www.gapminder.org/data/

3 Dataset Repository http://fimi.uantwerpen.be/data/

4 R-Blogger datasets https://www.r-bloggers.com/datasets-to-practice-your-data-


mining/

5 R-datasets http://vincentarelbundock.github.io/Rdatasets/datasets.html

6 UK Government Data Data.gov.uk


7 Walmart Recruiting - Store https://www.kaggle.com/c/walmart-recruiting-store-sales-
Sales Forecasting forecasting/data

8 Airbnb http://insideairbnb.com/get-the-data.html
Dow Jones Index Data Set
9 http://archive.ics.uci.edu/ml/datasets/Dow+Jones+Index
Yelp
10 https://www.yelp.com/dataset/challenge
May 2015 Reddit Comments
11 https://www.kaggle.com/reddit/reddit-comments-may-2015

12 Airline Safety https://github.com/fivethirtyeight/data/tree/master/airline-safety

13 Bosch Production Line https://www.kaggle.com/c/bosch-production-line-performance


Performance

14 Harvard Dataverse https://dataverse.harvard.edu/dataverse/cid?q=&types=datasets&s


ort=dateSort&order=desc&page=1

15 MIT Data source https://ocw.mit.edu/courses/sloan-school-of-management/15-097-


prediction-machine-learning-and-statistics-spring-2012/datasets/
16 Datasets from MIT http://web.mit.edu/towtank/www/vivdr/datasets.html

17 Health Data https://www.cpc.unc.edu/projects/addhealth/documentation

18 Indiana University, http://cnets.indiana.edu/groups/nan/webtraffic/click-dataset/


Bloomington

19 Paw Research Center https://www.pewresearch.org/download-datasets/

20 Million Song Dataset http://millionsongdataset.com/pages/additional-datasets/

21 YAHOO https://webscope.sandbox.yahoo.com/

22 FiveThirtyEight https://data.fivethirtyeight.com/

Other References:
Big Data and AI Applications in Finance Industry:
http://www.ee.columbia.edu/~cylin/course/bigdata/EECS6893-BigDataAnalytics-Lecture11.pdf
Sapphirine Big Data Analytics Open Source Applications:
http://www.ee.columbia.edu/~cylin/course/bigdata/projects/

You might also like