You are on page 1of 27

COMP6725 - Big Data Technologies

TOPIK 10
BIG DATA CASE STUDY
LEARNING OUTCOMES

At the end of this session, students will be able to:


o LO3. Analyze using big data analytics methods
OUTCOMES

Students are able to analyze using big data analytics methods


OUTLINE

1. Genome Data Analysis


2. Weather Data Analysis
3. Batch Analysis of News Articles
4. Interactive Querying of Weather Data
5. Song Recommendation System
6. Classifying Handwritten Digits
7. Movie Recommendation System
GENOME DATA ANALYSIS
GENOME DATA ANALYSIS

Use GenBase genomics benchmark synthetic data generator, sample:

Pic 10.1. Genome datasets


Source : Big Data Science & Analytics: A Hands-On Approach Basic Statistics.,
2016.
GENOME DATA ANALYSIS (CONT)

Pic 10.2. Analytics flow for genome data analysis


Source : Big Data Science & Analytics: A Hands-On Approach Basic Statistics.,
2016.
GENOME DATA ANALYSIS (CONT)

Pic 10.3. Using big data stack for analysis of genome data
Source : Big Data Science & Analytics: A Hands-On Approach Basic Statistics.,
2016.
GENOME DATA ANALYSIS (CONT)

Pic 10.4. Steps involved in building a regression model for predicting drug response
Source : Big Data Science & Analytics: A Hands-On Approach Basic Statistics., 2016.
GENOME DATA ANALYSIS (CONT)

Pic 10.5. Steps involved in computing correlation between the expression levels of all
pairs of genes
Source : Big Data Science & Analytics: A Hands-On Approach Basic Statistics., 2016.
WEATHER DATA ANALYSIS
WEATHER DATA ANALYSIS

Pic 10.6. Analytics flow for weather data analysis application


Source : Big Data Science & Analytics: A Hands-On Approach Basic Statistics.,
2016.
WEATHER DATA ANALYSIS (CONT)

Pic 10.7. Using Big Data stack for weather data analysis
Source : Big Data Science & Analytics: A Hands-On Approach Basic Statistics.,
2016.
BATCH ANALYSIS OF NEWS ARTICLES
BATCH ANALYSIS OF NEWS ARTICLES

Pic 10.8. A realization of Alpha pattern for batch analysis of news articles
Source : Big Data Science & Analytics: A Hands-On Approach Basic Statistics.,
2016.
BATCH ANALYSIS OF NEWS ARTICLES
(CONT)

Pic 10.9. Architecture of the system


Source : Big Data Science & Analytics: A Hands-On Approach Basic Statistics.,
2016.
INTERACTIVE QUERYING OF WEATHER DATA
INTERACTIVE QUERYING OF
WEATHER DATA

Pic 10.10. A realization of Delta pattern for interactive querying of weather data
Source : Big Data Science & Analytics: A Hands-On Approach Basic Statistics.,
2016.
SONG RECOMMENDATION SYSTEM
SONG RECOMMENDATION SYSTEM

Pic 10.11. Steps involved in song recommendation


Source : Big Data Science & Analytics: A Hands-On Approach Basic Statistics.,
2016.
CLASSIFYING HANDWRITTEN DIGITS
CLASSIFYING HANDWRITTEN DIGIT

Pic 10.12. Digit recognition


Source : Big Data Science & Analytics: A Hands-On Approach Basic Statistics.,
2016.
MOVIE RECOMMENDATION SYSTEM
MOVIE RECOMMENDATION SYSTEM

Pic 10.13. Architecture for movie recommendation system


Source : Big Data Science & Analytics: A Hands-On Approach Basic Statistics.,
2016.
ThankYOU...
SUMMARY
o Implementations and examples of applying these algorithms using Spark MLlib
and H2O machine learning frameworks were provided. Clustering is the process
of grouping similar data items together such that data items that are more
similar to each other than other data items are put in one cluster. The k-means
clustering algorithm groups data items into k clusters, such that all points in a
cluster are closer to their centroid as compared to the centroids of neighboring
clusters.
o Generalized Linear Models (GLM) are a generalization of ordinary linear
regression models that allows response variables which are discrete, non-
normally distributed and/or non-constant variance.
o In Support Vector Machine (SVM), a maximum margin hyperplane is
determined, that separates the two classes. Next, we described a specific
implementation of deep learning, which is based on a multi-layer feed-forward
artificial neural network which includes multiple layers of interconnected
neurons.
REFERENCES

o Arshdeep Bahga & Vijay Madisetti. (2016). Big Data Science & Analytics: A Hands-On Approach. 1st E.
VPT. India. ISBN: 9781949978001. Chapter 1, 7, 10, and 11
o https://towardsdatascience.com/step-by-step-twitter-sentiment-analysis-in-python-d6f650ade58d
o https://towardsdatascience.com/twitter-sentiment-analysis-in-python-1bafebe0b566

You might also like