Big Data With Hadoop & Spark - Introduction

Welcome to
Big Data
with
Hadoop & Spark
Please introduce yourself while others are

joining
Introduction to Hadoop &

ReachUs@CloudxLab.com
Session 1 - Big Data with Hadoop & Spark
Duration: 3 hours
Agenda:
• Introduction to Big Data
• 10 mins. break
• Spark & Hadoop Architecture
Notes:
• Please introduce yourself using chat window while others are joining
• Session is being recorded & Recording & presentation will be shared
• This is Session 1 out of 18 sessions on Big Data with Hadoop & Spark
specialization.
• It suffices as an introduction to Big Data Technology Stack.
Asking Questions?
• Every one except Instructor is muted
• Please ask questions by typing in Q&A Window
• Instructor will read out the questions before answering
• To get better answers, keep your messages short and avoid chat language
Course Instructor
Founder
Loves Explaining Technologies
Software Engineer
Sandeep Giri
Worked On Large Scale Computing
Graduated from IIT Roorkee

Course Objective
Learn To Process
Big Data
With
Hadoop, Spark
&
Related Technologies

Course Structure
Videos Quizzes Hands-On Projects Case

Studies
Real Life Use Cases

Automated Hands-on Assessments
Learn by doing

Problem Hands On Assessment

Statement

Problem
Statement
Evaluatio
n Introduction to Hadoop &
My Courses

My Course List

Topics or PlayLists

Learning Item

Click when you are done!

Data Variety

Data Variety
ETL
Extract Transform Load

Distributed Systems
1.Groups of networked
computers
2.Interact with each other
3.To achieve a common goal.

Question
How Many Bytes in One

Petabyte?
1.1259x10 ^15

Question
How Much Data Facebook Stores

in One Day?
600 TB

What is Big Data?
• Simply: Data of Very Big

Size
• Can’t process with usual

tools
• Distributed Architecture
Needed
• Structured / Unstructured

Characteristics of Big Data
VOLUME VELOCITY VARIETY
Data At Rest Data In Motion Data in Many Forms
Problems Involving the

handling of data coming at Problems involving
Problems related to complex data
storage of huge data fast rate.
e.g. Number of requests structures
reliably. e.g. Maps, Social
e.g. Storage of Logs of a being received by
Facebook, Youtube Graphs,
website, Storage of data Recommendations
by gmail. streaming, Google
FB: 300 PB. 600TB/ day Analytics

Characteristics of Big Data - Variety
Problems involving complex data structures

e.g. Maps, Social Graphs, Recommendations
Question
Time taken to read 1 TB from

HDD?
Around 6 hours

Is One PetaByte Big Data?
If you have to count just vowels in 1

Petabyte data everyday, do you need
distributed system?

Is One PetaByte Big Data?
Yes.
Most of the existing systems can’t handle it.

Why Big Data?

Why is It Important Now?
X =>
Application
Devices: Connectivity
Social Networks
Smart Phones Wifi, 4G, NFC,
4.6 billion mobile-phones. Internet of
GPS
1 - 2 billion people accessing the
internet. Things
The devices became cheaper, faster and smaller.

The connectivity improved. Result: Many Applications
Computing Components
To process & store data
we need
1. CPU Speed
4. Network
2. RAM - Speed & 3. HDD or SSD

Size Disk Size + Speed
Which Components Impact the Speed
of Computing?
A. CPU
B. Memory Size
C. Memory Read Speed
D. Disk Speed
E. Disk Size
F. Network Speed
G. All of Above

Which Components Impact the Speed
of Computing?
A. CPU
B. Memory Size
C. Memory Read
Speed
D. Disk Speed
E. Disk Size
F. Network Speed
G. All of Above

Example Big Data Customers
1. Ecommerce - Recommendations

1. Ecommerce - Recommendations

Example Big Data Problems
Recommendations -
How?
MOVIE
USER ID RATING
ID
KUMAR matrix 4.0
KUMAR Ice age 3.5

USER ID MOVIE ID RATING
apocalyp
GIRI 3.6
se now
KUMAR apocalypse now 3.6
GIRI Ice age 3.5
GIRI matrix 4.0

2. Ecommerce - A/B Testing

Big Data Customers
Government
1.Fraud Detection
2.Cyber Security
Welfare
3.Justice Telecommunications
1.Customer Churn Prevention
2.Network Performance Optimization
3.Calling Data Record (CDR)
Analysis
4.Analyzing Network to Predict
Failure

Healthcare & Life Sciences

1.Health information
exchange
2.Gene sequencing
3.Healthcare improvements
4.Drug Safety
Big Data Solutions
1.Apache Hadoop
Apache Spark
2.Cassandra
3.MongoDB
4.Google Compute
Engine
5.AWS

What is Hadoop?
A. Created by Doug Cutting (of Yahoo)

B. Built for Nutch search engine project
C. Joined by Mike Cafarella
D. Based on GFS, GMR & Google Big Table
E. Named after Toy Elephant
F. Open Source - Apache
G. Powerful, Popular & Supported
H. Framework to handle Big Data
I. For distributed, scalable and reliable computing
J. Written in Java
WorkFlow
Components Spark Machin
SQL like e
interface learnin
g/
SQL Interface STATS
Compute Engine
NoSQL
Datastore
Resource
Manager
File Storage
Apach
e• Really fast MapReduce
• 100x faster than Hadoop MapReduce in
memory,
• 10x faster on disk.
• Builds on similar paradigms as MapReduce
• Integrated with Hadoop
Spark Core - A fast and general engine for large-
scale data processing.

Spark Architecture
Data Sources
HDFS
HBase
Spark Jav Pytho Scal Languages
SQL R a n a
Hive
Dataframe MLLi
Streaming GraphX Libraries
s b
Tachyon
Spark Core
Cassandra
Hadoop Amazon Standalon

Apache Mesos
YARN EC2 e
Resource/cluster managers

Thank you. For the full course please enroll at
https://cloudxlab.com/

For the full course please enroll at https://cloudxlab.com/


Big Data With Hadoop & Spark - Introduction

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Big Data With Hadoop & Spark - Introduction

Uploaded by

Copyright:

Available Formats

Welcome to

Hadoop & Spark

Please introduce yourself while others are

Introduction to Hadoop &

Loves Explaining Technologies

Worked On Large Scale Computing

Graduated from IIT Roorkee

Introduction to Hadoop &

Introduction to Hadoop &

Videos Quizzes Hands-On Projects Case

Real Life Use Cases

Introduction to Hadoop &

Introduction to Hadoop &

Problem Hands On Assessment

Introduction to Hadoop &

Introduction to Hadoop &

Introduction to Hadoop &

Introduction to Hadoop &

Introduction to Hadoop &

Click when you are done!

Introduction to Hadoop &

Introduction to Hadoop &

Introduction to Hadoop &

How Many Bytes in One

Introduction to Hadoop &

How Much Data Facebook Stores

Introduction to Hadoop &

• Simply: Data of Very Big

• Can’t process with usual

Introduction to Hadoop &

Problems Involving the

Introduction to Hadoop &

Problems involving complex data structures

Time taken to read 1 TB from

Introduction to Hadoop &

If you have to count just vowels in 1

Introduction to Hadoop &

Introduction to Hadoop &

Introduction to Hadoop &

The devices became cheaper, faster and smaller.

2. RAM - Speed & 3. HDD or SSD

Introduction to Hadoop &

Introduction to Hadoop &

Introduction to Hadoop &

Introduction to Hadoop &

KUMAR matrix 4.0

KUMAR Ice age 3.5

Introduction to Hadoop &

Introduction to Hadoop &

Introduction to Hadoop &

Healthcare & Life Sciences

Introduction to Hadoop &

A. Created by Doug Cutting (of Yahoo)

Introduction to Hadoop &

Hadoop Amazon Standalon

Introduction to Hadoop &

Introduction to Hadoop &

Introduction to Hadoop &

You might also like