Welcome to Scribd!

Fathoming A Data Lake: A Proof of Concept

Uploaded by

0% found this document useful (0 votes)

15 views1 page

A data lake requires more than just dumping data into storage like HDFS. It enables continuous delivery and integration of diverse data from various sources for many analytical applications. This workshop will walk through designing and implementing a proof of concept data lake using financial market data to demonstrate salient features like scalable storage, batch and streaming ingestion, metadata, and extensible analytics platforms. Attendees will learn the architectural elements through the PoC and be able to recreate it themselves to analyze structured, semi-structured, and unstructured data from multiple sources and stakeholders.

Original Description:

Original Title

datalake

Copyright

Available Formats

DOCX, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

15 views1 page

Fathoming A Data Lake: A Proof of Concept

Uploaded by

mbasu ty

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 1

Search inside document

WORKSHOP - Fathoming A Data Lake: A Proof of Concept

Keep pouring data into a crater, but you won't get a data lake. Of course you need a massively scalable
data storage to start with - like a Hadoop Distributed File System (HDFS).

But the goal of architecting a Data Lake is to enable continuous delivery, integration and recycling of
data for a multitude of diverse analytical applications. It supports volume, variety and velocity that far
exceed the capabilities of a traditional data warehouse. And more important, it offers a platform to
analyze and correlate data from various sources, in a way that can lead us to new insights, going beyond
the task of populating predefined data cubes.

In this workshop we will walk you through an illustrative set of steps to design and implement a data
lake, using data from financial markets.

Does your organization need to invest in a data lake? We will start off with a discussion on why you
should consider designing a Data Lake in addition to your existing enterprise data warehouse.

Then we will explain a few salient features and building blocks of a Data Lake architecture, through a
demo, including:

 Setting up a scalable HDFS Layer

 Data Ingestion in Batch and Streaming mode
 Data Curation, Discovery and Metadata
 Extensible Pluggable Data Analytics Infrastructure enabling
o Statistical Analysis
o Data Mining
o Event Processing
o Machine Learning, and more
from structured, semi-structured and unstructured data
 Data Integration into downstream applications including dashboards and decision support
systems
 Enabling multi-tenant access for various stakeholders and departments in an organization

The PoC is built upon HDFS, Pentaho Data Integration and Apache Spark - all of which are widely
deployed open source platforms.

We intend to drive home the architectural elements through our Proof of Concept and not keep you
staring at a beautiful waterfront :-) Hopefully you can take away a set of documented steps to re-build it
yourself at your organization and use our PoC as a reference point to start from.

Hadoop Blueprints
From Everand
Hadoop Blueprints
Anurag Shrivastava
No ratings yet
Hands-On Machine Learning Recommender Systems with Apache Spark
From Everand
Hands-On Machine Learning Recommender Systems with Apache Spark
Ernesto Lee
No ratings yet
Storage Emulated 0 Download Modern-Data-Architecture-Apache-Hadoop
Document18 pages
Storage Emulated 0 Download Modern-Data-Architecture-Apache-Hadoop
Santosh Kumar G
No ratings yet
Alteryx Hadoop Whitepaper Final1
Document6 pages
Alteryx Hadoop Whitepaper Final1
nivas_mech
No ratings yet
Getting Started With HDP Sandbox
Document107 pages
Getting Started With HDP Sandbox
risdianto sigma
No ratings yet
Data Lake Architecture: Delivering Insight and Scale From Hadoop As An Enterprise-Wide Shared Service
Document12 pages
Data Lake Architecture: Delivering Insight and Scale From Hadoop As An Enterprise-Wide Shared Service
juergen_urbanski
100% (1)
15 Big Data Tools and Technologies To Know About in 2021
Document7 pages
15 Big Data Tools and Technologies To Know About in 2021
viay
No ratings yet
Microsoft Big Data: Solution Brief
Document3 pages
Microsoft Big Data: Solution Brief
maple0322
No ratings yet
Drivers to Capitalize on New Technologies with a Modern BI Architecture
Document31 pages
Drivers to Capitalize on New Technologies with a Modern BI Architecture
sridhar_ee
No ratings yet
Research Paper On Apache Hadoop
Document6 pages
Research Paper On Apache Hadoop
soezsevkg
100% (1)
Making The Most of Your Investment in Hadoop: Whitepaper
Document10 pages
Making The Most of Your Investment in Hadoop: Whitepaper
khamdb
No ratings yet
Hitachi Data Systems Hadoop Solution
Document3 pages
Hitachi Data Systems Hadoop Solution
Lars Glöckner
No ratings yet
Hadoop Data Lake: Hadoop Log Files Json
Document5 pages
Hadoop Data Lake: Hadoop Log Files Json
Srinivas Gollanapalli
No ratings yet
Data Engineer Resume Summary
Document6 pages
Data Engineer Resume Summary
Chandra Babu Nookala
No ratings yet
What Is A Data Platform
Document18 pages
What Is A Data Platform
Rishi
No ratings yet
Banking Data Analysis On Hadoop
Document21 pages
Banking Data Analysis On Hadoop
Shantanu
No ratings yet
Expert Hadoop Developer resume with 3+ years experience
Document4 pages
Expert Hadoop Developer resume with 3+ years experience
Sandeep Lakkumdas
No ratings yet
Big Data Introduction & Ecosystems
Document4 pages
Big Data Introduction & Ecosystems
Harish Ch
No ratings yet
What Is Presto
Document11 pages
What Is Presto
Raka Subi
No ratings yet
Big Data Analytics - Unit 2
Document10 pages
Big Data Analytics - Unit 2
thulasimaninami
No ratings yet
DSX InfoSphere DataStage Is Big Data Integration 2013-05-13
Document30 pages
DSX InfoSphere DataStage Is Big Data Integration 2013-05-13
parashara
0% (1)
Tib Amx
Document1 page
Tib Amx
jey
No ratings yet
Naveen Kumar Nemani Sr. Big Data Engineer: Summary
Document6 pages
Naveen Kumar Nemani Sr. Big Data Engineer: Summary
Vrahta
No ratings yet
Project Experience: Company: Mitratech LTD
Document5 pages
Project Experience: Company: Mitratech LTD
Shamsher Siddique
No ratings yet
Anusha Koluguri: SR - Business Analyst
Document6 pages
Anusha Koluguri: SR - Business Analyst
Shiva Kumar Kalakonda
No ratings yet
Expert Hadoop Developer with Big Data Experience
Document7 pages
Expert Hadoop Developer with Big Data Experience
MA
100% (1)
Research Paper On Hadoop Technology
Document4 pages
Research Paper On Hadoop Technology
efjddr4z
100% (1)
SDL Module-No SQL Module Assignment No. 2: Q1 What Is Hadoop and Need For It? Discuss It's Architecture
Document6 pages
SDL Module-No SQL Module Assignment No. 2: Q1 What Is Hadoop and Need For It? Discuss It's Architecture
asdfasdf
No ratings yet
Hadoop Ecosystem
Document16 pages
Hadoop Ecosystem
poojan thakkar
No ratings yet
Top Big Data Analytics Tools
Document2 pages
Top Big Data Analytics Tools
Emilia koley
No ratings yet
Business Intelligence With Sap
Document2 pages
Business Intelligence With Sap
Raghavendra Reddy Malkireddy
No ratings yet
Hortonworks Data Platform (HDP)
Document56 pages
Hortonworks Data Platform (HDP)
Harshit Bansal
100% (1)
Venkateshwaran Gopal: Professional
Document5 pages
Venkateshwaran Gopal: Professional
SELVAKUMAR MP
No ratings yet
IT Data Analyst with 7+ Years Experience
Document5 pages
IT Data Analyst with 7+ Years Experience
Mani Navitas
No ratings yet
Modern Data Architecture For Financial Services With Apache Hadoop On Windows White Paper
Document20 pages
Modern Data Architecture For Financial Services With Apache Hadoop On Windows White Paper
Zhi Jun Alisa Yap
No ratings yet
Big Data Answers
Document11 pages
Big Data Answers
shubham rana
No ratings yet
SAP HortonWorks GB 24469 en
Document10 pages
SAP HortonWorks GB 24469 en
pidamarthi14
No ratings yet
Jyostna DataEngineer GCEAD
Document5 pages
Jyostna DataEngineer GCEAD
Nishant Kumar
No ratings yet
Haddob Lab Report
Document12 pages
Haddob Lab Report
Magneto Eric Apollyon Thorn
No ratings yet
Big Data Hadoop Professional Resume
Document4 pages
Big Data Hadoop Professional Resume
rameshborukati
No ratings yet
What Is The Hadoop Ecosystem
Document5 pages
What Is The Hadoop Ecosystem
Zahra Mea
No ratings yet
Projects Profile Project #4: Tax System. Hadoop Developer Speedway - Enon, Oh. June 2015 To Present
Document4 pages
Projects Profile Project #4: Tax System. Hadoop Developer Speedway - Enon, Oh. June 2015 To Present
rameshborukati
No ratings yet
Hadoop Value Eon HDFS
Document2 pages
Hadoop Value Eon HDFS
WLS
No ratings yet
Hanumantha Rao Resume-1 (4391)
Document4 pages
Hanumantha Rao Resume-1 (4391)
Aarish Acharya
No ratings yet
24 Hadoop RFI Questions
Document2 pages
24 Hadoop RFI Questions
Annivas
No ratings yet
Chapter 5
Document47 pages
Chapter 5
Rose Mae
No ratings yet
Tapasvi - Lead GCP Cloud Data Engineer
Document5 pages
Tapasvi - Lead GCP Cloud Data Engineer
Sri Guru
No ratings yet
Unit 4
Document33 pages
Unit 4
Sahana Shetty
100% (1)
Big Data Overview
Document39 pages
Big Data Overview
noor khan
No ratings yet
Chaitanya - Sr. AWS Engineer
Document3 pages
Chaitanya - Sr. AWS Engineer
recruiterkk
No ratings yet
Microsoft Big Data Solution Sheet
Document5 pages
Microsoft Big Data Solution Sheet
Getafix
No ratings yet
Speed Your Data Lake ROI
Document16 pages
Speed Your Data Lake ROI
shilpan9166
No ratings yet
Sampath Polishetty BigData Consultant
Document7 pages
Sampath Polishetty BigData Consultant
Sampath Polishetty
No ratings yet
Data Analytics Unit-3 Notes
Document21 pages
Data Analytics Unit-3 Notes
18R11A0530 MUSALE AASHISH
No ratings yet
Santosh Goud - Senior AWS Big Data Engineer
Document9 pages
Santosh Goud - Senior AWS Big Data Engineer
Pranay G
No ratings yet
Lecture Notes: Data Ingestion For Structured/Unstructured Data
Document31 pages
Lecture Notes: Data Ingestion For Structured/Unstructured Data
Yuvaraj V, Assistant Professor, BCA
No ratings yet
How To Lo-8586533223469563329
Document7 pages
How To Lo-8586533223469563329
Sindhu
No ratings yet
CSE Hadoop Report
Document14 pages
CSE Hadoop Report
rohit
No ratings yet
Establishment: and The Data Warehousing AND They Use
Document18 pages
Establishment: and The Data Warehousing AND They Use
Maria Anne Marianne Ayang
No ratings yet
Bidirectional Data Import To Hive Using SQOOP
Document6 pages
Bidirectional Data Import To Hive Using SQOOP
International Journal of Innovative Science and Research Technology
No ratings yet
Hellodocker 01
Document25 pages
Hellodocker 01
mbasu ty
No ratings yet
Microservices Osidays 2019
Document1 page
Microservices Osidays 2019
mbasu ty
No ratings yet
Mlworkshop
Document2 pages
Mlworkshop
mbasu ty
No ratings yet
Real-Time Big Data Analytics with Apache Apex
Document23 pages
Real-Time Big Data Analytics with Apache Apex
mbasu ty
No ratings yet
NFTCon 2016 - Bangalore
Document16 pages
NFTCon 2016 - Bangalore
mbasu ty
No ratings yet