Welcome to Scribd!

Skip carousel

BDA305 NEW LAUNCH! Intro To Amazon Redshift Spectrum - Now Query Exabytes of Data in S3!1!20

Uploaded by

Tony

0% found this document useful (0 votes)

18 views20 pages

Original Title

BDA305 NEW LAUNCH! Intro to Amazon Redshift Spectrum- Now Query Exabytes of Data in S3!1!20

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

18 views20 pages

BDA305 NEW LAUNCH! Intro To Amazon Redshift Spectrum - Now Query Exabytes of Data in S3!1!20

Uploaded by

Tony

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 20

Search inside document

Introduction to

Amazon Redshift Spectrum

Anurag Gupta, Vice President
Amazon Athena, Amazon CloudSearch, AWS Data Pipeline, Amazon Elasticsearch Service, Amazon EMR,
Amazon Redshift, AWS Glue, Amazon Aurora, Amazon RDS for MariaDB, RDS for MySQL, RDS for PostgreSQL

April 19, 2017

When your data sets become so large and diverse

that you have to start innovating around how to
collect, store, process, analyze and share them
It’s never been easier to generate vast amounts of data

Generate

Individual AWS customers Collect & Store

generate over a PB/day

Analyze

Collaborate & Act

Amazon S3 lets you collect and store all this data

Generate

Store exabytes of
Individual AWS customers Collect & Store
data in S3
generating over PB/day

Analyze

Collaborate & Act

But how do you analyze it?

Generate

Store exabytes of
Individual AWS customers Collect & Store
data in S3
generating over PB/day

Highly
Analyze
Constrained

Collaborate & Act

The Dark Data Problem
Most generated data is unavailable for analysis
Data Volume

Generated Data
Available for Analysis

Year
1990 2000 2010 2020
Sources:
Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011
IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares
The tyranny of “OR”

Amazon EMR Amazon Redshift

Directly access data in S3 Super-fast local disk performance

Scale out to thousands of nodes Sophisticated query optimization

Open data formats Join-optimized data formats

Popular big data frameworks Query using standard SQL

Anything you can dream up and code Optimized for data warehousing
But I don’t want to choose.

I shouldn’t have to choose

I want “all of the above”

I want
sophisticated query optimization and scale-out processing

super fast performance and support for open formats

the throughput of local disk and the scale of S3

I want all this
From one data processing engine

With my data accessible from all data processing engines

Now and in the future

We’re told “you have to choose”

Pick small clusters for joins or large ones for scans

Shuffles are expensive

Open formats can’t collocate data for joins

They have to deal with variable cluster sizes

Query optimization requires statistics

You can’t determine this for external data
“It’s just physics”
Amazon Redshift Spectrum
Amazon Redshift Spectrum
Run SQL queries directly against data in S3 using thousands of nodes

Fast @ exabyte scale Elastic & highly available On-demand, pay-per-query

S3
SQL
High concurrency: Multiple No ETL: Query data in-place Full Amazon Redshift
clusters access same data using open file formats SQL support
Life of a query Query
SELECT COUNT(*)
1
FROM S3.EXT_TABLE
GROUP BY…
JDBC/ODBC

Amazon
Redshift

...
1 2 3 4 N

Amazon S3 Data Catalog

Exabyte-scale object storage Apache Hive Metastore
Life of a query

JDBC/ODBC

Amazon
Query is optimized and compiled at
Redshift
2 the leader node. Determine what gets
run locally and what goes to Amazon
Redshift Spectrum

...
1 2 3 4 N

Amazon S3 Data Catalog

Exabyte-scale object storage Apache Hive Metastore
Life of a query

JDBC/ODBC

Amazon
Redshift

Query plan is sent to

3 all compute nodes

...
1 2 3 4 N

Amazon S3 Data Catalog

Exabyte-scale object storage Apache Hive Metastore
Life of a query

JDBC/ODBC

Amazon
Redshift

Compute nodes obtain partition info from

4 Data Catalog; dynamically prune partitions

...
1 2 3 4 N

Amazon S3 Data Catalog

Exabyte-scale object storage Apache Hive Metastore
Life of a query

JDBC/ODBC

Amazon
Redshift

Each compute node issues multiple

5 requests to the Amazon Redshift
Spectrum layer

...
1 2 3 4 N

Amazon S3 Data Catalog

Exabyte-scale object storage Apache Hive Metastore
Life of a query

JDBC/ODBC

Amazon
Redshift

... 6 Amazon Redshift Spectrum nodes

1 2 3 4 N scan your S3 data

Amazon S3 Data Catalog

Exabyte-scale object storage Apache Hive Metastore

AWS Certified Data Analytics Study Guide: Specialty (DAS-C01) Exam
From Everand
AWS Certified Data Analytics Study Guide: Specialty (DAS-C01) Exam
Asif Abbasi
No ratings yet
AWS Certified Solutions Architect - Professional
From Everand
AWS Certified Solutions Architect - Professional
VB Dev
No ratings yet
Getting Started With Amazon Redshift
Document51 pages
Getting Started With Amazon Redshift
rohit kumar
No ratings yet
Amazon DynamoDB vs. Elasticsearch Comparison
Document4 pages
Amazon DynamoDB vs. Elasticsearch Comparison
Nilesh Kumar
No ratings yet
4 Building Blocks of A Streaming Data Architecture
Document11 pages
4 Building Blocks of A Streaming Data Architecture
Ulises Carreon
No ratings yet
Amazon Athena - Use Cases
Document4 pages
Amazon Athena - Use Cases
sumairian
No ratings yet
REPEAT 1 Lessons From Migrating Oracle Databases To Amazon Aurora DAT342-R1 PDF
Document41 pages
REPEAT 1 Lessons From Migrating Oracle Databases To Amazon Aurora DAT342-R1 PDF
Joselito Carreran
No ratings yet
Deeplearning - Ai Deeplearning - Ai
Document30 pages
Deeplearning - Ai Deeplearning - Ai
Dinan Li
No ratings yet
Modern Analytics Academy - Data Modeling
Document12 pages
Modern Analytics Academy - Data Modeling
Hernan Labastie
No ratings yet
Deliver Better Customer Experiences With Machine Learning in Real-Time - Handout
Document27 pages
Deliver Better Customer Experiences With Machine Learning in Real-Time - Handout
anil987verma
No ratings yet
Elasticsearch Sizing and Capacity Planning
Document46 pages
Elasticsearch Sizing and Capacity Planning
Salam Chafariou
No ratings yet
Data Lakes For Maximum Flexibility
Document29 pages
Data Lakes For Maximum Flexibility
Thiago Bajur
No ratings yet
Azure Data Lake and U-SQL
Document51 pages
Azure Data Lake and U-SQL
Saurabh Gupta
No ratings yet
Get The Most, From The Best!!
Document25 pages
Get The Most, From The Best!!
Mangesh Abnave
No ratings yet
PSO Data Analytics Day 1
Document106 pages
PSO Data Analytics Day 1
Ana Marroquín
100% (1)
Azure Databricks Course Slide Deck
Document169 pages
Azure Databricks Course Slide Deck
Raghunath Sai
100% (2)
Looker For Amazon Redshift
Document9 pages
Looker For Amazon Redshift
sheikh abdullah aleem
No ratings yet
Azure Synapse Course Presentation
Document261 pages
Azure Synapse Course Presentation
saok
100% (1)
AWS Data Lake
Document13 pages
AWS Data Lake
Suvankar Chakraborty
No ratings yet
Databases On AWS: Raul Hugo, Solutions Architect
Document74 pages
Databases On AWS: Raul Hugo, Solutions Architect
Juan Carlos Lauri Ponte
No ratings yet
SDC - Synapse Analytics
Document23 pages
SDC - Synapse Analytics
Luis Castillo
No ratings yet
SAP Data Integration Using Azure Data Factory
Document49 pages
SAP Data Integration Using Azure Data Factory
Leandro
No ratings yet
AWS Athena Knowledgebase
Document4 pages
AWS Athena Knowledgebase
David Joseph
No ratings yet
Introduction To Analytics On AWS
Document34 pages
Introduction To Analytics On AWS
elleryodenwald696
No ratings yet
AWS Portfolio
Document76 pages
AWS Portfolio
gopugg
No ratings yet
AWS Machine Learning Specialty
Document67 pages
AWS Machine Learning Specialty
Trans7 Jakarta
100% (1)
EUC1502 Module5 Big-Data
Document46 pages
EUC1502 Module5 Big-Data
Радомир Мутабџија
No ratings yet
Database Services and Utilities: David Tucker
Document25 pages
Database Services and Utilities: David Tucker
Bao Le
No ratings yet
Elasticsearch Sizing and Capacity Planning
Document49 pages
Elasticsearch Sizing and Capacity Planning
Bean & Shin
No ratings yet
L2 AWS Basics
Document56 pages
L2 AWS Basics
hammad khan
No ratings yet
Data Warehousing
Document3 pages
Data Warehousing
MosesEljames
No ratings yet
Azure Databricks Course Slide Deck V4
Document308 pages
Azure Databricks Course Slide Deck V4
ravikumar lanka
100% (2)
Lakehouse With Delta Lake Deep Dive
Document64 pages
Lakehouse With Delta Lake Deep Dive
A Noraznizam
No ratings yet
Data Analytics Using Lake House
Document1 page
Data Analytics Using Lake House
Javier Velandia
No ratings yet
AWS Cloud
Document40 pages
AWS Cloud
Benjamin
No ratings yet
AWS Database Products Infographic
Document1 page
AWS Database Products Infographic
Homer Semms
No ratings yet
6 +Athena,+QuickSight,+EMR
Document63 pages
6 +Athena,+QuickSight,+EMR
Ahmad Hammad
No ratings yet
Azure Databricks
Document69 pages
Azure Databricks
sr_saurab8511
75% (4)
1 AWS Analytics and Data Lakes
Document15 pages
1 AWS Analytics and Data Lakes
eduardo2307
No ratings yet
Amazon - Pass4sure - Aws Certified Solutions Architect - Associate 2018.v2019-02-12.by - Mia.84q PDF
Document3 pages
Amazon - Pass4sure - Aws Certified Solutions Architect - Associate 2018.v2019-02-12.by - Mia.84q PDF
mehanakjafhhaifhiassf
100% (1)
L2 Services
Document47 pages
L2 Services
hammad khan
No ratings yet
Migrating Your Databases To AWS Blair Layton Final
Document41 pages
Migrating Your Databases To AWS Blair Layton Final
Kumar Gollapudi
No ratings yet
Implementing Travel & Hospitality Data Mesh: AWS Reference Architecture
Document2 pages
Implementing Travel & Hospitality Data Mesh: AWS Reference Architecture
Pavan Savla
No ratings yet
DP-200 - Implementing An Azure Data Solution - Exam Prep - Taygan
Document7 pages
DP-200 - Implementing An Azure Data Solution - Exam Prep - Taygan
sr_saurab8511
No ratings yet
ALDA Integration
Document41 pages
ALDA Integration
Santiago Pérez Guerrero
No ratings yet
Strategies For Migrating Oracle Database To Aws 2
Document38 pages
Strategies For Migrating Oracle Database To Aws 2
Daniel Cabarcas M.
No ratings yet
Log Analytics Withamazonelasticsearchservice
Document46 pages
Log Analytics Withamazonelasticsearchservice
Nataraju G
No ratings yet
Project
Document3 pages
Project
Mohammed Jazouuli
No ratings yet
DP 900t00a Enu Powerpoint 04
Document23 pages
DP 900t00a Enu Powerpoint 04
Susmita Dey
No ratings yet
2 +Handout+-+Deploying+open+source+databases+on+AWS
Document30 pages
2 +Handout+-+Deploying+open+source+databases+on+AWS
Dinesh
No ratings yet
AWS Storage Use Cases
Document12 pages
AWS Storage Use Cases
Meghana Rao
No ratings yet
Durability and Availability
Document6 pages
Durability and Availability
Sandeep Ch
No ratings yet
Change Data Capture Using Aws Dms Ra
Document3 pages
Change Data Capture Using Aws Dms Ra
Hari
No ratings yet
CloudFoundations - 08b - Databases - Dynamo DB, Redshift, Aurora
Document33 pages
CloudFoundations - 08b - Databases - Dynamo DB, Redshift, Aurora
Jan jan1
No ratings yet
Case Study On Amazon Simpledb For A Particular Real-World Application
Document7 pages
Case Study On Amazon Simpledb For A Particular Real-World Application
asdfasdf
No ratings yet
Lambda Architecure On For Batch Aws
Document12 pages
Lambda Architecure On For Batch Aws
nanich
No ratings yet
4 8 SQL Server Basics Material
Document6 pages
4 8 SQL Server Basics Material
Muhammad Arsalan Ashraf
No ratings yet
White Paper Modern Data Stack
Document21 pages
White Paper Modern Data Stack
Sujit Sadagopan
No ratings yet
Data Sources Dremio
Document1 page
Data Sources Dremio
Jonathan Moreno
No ratings yet
MS Azure Cloud Platform - Webinar Material
Document19 pages
MS Azure Cloud Platform - Webinar Material
JuliCastro
No ratings yet
DOP327-R2 - Monitoring and Observability of Serverless Apps Using AWS X-Ray
Document22 pages
DOP327-R2 - Monitoring and Observability of Serverless Apps Using AWS X-Ray
Tony
100% (1)
Being Well-Architected in The Cloud
Document69 pages
Being Well-Architected in The Cloud
Tony
No ratings yet
DOP325-R1 - Deploying AWS CloudFormation StackSets Across Accounts and Regions
Document20 pages
DOP325-R1 - Deploying AWS CloudFormation StackSets Across Accounts and Regions
Tony
No ratings yet
DL 210222 q4 20 PDF GB
Document292 pages
DL 210222 q4 20 PDF GB
Tony
No ratings yet
Best Practices For Active Directory With AWS Workloads - AWS Public Sector Summit 2017
Document24 pages
Best Practices For Active Directory With AWS Workloads - AWS Public Sector Summit 2017
Tony
No ratings yet
API202-R1 - Building A Bridge Solution From IBM MQ To Amazon MQ
Document16 pages
API202-R1 - Building A Bridge Solution From IBM MQ To Amazon MQ
Tony
No ratings yet
DevOps Spain From Dev To DevOps An Unexpected Journey
Document59 pages
DevOps Spain From Dev To DevOps An Unexpected Journey
Tony
No ratings yet
SVS341-R1 - An In-Depth Tour of AWS SAM
Document53 pages
SVS341-R1 - An In-Depth Tour of AWS SAM
Tony
No ratings yet
SVS402-R2 - Building APIs From Front To Back
Document56 pages
SVS402-R2 - Building APIs From Front To Back
Tony
No ratings yet
Deep Dive: Amazon Dynamodb: Siva Raghupathy Principal Solutions Architect Amazon Web Services
Document92 pages
Deep Dive: Amazon Dynamodb: Siva Raghupathy Principal Solutions Architect Amazon Web Services
Tony
No ratings yet
Reduce Your Blast Radius by Using Multiple AWS Accounts Per Region and Service
Document25 pages
Reduce Your Blast Radius by Using Multiple AWS Accounts Per Region and Service
Tony
No ratings yet
Storage TCO Using AWS Storage Gateway, Amazon S3 and Amazon Glacier (STG202) AWS Reinvent 2013
Document23 pages
Storage TCO Using AWS Storage Gateway, Amazon S3 and Amazon Glacier (STG202) AWS Reinvent 2013
Tony
No ratings yet
REPEAT 1 Build End-To-End Governance With AWS Control Tower ARC315-R1
Document22 pages
REPEAT 1 Build End-To-End Governance With AWS Control Tower ARC315-R1
Tony
No ratings yet
AWS Systems Manager - Bridging Operational Models - SRV212 - Chicago AWS Summit
Document21 pages
AWS Systems Manager - Bridging Operational Models - SRV212 - Chicago AWS Summit
Tony
No ratings yet
Assignmentmmeme
Document38 pages
Assignmentmmeme
Hirdeyjeet singh
No ratings yet
DBMS Bal Krishna Nyaupane PDF
Document166 pages
DBMS Bal Krishna Nyaupane PDF
Prabin123
No ratings yet
What Would Happen If I Surrender
Document211 pages
What Would Happen If I Surrender
JessicaTheLazy Player
No ratings yet
Introduction To Database: UCT - Mogadishu, Somalia
Document26 pages
Introduction To Database: UCT - Mogadishu, Somalia
ahmed
No ratings yet
Additional Reading
Document4 pages
Additional Reading
Jesse Tan
No ratings yet
Codd Rules
Document3 pages
Codd Rules
Emerson Ortega Salcedo
No ratings yet
Assignment#3
Document6 pages
Assignment#3
Prateek Chandra
No ratings yet
ITECH1103 Assignment2 SQL Database
Document5 pages
ITECH1103 Assignment2 SQL Database
Goutam Das
No ratings yet
Database Systems The Complete Book 2Nd Ed Hector Garcia Molina Full Chapter
Document67 pages
Database Systems The Complete Book 2Nd Ed Hector Garcia Molina Full Chapter
julie.scott803
100% (5)
Database Full Report
Document12 pages
Database Full Report
Intiser Ahmed
No ratings yet
Database Quiz - Data Science Masters - PW Skills
Document3 pages
Database Quiz - Data Science Masters - PW Skills
Shashi Kamal Chakraborty
No ratings yet
Build - in Functioon
Document5 pages
Build - in Functioon
mansianerao3012
No ratings yet
A Quick Microsoft Access 2007 Tutorial
Document44 pages
A Quick Microsoft Access 2007 Tutorial
narinder_sagar
No ratings yet
Implementing Fine-Grained Access Control For VPD
Document25 pages
Implementing Fine-Grained Access Control For VPD
Jhon R Quintero H
No ratings yet
Higher Nationals in Computing: Unit 2: Website Designe and Development
Document31 pages
Higher Nationals in Computing: Unit 2: Website Designe and Development
khang phan van
No ratings yet
Setting Sequence Value To A Specific Number - Oracle Database
Document2 pages
Setting Sequence Value To A Specific Number - Oracle Database
César Pérez Avilez
No ratings yet
Visvesvaraya Technological University Belagavi-590 018, Karnataka
Document44 pages
Visvesvaraya Technological University Belagavi-590 018, Karnataka
Kishore mallya
No ratings yet
Lecture 2 SQL
Document50 pages
Lecture 2 SQL
Peter L. Montez
No ratings yet
Yugabyte Introduction
Document23 pages
Yugabyte Introduction
pandapo55
No ratings yet
SQL Tutorial - DB2 SQL Tutorials - SQL Tutor
Document2 pages
SQL Tutorial - DB2 SQL Tutorials - SQL Tutor
Saurabh Choudhary
No ratings yet
Ia PostgreSQL FDW
Document28 pages
Ia PostgreSQL FDW
Đại Nguyễn
No ratings yet
SQL Server Performance Tuning
Document25 pages
SQL Server Performance Tuning
Bang Trinh
No ratings yet
Dbms Merged Its Makaut Previous Year Question Set
Document67 pages
Dbms Merged Its Makaut Previous Year Question Set
aritra ghosh
No ratings yet
Manage Tablespaces in A Container Database (CDB) and Pluggable Database (PDB)
Document3 pages
Manage Tablespaces in A Container Database (CDB) and Pluggable Database (PDB)
Ahmed Nagy
No ratings yet
Mysql Practice Ques
Document5 pages
Mysql Practice Ques
druhi
No ratings yet
NO SQL Unit 1
Document66 pages
NO SQL Unit 1
Devina C
No ratings yet
SQL 3 Hari Resume
Document3 pages
SQL 3 Hari Resume
Bhanu Prakash
No ratings yet
What Is PL/SQL Block? Block Structure PL/SQL Block Syntax Types of PL/SQL Block
Document5 pages
What Is PL/SQL Block? Block Structure PL/SQL Block Syntax Types of PL/SQL Block
sureshkaswan
No ratings yet
DBMS
Document22 pages
DBMS
Rutuja Rane
100% (1)
HTTP Localhost 9889 Doc Ad Bad Apter HTML Tib Adadb Config Depl
Document13 pages
HTTP Localhost 9889 Doc Ad Bad Apter HTML Tib Adadb Config Depl
rajisgood
No ratings yet