Professional Documents
Culture Documents
© 2023 Amazon Web Services, Inc. or its Affiliates. All rights reserved. 3
Course objectives
© 2023 Amazon Web Services, Inc. or its Affiliates. All rights reserved. 4
Design and implement a data
warehouse solution
Well-Architected Framework
© 2023 Amazon Web Services, Inc. or its Affiliates. All rights reserved. 7
Data analytics use cases
© 2023 Amazon Web Services, Inc. or its Affiliates. All rights reserved. 8
Data warehouse use case
AnyCorp Agribusiness uses Amazon Kinesis Data Streams to capture and analyze
streaming data from IoT devices in the field including weather stations, pump and
irrigation switches, and machinery. repair or replacement.
© 2023 Amazon Web Services, Inc. or its Affiliates. All rights reserved. 12
Overview of Data Analytics and the Data Pipeline
© 2023 Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data pipeline: Analytics functionality
Cataloging,
Analytics and
Data sources Ingestion Storage processing, and
visualization
governance
Time to answer
Streaming Embedded
Mobile IoT Managed Processes Queries
data analytics
storage
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 14
Storage architectures
© 2023 Amazon Web Services, Inc. or its Affiliates. All rights reserved. 16
Storage optimization
© 2023 Amazon Web Services, Inc. or its Affiliates. All rights reserved. 17
AWS analytics services
Cataloging,
Analytics and
Data sources Ingestion Storage processing, and
visualization
governance
Objects Databases
Amazon Amazon MSK Databases AWS Glue Amazon Redshift
Kinesis
AWS DataSync Amazon Redshift AWS Lake Formation Amazon Kinesis Data Analytics
Mobile IoT
AWS Transfer Family HDFS cluster AWS Lambda Amazon OpenSearch Service
Security and monitoring
Data lake
Log Machine
analytics learning
Data
warehousing
© 2023 Amazon Web Services, Inc. or its Affiliates. All rights reserved. 27
AWS modern data architecture
Amazon RDS
Data Lake
Amazon S3
Amazon SageMaker
Amazon OpenSearch Service
Amazon Redshift
© 2023 Amazon Web Services, Inc. or its Affiliates. All rights reserved. 28
Analysis without data movement
© 2023 Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Topic A
© 2023 Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Why data warehouse?
Informed decision making Consolidated data from many Data quality, consistency,
sources and accuracy
© 2023 Amazon Web Services, Inc. or its Affiliates. All rights reserved. 33
• More data is created every hour today
than in an entire year just 20 years ago
Variety of sources and data types Slow performance Increasing and unpredictable cost
© 2023 Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data lake or data warehouse?
© 2023 Amazon Web Services, Inc. or its Affiliates. All rights reserved. 37
What is a data warehouse?
• 3-tiered architecture
Business
• Structured and relational data intelligence
• Schema-on-write
• PB (petabyte) scale Data warehouse
© 2023 Amazon Web Services, Inc. or its Affiliates. All rights reserved. 38
Comparison
© 2021 Amazon Web Services, Inc. or its Affiliates. All rights reserved. 39
Amazon Redshift use cases
NTT DOCOMO FOX Corp. Yelp Jack in the Box Warner Bros.
Moved >10 PB of Taking a lake house Enabling a Improved ops by Games
data from on- approach with data-driven moving off of Performance, scale,
premises to cloud RA3 nodes and organization with on-premises DW cost-effective
Amazon S3 concurrency scaling
Amazon Data sharing Super data type Federated Lambda UDF Partner console Amazon Redshift Data Lake
Analyze all your data Redshift ML with JSON support Query integration Spectrum + Export
Lake House with AWS integration Lake Formation
Automatic Cross-AZ cluster Data API On-demand Pause Cost controls Built-in
Low cost & best value workload recovery and RIs and resume security features
Predictable costs manager
Building Data Analytics Solutions Using Amazon Redshift
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Goals of data warehousing
Online transaction
processing (OLTP)
Enterprise resource
planning (ERP)
Customer resource
management (CRM) Data warehouse Business intelligence
© 2023 Amazon Web Services, Inc. or its Affiliates. All rights reserved. 52
Benefits of Amazon Redshift
Data warehouse
Amazon Redshift
© 2023 Amazon Web Services, Inc. or its Affiliates. All rights reserved. 55
Using Amazon Redshift in the Data Analytics Pipeline
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved.
What is Amazon Redshift?
Analysis of all your data Performance at scale Predictable costs Ease of use
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 57
Use cases for Amazon Redshift
© 2023 Amazon Web Services, Inc. or its Affiliates. All rights reserved. 58
Introduction to Amazon Redshift
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Cluster architecture in Amazon Redshift
Cluster with DC or DS nodes
Leader node
DC or DS compute node Slices
Compute nodes
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 69
RA3 nodes with managed storage
Cluster with RA3 nodes
Leader node
RA3 compute node Slices
Compute nodes
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 70
Advanced Query Accelerator (AQUA)
Cluster with RA3 nodes and AQUA
Leader node Query plan
AQUA Layer
AQUA node
Analytics processor
AQUA layer
SSD SSD
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 71
Introduction to Amazon Redshift
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Enabling analysis of all your data
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 75
Amazon Redshift ML
SQL>
© 2023 Amazon Web Services, Inc. or its Affiliates. All rights reserved. 76
Enabling performance at scale
Automated Concurrency
performance tuning scaling
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 77
Enabling predictable costs
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 78
Ease of use: 4 steps to get started
SQL>
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 79
Lab 1 architecture
AWS Cloud
Amazon VPC
Load data
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 81
Building Data Analytics Solutions Using Amazon Redshift
• Ingestion
• Data distribution and storage
• Querying data in Amazon Redshift
• Data analytics using Amazon Redshift Spectrum (Lab 2)
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 89
Ingestion and Storage
Ingestion
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Redshift data ingestion
overview
AWS Lambda
Leader node
/data/
compute 1
stock_prices.csv.1 Slice 0
Slice 1
stock_prices.csv.2
stock_prices.csv.3 compute 2
Slice 0
stock_prices.csv.4 Slice 1
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 92
Amazon Redshift Data API
Amazon Redshift
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 93
Architecting an event-driven architecture
using the Amazon Redshift Data API
AWS Cloud
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 94
Interacting with the Data API
Custom
application
using AWS
SDKs
Jupyter
notebooks
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 95
Loading streaming data into Amazon
Redshift
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 99
Migrating data to Amazon Redshift with
AWS DMS
Amazon RDS
Amazon Aurora
Microsoft SQL Server
PostgreSQL
MariaDB
AWS DMS Amazon Redshift
MySQL
Oracle
Microsoft Azure SQL
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 100
Ingestion and Storage
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Redshift storage review
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 103
Amazon Redshift Spectrum and
managed storage comparison
Characteristic Redshift Spectrum Managed storage – RA3
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 104
Amazon Redshift supported data types
Datetime
Numeric Boolean
Amazon
Redshift
Character SUPER
Spatial
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 105
SUPER data type
Table: customers
id name phones
INTEGER SUPER SUPER
[{"type":"work",
{"given":"Jane", "num":"9255550100"},
1
"family":"Doe"} {"type":"cell",
"num": 6505550101} ]
{"given":„Richard", [{"type":"work",
2
"family":„Roe"}, "num": 5105550102}]
firstname | num
----------+---------------
"Jane" | 6505550101
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 106
Distribution styles
KEY
ALL Key A Key B Key C Key D
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 108
Ingestion and Storage
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Querying data on Amazon Redshift
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 110
Data lake export
Redshift Spectrum
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 111
Building Data Analytics Solutions Using Amazon Redshift
• Data transformation
• Advanced querying
• Data transformation and querying in Amazon Redshift (Lab 3)
• Resource management
• Automation and optimization
Pipeline components covered
Catalog,
Analytics and
Ingestion Storage processing,
visualization
governance
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 125
Processing and Optimizing Data
Data transformation
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Extract, transform, and load (ETL)
Extract Stage Transform Load
Data
staging
Hot data
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 127
Extract, load, and transform (ELT)
Extract Load Transform
Amazon Redshift
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 128
Amazon Redshift materialized views
MV_loc_sales
Red-loc bucket_sales
SF 12.00
NY 10.00
Materialized views
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 133
Materialized view benefits
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 134
Transformation best practices
Copy data from Use Amazon Redshift Use UNLOAD to extract large
evenly sized, compressed files Spectrum for one-time result sets
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved.
ELT operations 135
Processing and Optimizing Data
Advanced querying
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Advanced querying features
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 137
Easily query over external tables:
Step 1
Create external schema and tables
CREATE EXTERNAL SCHEMA spectrum
FROM data catalog
DATABASE ‘salesdb'
IAM_ROLE
'arn:aws:iam::123456789012:role/MySpectrumRole'
customers
REGION 'us-east-2';
spectrum
CREATE EXTERNAL TABLE spectrum.customers(
customer_key integer,
first_name varchar(50),
products last_name varchar(50) )
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
STORED AS textfile
spectrum.customers location
store_sales spectrum.products 's3://awssampledbuswest2/tickit/spectrum/customers/'
spectrum_store_sales TABLE properties ('numRows'='172000');
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 139
Easily query over external tables:
Step 2
Create a materialized view
CREATE MATERIALIZED VIEW MV.Top_10_products
AS
SELECT TOP 10 p.product_key, COUNT(ss.order_key) AS
order_total, COUNT(c.customer_key) AS
customer_total,
COUNT(DISTINCT c.customer_key) AS
spectrum.customers unique_customer_total
spectrum.products
FROM redshift.store_sales AS ss JOIN
MV.Top_10_products spectrum.products AS p
ON ss.product_key = p.product_key
JOIN spectrum.customers AS c
redshift.store_sales ON ss.customer_key = c.customer_key
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 140
Easily query over external tables:
Step 3
Query tables and views
redshift.store_sales
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 141
Scheduled queries
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 142
Stored procedures
Group SQL statements together into a transaction
• Data transformation
• Data validation
• Business-specific logic
CREATE PROCEDURE get_result_set(param IN integer, tmp_name INOUT varchar(256))
AS $$
DECLARE row record;
BEGIN
EXECUTE 'create temp table ' || tmp_name || SELECT * from myresult;
' as select * from fact_tbl where id >= ' || param;
END; id | secondary_id | name
$$ LANGUAGE plpgsql; -------+--------------+---------
1 | 1 | Joe
CALL get_result_set(2, 'myresult'); 1 | 2 | Ed
2 | 1 | Mary
1 | 3 | Mike
(4 rows)
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 143
What is Amazon Redshift ML?
Import model to the cluster, making
it available as a SQL function
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 145
Secure data sharing across Amazon
Redshift clusters
Direct way to share Instant, granular, and Live, transactionally Secure and governed
data highly performant consistent views collaboration
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 146
Amazon Redshift data sharing
Producer cluster with RA3 nodes Consumer cluster with RA3 nodes
Copy data in
custom ETL feed
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 147
Processing and Optimizing Data
Resource management
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Redshift
workload management (WLM) features
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 153
Query queues and concurrency
Query queues
• Can be managed
automatically with
Auto WLM
• Query priorities
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 155
Query queues and concurrency
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 156
Query queues and concurrency
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 157
Short query acceleration (SQA)
Short-running Long-running
query query
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 158
Auto WLM with adaptive capacity
Amazon Redshift cluster
Queue 1 Leader
Priority: High node Catalog Query planner Query plan
- Dashboard users
Workload
manager
Queue 2
Priority: Normal
Adaptive concurrency Predictive
- one-time queries ML model
query processes memory
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 159
Processing and Optimizing Data
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Automated performance tuning
Automatic Automatic
Automatic sort keys Automatic table sort
distribution keys vacuum delete
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 167
Optimization automations
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 168
Scaling an Amazon Redshift cluster
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 169
Cost optimizations
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 172
Cost controls
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 173
Building Data Analytics Solutions Using Amazon Redshift
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Four access levels
Temporary DB
Redshift database access through temporary database IAM
credentials and credentials or single sign-on
SSO
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 185
Network topology for the Redshift
cluster
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 189
Amazon Redshift network architecture
AWS Cloud
VPC
Amazon Amazon
RDS Amazon Redshift RDS
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 190
Database security model
Group
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 191
Database object-level access
GRANT SELECT on TABLE QA.Customers TO Database_User
REVOKE permissions
REVOKE SELECT(cust_name) on
QA.Customers TO Database_User
Table QA.Customers
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 193
Amazon Redshift Spectrum
fine-grained permissions
AWS Cloud
IAM Lake Formation role
Policy assigned
Amazon S3 AWS Glue Grant permissions External schema and Amazon Redshift
data bucket Data Catalog tables Spectrum
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 194
Single sign-on
Existing infrastructure AWS Cloud
IAM
BI tools Analytics tools SQL clients SAML
IdP
Corporate Active Directory Identity providers Amazon Redshift
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 198
Data protection
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 199
Data encryption
TLS
Encryption in Encryption at
transit rest
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 200
Governance and compliance
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 203
Compliance
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 204
Security and Monitoring of Amazon Redshift Clusters
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Monitoring Amazon Redshift workloads
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 207
Troubleshooting slow clusters
Factors
• Number of nodes
• Node types
• Data distribution
• Dataset sort order
• Dataset size
• Concurrent operations
Slow clusters • Query structure
• Code compilation
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 209
AWS data analytics resources
AWS Well-Architected: Learn, measure, and build using architectural best practices
• https://aws.amazon.com/architecture/well-architected
For more information about querying a database using the query editor, see
• https://docs.aws.amazon.com/redshift/latest/mgmt/query-editor.html
© 2021 Amazon Web Services, Inc. or its Affiliates. All rights reserved. 239
Your feedback is important to us!
• Sign in to https://aws.training/Account/Transcript/Archived
• Choose Evaluate
© 2021 Amazon Web Services, Inc. or its Affiliates. All rights reserved. 240
Thank you !
Parvesh Chopra
choprapa@amazon.com
© 2021 Amazon Web Services, Inc. or its affiliates. All rights reserved. This work may not be reproduced or redistributed, in whole or in part, without prior
written permission from Amazon Web Services, Inc. Commercial copying, lending, or selling is prohibited. Corrections, feedback, or other questions? Contact us
at https://support.aws.amazon.com/#/contacts/aws-training. All trademarks are the property of their owners.