You are on page 1of 121

AWS Partners - Building Data Analytics

Solutions Using Amazon Redshift


Building Data Analytics Solutions Using Amazon Redshift

Course Overview and


Introductions
Agenda
Module A: Overview of Data Analytics and the Data Pipeline
Module 1: Using Amazon Redshift in the Data Analytics Pipeline
Module 2: Introduction to Amazon Redshift
• Lab 1: Load and Query Data in an Amazon Redshift Cluster
Module 3: Ingestion and Storage
• Lab 2: Data Analytics Using Amazon Redshift Spectrum
Module 4: Processing and Optimizing Data
• Lab 3: Data Transformations in Amazon Redshift
Module 5: Security and Monitoring of Amazon Redshift Clusters

© 2023 Amazon Web Services, Inc. or its Affiliates. All rights reserved. 3
Course objectives

In this course, you will learn how to:


• Compare the features and benefits of data warehouses, data lakes, and
modern data architectures
• Design and implement a data warehouse analytics solution
• Configure and deploy an Amazon Redshift data warehouse

© 2023 Amazon Web Services, Inc. or its Affiliates. All rights reserved. 4
Design and implement a data
warehouse solution

Well-Architected Framework

Operational Performance Cost


Security Reliability
excellence efficiency optimization
© 2023 Amazon Web Services, Inc. or its Affiliates. All rights reserved. 5
Building Data Analytics Solutions On AWS

Overview of Data Analytics and


the Data Pipeline
Challenges in data analytics

Data silos Exponential data Real-time processing Instant insights


growth

© 2023 Amazon Web Services, Inc. or its Affiliates. All rights reserved. 7
Data analytics use cases

Batch data Data warehouse Streaming data

ExampleCorp AnyCompany AnyCorp


Media Mortgages Agriculture

© 2023 Amazon Web Services, Inc. or its Affiliates. All rights reserved. 8
Data warehouse use case

AnyCompany Mortgages operates in the housing industry, processing


over a quarter million loans per day.

Challenge Solution Benefits


• Difficulty scaling • Automatic scaling with
consistent performance
• Performance drops as Amazon Redshift
system scales • Decreased maintenance
• High TCO • Reduced TCO
• Complex security and Amazon S3 • Meets complex security and
compliance requirements compliance requirements
• Focus on strategic analytic
development
© 2023 Amazon Web Services, Inc. or its Affiliates. All rights reserved. 11
Streaming data use case

AnyCorp Agribusiness uses Amazon Kinesis Data Streams to capture and analyze
streaming data from IoT devices in the field including weather stations, pump and
irrigation switches, and machinery. repair or replacement.

Challenge Solution Benefits


• Meet growing demand real • Scale to support global
time weather data growth strategy
Amazon Kinesis
• Optimize resource planning • Utilize real time weather
including irrigation, fertilizer and sensor data
and seed requirements • Reduce TCO and machinery
Amazon EC2
• Predict machinery failure cost
and replacement

© 2023 Amazon Web Services, Inc. or its Affiliates. All rights reserved. 12
Overview of Data Analytics and the Data Pipeline

Using the data pipeline for


analytics

© 2023 Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data pipeline: Analytics functionality
Cataloging,
Analytics and
Data sources Ingestion Storage processing, and
visualization
governance
Time to answer

Database Object/file Databases Interactive


Objects Databases Data Catalog Search
import ingestion dashboards

Streaming Embedded
Mobile IoT Managed Processes Queries
data analytics
storage

Security and monitoring

Single sign-on Identity Network security Monitoring

© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 14
Storage architectures

Data lake Data warehouse Modern data architecture

© 2023 Amazon Web Services, Inc. or its Affiliates. All rights reserved. 16
Storage optimization

Best practice is to optimize for querying:


• Formatting – optimal file storage format
• Partitioning – divide large datasets into manageable file sizes
• Compression – file size in comparison to performance

© 2023 Amazon Web Services, Inc. or its Affiliates. All rights reserved. 17
AWS analytics services
Cataloging,
Analytics and
Data sources Ingestion Storage processing, and
visualization
governance

AWS DMS Amazon S3 Amazon EMR Athena QuickSight

Objects Databases
Amazon Amazon MSK Databases AWS Glue Amazon Redshift
Kinesis

AWS DataSync Amazon Redshift AWS Lake Formation Amazon Kinesis Data Analytics
Mobile IoT

AWS Transfer Family HDFS cluster AWS Lambda Amazon OpenSearch Service
Security and monitoring

AWS SSO Amazon CloudWatch Amazon Cognito IAM AWS CloudTrail


18
© 2023 Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Modern data architecture
Relational
databases

Big data Non-


processing relational
databases

Data lake

Log Machine
analytics learning

Data
warehousing

© 2023 Amazon Web Services, Inc. or its Affiliates. All rights reserved. 27
AWS modern data architecture
Amazon RDS

Amazon EMR Amazon DynamoDB

Data Lake
Amazon S3

Amazon SageMaker
Amazon OpenSearch Service

Amazon Redshift

© 2023 Amazon Web Services, Inc. or its Affiliates. All rights reserved. 28
Analysis without data movement

Federated query Amazon Redshift Spectrum Amazon Redshift materialized


views

Amazon Redshift ML Amazon Redshift data sharing


© 2023 Amazon Web Services, Inc. or its Affiliates. All rights reserved. 29
Amazon Redshift

Module 1: Introduction to Redshift

© 2023 Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Topic A

Why data warehouse?

© 2023 Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Why data warehouse?

Informed decision making Consolidated data from many Data quality, consistency,
sources and accuracy

Historical data analysis Separation of analytics


processing from
transactional databases,
which improves performance
of both systems

© 2023 Amazon Web Services, Inc. or its Affiliates. All rights reserved. 33
• More data is created every hour today
than in an entire year just 20 years ago

© 2020, Amazon Web Services, Inc. or its Affiliates.


Challenges of data analytics at scale

Variety of sources and data types Slow performance Increasing and unpredictable cost

Multiple analytics needs Difficult to manage systems Inflexible tools

Data volume and velocity Complex to scale Security, compliance


Data lakes compared to
data warehouses

© 2023 Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data lake or data warehouse?

Data lake Data warehouse

© 2023 Amazon Web Services, Inc. or its Affiliates. All rights reserved. 37
What is a data warehouse?

• 3-tiered architecture
Business
• Structured and relational data intelligence

• Schema-on-write
• PB (petabyte) scale Data warehouse

• Traditionally, compute and storage


are tightly coupled
• Optimized for analytic queries
OLTP ERP CRM LOB

© 2023 Amazon Web Services, Inc. or its Affiliates. All rights reserved. 38
Comparison

© 2021 Amazon Web Services, Inc. or its Affiliates. All rights reserved. 39
Amazon Redshift use cases

Business intelligence Operational analytics on events

Data as a service Predictive analytics


Tens of thousands of customers process exabytes
of data with Amazon Redshift daily

NTT DOCOMO FOX Corp. Yelp Jack in the Box Warner Bros.
Moved >10 PB of Taking a lake house Enabling a Improved ops by Games
data from on- approach with data-driven moving off of Performance, scale,
premises to cloud RA3 nodes and organization with on-premises DW cost-effective
Amazon S3 concurrency scaling

© 2020, Amazon Web Services, Inc. or its Affiliates.


Amazon Redshift power of innovation
NEW! NEW! NEW! UPDATED! NEW! NEW!

Amazon Data sharing Super data type Federated Lambda UDF Partner console Amazon Redshift Data Lake
Analyze all your data Redshift ML with JSON support Query integration Spectrum + Export
Lake House with AWS integration Lake Formation

UPDATED! EARLY Q1! NEW! UPDATED! NEW! NEW!

Performance & scale RA3 nodes &


managed storage
AQUA Performance tuning:
automated
Materialized
views
100K tables HyperLogLog Concurrency
scaling
Fast and self-tuning

UPDATED! NEW! NEW!

Automatic Cross-AZ cluster Data API On-demand Pause Cost controls Built-in
Low cost & best value workload recovery and RIs and resume security features
Predictable costs manager
Building Data Analytics Solutions Using Amazon Redshift

Using Amazon Redshift in the


Data Analytics Pipeline
Using Amazon Redshift in the Data Analytics Pipeline

Why Amazon Redshift for data


warehousing?

© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Goals of data warehousing

Online transaction
processing (OLTP)

Enterprise resource
planning (ERP)

Customer resource
management (CRM) Data warehouse Business intelligence

Line of business (LOB)

© 2023 Amazon Web Services, Inc. or its Affiliates. All rights reserved. 52
Benefits of Amazon Redshift

Consistent performance Structured or Optionally decouple


at scale semi-structured data storage and compute

Built-in security and Integration with other Simplified administration and


compliance features AWS services reduced TCO
© 2023 Amazon Web Services, Inc. or its Affiliates. All rights reserved. 54
Choosing a purpose-built solution

RDBMS Data processing

Data warehouse

Amazon Relational Database Amazon EMR


Service (Amazon RDS)

Amazon Redshift

Amazon Aurora AWS Glue

© 2023 Amazon Web Services, Inc. or its Affiliates. All rights reserved. 55
Using Amazon Redshift in the Data Analytics Pipeline

Overview of Amazon Redshift

© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved.
What is Amazon Redshift?

Purpose-built, cloud data warehouse

Analysis of all your data Performance at scale Predictable costs Ease of use

© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 57
Use cases for Amazon Redshift

Business intelligence Data as a service

Operational analytics Predictive analytics

© 2023 Amazon Web Services, Inc. or its Affiliates. All rights reserved. 58
Introduction to Amazon Redshift

Amazon Redshift architecture

© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Cluster architecture in Amazon Redshift
Cluster with DC or DS nodes
Leader node
DC or DS compute node Slices

Compute nodes

Compute and storage scale together

© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 69
RA3 nodes with managed storage
Cluster with RA3 nodes
Leader node
RA3 compute node Slices

Compute nodes

Amazon Redshift managed


storage in Amazon S3 Separately scale compute and storage

© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 70
Advanced Query Accelerator (AQUA)
Cluster with RA3 nodes and AQUA
Leader node Query plan

Scans, filters, and aggregates


Compute nodes

AQUA Layer
AQUA node

Analytics processor
AQUA layer
SSD SSD

Amazon Redshift managed


storage in Amazon S3 Move compute closer to storage

© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 71
Introduction to Amazon Redshift

Amazon Redshift features

© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Enabling analysis of all your data

Amazon Redshift Data lake


Spectrum export

Federated query Data sharing Amazon Redshift ML

© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 75
Amazon Redshift ML

Amazon Redshift cluster

SQL>

Data Amazon Redshift ML

© 2023 Amazon Web Services, Inc. or its Affiliates. All rights reserved. 76
Enabling performance at scale

RA3 nodes AQUA Materialized views


and managed storage

Automated Concurrency
performance tuning scaling

© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 77
Enabling predictable costs

Automatic workload Cross-AZ cluster Data API Built-in security


manager recovery features

On-Demand and Pause and resume Cost controls


Reserved
Instances

© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 78
Ease of use: 4 steps to get started

SQL>

1. Create roles 2. Create cluster 3. Load data 4. Query data

© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 79
Lab 1 architecture
AWS Cloud

Amazon VPC

Amazon Redshift cluster

Load data

Data analyst AWS Query editor Cluster node


Management
Console
Supporting services
Data source

IAM Amazon CloudWatch AWS CloudTrail

© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 81
Building Data Analytics Solutions Using Amazon Redshift

Ingestion and Storage


Agenda

• Ingestion
• Data distribution and storage
• Querying data in Amazon Redshift
• Data analytics using Amazon Redshift Spectrum (Lab 2)

Pipeline components covered


Catalog,
Analytics and
Ingestion Storage processing,
visualization
governance

Security and monitoring

© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 89
Ingestion and Storage

Ingestion

© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Redshift data ingestion
overview

AWS Lambda

Amazon Redshift Amazon Kinesis


Data API Data Firehose

COPY command AWS Database


Migration Service
Amazon Redshift (AWS DMS)
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 91
Loading data from Amazon S3 using
the COPY command
copy sales from 's3://databucket/data/stock_prices.csv
credentials 'aws_iam_role=<iam-role-arn>‘
delimiter ‘\t';

databucket Amazon Redshift cluster

Leader node

/data/
compute 1
stock_prices.csv.1 Slice 0
Slice 1
stock_prices.csv.2

stock_prices.csv.3 compute 2
Slice 0
stock_prices.csv.4 Slice 1

© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 92
Amazon Redshift Data API

Amazon Elastic Compute Amazon Elastic Container AWS Lambda


Cloud (Amazon EC2) Service (Amazon ECS)

AWS Tools Amazon EventBridge AWS AppSync Jupyter


and SDKs notebooks

Amazon Redshift Data API

Amazon Redshift

© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 93
Architecting an event-driven architecture
using the Amazon Redshift Data API

AWS Cloud

Amazon Simple Storage


Service (Amazon S3)

Amazon EventBridge AWS Lambda Amazon Redshift Amazon Redshift


Data API

© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 94
Interacting with the Data API

Custom
application
using AWS
SDKs

Application Amazon Redshift Amazon Redshift


Data API
users

Jupyter
notebooks

© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 95
Loading streaming data into Amazon
Redshift

Amazon Kinesis Amazon S3 Amazon Redshift


Internet of Things Data Firehose
(IoT) devices

© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 99
Migrating data to Amazon Redshift with
AWS DMS
Amazon RDS
Amazon Aurora
Microsoft SQL Server
PostgreSQL
MariaDB
AWS DMS Amazon Redshift
MySQL
Oracle
Microsoft Azure SQL

© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 100
Ingestion and Storage

Data distribution and storage

© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Redshift storage review

RA3 nodes Amazon S3 storage


DC node type accessed with
with managed storage
Amazon Redshift
Spectrum

© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 103
Amazon Redshift Spectrum and
managed storage comparison
Characteristic Redshift Spectrum Managed storage – RA3

Partitioning Customer Amazon Redshift


Migration of Customer Amazon Redshift
hot/cold data
Data format Open file formats Block storage on RA3 nodes
Object storage on Amazon S3
Data access Create external tables in Load and unload data directly
Amazon Redshift. to and from Redshift compute
nodes.

© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 104
Amazon Redshift supported data types

Datetime

Numeric Boolean

Amazon
Redshift
Character SUPER

Spatial

© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 105
SUPER data type
Table: customers
id name phones
INTEGER SUPER SUPER
[{"type":"work",
{"given":"Jane", "num":"9255550100"},
1
"family":"Doe"} {"type":"cell",
"num": 6505550101} ]
{"given":„Richard", [{"type":"work",
2
"family":„Roe"}, "num": 5105550102}]

SELECT name.given AS firstname, ph.num


FROM customers c, c.phones ph
WHERE ph.type = ‘cell’;

firstname | num
----------+---------------
"Jane" | 6505550101

© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 106
Distribution styles

KEY
ALL Key A Key B Key C Key D

Slice 1 Slice 2 Slice 1 Slice 2 Slice 1 Slice 2 Slice 1 Slice 2


Node 1 Node 2 Node 1 Node 2

create table regions(


EVEN travelDate date not null,
region string not null,
Slice 1 Slice 2 Slice 1 Slice 2 country string not null
Node 1 Node 2 )
diststyle auto;

© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 108
Ingestion and Storage

Querying data in Amazon


Redshift

© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Querying data on Amazon Redshift

Amazon Redshift SQL client tool Amazon Redshift


query editor such as SQL Workbench/J Data API

© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 110
Data lake export

Redshift Spectrum

Data lake export


Amazon Redshift Data lake

UNLOAD ('select * from lineitem')


TO 's3://mybucket/unload/lineitem'
FORMAT as PARQUET
CREDENTIALS
'aws_iam_role=arn:aws:iam::123412341234:role/myRedshiftRole';

© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 111
Building Data Analytics Solutions Using Amazon Redshift

Processing and Optimizing Data


Agenda

• Data transformation
• Advanced querying
• Data transformation and querying in Amazon Redshift (Lab 3)
• Resource management
• Automation and optimization
Pipeline components covered
Catalog,
Analytics and
Ingestion Storage processing,
visualization
governance

Security and monitoring

© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 125
Processing and Optimizing Data

Data transformation

© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Extract, transform, and load (ETL)
Extract Stage Transform Load

Amazon EMR AWS Glue Amazon Redshift

Data
staging

Hot data

© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 127
Extract, load, and transform (ELT)
Extract Load Transform

Amazon Redshift

Data staging Hot data

© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 128
Amazon Redshift materialized views

MV_loc_sales

Red-loc bucket_sales
SF 12.00
NY 10.00

Materialized views
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 133
Materialized view benefits

Auto query rewrite Auto view refresh


Simplify and
accelerate ELT
operations

© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 134
Transformation best practices

Load data in bulk Perform multiple steps in Use workload management to


a single transaction improve ELT runtimes

Copy data from Use Amazon Redshift Use UNLOAD to extract large
evenly sized, compressed files Spectrum for one-time result sets
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved.
ELT operations 135
Processing and Optimizing Data

Advanced querying

© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Advanced querying features

Scheduled queries AWS Lambda user- Stored procedures Amazon Redshift ML


defined functions
(UDFs)

© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 137
Easily query over external tables:
Step 1
Create external schema and tables
CREATE EXTERNAL SCHEMA spectrum
FROM data catalog
DATABASE ‘salesdb'
IAM_ROLE
'arn:aws:iam::123456789012:role/MySpectrumRole'
customers
REGION 'us-east-2';
spectrum
CREATE EXTERNAL TABLE spectrum.customers(
customer_key integer,
first_name varchar(50),
products last_name varchar(50) )
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
STORED AS textfile
spectrum.customers location
store_sales spectrum.products 's3://awssampledbuswest2/tickit/spectrum/customers/'
spectrum_store_sales TABLE properties ('numRows'='172000');
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 139
Easily query over external tables:
Step 2
Create a materialized view
CREATE MATERIALIZED VIEW MV.Top_10_products
AS
SELECT TOP 10 p.product_key, COUNT(ss.order_key) AS
order_total, COUNT(c.customer_key) AS
customer_total,
COUNT(DISTINCT c.customer_key) AS
spectrum.customers unique_customer_total
spectrum.products
FROM redshift.store_sales AS ss JOIN
MV.Top_10_products spectrum.products AS p
ON ss.product_key = p.product_key
JOIN spectrum.customers AS c
redshift.store_sales ON ss.customer_key = c.customer_key

© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 140
Easily query over external tables:
Step 3
Query tables and views

MV.Top_10_products SELECT (c.first_name + ‘ ‘ + c.last_name) AS name,


SUM(ss.order_profit) AS profit,
COUNT(t.product_key) AS number_purchased

FROM MV.Top_10_products AS t JOIN


spectrum.customers AS c
spectrum.customers ON t.customer_key = c.customer_key
spectrum.products
JOIN redshift.store_sales AS ss
ON ss.order_key = t.order_key

redshift.store_sales

© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 141
Scheduled queries

• Run SQL queries during non-business hours


• Load data with COPY statements every night
• Unload data with UNLOAD nightly or at regular intervals
throughout the day

© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 142
Stored procedures
Group SQL statements together into a transaction
• Data transformation
• Data validation
• Business-specific logic
CREATE PROCEDURE get_result_set(param IN integer, tmp_name INOUT varchar(256))
AS $$
DECLARE row record;
BEGIN
EXECUTE 'create temp table ' || tmp_name || SELECT * from myresult;
' as select * from fact_tbl where id >= ' || param;
END; id | secondary_id | name
$$ LANGUAGE plpgsql; -------+--------------+---------
1 | 1 | Joe
CALL get_result_set(2, 'myresult'); 1 | 2 | Ed
2 | 1 | Mary
1 | 3 | Mike
(4 rows)
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 143
What is Amazon Redshift ML?
Import model to the cluster, making
it available as a SQL function

Amazon Redshift Export data


Optimize the trained
CREATE MODEL model for the
S3 bucket target hardware
FROM (SELECT…)

Amazon SageMaker Preprocessing Train Model Amazon SageMaker


Autopilot Neo

© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 145
Secure data sharing across Amazon
Redshift clusters

Direct way to share Instant, granular, and Live, transactionally Secure and governed
data highly performant consistent views collaboration

© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 146
Amazon Redshift data sharing
Producer cluster with RA3 nodes Consumer cluster with RA3 nodes

Copy data in
custom ETL feed

Read and write Read shared data Read and write


private data private data

Amazon Redshift managed


storage in Amazon S3

© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 147
Processing and Optimizing Data

Resource management

© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Redshift
workload management (WLM) features

Parameter groups Query queues and Short query Auto WLM


concurrency acceleration (SQA)

© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 153
Query queues and concurrency

Query queues
• Can be managed
automatically with
Auto WLM
• Query priorities

© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 155
Query queues and concurrency

Query queues Concurrency scaling mode


• Can be managed • Disabled by default
automatically with • Concurrency level
Auto WLM • Manual or Auto WLM
• Query priorities • 1-hour credit per day

© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 156
Query queues and concurrency

Query queues Concurrency scaling mode Targeting queries


• Can be managed • Disabled by default • User group
automatically with • Concurrency level • Query group
Auto WLM • Manual or Auto WLM • Query monitoring
• Query priorities • 1-hour credit per day rules

© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 157
Short query acceleration (SQA)

• Turbo Boost mode


• Prioritizes short-running queries
• Reduces queues dedicated to short queries
• Enabled by default

Short-running Long-running
query query

© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 158
Auto WLM with adaptive capacity
Amazon Redshift cluster

Sample queues: work hours

Queue 1 Leader
Priority: High node Catalog Query planner Query plan
- Dashboard users
Workload
manager
Queue 2
Priority: Normal
Adaptive concurrency Predictive
- one-time queries ML model
query processes memory

Queue 3 Compute nodes


Priority: Low
- ETL jobs

© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 159
Processing and Optimizing Data

Automation and optimization

© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Automated performance tuning

Automatic table optimization Automatic maintenance

Automatic Automatic
Automatic sort keys Automatic table sort
distribution keys vacuum delete

Automatic compression Automatic analyze


encoding
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 166
Amazon Redshift Advisor
Recommendations based on performance analysis

Data ingestion Query tuning Table optimization Cost savings


Speed up COPY Reallocate WLM memory Alter distribution keys Delete unused clusters
Skip compression analysis Enable SQA Alter sort keys
Split Amazon S3 objects Column encoding

© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 167
Optimization automations

© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 168
Scaling an Amazon Redshift cluster

Elastic resize Classic resize Snapshot, restore, and resize


Minutes Hours-Days Hours-Days
Quickly add or remove nodes from a Change size, type, Cross-instance restore
cluster and change node type or number of nodes
(snapshot and restore)
* Can be scheduled
Hours-Days

© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 169
Cost optimizations

© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 172
Cost controls

• Create daily, weekly, and monthly limits


• Up to four limits per feature
• Define actions, such as log to system table,
alert, and disable feature
• Automatically generate Amazon CloudWatch
alarm

© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 173
Building Data Analytics Solutions Using Amazon Redshift

Security and Monitoring of


Amazon Redshift Clusters
Security and Monitoring of Amazon Redshift Clusters

Securing the Amazon Redshift


cluster

© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Four access levels

Cluster Redshift cluster permissions managed by IAM


management
Amazon Redshift security

Cluster Redshift cluster access managed through security groups for


connectivity AWS instances

Database Redshift database user accounts used to control database


access objects access

Temporary DB
Redshift database access through temporary database IAM
credentials and credentials or single sign-on
SSO
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 185
Network topology for the Redshift
cluster

Amazon VPC Subnet group Security group Parameter group

© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 189
Amazon Redshift network architecture
AWS Cloud

VPC

Availability Zone 1 Application Availability Zone 2


Load Balancer
Public subnet Public subnet

NAT gateway NAT gateway

Private subnet Private subnet

App Security group Cluster App


instance 1 subnet group instance 2
Parameter group

Amazon Amazon
RDS Amazon Redshift RDS

© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 190
Database security model

Group

Schema In Amazon Redshift, schemas are


similar to operating system
directories, except that schemas
User A User B cannot be nested.

Tables Objects Users can be granted access to a


single schema or to multiple
User C schemas.

© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 191
Database object-level access
GRANT SELECT on TABLE QA.Customers TO Database_User

Schema FinanceDB.QA Table QA.Customers MV QA.TopSales

GRANT ALL on SCHEMA QA TO Database_User


Database_user

Schema FinanceDB.QA Table QA.Customers MV QA.TopSales


© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 192
Column-level permissions for database
users
GRANT permissions
GRANT SELECT(cust_name, cust_phone)
on QA.Customers TO Database_User

Database_user Table QA.Customers

REVOKE permissions
REVOKE SELECT(cust_name) on
QA.Customers TO Database_User

Table QA.Customers
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 193
Amazon Redshift Spectrum
fine-grained permissions
AWS Cloud
IAM Lake Formation role
Policy assigned

Data lake Amazon Redshift cluster

Amazon S3 AWS Glue Grant permissions External schema and Amazon Redshift
data bucket Data Catalog tables Spectrum

AWS Lake Formation

© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 194
Single sign-on
Existing infrastructure AWS Cloud

IAM
BI tools Analytics tools SQL clients SAML

SAML plugin AWS SDK


Data API

IdP
Corporate Active Directory Identity providers Amazon Redshift
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 198
Data protection

© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 199
Data encryption

TLS

Encryption in Encryption at
transit rest

© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 200
Governance and compliance

© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 203
Compliance

© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 204
Security and Monitoring of Amazon Redshift Clusters

Monitoring and troubleshooting


Amazon Redshift clusters

© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Monitoring Amazon Redshift workloads

Performance metrics Amazon Redshift Advisor Alerts from running queries


recommendations

Locking issues Workload management Cluster node hardware


configuration performance
© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 206
Amazon Redshift query monitoring

• Isolate and fix expensive queries


• Identify specific queries with a rich
filter feature
• Isolate slow- or long-running queries
• Drill down to query plans and run
query statistics
• Correlate query performance with
cluster performance
Query monitoring

© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 207
Troubleshooting slow clusters

Factors
• Number of nodes
• Node types
• Data distribution
• Dataset sort order
• Dataset size
• Concurrent operations
Slow clusters • Query structure
• Code compilation

© 2023 Amazon Web Services, Inc. or its affiliates. All rights reserved. 209
AWS data analytics resources
AWS Well-Architected: Learn, measure, and build using architectural best practices
• https://aws.amazon.com/architecture/well-architected

Data warehouse system architecture, see


• https://docs.aws.amazon.com/redshift/latest/dg/c_high_level_system_architecture.html

Clusters and nodes in Amazon Redshift, see


• https://docs.aws.amazon.com/redshift/latest/mgmt/working-with-clusters.html#rs-about-clusters-and-nodes

For more information about querying a database using the query editor, see
• https://docs.aws.amazon.com/redshift/latest/mgmt/query-editor.html

Additional resources are listed in the slide notes.

© 2021 Amazon Web Services, Inc. or its Affiliates. All rights reserved. 239
Your feedback is important to us!

• Sign in to https://aws.training/Account/Transcript/Archived

(Your instructor will provide a link if another is to be used.)

• Choose My Account, and then Transcript

• Find this course:


Building Data Analytics Solutions Using Amazon Redshift

• Choose Evaluate

© 2021 Amazon Web Services, Inc. or its Affiliates. All rights reserved. 240
Thank you !
Parvesh Chopra
choprapa@amazon.com

© 2021 Amazon Web Services, Inc. or its affiliates. All rights reserved. This work may not be reproduced or redistributed, in whole or in part, without prior
written permission from Amazon Web Services, Inc. Commercial copying, lending, or selling is prohibited. Corrections, feedback, or other questions? Contact us
at https://support.aws.amazon.com/#/contacts/aws-training. All trademarks are the property of their owners.

You might also like