Professional Documents
Culture Documents
Tuan Vo
Solutions Architect
mintuan@amazon.com
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
To Become a Leader, Data is Your Differentiator
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
For Data to Be a Differentiator, Customers Need
to Be Able to…
New types of analytics
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Traditionally, Analytics Used to Look Like This
• TBs–PBs scale
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data Lakes Extend the Traditional Approach
Business Intelligence Big Data processing,
real-time, Machine Learning
• TBs–EBs scale
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data Lakes and Analytics from AWS
Secure
Data Lake
on AWS
Cost-effective
On-premises Real-time Data
Data Movement Movement
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Store Data in the Format You Want
Open and comprehensive
CSV
ORC
• Store data in the format you want:
Grok • Text files like CSV
Amazon S3
• Columnar like Apache Parquet, and Apache ORC
Amazon Glacier
Avro • Logstash like Grok
AWS Glue
• JSON (simple, nested), AVRO
Parquet
• And more…
JSON
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Analyze with the Broadest Set of Analytic Tools
Open and comprehensive
• Analyze data with the broadest selection
of analytics tools
Machine • Data warehousing
Analytics
Learning • Interactive SQL queries
• Big Data processing
Amazon SageMaker Amazon Athena
AWS Deep Learning AMIs Amazon EMR
• Real-time analytics
Amazon Rekognition Amazon Redshift • Dashboards & Visualizations
Amazon Lex Amazon Elasticsearch service • Machine Learning
AWS DeepLens Amazon Kinesis
Amazon Comprehend Amazon QuickSight • Query in place without moving to a
Amazon Translate
Amazon Transcribe
separate analytics system
Amazon Polly
• Up to 400% faster with S3 Select and
Glacier Select
Amazon S3 • Largest ISV ecosystem with built-in
Amazon Glacier integration
AWS Glue
• Ensures you can meet existing and future
use cases, minimizing risks
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data Lakes from AWS
Secure
Data Lake
on AWS
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Provides Highest Levels of Security
Secure
Customer need to have multiple levels of security, identity and access management,
encryption, and compliance to secure their data lake
Secure
Data Lake
on AWS
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Any Scale
Scalable and durable
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Unmatched Durability and Availability
Scalable and durable
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data Lakes from AWS
Secure
Data Lake
on AWS
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Tiered Storage to Optimize Price/Performance
Lowest Cost
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Pay Only for the Resources You Use as you Scale
Lowest Cost
Traditional approach leads to wasted capacity
Unmet demand
upset players
missed revenue Servers
• Pay-as-you-go for the resources you consume
Demand
Excess capacity
wasted $$$
Capacity
Demand
AWS: Elastic
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Lowest Total Cost of Ownership (TCO)
Cost-effective
• Less admin time to
On-premises AWS manage, and support
Licensing Fees Subscription Fee
Support Costs Support Costs • No up-front costs—
hardware acquisition,
Server Costs installation
Hardware—Server, Rack, Chassis,
PDUs, Tor Switches (+Maintenance)
Software—OS, Virtualization Licenses
(+Maintenance) • Save on operating
Network Costs costs—data center space,
Network Hardware—LAN Switches,
Load Balancer Bandwidth costs
power, cooling
Software—Network Monitoring
IT Labor Costs
• Business value: cost of
Server admin, virtualization admin, delays, risk premium,
storage admin, network admin,
support team competitive abilities,
Extras governance, etc.
Project planning, advisors, legal,
contractors, managed services,
training, cost of capital
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
More Data Lakes & Analytics on AWS than Anywhere Else
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Catalog and search Reference architecture: Access and user interface
Data lake on AWS
AWS Glue Amazon DynamoDB Amazon ES Amazon API Gateway IAM Amazon Cognito
Amazon RDS
On-premises data
Streaming data
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon S3
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon S3—Object Storage
Built for eleven nine’s of Three different forms of Run analytics & ML on Classify, report, and
durability; data encryption; encrypts data data lake without data visualize data usage
distributed across 3 in transit when movement; S3 Select can trends; objects can be
physical facilities in an replicating across regions; retrieve subset of data, tagged to see storage
AWS region; log and monitor with improving analytics consumption, cost, and
automatically replicated CloudTrail, use ML to performance by 400% security; build lifecycle
to any other AWS region discover and protect policies to automate
sensitive data with Macie tiering, and retention
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Glacier—Backup and Archive
$
Built for eleven nine’s of Three retrieval options to Log and monitor with Lowest cost AWS object
durability; data fit your use case; CloudTrail, Vault Lock storage class, allowing
distributed across 3 expedited retrievals with enables WORM storage you to archive large
physical facilities in an Glacier Select can return capabilities, helping amounts of data at a very
AWS region; data in minutes satisfy compliance low cost
automatically replicated requirements
to any other AWS region
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Glue
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Storing is Not Enough, Data Needs to Be Discoverable
”
and direct monetizing).
Gartner IT Glossary, 2018
https://www.gartner.com/it-glossary/dark-data
CRM ERP Data warehouse Mainframe Web Social Log Machine Semi- Unstructured
data files data structured
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Glue—Data Catalog
Make data discoverable
Glue
Data Catalog
• Automatically discovers data and stores schema
Discover data and • Catalog makes data searchable, and available for ETL
extract schema
• Catalog contains table and job definitions
Compliance
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data Catalog: Crawlers
Built-in classifiers for popular types; custom classifiers using Grok expressions
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Glue Data Catalog
Bring in metadata from a variety of data sources (Amazon S3, Amazon Redshift, etc.) into a single categorized
list that is searchable
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data Catalog: Table details
Table properties
Nested fields
Data statistics
Table schema
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data Catalog: Version control
Compare schema versions List of table versions
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Glue—ETL Service
Make ETL scripting and deployment easy
• Serverless
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
N
AWS Glue DataBrew EW
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Athena
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Athena
Example Query
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Athena: ETL & Query Use Case
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Quicksight
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Create Beautiful, Interactive
Dashboards
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
ML (Machine Learning) Insights
Cutting edge ML tools that automatically discover powerful insights for your users.
• Anomaly Detection
• Forecasting
• Bring your own model from
Amazon SageMaker
• Auto-generated natural language
narratives
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. *currently in preview
THANK YOU
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.