You are on page 1of 183

AWS DevOps Engineer -

Professional

Created by Leticia Massae


March/2022
Sumário

Encryption 14
Encryption Approaches 14
Encryption Context 14
Symmetric Encryption 15
Asymmetric Encryption 16
Signing 17
Steganography 18

Distributed Denial of Service (DDoS) 19


Application Layers - HTTP Flood 19
Protocol Attack - SYN Floo 20
Volumetric - DNS Amplification 20

Secure Sockets Layer (SSL) and Transport Layer Security (TLS) 21


1 - Cipher Suites 21
2 - Authentication 22
3 - Key Exchange 22

DNS 22
DNS - Basic 22
DNS Zone 23
DNS - Recall 23
DNS - Root 23
DNS - Hierarchy 24
DNS - Resolution 24
DNS - Remember 24

Route53 Fundamentals 24
Route53 - Product Basics 24
Route53 - Register Domains 25
Route53 - Hosted Zones 25

DNS Record Types 25


Nameserver(NS) 26
A and AAAA Records 26
CNAME Records 26
MX Records 27
TXT Records 27
TTL - Time To Live 27

IAM Users and ARNs 28


IAM Users 28

When to use IAM Roles 28


Roles for services 29
Roles for Emergency or out of the usual situations 29
Roles for SSO or > 5k identities 30
Roles for Mobile Applications 30
Roles for Cross Accounts Access 31

Cloudfromation Physical & Logical Resources 31


CloudFromation 31

Elastic Beanstalk Architecture 31


Elastic Beanstalk(EB) - Platforms 31
Elastic Beanstalk(EB) - Summary 32

Elastic Beanstalk Deployment Policies 32


EB Deployment - All at once 33
EB Deployment - Rolling 33
EB Deployment - Rolling with Additional batches 34
EB Deployment - Immutable 35
EB Deployment - Traffic Splitting 36
EB Deployment - Blue Green 37

Elastic Beanstalk and RDS 37

Advanced Customisation via dockerrun 38


Customizing via dockerrun 38

Advanced Customisation via .ebextensions 38


Customizing via .ebextensions 38

Elastic Beanstalk and HTTPS 38

Elastic Beanstalk Environment Cloning 39


EB Cloning 39

EB and Docker 39
EB and Docker - Single Container 39
EB and Docker - Multi-Container 39

Lambda Versions 40

Lambda Aliases 41

Lambda Environment Variables 42

Monitoring & Logging & Tracing Lambda Based Applications 42


Lambda Monitoring 42
Lambda Logging 43
Lambda Tracing 43

VPC Lambda + EFS 43

Lambda Layers 43

Lambda Container Images 44


Lambda & ALB Integration 44
Lambda and ALB - Multi-Value Headers 46

Lambda Resource Policy 46

API Gateway 48
API Gateway - Refresher 48
API Gateway - Authentication 49
API Gateway - Endpoint Types 49
API Gateway - Stages 49
API Gateway - Errors 49
API Gateway - Caching 50

API Gateway Methods and Resources 50

API Gateway - Integrations 50


API Gateway - Overview 51
API Gateway - Integrations 51
API Gateway - Mapping Template 52

API Gateway Stages and Deployments 52


API Gateway - Stage Variables 53

API Gateway - Open API and Swagger 53


Swagger and OpenAPI 53

Simple Notification Service 54

Simple Queue Service 54


SNS and SQS Fanout 55
Simple Queue Service(SQS) - Limitations 55

SQS Standard vs FIFO Queue 56

SQS Extend Client Library 56

SQS Delay Queue 56


SQS vs SQS Delay Queue - Process 57

SQS DeadLetter Queues 57

Step Function 58
Some problems with Lambda 58
State Machines 58

Introduction to Containers 59
Virtualisation Problems 59
Containerization 60
Image Anatomy 60
Container Anatomy 61
Container Registry 61
Container Key Concepts 61

ECS - Concepts 61
ECS 61
ECS Concepts 62

ECS - Cluster Mode 62


ECS - EC2 Mode 63
ECS - Fargate Mode 63
EC2 vs ECS(EC2) vs Fargate 63

OpsWorks Stacks 64
OpsWorks Stacks 64
OpsWorks Stacks - Server Instances 64

OpsWorks Lifecycle Events 65


OpsWorks Stacks Lifecycle Events 65

OpsWorks Auto Healing & CloudWatch Events 66


OpsWorks Auto Healing 66

SSM Parameter Store 67


Parameters 67

AWS Secrets Manager 67


About 67

Security Token Service (STS) 69

Permissions Boundaries & Uses Cases 69


Delegation Problems 69
IAM Administrator Job Role 70

AWS Permissions Evaluation 70


Policy Evaluation Logic - Same Account 70
Policy Evaluation Logic - Different Account 71

R53 Public Hosted Zones 71


R53 Hosted Zone 71
R53 Public Hosted Zones 71

R53 Private Hosted Zones 72


R53 Split View Hosted Zones 73

R53 CNAME vs ALIAS 73


R53 CNAME 73
R53 ALIAS 73

R53 Simple Routing 74

R53 Health Checks 74

R53 Failover Routing 75


R53 Multi Value Routing 76

R53 Weighted Routing 76

R53 Latency Routing 77

R53 Geolocation Routing 77

R53 Geoproximity Routing 78

R53 Interoperability 78
R53 - Both Roles 79
R53 - Resgistrar Only 79
R53 - Hosting Only 80

CICD in AWS 80

CodePipeline 82
CodePipeline - Basic 82

CodeBuild 83
CodeBuild - Basic 83
CodeBuild - Architecture 83

CodeDeploy 84
CodeDeploy - Basic 84
CodeDeploy - Configuration 84

Elastic Container Registry (ECR) 87


ECR - High-level Architecture 87
ECR - Benefits 87

Jenkins 87
Jenkins Architecture 88
Jenkins on AWS 88
Jenkins with CodePipeline 89

CloudWatch 89
CloudWatch - Architecture Concepts 89
CloudWatch - Data 90
CloudWatch - Alarms 90
CloudWatch - Data Architecture 91

CloudWatch Logs 91
CloudWatch Logs - Ingestion 91
CloudWatch Logs - Subscriptions 92
CloudWatch Logs - Aggregation 93
CloudWatch Logs - Summary 93

Athena 93
Athena - Basic 93
Kinesis Data Streams 93
Kinesis Data Streams - Concepts 94
Kinesis Data Streams - Architecture 94
SQS vs Kinesis 94

Kinesis Data Firehose 95


Kinesis Data Firehose 95
Kinesis Data Firehose - Architecture 95

Kinesis Data Analytics 95


Kinesis Data Analytics 95
Kinesis Data Analytics - Architecture 96
Kinesis Data Analytics - When and Where 96

MapReduce 96
MapReduce 96

EMR Architecture 98
EMR Architecture 98

Amazon Redshift 99
Redshift - Basic 99
Redshift Architecture 99

Amazon Quicksight 100


Amazon Quicksght 100
Amazon Quicksght - Sources 100

Amazon Athena 101


Amazon Athena - Basic 101
Amazon Athena - Exam 101

SSM Architecture and Agent Activation 102


AWS System Manager 102
AWS System Manager - Agent 102

SSM Run Command 102


AWS Run Command 102

SSM Documents 102


SSM Documents 103

SSM Inventory & SSM Patching 104


Patch Manager - Concepts 104
Patch Manager - Patching Architecture 105

AWS Config 105


AWS Config 106
AWS Config - Architecture 106

AWS Service Catalog 106


What is a Service Catalog 106
AWS Service Catalog 106
AWS Service Catalog - Architecture 107

AWS Inspector 107


Inspector - Basic 107
Inspector - Rules Packages 107
Inspector - HostPackages 108

AWS GuardDuty 108

Amazon Macie 109


Macie 109

Trusted Advisor 109


Trusted Advisor - Basic Price Plans 109
Trusted Advisor - Best Price Plans 110

AWS Cost Allocation 110


AWS Cost allocation Tags 110

High-Availability vs Fault-Tolerance vs Disaster Recovery 110


High-Availability (HA) 111
Fault-Tolerance (FT) 111
Disaster Recovery (DR) 111
Summary 111
DR Tips 111
DR Checklist 112

Elastic Block Store (EBS) Service Architecture 112


Elastic Block Store (EBS) 112

EBS Volume Types - General Purpose 112


EBS - General Purpose SSD - GP2 113
EBS - General Purpose SSD - GP3 113

EBS Volume Types - Provisioned IOPS 114


EBS - Provisioned IOPS SSD (io1/io2) 114

EBS Volume Types - HDD-Based 115


EBS - HDD-based 115

Instance Store Volumes - Architecture 116


Instance Store Volumes 116

Storage Gateway - Volume Gateway 117


Storage Gateway - Basic 117
Storage Gateway - Volume Stored 118
Storage Gateway - Volume Cached 118

Storage Gateway - Tape Gateway(VTL) 118


Storage Gateway Tape(VTL) - Basic 118
Traditional Tape Backup 119
Storage Gateway Tape(VTL) 119

Storage Gateway - File Gateway 119


Storage Gateway - File 119
Bridged on-premises file storage and S3 119
Storage Gateway - Multiple Contributors 120
Storage Gateway - Multiple Contributors and Replication 121
Storage Gateway - File Gateway 121

S3 Security (Resource Policies & ACL) 121


S3 Bucket Policies 121
Access Control Lists (ACLs) 122
Block Public Access 122
S3 Security - Exam 122

S3 Static Hosting 123


S3 Static Hosting 123
Static Website Hosting 124

S3 Object Versioning & MFA 124


Object Versioning 124
MFA Delete 125

S3 Object Encryption 125


S3 Encryption 125
SSE-C 126
SSE-S3(AES256) 126
SSE-KMS 126
Summary 128
Bucket Default Encryption 128

S3 Object Store Classes 128


S3 Standard 128
S3 Standard-IA 129
S3 One Zone-IA 130
S3 Glacier 130
S3 Glacier Deep Archive 131
S3 Intelligent-Tiering 132

S3 Lifecycle Configuration 133


S3 Licecycle Configuration - Basic 133
S3 Licecycle Configuration - Transitions 133

S3 PreSigned URL 134


Presigned URL 134

S3 Select & Glacier Select 134


S3 Select and Glacier Select 134

Cross-Origin Resource Sharing(CORS) 135


Cross-Origin Resource Sharing (CORS) 135

S3 Events 136
S3 Event Notifications 137

S3 Access logs 137


S3 Access Logs 137

S3 Object Lock 137


S3 Object Lock 138
S3 Object Lock - Retention 138
S3 Object Lock - Legal Hold 139

S3 Access Points 139


S3 Access Points 139

Elastic File System(EFS) Architecture 140


EFS - Architecture 140

FSx for Windows File Server 141


FSx for Windows File Server 141
FSx - Key Features and Benefits 141

FSx for Lustre 142


FSx for Lustre 142

CloudFront Architecture 144


CloudFront - Basis 144
CloudFront Terms 144
CloudFront Architecture 145

TTL and Invalidations 145


TTL and Invalidations 145

AWS Certificate Manager 146


AWS Certificate Manager (ACM) - Basic 147
AWS Certificate Manager (ACM) 147

CloudFrton, SSL/TLS & SNI 147


CloudFront and SSL 148
CloudFront and SNI 148
CloudFront and SSL/SNI 148

CloudFront Security - OAI & Custom Origins 148


Securing the CF Content Delivery Path 149
Origin Access Identity (OAI) 149
Securing Custom Origins 150

Lambda@Edge 150
Lambda@Edge 150
Lambda@Edge - Use Cases 150

CloudFront Geo-Retsriction 151


CloudFront Geo Restriction 151
3rd Party Geo Location (or anything!) 152

DynamoDB Architecture Basis 152


DynamoDB 152
DynamoDB Tables 152
DynamoDB On-Demand Backups 153
Point-in-Time Recovery (PITR) 154
DynamoDB Considerations 154

DynamoDB Operations, Consistency and Performance 154


Reading and Writing 154
Query 154
Scan 154
DynamoDB Consistency Moodel 154
WCU Calculation 156
RCU Calculation 156

DynamoDB Indexes (LSI and GSI) 156


DynamoDB Indexes 156
Local Secondary Indexes (LSI) 156
Global Secondary Indexes (GSI) 157
LSI and GSI Considerations 157

DynamoDB Streams and Triggers 158


Streams Concepts 158
DynamoDB Streams 158
Trigger Concepts 158
DynamoDB Triggers 159

DynamoDB Accelerator 159


Traditional Caches vs DAX 159
DAX Architecture 160
DAX Considerations 160

DynamoDB Global Tables 160


DynamoDB Global Tables 160

DynamoDB Time-To-Live (TTL) 161


DynamoDB TTL 161

Types DR - Cold, Warm, PilotLight 161


DB / BC ARchitecture 162
DB / BC - Backup & Restore 162
DB / BC - Pilot Light 162
DB / BC - Warm Standby 162
DB / BC - Multi Site - Active / Active/ 163
DB / BC - Architecture 164

DR Architecture - Storage 164


DR Architecture - Storage 164

DR Architecture - Compute 165


DR Architecture - Compute 165

DR Architecture - Database 165


DR Architecture - Database 165
DR Architecture - Global Databases 165

DR Architecture - Networking 166


DR Architecture - Networking 166
DR Architecture - Global Networking 167

EC2 - Launch Configuration and Templates 167


Launch Configuration(LC) and Launch Templates(LT) Key Concepts 167
LC and LT - Architecture 167

Auto-Scaling Groups 168


ASG - Basic 168
ASG - Architecture 168
ASG - Policies 168
ASG - Health 169
ASG - ASG + Load Balancers 170
ASG - Scaling Processes 170
ASG - Final Points 170

ASG Scaling Policies 170


ASG - Scaling Policies 170
ASG - Simple Scaling 170
ASG - Step Scaling 170

ASG Lifecycle Hooks 171


ASG LifeCycle Hooks - Basics 172
Custom Actions on instances during ASG actions 172
.. instance launch or instance terminate transitions 172
Instances are paused within the flow .. they wait 172
.. until a timeout (then either CONTINUE or ABANDON) 172
.. or you resume the ASG process CompleteLifecycleAction 172
EventBridge or SNS Notification 172

ASG Health-Checks EC2 vs ALB 173


ASG - Health Checks 173

Load Balancing Evolution 173


ELB - Evolution 173
ELB Architecture 174
ELB - Architecture 174
ELB - Cross-Zone Load Balancer 175
ELB - About the Architecture 175

ALB vs NLB 175


Load Balancer Consolidation 175
Application Load Balancer (ALB) 175
Application Load Balancer(ALB) - Rules 176
Network Load Balancer (NLB) 176
ALB vs NLB 177

Session State 177


Session State - Whats is 177
Session State - State matters 177

Session Stickiness 178


Connection Stickiness 178
Connection Stickiness - Key Points 179
ALB - Session Stickiness 179

Gateway Load Balancer(GWLB) 180


GWLB - Basic 180
GWLB - How it works 180
GWLB - Architecture 180

Connection Draining 181


Connection Draining 181
Derestritation Delay 181

X-Forwarded-For & Proxy Protocol 182


Why? X-Forwarded-For & Proxy Protocol 182
X-Forwarded 182
Proxy Protocol 182
Useful Links
https://learn.cantrill.io/
https://aws.amazon.com/pt/certification/certified-devops-engineer-professional/
https://aws.amazon.com/blogs/aws/aws-well-architected-framework-updated-white-papers-to
ols-and-best-practices/
Exam guide:
https://d1.awsstatic.com/training-and-certification/docs-devops-pro/AWS-Certified-DevOps-E
ngineer-Professional_Exam-Guide.pdf
Exam question examples:
https://d1.awsstatic.com/training-and-certification/docs-devops-pro/AWS-Certified-DevOps-E
ngineer-Professional_Sample-Questions.pdf
Exam readiness:
https://aws.amazon.com/pt/training/classroom/exam-readiness-aws-certified-devops-engine
er-professional/
Well-architected framework whitepaper: http://bit.ly/1VH5DNs
AWS Best Cloud Practices: http://bit.ly/2ziWJWp
https://app.linuxacademy.com/
https://www.whizlabs.com/

Technical Fundamentals

Encryption

Encryption Approaches

● Encryption At Rest
● Encryption In Transit

Encryption Context
● Plaintext
○ Is unnincrypted data
○ It can be text, but it doesn’t have to be
○ Can be documents, image, or even applications
○ Is data that you can load into an application and use, or can load and
immediatly read that data
● Algorithm
○ Is a piece of code or more specially a piece of maths which takes plaintext
and an encryption key, and it generates encrypted data
○ Examples: Blowfish, AES, RC4, DES, RC5 and RC6
○ When an algorithm is being used, it needs the plaintext and it needs a key
● Key
○ A key at is simplest is a password, but it can be much more complex
○ When an algorithm takes plaintext and the key, the output that it generates is
ciphertext
● Ciphertext
○ Just like plaintext, ciphertext isn’t always text data
○ Is just encrypted data
○ The relationship between all these is that encryption, it takes plain text, it uses
an algorithm and a key, and uses those things to create ciphertext

Decryption is just the reverse, it takescuphertext, it takes a key and it ghenerates plaintext

Symmetric Encryption
● A symmetric encryption algorithm is used and this accepts the key and the plaintext
● Once both of those are accepted, it performs ancryption and it outputs ciphertext, the
encryped info
● The encrypted info is now secure because they are ciphertext and nobody can
decipher without the key
● They can be sent over any transmission method, even an insecure way to the end
receiver
● The encryption removes the risk of transmitting this data in the open
● So even if we handed the ciphertext overto an untrusted party and asked for him to
deliver that to the end receiver, that would still be safe because the cuphertext is
undecipherable without the key
● But the end receiver doen’t have the key wich was used to encrypt the data
● With symmetric key encryption, the same key is used for both the encryption and
decrypt processes
● It gets tricky beacause transfer this key electronically would not be secure and could
be intercepted by third parties
● This is why symmetric encryption is great for things like laptops, but no useful for
situations where the data needs to be transfered between two remote parties,
because arranging the transit of the key is the problem
● Generally this is done in advance so there is no delay in decrypting the data
● If the data that we’re transferring is time sensitive, the transit of the encryption key
needs to happen in advance
● And that is the most complex part of this method of encryption
● If we did have a way to transfer the key securely, then the same algorithm would
decrypt the data using the key and the ciphertext, and then would return the original
message that was sent
Asymmetric Encryption
It makes much easier to exchange keys because the keys used in asymmetric encryption
are themselves asymmetric
● To use the asymmetric encryption, the first stage id for the sender and receiver to
agree with an asymmetric algorithm to use, and then create encryption keys for the
algorithm, which logically enough will be asymmetric encryption keys
● Asymmetric encryption keys are formed of two parts, a public key and a private key
● For both sides to be able to send and receive to each other, both sides would need to
make both public and private keys
● In the asymmetric scenarios, only the receiver side will need to generate any keys
● A public key can be used to generate ciphertext, which can only be decrypted by the
corresponding private key
● The public keys cannot decrypt data that it was used to encrypt
● Only the private key can encrypt that data
● This means that the private key needs to be guarded really carefully because it’s
what’s used to decrypt data
● If it leaks, the receiver could be compromised
● The public key is only used to encrypt
● So when the receiver uploads his public key to his cloud storage so that anyone can
access it
● The worst thing that could happen to anyone who obtains the receiver public key is
that he or she could use it to encrypt plaintext into ciphertext that only the receiver
could decrypt
● So there is no downside to anyone getting hold of the receiver's public key
● So with asymmetric encryption, there is no requirement to exchange keys in advance
● As long as the receiver uploaded his public key to somewhere that was accessible to
the world, then the first step would be for the sender to download the receiver’s
public key
● Using the receiver public key and the plaintext, the asymmetric algorithm would
generate some ciphertext
● The ciphertext can then be transmitted to the receiver and once received only then
the data could be decrypted
● The receiver already has his private key and so he provides that private key and the
ciphertext to the algorithm, which decrypts the ciphertext back into plaintext, and then
the receiver has a copy of the plaintext of the original message
● Asymmetric encryption is generally used where two or more parties are involved
● And generally when those parties have never physically met before
● Asymmetric encryption is computationally much more difficult to do than symmetric
● And so many processes use asymmetric encryption to initially agree and
communicate a symmetric key and then the symmetric key is used for
communication between those two parties from that point onward

Signing
Generally used for ID verification and certain log on systems
● This requires that both sides to operate as one, and so the sender needs to know
that the receiver has received the message and that agreed with them
● The receiver might want to respond with a simple okay message
● So the message will be sent to the sender with the okay message
● The issue is that anyone can encrypt the message to another party using asymmetric
encryption
● Anyone could get a hold of the sender public key and encrypt a message saying
okay and send it to the sender and the sender wouldn’t necessarily be aware
whether that was from the receiver or not
● Encryption does not prove identity
● But for this, we can use the signing process
● With the signing, the receiver could write this okay message, and then he could take
the message and using his private key, he can sign that message
● Then, that message can be sent across to the sender and when the sender receives
that message, he can use the receiver public key to prove whether that message was
signed using the receiver‘s private key
● So key signing is generally used for ID verification and certain log on systems
Steganography
Method of hiding something in something else
● With steganography, the sender could generate some ciphertext and hide it in a
puppy image
● The image could be delivered to the receiver who knows to expect the image with
some data inside and then extract the data
● To anyone else, it would just look like a puppy image and everybody knows there’s
no way that the sender would send the receiver a puppy image, so there is plausible
deniability
● The effect of steganography might be a slightly larger file but it would look almost
identical
● Effective steganography algorithms make it almost impossible to find the hidden
data unless you know a certain key, a number, or a pattern
● Steganography is just another layer of protection
● The steganography algorithm would take the original picture, select the required
number of pixels, adjust those pixels by a certain range of values and what it would
generate as an output would be an almost identical puppy image, but hidden there
would be slight changes
● It allows you to embed data in another piece of data
● To be really secure, the sender would encrypt some data using the receiver’s public
key, take that ciphertext, use steganography to embed it in an image that wouldn’t
be tied back to the sender, send this image to the receiver and then the receiver
could also use steganography to extract the piece of ciphertext and then decrypt it
using his private key and the same process could be followed in reverse to signal an
okay, but the receiver in additional to encrypting that okay, would also sign it
● So the sender would know that it came from the receiver
Distributed Denial of Service (DDoS)

● Attacks designed to overload websites


● Compete against ‘legitimate connections’
● Distributed - hard to block individual IPs/Ranges

Application Layers - HTTP Flood


Takes advantage of the imbalance of processing between client and server
Protocol Attack - SYN Floo
Normally the connection is initiated via three-stage handshake

● Spoof a source IP address and initiate the connection attempt with a server
● The server tries to perform step two of the handshake, but it can’t contact the source
address because it’s spoofed
● In general, it hangs in this step waiting for a specified duration, and this consumes
network resources

Volumetric - DNS Amplification


Relies on how certain protocols, such as DNS, only take small amounts of data to make the
request, such as DNS resolution, but in response to that, they can deliver a large amount of
data
● One attack of this nature might make requests to DNS servers with a large number of
independent requests where the source address is spoofed to the actual IP address
of our website
● The DNS servers, potentially hundreds or thousands of them, respond to what they
see as legitimate requests and overwhelm a service(user laptop or desktop infected
with malware)

DDoS attacks often involve large armies of compromised machines (botnets)

Secure Sockets Layer (SSL) and Transport Layer Security (TLS)


● Provides privacy and data integrity between client & server
● Privacy: Communications are encrypted
● When using TLS: Asymmetric and then symmetric
● Identity (server or client/server) verified
● Reliable connection: Protect against alteration

1 - Cipher Suites
● TLS begins with an established TCP connection.
● Agree method of communications, the “Cipher Suite”
● At this point, the client and server have agreed on how to communicate and the client
has the server certificate.
● The certificate contains the server ‘public key’

2 - Authentication
● Ensure the server certificate is authentic, verifying the server as legitimate

3 - Key Exchange
● Move from Asymmetric to SYmmetric keys in a secure way and begin encryption
process

● Both sides confirm the handshake and from them on, communications between client
⇔ server are encrypted

DNS
● DNS is probably one of the most important distributed databases which exist today.
● It is used for service discovery, configuration and the operation of most consumer
web browsing and other internet activities.
● While not strictly required in detail for the exam - understanding DNS will help you
answer DNS related questions and help make sense of other AWS lessons
throughout the course.
● IANA : https://www.iana.org
● Root hints : https://www.internic.net/domain/named.root
● Root Servers : https://www.iana.org/domains/root/servers
● Root Zone Database : https://www.iana.org/domains/root/db
● Root Zone File : https://www.internic.net/domain/root.zone
● Delegation Record for .com : https://www.iana.org/domains/root/db/com.html

DNS - Basic
● DNS is a discovery service
● Translate machone into human and vice-versa
● www.amazon.com => 104.98.34.131
● It`s huge and has to be distributed
● 4,294,967,296 IPv4

DNS Zone

DNS - Recall
● DNS Client => your laptop, phone, tablet, PC
● Resolver => Software on your device, or a server which queries DNS on your behalf
● Zone => A part of the DNS database (amazon.com)
● Zonefile => physical database for a zone

DNS - Root
● www.amazon.com
● DNS root & Root Zone
○ DNS Root Servers (13)
DNS - Hierarchy

DNS - Resolution

DNS - Remember
● Root Hints => config points at the root server IPs and addresses
● Root Server => Hosts the DNS root zone
● Root Zone => Points at the TLD authoritative servers
● gTLD => generic top level domain (.com .org)
● ccTLD => country-code top level domain (.uk .eu etc)

Route53 Fundamentals

Route53 - Product Basics


● 1. Register Domains
● 2. Host Zones ..managed nameservers
● Global Service ..Single database
● Globally Resilient

Route53 - Register Domains

Route53 - Hosted Zones


● Zone files in AWS
● Hosted on four managed name servers
● Can be public..
● Or private ..linked to VPC(s)
● …stored records (recordsets)

DNS Record Types


● DNS is capable of handling a number of different record types - which perform
different tasks.
● This lesson steps through
○ A & AAAA
○ CNAME
○ TXT
○ MX
○ NS
● as well as introducing TTL values on records.
Nameserver(NS)

A and AAAA Records

CNAME Records
MX Records

TXT Records

TTL - Time To Live


● Numeric value in seconds
IAM, Accounts & Organizations

IAM Users and ARNs


IAM Users are an identity used for anmything requiring long-term AWS access e.g. Humans,
Applications or service accounts

IAM Users
● 5000 IAM users per account
● IAM User can be a member of 10 groups
○ This has systems design impacts
○ May impact Internet-scale applications
○ May impact Large orgs & org mergers

When to use IAM Roles


Roles for services

Roles for Emergency or out of the usual situations


Roles for SSO or > 5k identities

Roles for Mobile Applications


Roles for Cross Accounts Access

CloudFromation

Cloudfromation Physical & Logical Resources

CloudFromation

Elastic Beanstalk

Elastic Beanstalk Architecture


● Platform As A Service (PaaS)
● Developer FOcused - not end user
● High level - Managed Application Environments
● User Provide Code & EB handles the environment
● Focus on Code, low infrastructure overhead
● Fully customizable - uses AWS products under the covers

Elastic Beanstalk(EB) - Platforms


● Build in languages, docker & custom platforms
● GO, Java SE, Tomcat
● .Net Core (Linux) & .Net (Windows)
● Node.js, PHP, Python & Ruby
● Single COntainer Docker & Multicontainer Docker
● Preconfigured Docker
● Custom via packer
If everything is ok, then the CNAME swap is done (Blue/Green Deployment):

Elastic Beanstalk(EB) - Summary


● It doesn’t come for free - app tweaks
● Great for small development teams
● Use docker for anything unsupported
● Databases OUTSIDE of Elastic Beanstalk
● DB’s in an ENV are lost if the end is deleted

Elastic Beanstalk Deployment Policies


● How application versions are deployed to environments
● All at once - Deploy to all at once, brief outage
● Rolling - Deploy in rolling batches
● Rolling with additional batch - As above, with a new batch to maintain capacity
during the process
● Immutable - All new instances with the new version
● Traffic Splitting - Fresh instances, with a traffic split

EB Deployment - All at once

● A new application version is deployed into all instances within the EB environment at
the same time
● But it causes outages during the deployment and doesn’t have a great method of
handling failures
● This is a good one to use for a development or testing environment, but nothing that
of a great importance

EB Deployment - Rolling

● With this method you deploy the new application version in batches
● This type of method is great when you want to sleep through all of the instances
within your environment, taking things batch by batch
● Each batch is taken out of service
● The new application is deployed into that batch and then when they pass health
checks they’re put back into service
● You can identify any problems before moving on to the next
● This means that you have additional control and it’s safer because the process
continues only when the current batch passes its health checks
● But it does mean a loss in capacity as instances are removed from service while the
deployment is happening
● This means there’s no increase in cost because the capacity of the environment is
maintained

EB Deployment - Rolling with Additional batches

● This method is similar to rolling deployment but we can’t drop any capacity
● We’re gonna start with an environment with four instances and all four are running
the currently deployed application version which is version 1
● This method of deployment starts by us having a version two of the application, and
we decide to deploy it into this environment
● Immediately whatever batch size we pick, let’s say that it’s two instances, this means
that a new batch is deployed running the new version of the application
● This now means that we have 150% capacity for our application because we chose a
batch size which is half of the number of instances in which our environment is
running
● So now we have two application versions running within our environment
● Four instances on version 1 and two on version two
● The deployment starts on two of the original instances, and these are removed from
service, taking the capacity back down to 100% levels
● Once this batch is finished, they’re added back into service and then deployment
happens to the next batch
● Again, they are taken out of service
● the new version is deployed and when finished they’re brought back into service, and
this means that we now have a total of six instances running version two
● This is two more, so one batch more than we originally had
● So once the whole deployment is finished the extra batch is removed and the
environment again has four instances
● This process takes longer
● There will be additional costs for the extra instances running during the deployment
but it’s safer and great for production usage because we don’t drop any capacity

EB Deployment - Immutable

● With this method, we start with a similar environment three instances running the
same application
● Version once, when we start the deployment the original instances aren’t touched at
all
● They’re treated as immutable instead a temporary auto-scaling group is created and
within it another full set of instances are immediately created and the new application
version is deployed onto these instances
● Once finished, the result is two sets of instances the original set running version one
in the original auto-scaling group and the new ser running in the temporary
auto-scaling group deployed with version two
● Once completed and the health checks passed, the new instances are moved into
the original auto-scaling group
● The original instances are terminated and the temporary auto-scaling group is
deleted
● This method has the highest cost because it uses double the instances but it offers
the quickest and lowest risk rollback because if anything goes wrong, the original
instances are available until right at the end of the process
● So until the deployment finishes in its entirety non of the original infrastructure is
touched in any way

EB Deployment - Traffic Splitting

● The Traffic Splitting deployment method is just like immutable, but when a new set of
instances are ready traffic can be split between the new and the old version
● In this case, we’re sending 50% of the traffic towards the new version on the right
and leaving the remaining 50% on the original version
● It’s just a form of AB testing and it allows that extra set of verification before you
finished the deployment
● This means your regression path is relatively quick because you can just return the
traffic flow and remove the new instances and once you’re entirely happy with the
new version
● Traffic splitting is a fairly new method of deployment introduced in the middle of 2020
EB Deployment - Blue Green

● You have two different environments, a blue environment on the left and a green on
the right
● You have the DNS Record pointing ate the blue running version one ate the
application you deploy version two to a brand new environment
● You run both of them at the same time, get people to be to test the green
environment on the right and then when ready you can change the DNS to point at
the green environment
● This is really good because it means you have complete control over these two
different environments and you’re not relying on Elastic Beanstalk to orchestrate the
deployment
● You can control when the switch over occurs and leave the original environment in
place for as long as you want to

Elastic Beanstalk and RDS


● You can create an RDS instance within an EB environment
● It’s linked to the EB environment
● Delete the environment = delete the RDS = data loss
● Different environments = different RDS = different data
● Environments Properties: RDS_HOSTNAME, RDS_PORT, RDS_DB_NAME,
RDS_USERNAME, RDS_PASSWORD

● You can also create an RDS instance outside of EB


● Add environment properties to point at RDS instance: RDS_HOSTNAME,
RDS_PORT, RDS_DB_NAME, RDS_USERNAME, RDS_PASSWORD
● Environments can ve changed at will - data is outside of the lifecycle of that
environment

● Decouplingexisting RDS within EB from EB environment


● Create an RDS Snapshot
● “Enable Delete Protection”
● Create a new EB environment with the same app version
● Ensure new environment can connect to the EB
● Swap environments (CNAME or DNS)
● Terminate the old environment - This will try terminate the RDS instance
● Locate DELETE_FAILED stack, manually delete and pick to retain stuck resources

Advanced Customisation via dockerrun

Customizing via dockerrun


● Used in cases that the team used more than one runtime(java, nginx…)
● Should me implemented using Multidocker environment
● Use dockerrun.aws.json v2 for the multidocker environment.
○ This files describes the containers to deploy to each container instance
● Dockerrun.aws.json sile is an Elastic Beanstalk-specific JSON file that describes hot
o deploy a set of Docker containers as an Elastic Beanstalk application.
● AWSEBDockerrunVersion is 2 for the multi container environment
● https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/create_deploy_docker_v2co
nfig.html

Advanced Customisation via .ebextensions

Customizing via .ebextensions


● Ebextensions are a way to customize EB environments
● Inside application source bundle (ZIP/WAR)
● .ebextensions folder
● Add YAML or JSON files ending .config
● Used CFN format to create additional resources within the environment
● option_settings allows you to set options of resources
● Resources allow entirely new resources
● …packages, sources, files, users, groups, commands, container_commands, and
services

Elastic Beanstalk and HTTPS

● Applly the SSL Cert to the LB Directly


● .. via EB Console => Envrionment => Load Balancer Configuration
● Or via .ebextensions/securelistener-[alb | nlb].config
● Make sure you configure the security group
Elastic Beanstalk Environment Cloning

EB Cloning
● Create a NEW environment, by cloning an EXISTING one
● Copy PROD-ENV to a new TEST-ENV (for testing and Q/A)
● .. new version of platform branch
● Copies options, env variables, resources and other settings
● Includes RDS in ENV, but no data is copied
● “unmanaged changes” are not included
● Console UI, API or “eb clone EXISTING-ENVNAME”

EB and Docker

EB and Docker - Single Container


● Single Docker Container
● This mode uses EC2 with Docker, not ECS
● Provide 1 of 3 things
● Dockerfile: EB willbuild a docker imahge and use this to run a container
● Dockerrun.aws.json (version 1): Use an existing docker image - configure image,
ports, volumes and other docker attributes
● Docker-compose.yml - if you use docker compose

EB and Docker - Multi-Container


● Multiple container application
● Creates an ECS Cluster
● …Ecs instances provisioned in the cluster and an ELB for high availability
● You need to provide a Dockerrun.aws.json (version 2) file in the application source
bundle (root level)
● Any images need to be stored in a container registry such as ECR

Lambda, Serverless & Application Services

Lambda Versions

● Unpublished function - can be changed & deployed


● .. this is what you;ve used so far (Deploys to $LATEST)
● Functions can be poublished …created an immutable version
● ..locked, no editing of that published version
● ..Function Code, Dependencies, Runtime, Settings & Environment Variables
● .. A unique ARN for that function (uniquely identifies)
● Qualified ARN points at a specific version
● Unqualified ARN points at the function ($LATEST) - not a specific version

You can use versions to manage the deployment of your functions. For example, you can
publish a new version of a function for beta testing without affecting users of the stable
production version. Lambda creates a new version of your function each time that you
publish the function. The new version is a copy of the unpublished version of the function.

A function version includes the following information:


● The function code and all associated dependencies.
● The Lambda runtime that invokes the function.
● All of the function settings, including the environment variables.
● A unique Amazon Resource Name (ARN) to identify the specific version of the
function.
Lambda Aliases
● An Alias is a pointyer to a function version
● ..PROD -> bestaniomal:1, BETA => bestanimal:2
● Each Alias has a unique ARN .. fixed for the alias
● Aliases can be updated, changing which version theu reference
● Useful for POD/DEV, BLUE/GREEN, A/B testing
● Alias Routinh ..percentage at v1 & percentage at v2
● .. need same role, same dead-letter queue and not $LATEST

● You can create one or more aliases for your Lambda function. A Lambda alias is like
a pointer to a specific function version.
● Users can access the function version using the alias Amazon Resource Name
(ARN).
● Aliases can point at a single version, or be configured to perform weighted routing
between 2 versions.
Lambda Environment Variables
● KEY & VALUE pair (0 or more)
● Associated with &LATEST (Can be edited)
● Associated with a version (immutable // fixed)
● Can be accesses within the execution environment
● Can be encrypted wirth KMS
● Allow code execution to be adjusted based on variables

https://docs.aws.amazon.com/lambda/latest/dg/configuration-envvars.html#configuration-env
vars-samples

Monitoring & Logging & Tracing Lambda Based Applications


● AWS Lambda integrates with other AWS services to help you monitor and
troubleshoot your Lambda functions.
● Lambda automatically monitors Lambda functions on your behalf and reports metrics
through Amazon CloudWatch.
● To help you monitor your code when it runs, Lambda automatically tracks the number
of requests, the invocation duration per request, and the number of requests that
result in an error.

Lambda Monitoring
● All Lambda metrics are available within CloudWatch
● .. directly or via monitoring tab on a specific function
● Dimensions - Function Name, Resource (Alias/Version), Executed Version
(combination alias and version, wighted alias) and ALL FUNCTIONS
● Invocations, Error, Duration, COncurrent Executions
● DeadLetterErrors, DestinationDeliveryFailures
Lambda Logging
● Lambda Execution Logs => CloudWatch Logs
● stdout or stderr
● Log Groups = /aws/lambda/functionsname
● Log Stream = YYY/MM/DD/[&LATEST || version]..random
● Permissions via Execution Role
● …default role gives logging permissions

Lambda Tracing
● X-Ray shows the flow of requests through your application
● Enable aActive Tracinga on a function
● aws lambda update-function-configuration --function-name my-function
--tracing-config Mode=active
● AWSXRau Daemon WriteAccess managed policy
● Use X-Ray SDK within your function

VPC Lambda + EFS

Lambda Layers
● You can configure your Lambda function to pull in additional code and content in the
form of layers. A layer is a .zip file archive that contains libraries, a custom runtime,
or other dependencies.
● With layers, you can use libraries in your function without needing to include them in
your deployment package.

https://aws.amazon.com/blogs/aws/new-for-aws-lambda-use-any-programming-language-an
d-share-common-components/
Lambda Container Images
● Lambda is a Function as a Service (FaaS) product
● Create a function, upload code, it executes
● This is great …but has 2 problems
● ORGS…uses containers & CI/CD processes build for containers…
● …would like a way of locally testing lambda functions before deployment
● Lambda runtime API - IN CONTAINER IMAGE
● AWS Lambda Runtime Interface Simulator (RIE) - Local Test

Lambda & ALB Integration


You can use a Lambda function to process requests from an Application Load Balancer.
Elastic Load Balancing supports Lambda functions as a target for an Application Load
Balancer. Use load balancer rules to route HTTP requests to a function, based on path or
header values. Process the request and return an HTTP response from your Lambda
function.

Elastic Load Balancing invokes your Lambda function synchronously with an event that
contains the request body and metadata.
Lambda and ALB - Multi-Value Headers

Lambda Resource Policy


● Required when a service needs to invoke lambda
● Required cross-account (lambda has no trust)
● NOT required for an identity in the same account
● Can view and perform limited edits in the UI
● .. also full control via the CLI/API

API Gateway
Amazon API Gateway is a fully managed service that makes it easy for developers to create,
publish, maintain, monitor, and secure APIs at any scale. APIs act as the "front door" for
applications to access data, business logic, or functionality from your backend services.

API Gateway - Refresher


● Create and manage APIs
● Endpoint/entry-point for applications
● Sits between applications & integrations (services)
● Highly available, scalable, handles authorization, throttling, caching CORS,
transformations, OpenAPI spec, direct integration and much more
● Can connect to services/endpoints in AWS or on-premises
● HTTP APIs, REST APIs and WebSocket APIs
API Gateway - Authentication

API Gateway - Endpoint Types


● Edge Optimized
○ Routed to the nearest CloudFront POP
● Regional - Clients in the same region
● Private - Endpoint accessible only within a VPC via interface endpoint

API Gateway - Stages

API Gateway - Errors


● 4XXX - Client Error - Invalid request on client side
● 5XXX - Server Error - Valid request, backend issue
● 400 - Bad Request - Generic
● 403 - Access Denied - Authorizer denies.. WAF Filtered
● 429 - API Gateway can throttle - this means you’ve exceeded that amount
● 502 - Bad Gateway Exception - bad output returned by lambda
● 503 - Service Unavailable - backing endpoint offline? Major service issues
● 504 - Integration Failure/Timeout - 29s limit
https://docs.aws.amazon.com/apigateway/api-reference/handling-errors/#api-error-codes

API Gateway - Caching

API Gateway Methods and Resources

API Gateway - Integrations


● You choose an API integration type according to the types of integration endpoint you
work with and how you want data to pass to and from the integration endpoint.
● For a Lambda function, you can have the Lambda proxy integration, or the Lambda
custom integration.
● For an HTTP endpoint, you can have the HTTP proxy integration or the HTTP
custom integration.
● For an AWS service action, you have the AWS integration of the non-proxy type only.
● API Gateway also supports the mock integration, where API Gateway serves as an
integration endpoint to respond to a method request.
https://docs.aws.amazon.com/apigateway/latest/developerguide/api-gateway-api-integration-
types.html

API Gateway - Overview

API Gateway - Integrations


● API Methods (client) are integrated with backend endpoint
● Different types of integrations are available…
● MOCK - Used for testing, no backend involvement
● HTTP - Backend HTTP Endpoint
● HTTP_PROXY - pass through to integration unmodified, return to the client
unmodified (backend need to use supported format)(Passed as is)
● AWS - Lets an API exposes AWS service actions
○ Needs to configure/mappings integrations request and integrations responses
● AWS_PROXY (Lambda) - Low admin overhead Lambda endpoint
https://docs.aws.amazon.com/apigateway/latest/developerguide/api-gateway-api-integration-
types.html

API Gateway - Mapping Template


● Used for AWS and HTTP (non PROXY) integrations
● Modify or Rename Parameters
● MOdify the body or headers of the request
● Filtering - removing anything which isn't needed
● Uses Velocity Template Language (VTL)
● Common exam scenario is REST API (on API Gateway) to SOAP API .. transform
the request using mapping template

API Gateway Stages and Deployments


● A stage is a named reference to a deployment, which is a snapshot of the API.
● You use a Stage to manage and optimize a particular deployment.
● For example, you can configure stage settings to enable caching, customize request
throttling, configure logging, define stage variables, or attach a canary release for
testing

● Changes made in API Gateway are not LIVE


● The current API state needs to be deployed to a stage
● Stages can be environments (PROD, DEV, TEST)
● ..or version (v1, v2, v3) - for breaking changes
● Each stage has its own configuration
● .. not immutable, can be overwritten and rolled back
API Gateway - Stage Variables

API Gateway - Open API and Swagger


● You can use API Gateway to import a REST API from an external definition file into
API Gateway.
● Currently, API Gateway supports OpenAPI v2.0 and OpenAPI v3.0 definition files.
● You can update an API by overwriting it with a new definition, or you can merge a
definition with an existing API.

● The OpenAPI Specification (OAS) defines a standard, language-agnostic interface to


RESTful APIs which allows both humans and computers to discover and understand
the capabilities of the service without access to source code, documentation, or
through network traffic inspection

Swagger and OpenAPI


● OpenAPI(OAS) formally known as swagger
● Swagger = OpenAPI v2
● OpenAPI v3 is a more recent version
● API Description format for REST APIs
○ It’s a way of defining what an API is, What is does, how to interact with it as
well as all the important information
○ In a way is like CloudFormation, but instead of defining AWS infrastructure, it
defines an API
● Endpoint(/listcats) and Operation (GET/listcats)
● Input and Output Parameters & Authentication Methods
● Non tech information - COntact Info, License, Terms of use
● API Gateway can export to Swagger/OpenAPI & import from them
Simple Notification Service
● The Simple Notification Service (SNS) is a pub/sub system in AWS for the reliable
delivery of notification style messages between AWS components or between AWS
and external systems.

● Public AWS service - network connectivity with Public Endpoint


● Coordinates the sending and delivery of messages
● Messages are <= 256KB payloads
● SNS topics are the base entity to a Topic
● Topics have Subscribers which receive messages
● eg..HTTP(s), Email(-JSON), SQS, Mobile Push, SMS Messages & Lambda
● SNS used across AWS for notifications - eg. CloudWatch & CloudFormation

● Delivery Status - (Including HTTP, Lambda, SQS)


● Delivery Retries - Reliable Delivery
● HA and Scalable (Region)
● Server-Side Encryption (SSE)
● Cross-Account via Topic Policy

Simple Queue Service


● Public, Fully Management, Highly-Available Queues - Standard or FIFO
● Messages up to 256KB in size - link to large data
● Received messages are hidden (Visibility Timeout)
● ..Then either reappear (retry) or are explicitly deleted
● Dead-Letter queues(DLQ) can be used for problem messages
(Generally Queues’s Function)

SNS and SQS Fanout

(1 Job logged x multiple different outputs are needed)

Simple Queue Service(SQS) - Limitations


● Standard = at-least-once, FIFO= exactly-once
● FIFO(Performance) 3000 messages per second with batching, or up to 300
messages per second without batching
○ Don’t offer exceptional level of scalling, different from Standard
● Billed based on ‘requests’
● 1 request = 1-10 messages up to 64KB total’
● Short (immediate) vs Long (waitTimeSeconds) Polling
○ Short - can be 0 or x messages in queue = orchestrator can request every
time and message can come empty (Billable!)
○ Long - can be setup up to 20s. If messages are available on the queue when
you launch the request they will be received, otherwise will wait for messages
to arrive.
■ It is how SQS should be polled
● Encryption at rest (KMS) & in-transit
● Queue policy…

SQS Standard vs FIFO Queue


● SQS = Miltu Lane Highway
● FIFO = Single Lane Highway

SQS Extend Client Library


● Used when handling messages over SQS max (256KB)
● Allows large payloads - stored in S3
● SendMessage, uploads to S3, stores link in message
● Receive message loads large payload from S3
● Delete message also deletes large S3 payload
● Interface for SQS+S3 - handling the integration workload
● Exam often mentions HAVA with Extended Client Library

https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-s3-
messages.htm

SQS Delay Queue


● Delay queues provide an initial period of invisibility for messages. Predefine periods
can ensure that processing of messages doesn't begin until this period has expired.
● https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/
sqs-message-timers.html
● https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/
sqs-delay-queues.html
● https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/
sqs-visibility-timeout.html

SQS vs SQS Delay Queue - Process

SQS Delay Queue x SQS Visibility Timeout


● Delay queue is in a way are similar to visibility timeouts because both features make
messages unavailable to consumers for a specific period of time
● But the difference between the two is that for delay queues, a message is hidden
automatically when it’s first added to the queue
● Using Visibility Timeouts, a message is initially visible, and it’s only hidden after it’s
consumed from the queue and automatically reappears if that message isn’t deleted
● Delay Queues are generally used when you need ti build in a delay in processing into
your application
○ Maybe you need to perform a certain set of tasks before you begin
processing a message, or maybe you want to add a certain amount of time
between an action that a customer takes and further processing of the
message that represents that action
● Visilibity Timeouts are used to support automatic messages.

SQS DeadLetter Queues


● Dead letter queues allow for messages which are causing repeated processing
errors to be moved into a dead letter queue
● In this queue, different processing methods, diagnostic methods or logging methods
can be used to identity message faults
Step Function
● Step functions is a product which lets you build long running serverless workflow
based applications within AWS which integrate with many AWS services.

Some problems with Lambda


● Lambda is FaaS
● 15-minutes max execution time
● Can be chained at scale
● Runtime Environments are stateless

State Machines
● Serverless workflow.. START -> STATES -> END
● States are THINGS which occur
● The maximum Duration is 1 year…
● Standard Workflow (Default)
● Express Workflow
○ High volume
○ Event processing workflow. ex? IOT
○ Streaming data processing transformation
○ Mobile Application Backend
● Started via API Gateway, IOT Rules, EventBridge, Lambda
https://github.com/acantril/learn-cantrill-io-labs/tree/master/aws-serverless-pet-cuddle-o-tron

Containers, ECR, EKS

Introduction to Containers

Virtualisation Problems
Containerization

Image Anatomy
Container Anatomy

Container Registry

Container Key Concepts


● Dockerfiles are used to build images
● Portable - self-contained, always run as expected
● Lighweight - Parent OS used, fs layers are shared
● Container only runs the application & environment it needs
● Provides much of the isolation VM’s do
● Ports are ‘exposed’ to the host and beyond
● Application stacks can be muilti-container

ECS - Concepts
● ContainerDefinition - Amazon Elastic Container Service
● TaskDefinition - Amazon Elastic Container Service
ECS

ECS Concepts
● Container Definition - Image & Ports
● Task Definition - Security (Task Role), Container(s), Resources
● Task Role - IAM Role which the TASK assumes
● Service - How many copies, HA, Restarts

ECS - Cluster Mode


● ECS is capable of running in EC2 mode or Fargate mode.
● EC2 mode deploys EC2 instances into your AWS account which can be used to
deploy tasks and services.
● With EC2 mode you pay for the EC2 instances regardless of container usage
● Fargate mode uses shared AWS infrastructure, and ENI's which are injected into
your VPC
● You pay only for container resources used while they are running.
ECS - EC2 Mode

ECS - Fargate Mode

EC2 vs ECS(EC2) vs Fargate


● EC2
○ Quickly test containers
○ Large Workload - price conscious
● ECS(EC2)
○ If you use containers
● Fargate
○ Large workload - overhead conscious
○ Small / Burst workloads
○ Batch / Periodic workloads

OPSWORKS
OpsWorks Stacks
● Handles with Infrastructure, Configuration and Application within one Service

OpsWorks Stacks
● A Stack is a set of layers, instances and related AWs resources whose configuration
you want to manage together
● Layers
○ Elastic Lead Balancing Layer
○ Application Server Layer
■ Use Cookbooks for the pplication deployment (Cookbooks = Chef
recepies)
● User have to indicate where they are: GIT, HTTP Archive, S3
Archive
○ Amazon RDS Layer

OpsWorks Stacks - Server Instances


AWS OpsWorks STacks provides three ways to manage the number of server instances
● 24/7 instances: Are started manually and run until they are manually stopped
● Time Based Instances: Are automatically started and stopped by AWS OpsWorks
Stacks on a user-specified schedule
● Load-Based Instances: Are automatically started and stopped by AWS OpsWorks
STacks when they cross a threshold for a user-specified load metric such a CPU or
memory utilization

OpsWorks Lifecycle Events


● Each layer has a set of five lifecycle events, each of which has an associated set of
recipes that are specific to the layer.
● When an event occurs on a layer's instance, AWS OpsWorks Stacks automatically
runs the appropriate set of recipes.
● To provide a custom response to these events, implement custom recipes and assign
them to the appropriate events for each layer.
● AWS OpsWorks Stacks runs those recipes after the event's built-in recipes.
https://docs.aws.amazon.com/us_en/opsworks/latest/userguide/workingcookbook-events.ht
ml

OpsWorks Stacks Lifecycle Events


The five lifecycle events:
● Setup: This event occurs after a started instance has finished booting.
○ You can also manually trigger the Setup event by using the Setup stack
command.
○ A Setup event takes an instance out of service. Because an instance is not in
the Online state when the Setup lifecycle event runs, instances on which you
run Setup events are removed from a load balancer.
● Configure: This event occurs on all of the stack's instances when one of the
following occurs:
○ An instance enters or leaves the online state.
○ You associate an Elastic IP address with an instance or disassociate one from
an instance.
○ You attach an Elastic Load Balancing load balancer to a layer, or detach one
from a layer.
● Deploy: This event occurs when you run a Deploy command, typically to deploy an
application to a set of application server instances.
○ The instances run recipes that deploy the application and any related files
from its repository to the layer's instances.
○ Setup includes Deploy; it runs the Deploy recipes after setup is complete.
● Undeploy: This event occurs when you delete an app or run an Undeploy command
to remove an app from a set of application server instances.
○ The specified instances run recipes to remove all application versions and
perform any required cleanup.
● Shutdown: This event occurs after you direct AWS OpsWorks Stacks to shut an
instance down but before the associated Amazon EC2 instance is actually
terminated.
○ AWS OpsWorks Stacks runs recipes to perform cleanup tasks such as
shutting down services.
○ If you have attached an Elastic Load Balancing load balancer to the layer and
enabled support for connection draining, AWS OpsWorks Stacks waits until
connection draining is complete before triggering the Shutdown event.
○ After triggering a Shutdown event, AWS OpsWorks Stacks allows Shutdown
recipes a specified amount of time to perform their tasks, and then stops or
terminates the Amazon EC2 instance. The default Shutdown timeout value is
120 seconds. If your Shutdown recipes might require more time, you can edit
the layer configuration to change the timeout value. For more information on
instance Shutdown, see Stopping an Instance.
○ Rebooting an instance does not trigger any lifecycle events.

OpsWorks Auto Healing & CloudWatch Events


https://docs.aws.amazon.com/opsworks/latest/userguide/workinginstances-autohealing.html

OpsWorks Auto Healing


● Every instance has an AWS OpsWorks Stacks agent that communicates regularly
with the service. AWS OpsWorks Stacks uses that communication to monitor
instance health. If an agent does not communicate with the service for more than
approximately five minutes, AWS OpsWorks Stacks considers the instance to have
failed.
● Auto healing is set at the layer level; you can change the auto healing setting by
editing layer settings

● An instance can be a member of multiple layers. If any of those layers has auto
healing disabled, AWS OpsWorks Stacks does not heal the instance if it fails.

If a layer has auto healing enabled—the default setting—AWS OpsWorks Stacks


automatically replaces the layer's failed instances as follows:
● Instance store-backed instance
○ Stops the Amazon EC2 instance, and verifies that it has shut down.
○ Deletes the data on the root volume.
○ Creates a new Amazon EC2 instance with the same host name,
configuration, and layer membership.
○ Reattaches any Amazon EBS volumes, including volumes that were attached
after the old instance was originally started.
○ Assigns a new public and private IP Address.
○ If the old instance was associated with an Elastic IP address, associates the
new instance with the same IP address.
● Amazon EBS-backed instance
○ Stops the Amazon EC2 instance, and verifies that it has stopped.
○ Starts the EC2 instance.

Credentials and Secrets

SSM Parameter Store

Parameters
● Standard Parameters and SecureString (used for credentials)
● Hierarccal information
● Configuration for the CloudWatch agent
● Anything of this nature

AWS Secrets Manager

About
● It does share functionality with Parameter Store
● Designed for secrets (.. password, API Keys..)
● Usable via Console, CLI, API or SDK (integration)
● Supports automatic rotation, this uses lambda
● Directly integrates with some AWS products (...RDS)
ECS Cluster with Fargate

Advanced Identity and Permissions


Security Token Service (STS)
● Generates temporary credentials (sts:AssumeRole*)
● Similar to access keys
● .. they expire and don’t belong to the identity
● Limited Access
● Used to access AWS resources
● Requested by an Identity (AWS or EXTERNAL)

Permissions Boundaries & Uses Cases

Delegation Problems
● Julie is a AdministratorAccess
● Julie wants Bob to be an IAM administrator
● Julie gives Bob iam:* to manahe identities
● Nothing stops Bob changing it’s own permissions
● Nothing stops Bob creating a FullAdministrator

IAM Administrator Job Role

AWS Permissions Evaluation


● Organization SCPs
● Resource Policies
● IAM Identity Boundaries
● Session Policies
● Identity Policies
● What about different accounts?

Policy Evaluation Logic - Same Account


Policy Evaluation Logic - Different Account

Route53

R53 Public Hosted Zones

R53 Hosted Zone


● A R53 Hosted Zones is a DNS DB for a domain
● Globally resilient (multiple DNS Servers)
● Created with domain registration via R53 - can be created separately
● Host DNS Records (A, AAAA, MX, NS, TXT…)
● Hosted Zones are what the DNS system references - Authoritative for a domain

R53 Public Hosted Zones


● DNS Database (zone fiole) hosted by R53 (Public Name Servers)
● Accessible from the public internet & VPCs
● Hosted on “4” R53 Name Servers (NS) specific for the zone
● .. use “NS records” to point at these NS (connecte to global DNS)
● Resource Records (RR) created within the Hosted Zone
● Externally registered domains can point at R53 Public Zones
R53 Private Hosted Zones
● A Public Hosted Zone …which isn’t public
● Associated with VPC
● …only accessible in those VPC
● …Using different accounts is supported via CLI/API
● Split-view (overlapping public & private) for PUBLIC and INTERNAL use with the
same zone name
R53 Split View Hosted Zones

R53 CNAME vs ALIAS

R53 CNAME
● “A” maps a NAME to an IP Address
● leticia.io => 1.3.3.7
● CNAME maps a NAME to another NAME
● … www.leticia.io => leticia.io
● CNAME is invalid for naked/apex (leticia.io)
● Many AWS services use a DNS Name (ELBs)
● With just CNAME - leticia.io => ELB would be invalid

R53 ALIAS
● ALIAS records map a NAME to an AWS resource
● Can be used both for naked/apex and normal records
● For non apex/naked - functions like CNAME
● There is no charge for ALIAS requests pointing at AWS resources
● For AWS Services - default to picking ALIAS
● Should be the same “Type” as what the record is pointing at
● API Gateway, CloudFront, Elastic Beanstalk, ELB, Global Accelerator & S3
R53 Simple Routing

R53 Health Checks


● Health check are separate from, but are used by records
● Health checkers located globally
● Health checker check every 30s (every 10s costs extra)
● TCP, HTTP/HTTPS, HTTP/HTTPS with String Matching
● Healthy or Unhealthy
● Endpoint, CloudWatch Alarm, Checks of Checks (Calculated)

Amazon Route 53 health checks monitor the health and performance of your web
applications, web servers, and other resources. Each health check that you create can
monitor one of the following:
● The health of a specified resource, such as a web server
● The status of other health checks
● The status of an Amazon CloudWatch alarm
R53 Failover Routing

● Failover routing lets you route traffic to a resource when the resource is healthy or to
a different resource when the first resource is unhealthy
R53 Multi Value Routing

● Multivalue answer routing lets you configure Amazon Route 53 to return multiple
values, such as IP addresses for your web servers, in response to DNS queries.
● You can specify multiple values for almost any record, but multivalue answer routing
also lets you check the health of each resource, so Route 53 returns only values for
healthy resources

R53 Weighted Routing

● Weighted routing lets you associate multiple resources with a single domain name
(catagram.io) and choose how much traffic is routed to each resource.
● This can be useful for a variety of purposes, including load balancing and testing new
versions of software.
R53 Latency Routing

● If your application is hosted in multiple AWS Regions, you can improve performance
for your users by serving their requests from the AWS Region that provides the
lowest latency.

R53 Geolocation Routing

● Geolocation routing lets you choose the resources that serve your traffic based on
the geographic location of your users, meaning the location that DNS queries
originate from.
R53 Geoproximity Routing

● Geoproximity routing lets Amazon Route 53 route traffic to your resources based on
the geographic location of your users and your resources.
● You can also optionally choose to route more traffic or less to a given resource by
specifying a value, known as a bias.
● A bias expands or shrinks the size of the geographic region from which traffic is
routed to a resource.

R53 Interoperability
● R53 normally has 2 jobs - Domain registrar and DOmain Hosting
● R53 can do BOTH, or either Domain Registrar or Domain Hosting
● R53 Accepts your money (domain registration fee)
● R53 allocates 4 Name Servers (NS) (Domain Hosting)
● R53 creates a zone file (domain hosting) on the above NS
● R53 communicates with the registry of the TLD(Domain Registrar)
● .. sets the NS records for the domain to point at the 4 NS above

● This details how Route53 provides Registrar and DNS Hosting features and steps
through architectures where it is used for BOTH, or only one of those functions - and
how it integrates with other registrars or DNS hosting.
R53 - Both Roles

R53 - Resgistrar Only


R53 - Hosting Only

Code Suite - SDLC Automation

CICD in AWS
● CI/CD is handled within AWS by CodeCommit, CodeBuild, CodeDeploy and
CodePipeline.
● For the SA Pro, you don't need to have a detailed understanding operationally, but
you will need a high-level, component level understanding.
● appspec.yml or appspec.json reference
https://docs.aws.amazon.com/codedeploy/latest/userguide/reference-appspec-file.ht
ml
● buildspec.yml reference
https://docs.aws.amazon.com/codebuild/latest/userguide/build-spec-ref.html
● buildspec.yml, appspec.[yml | json]
● CodeDeploy
● AWS Elastic Beanstalk or AWS OpsWorks
● AWS CloudFormation
● AWS ECS or ECS (Blue/Green)
● AWS Service Catalog or Alexa Skills Kit
● Amazon S3

CodePipeline
● AWS CodePipeline is a continuous delivery service you can use to model, visualize,
and automate the steps required to release your software.
● You can quickly model and configure the different stages of a software release
process.
● CodePipeline automates the steps required to release your software changes
continuously

CodePipeline - Basic
● Pipeline is a Continuous Delivery tool
● Controls the flow from source, through build towards deployment
● Pipelines are build from STAGES
● STAGES can have sequential or parallel ACTIONS
● Movement between stages cab require manual approval
● Artifacts can be loaded into an action, and generated from an action
● State Changes => Event bridge (Success, Failed, Cancelled)
● CloudTrail or Console UI can be used to view/interact
CodeBuild
● AWS CodeBuild is a fully managed continuous integration service that compiles
source code, runs tests, and produces software packages that are ready to deploy.
● With CodeBuild, you don’t need to provision, manage, and scale your own build
servers.
● CodeBuild scales continuously and processes multiple builds concurrently, so your
builds are not left waiting in a queue

CodeBuild - Basic
● Code Build as a service - fully managed
● Pay only for the resources consumed during builds
● Alternative to part of Jenkins functionality
● Used for builds and tests
● Uses docker with AWS services ..KMS, IAM, VPC, CloudTrail, S3

CodeBuild - Architecture
● Architecture - Gets sources from GitHub, CodeCommit, CodePipeline, S3
● Build … (and tests)
● Customised via buildspec.yml file (in root source)
● Logs => S3 and CloudWatch Logs
● Metrics => CloudWatch
● Event => EventBridge (event-driven response)
● Java, Ruby, Python, Node.JS, PHP, .NET, Go..and more…
● buildspec.yml - customize the build process
● Four main PHASES in the file
○ Install - install packages in the build environment (framework etc)
○ pre_build - sign-in to things or install dependencies
○ build - commands run during the build process
○ post_build - package things up, push docker image, explicit notifications
○ Environment variables - shell, variables, parameter-store, secrets-manager
○ Artifacts - What stuff to put where

CodeDeploy
● CodeDeploy is a deployment service that automates application deployments to
Amazon EC2 instances, on-premises instances, serverless Lambda functions, or
Amazon ECS services.

CodeDeploy - Basic
● Code Deployment as a Service
● There are alternatives - Jenkins, Ansible, Chef, Puppet, CloudFormation and more…
● Deploys code ..not resources
● EC2, On-premises, Lambda Functions and ECS
● Code, Web, COnfiguration, EXE files, Packages, Scripts, media and more
● CodeBuild integrates with AWS services & AWS Code* tools
● CodeDeploy agent (On-premises or EC2)

CodeDeploy - Configuration
● Appspec.yml (YAML or JSON formatted)
● Manage Deployments - config + lifecycle event hooks
● Files (EC2/On-prem)
○ Provides information to CodeDeploy, about which files from your application
should be installed on the instance during the deployment install
○ This is how you configure which things are installed
● Resources (ECS, Lambda)
○ Lambda
■ For lambda, it contains the name, alias, current version of a lambda
function
■ So it can be used to control all of the surrounding details about the
lambda function that is being used for the deployment
○ ECS
■ Contains things like the task definition or container and port details
used for routing traffic to your container
■ Think as the configuration for the thing running your application
● Permissions (EC2, On-prem)
○ Details any special permissions and how they should be applied to the files,
directories and folders, which are defined in the file sections
○ So if you use the file section to copy any files from your application on to
these deployments targets, then it’s permissions sections that’s going to be
used to set any special permissions on those files and folders
● Lifecycle Event Talk - Depends on what and where is being deployed
○ ApplicationStop
■ Generally used when you want to prepare for the actual deployment
itself
○ DownloadBundle
■ This is when the CodeDeploy agent copies the application down to a
temporary location
○ BeforeInstall
■ This is an event that you can use for any pre-installation tasks
● Maybe you want to decrypt some files, or create a backup of
the current application or configuration, anything that you want
to do before the install itself
○ Install
■ During this part of the deployment lifecycle, the CodeDeploy copies
the application files from the temporary location to the final destination
folder
■ This is performed by the CodeDeploy agent and you can’t run any
scripts during this step
■ This is something that’s handled on your behalf by the CodeDeploy
product and CodeDeploy agent itself
○ AfterInstall
■ Allows you to perform install steps
■ So performing any application-specific configuration, maybe changing
file permissions or applying licensing, anything that you want to do
after the install
○ ApplicationStart
■ Typically used when you want to restart or start any services that were
stopped during the ApplicationStart component of the deploy
■ This is the part when you fully installed, you’ve performed all the
configuration, and now you’re wanting to start up the application
service or services
○ ValidadeService*
■ Where you are going to verify that the component was completed
successfully
■This is the part that is going to allow CodeDeploy to determine
whether the deployment was successful or not
■ Is time to look at the application specifically, check any application
logs, or perform any tests to verify that the application has been
deployed as expected
● Remember the order and the names for the exam
Elastic Container Registry (ECR)

ECR - High-level Architecture


● Managed Container image registry service
● ..like Docker hub..but for AWS
● Each AWS account has a public and private registry
● Each registry can have many repositories
● Each repository can contain many images
● Images can have several tags
● Public = public R/O … R/W requires permissions
● Private = permissions required for any R/O or R/W

ECR - Benefits
● Integrated with IAM - Permissions
● Image scanning, basic and enhanced (inspector)
● Near RealTime Metrics => CW (auth, push, pull)
● API Actions = CloudTrail
● Events => EventBridge
● Replication …Cross-Region AND Cross-Account

Jenkins
● Open source CICD tool
● Can replace CodeBuild, CodePipeline & CodeDeploy
● Must be deployed in a Mater / Slave configuration
● Must manage multi-AZ, deploy on EC2, etc…
● All projects must have a “Jenkinsfile” (similar to buildspec.yml) to tell Jenkins what to
do
● Jenkins can be extended on AWS thanks to many plugins
Jenkins Architecture

Jenkins on AWS
Jenkins with CodePipeline

Monitoring and Logging

CloudWatch

CloudWatch - Architecture Concepts


● Ingestion, Storage and Management of Metrics
● Public Service - public space endpoints
● ..AWS Service integration - management plane
● ..Agent integration -> EC2 - richer metrics
● ..On-prem integration via API/Agent (custom metrics)
● View data via console UI, CLI, API, dashboards & anomaly detection
● Alarms..react to metrics, can be used to notify or perform actions
CloudWatch - Data
● Namespace = container for metrics (AWS/EC2 & AWS/Lambda)
● Datapoint = Timestamp, Value, (optional) unit of measure
● Metric .. time-ordered set of points
○ CPUUtilization, NetworkIn, DiskWriteBytes - EC2
● Every metric has a MetricName (CPUUtilization) and a Namespace (AWS/EC2)
● DImension .. name/value pair
○ CPUUtilization Name = InstanceId, Value i-111111111
○ CPUUtilization Name = InstanceId, Value i-22222222
● Can also be used for data agregation
○ AutoScalingGroupName, ImageId, InstanceId, InstanceType
● Resolution .. Standard (60s granularity)..High(1s)
● Retention .. sub 60s Retained for 3 hours
● 60s (1 minute) retained for 15 days
● 300s (5 minute) retained for 63 days
● 3600s (1 hour) retained for 455 days
● As data ages, its aggregated and stored for longer with less duration
● Statistics .. aggregation over a period (Min, Max, Sum, Average..)
● Percentile - (p95 & P97.5)

CloudWatch - Alarms
● Alarm - Watches a metric over a time period
● ..ALARM or OK
● ..Value of metric vs threshold ..over time
● ..one or more actions
● Alarm Resolution
CloudWatch - Data Architecture

CloudWatch Logs
● CloudWatch logs is a product which can store, manage and provide access to
logging data for on-premises and AWS environments including systems and
applications
● It can also via subscription filters stream the data to Lambda, Elasticsearch, Kinesis
streams and firehose for further delivery
● Metric filters can be used to generate Metrics within Cloudwatch, alarms and
eventual events within Eventbridge.

CloudWatch Logs - Ingestion


● Public Service - Store, Monitor, Access logging data
● AWS, On-Prem, IOT or any application
● CWAgent - system or custom application logging
● VPC Flow Logs
● CloudTrail …Account events and AWS API Calls
● Elastic Beanstalk, ECS COntainer Logs, API GTW .. Lambda execution logs
● Route53 - Log DNS Requests
CloudWatch Logs - Subscriptions
CloudWatch Logs - Aggregation

CloudWatch Logs - Summary


● For any long management scenarios default to CloudWatch Logs
● On-Premises AND AWS
● Export to S3 ..CreateExportTask - 12 hours
● Near Realtime or persist logs - Kinesis Firehose
● Firehose for any ‘firehose destinations’
● Realtime - Lambda or Kinesis Data Stream (KCL consumers)
● Elasticsearch - AWS Managed Lambda
● Metric Filter … scan log data, generate a CloudWatch Metric

Athena

Athena - Basic
● Amazon Athena is an interactive query service that makes it easy to analyze data in
Amazon S3 using standard SQL.
○ Athena is serverless, so there is no infrastructure to manage, and you pay
only for the queries that you run.
● Athena is easy to use.
○ Simply point to your data in Amazon S3, define the schema, and start
querying using standard SQL.
○ Most results are delivered within seconds.
○ With Athena, there’s no need for complex ETL jobs to prepare your data for
analysis.

DB, Data Analytics & Streaming

Kinesis Data Streams


● Kinesis data streams are a streaming service within AWS designed to ingest large
quantities of data and allow access to that data for consumers.
● Kinesis is ideal for dashboards and large scale real time analytics needs.
● Kinesis data firehose allows the long term persistent storage of kinesis data onto
services like S3
● This lesson ends by evaluating the differences between SQS and Kinesis, and
identifying key factors in exam questions which suggest picking one vs the other.

Kinesis Data Streams - Concepts


● Kinesis is a scalable streaming service
● Producers send data into a kinesis stream
● Streams can scale from low to near-infinite data rates
● Public service & highly available by design
● Streams store a 24-hour moving window of data
● Multiple consumers access data from that moving window

Kinesis Data Streams - Architecture

SQS vs Kinesis
● Kinesis
○ Designed for huge scale ingestion of data
○ Multiple consumers ..rolling window
○ Ingestion of data at scale
○ Large throughput
○ Large numbers of devices
○ Persistence
○ Hight data rates
○ Consuming data at different rates
○ The consumer can consume data either in real-time or periodically
○ Move backward and forwards through time
○ Data ingestion, Analytics, Monitoring, App Clicks
● SQS
○ SQS 1 production group, 1 consumption group
○ Decoupling and Asynchronous communication
○ Persistence of messages, no window
○ Worker pools

Kinesis Data Firehose


● Kinesis Data Firehose is a stream based delivery service capable of delivering high
throughput streaming data to supported destinations in near realtime.
● Its a member of the kinesis family and for the PRO level exam it's critical to have a
good understanding of how it functions in isolation and how it integrates with AWS
products and services.

Kinesis Data Firehose


● Fully managed service to load data for data lakes, data store, and analytics services
● Automatic scaling … fully serverless.. resilient
● Ner Real-Time delivery (delay of ~60seconds)
● Supports transformation of data on the fly (lambda)
○ May have latency due to the type of transformation
● Billing - volume through firehose

Kinesis Data Firehose - Architecture

Kinesis Data Analytics


● Amazon Kinesis Data Analytics is the easiest way to analyze streaming data, gain
actionable insights, and respond to your business and customer needs in realtime.
● it is part of the kinesis family of products and is capable of operating in realtime on
high throughput streaming data.

Kinesis Data Analytics


● Real time processing of data
● ..Using Structured Query Language (SQL)
● Destinations
○ Near Real Time: Firehose (S3, Redshift, ElasticSearch & Splunk). Or any
other indirect service
○ RealTime:
■ AWS Lambda
■ Kinesis Data Streams

Kinesis Data Analytics - Architecture

Kinesis Data Analytics - When and Where


● Streaming data needing real-time SQL processing
● Time-series analytics … elections / e-sports
● Real-time dashboards - leaderboards for games
● Real-time metrics - Security & Response teams

MapReduce

MapReduce
● Data Analysis Architecture - huge scale, parallel processing
● Two Main Phases - MAP and REDUCE
● Optional - Combine & Partition
● Data is separated into ‘splits’ .. each assigned to a mapper
● Perform Operations at scale - Customisable
● Recombine Data into Results
● HDFS - Hadoop File System
● Highly Fault-tolerant - replicated between nodes
● Name Node - provides the ‘namespace’ for file system & controls access to HDFS
● Block ..segment of data on HDFS .. generally 64MB

EMR Architecture
● Elastic Map Reduce (EMR) is the AWS Managed implementation of EMR/Hadoop
within AWS.

EMR Architecture
● AWS Managed Implementation af Apache Hadoop
● .. and Spark, HBase, Presto, Flink, Hive, Pig…
● Can be operated long term .. or use ad-hoc (transient) clusters
● Runs in ONE AZ in a VPC using EC2 for compute
● Auto scales - Spot, Instance Fleet, Reserved, On-Demand
● Big data processing , manipulation, analytics, indexing, transformation and more
… (data pipeline *)

Amazon Redshift
● Redshift is a column based, petabyte scale, data warehousing product within AWS
● Its designed for OLTP products within AWS/on-premises to add data to for long term
processing, aggregation and tending.

Redshift - Basic
● Petabyte-scale Data warehouse
● OLAP(Columm based) not OLTP (row/transaction)
● Pay as you use … similar structure to RDS
● Direct Query S3 using Redshift Spectrum
● Direct Query other DBs using federated query
● Integrates with AWS tooling such as Quicksight
● SQL-like interface JDBC/ODBC connections

Redshift Architecture
● Server based (not serverlss)
● One AZ in a VPC - Network cost/perfromance
● Leader Node - Query input, planning and aggregation
● Compute Node - performing queries of data
● VPC Security, IAM Permissions, KMS at rest Encryption, CW Monitoring
● Redshift Enhanced VPC Routing - VPC Networking
Amazon Quicksight
● QuickSight is a a BA/BI Visualisation and Dashboard tool which is capable of
integrating with AWS and external data sources

Amazon Quicksght
● Business Analytics & Intelligence (BA/BI) service
● Visualizations, Ad-hoc Analysis
● Discovery and Integration with AWS Data Sources
● .. and works with external data sources
● In the exam … dashboards or visualisation

Amazon Quicksght - Sources


● Athena, Aurora, Redshift, Redshift Spectrum
● S3, AWS IOT
● Jira, GitHub, Twitter, SalesForce
● Microsoft SQL Server, MySQL, PostgreSQL
● Apache Spark, Snowflake, Presto, Teradara

Amazon Athena
● Amazon Athena is an interactive query service that makes it easy to analyze data in
Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to
manage, and you pay only for the queries that you run.
● Athena is easy to use. Simply point to your data in Amazon S3, define the schema,
and start querying using standard SQL. Most results are delivered within seconds.
With Athena, there’s no need for complex ETL jobs to prepare your data for analysis.

Amazon Athena - Basic


● Serverless Interactive Quering Service
● Ad-hoc queries on data - pay only data consumed
● Schema-on-read - table-like translation
● Original data never changed - remains on S3
● Schema translates data => relational-like when read
● Output can be sent to other services

Amazon Athena - Exam


● Queries where loading/transformation isn’t desired
● Occasional / Ad-hoc queries on data in S3
● Serverless querying scenarios - cost conscious
● Querying AWS logs - VPC Flow Logs, CloudTrail, ELB logs, cost reports etc…
● AWS Glue Data Catalog & Web Server Logs
● w/ Athena Federated Query .. other data sources

System Manager
SSM Architecture and Agent Activation
● Systems Manager uses an agent architecture to allow communications between the
systems manager service and managed instances.

AWS System Manager


● View and control AWS and on-premises infrastructure
● Agent based - installed on windows and Linux AWS AMI’s
● Manage Inventory & Patch Assets
● Run commands & Manage Desired State
● Parameter Store … configuration and secrets*
● Securely connect to EC2 .. even in private VPCs

AWS System Manager - Agent

SSM Run Command


● Systems Manager Run Command is a foundational feature of Systems manager
which allows for commands to be executed on managed instances at scale
● It uses command documents which define WHAT is executed AWS Systems
Manager documents

AWS Run Command


● Run ‘Command documents’ on managed instances
● No SSH/RDP Access Required
● Instances, Tags or Resource Groups
● Command documents can be reused & can have parameters
● Rate COntrol - Concurrency * Error Threshold
● Output Options - S3 .. SNS
● EventBridge (*CloudWatch Events) Target
SSM Documents
● An AWS Systems Manager document (SSM document) defines the actions that
Systems Manager performs on your managed instances.
● Systems Manager includes more than 100 pre-configured documents that you can
use by specifying parameters at runtime.
● Documents use JavaScript Object Notation (JSON) or YAML, and they include steps
and parameters that you specify.

SSM Documents
● JSON or YAML documents
● Stored in the SSM Document Store
● Ask for Parameters and include Steps
● Command Document - Run Command, State Manager & Maintenance Windows
● Automation Document - Automation, State Manager & Maintenance Windows
● Package Document - Distributor
○ Contains a .ZIP payload
SSM Inventory & SSM Patching
● Patch Manager, a capability of AWS Systems Manager, automates the process of
patching managed instances with both security related and other types of updates.

Patch Manager - Concepts


● Patch Baseline
○ PreDefined ones, Custom ones and there are different ones depending on
OS
● Patch Groups
○ Groups of resources (specific resources)
● Maintenance Windows
● Run Command
○ Base level functionality
● Concurrency & Error Threshold
● Compliance
● Predefined Patch Baselines - Various OS (you can also create your own)
● For Linux - AWS-[OS]DefaultPatchBaseline, explicitly define patches
○ .. AWS-AmazonLinux2DefaultPatchBaseline
○ .. AWS-UbuntuDefaultPatchBaseline
● Windows - AWS-DefaultPatchBaseline - Critical and Security Updates
● AWS-WindowsPreDefinedPatchBaseline-OS - Same as above
● AWS-WindowsPreDefinedPatchBaseline-OS - + MS App Updates

Patch Manager - Patching Architecture

Config & Service Catalog

AWS Config
● AWS Config is a service which records the configuration of resources over time
(configuration items) into configuration histories.
● All the information is stored regionally in an S3 config bucket.
● AWS Config is capable of checking for compliance .. and generating notifications and
events based on compliance.

AWS Config
● Record configuration changes over time on resources
● Auditing of changes, compliance with standards
● Does not prevent changes happening … no protection
● Regional service … supports cross-region and account aggregation
● Changes can generate SNS notifications and near-realtime events via EventBridge &
Lambda

AWS Config - Architecture

AWS Service Catalog


● AWS Service Catalog is an AWS implementation of a Service Catalog system,
● It provides an end-user portal where products and portfolios can be deployed in a
self-service way as defined by technical administrators.

What is a Service Catalog


● Document or Database created by an IT Team
● Organized collection of products
● Offered by the IT Team
● Key Product Information:
○ Product Owner
○ Cost
○ Requirements
○ Support Information
○ Dependencies
● Defines approval of provisioning from IT and Customer side
● Manage costs and scale service delivery
AWS Service Catalog
● Self-Service Portal for ‘end users’
● … launch predefined (by admin) products
● End user permissions can be controlled
● Admins can define those products
● .. and the permissions required to launch them
● Build products into portfolios

AWS Service Catalog - Architecture

Inspector, Trusted Advisor & GuardDuty

AWS Inspector
● Amazon Inspector is an automated security assessment service that helps improve
the security and compliance of applications deployed on AWS.
● Amazon Inspector automatically assesses applications for exposure, vulnerabilities,
and deviations from best practices

Inspector - Basic
● Scans EC2 instances & the instance OS (and any other networking components
involved)
○ It’s not checking AMI’s or the applications themselves
● ..Vulnerabilities and deviations against best practice
● Length…15 min, 1 hour, 8/12 hours or 1 day
● Provides a report of findings ordered by priority
● Network Assessment (Agentless)
○ But you can add an agent to provide additional richer information

Inspector - Rules Packages


● Rules packages determine what is checked
● Network Reachability (no agent required)
● .. Agent can provide additional OS visibility
● Check reachability and to end
○ EC2, ALB, DX, ELB, ENI, IGW, ACLs, RT’s, SG’s, Subnets, VPC, VGWs &
VPC Peering
● The network reachability package returns the following types of findings
○ RecognizedPortWithListener
■ If it is exposed to the public networks from a security perspective and
is the operating system listening on that port?
○ RecognizedPortNoListener
■ It then checks if it’s a recognized port but with no listener, so where it’s
exposed to the internet but nothing on the operating system is
listening
○ RecognizedPortNoAgent
■ And then if you don’t use an agent it can identify any recognized port
which are exposed. But if there is no agent to check if the operating
system is listening, it can’t confirm that
■ This is why using an agent always adds more information versus no
agent
○ UnrecognizedPortWithListener
■ It can identify any unrecognized ports which are exposed with
listeners on the operating system

Inspector - HostPackages
● Packages (..Host Assessments, Agent required)
● Common vulnerabilities and exposures (CVE)
● Center for Internet Security Benchmarks (CIS)
● Security best practices for Amazon Inspector

AWS GuardDuty
● Guard Duty is an automatic thread detection service which reviews data from
supported services and attempts to identify any events outside of the 'norm' for a
given AWS account or Accounts.

● Continuous security monitoring service


● Can be integrated with Analyses supported Data Sources
● …plus AI/ML, plus threat intelligence feeds
● identifies unexpected and unauthorized activity
● …notify or event-driven protection/remediation
● Supports multiple accounts (MASTER and MEMBER accounts)
● Input data Includes
○ CloudTrail Logs: Unusual API calls, unathorized deployments
○ VPC Flow Logs: Unusual internal traffic, unusual IP addresses
○ DNS Logs: Compromised EC2 instances sending encoded data with DNS
queries
● Notifies you in case of findings with SNS
● Integration with AWS Lambda

Amazon Macie
● Amazon Macie is a data visibility security service that helps classify an protect your
sensitive and business-critical content

Macie
● Available on N. Virgina and Oregon Region

Trusted Advisor
● AWS Trusted Advisor is an online tool that provides you real time guidance to help
you provision your resources following AWS best practices.
● Trusted Advisor checks help optimize your AWS infrastructure, increase security and
performance, reduce your overall costs, and monitor service limits.

● Accounts level - no agents to install, it just works


● Cost Optimization, Performance, Security, Fault Tolerance and Service Limits
● 7 Core checks with basic & developer support plans
● Anything beyond requires Business or Enterprise

Trusted Advisor - Basic Price Plans


● Free 7 core checks (basic or developer)
● S3 Bucket Permissions - NOT OBJECTS
● Security Groups - Specific Ports Unrestricted
● IAM Use
● MFA on Root Account
● EBS Public Snapshots
● RDS Public SNapshots
● 50 Service limit checks

Trusted Advisor - Best Price Plans


● Business and Enterprise Support
● Core 7 check, PLUS
● 115 Further checks (14 costs, 17 security, 24 fault tolerant, 10 performance and 50
service limit)
● Access via the AWS Support API
○ Checks any time required
○ Summaries and detailed information programmatically for your Trusted
Advisor checks, and requests that Trusted Advisor checks be refreshed
○ Allows you to obtain the status of each Trusted Advisor check, either all of
them or individual checks
■ This means you can code your own applications or integrate your
internal support systems with Trusted Advisor
● CloudWatch Integration - react to changes

AWS Cost Allocation

AWS Cost allocation Tags


● With Tahs we can track resources that relate to each other
● With Cost Allocation Tags we can enable detailed costing reports
● Just like Tags, but they show up as columns in Reports
● Types of Cost Allocation Tags
○ AWS Generated Cost Allocation Tags
■ Automatically applied to the resource you create
■ Starts with Prefix aws: (e.g. aws:createdBy)
■ They are not applied to resources created before activation
○ User Tags
■ Defined by the user
■ Starts with Prefix user:
● Cost Allocation Tags just appear in the Billing COnsole
● Takes up to 24 hours for the tags to shown up in the report

HA, FT and DR

High-Availability vs Fault-Tolerance vs Disaster Recovery


● High Availability (HA), Fault-Tolerance (FT) and Disaster Recover (DR) are three
essential concepts to understand for every Solutions Architect.
● What's more important is understanding the differences between the three -
specifically HA and FT.
● Most technical people don't have a correct grasp .. this lesson aims to ensure that
you do, before starting the main course content.

High-Availability (HA)
● Aims to ensure an agreed level of operational performance, usually uptime, for a
higher than normal period
● Maixmixe systems online time
○ 99.9% (Three 9’s) = 8.77 hours p/year downtime
○ 99.999% (Five 9’s) = 5.26 minutes p/year downtime
● Fast or automatically recovery of issues

Fault-Tolerance (FT)
● Is the property that enables a system to continue operating properly in the event of
the failure of some (one or more faults within) of its components
● Minimize outages, levels of redundancy and system components which can route
traffic and sessions around any failed components
● Operate through failure
● Expensive - because it’s much more complex

Disaster Recovery (DR)


● A set of policies, tools and procedures to enable the recovery or continuation of vital
technology infrastructure and systems following a natural or human-induced disaster
● DR is designed to keep the crucial and non replaceable parts of your system safe, so
that when a disaster occurs, you don’t lose anything irreplaceable and can rebuild
after the disaster

Summary
● High-Availability - Minimise any outages
● Fault-Tolerance - Operate Through Faults
● Disaster Recovery - Used when these don’t work

DR Tips
● Backup
○ EBS Snapshots, RDS automated backupd/snapshots
○ Resgular pushes to S3, S3 IA, Glacier, Lifecycle policy, Cross Region
Replication
○ From on-premise: snowball or storage gatewa
● High availability
○ Use Route53 to migrate DNS over from Region to Region
○ RDS multi-az, ElasticCache Multi-AZ, EFS, S3
○ Site to Site VPN as a recovery from Direct Connect
● Replication
○ RDS Replication (cross region), AWS Aurora + global tables
○ Database replication from on-premise to RDS
○ Storage Gateway
● Automation
○ Cloudfromation / Elastic Beanstalk to re-create a whole new environment
○ Recover / Reboot EC2 instances with cloudwatch if alarms fail
○ AWS Lambda functions for customized automations
● Chaos Testing
○ Netflix has a "simian-army" randomly terminating EC2

DR Checklist
● EFS Backup
○ AWS backup with EFS (frequency, when, retain time, lifecycle policy) -
managed
○ EFS to EFS backup
○ Multi-region: EFS -> S3 -> S3 CRR -> EFS
● Route 53 Backup
○ Use ListResourceRecordSets API for exports
○ Write your own script for imports into R53 or other DNS provider
● Elastic Beanstalk Backup
○ Saved configuration usgin the eb or AWS console

Storage

Elastic Block Store (EBS) Service Architecture

Elastic Block Store (EBS)


● Block Storage - raw disk allocations(volume) - can be encrypted using KMS
● …instances see block device and create a file system on this device (ext3/4, xfs)
● Storage is provisioned in ONE AZ (Resilient in that AZ)
● Attached to *one EC2 instance (or other service) over storage network
● ..detached and reached, not lifecycle linked to one instance..persistent
● Snapshot (backup) into S3. Create volume from snapshot (migrate between AZs)
● Different physical storage types. different sizes, different performance profiles
● Billed based on GB-month (and in some cases performance)
EBS Volume Types - General Purpose
● EBS Provides a number of different volume types, this lesson will be covering
● General Purpose SSD — Provides a balance of price and performance. We
recommend these volumes for most workloads.

EBS - General Purpose SSD - GP2


● Boot volumes
● Low-Latency interactive apps
● Development
● Testing

EBS - General Purpose SSD - GP3


● Standard: 3000 IOPS & 125MiB/s
○ Anything more of IOPS(Performance) it is not adjusted automatically like GP2
■ which scales on size
● Cheaper(~20%) than GP2
● Use Cases
○ Virtual Desktops
○ Medium sized
○ Single instance databases such as MSSQL Server and Oracle DB
○ Low-latency interactive Apps
○ Development
○ Testing
○ Boot volumes

EBS Volume Types - Provisioned IOPS


● EBS Provides a number of different volume types, this lesson will be covering
● Provisioned IOPS SSD — Provides high performance for mission-critical, low-latency,
or high-throughput workloads.

EBS - Provisioned IOPS SSD (io1/io2)


● Are configurable independent of the size of the volume
● Designed for super high-performance situation
● Max of 64.000 IOPS per volume(4x GP2/GP3)
● Up to 256.000 IOPS per volume (Block Express)
● Up to 1.000 MB/s throughput
● Up to 4.000 MB/s throughput (Block Express)
● 4 GB-16 TB io1/io2
● 4 GB-64 TB BlockExpress
● io1 50 IOPS/GB max
● io2 500 IOPS/GB max
● BlockExpress1000 IOPS/GB max
EBS Volume Types - HDD-Based
● EBS Provides a number of different volume types, this lesson will be covering
● Throughput Optimized HDD — A low-cost HDD designed for frequently accessed,
throughput-intensive workloads.
● Cold HDD — The lowest-cost HDD design for less frequently accessed workloads.

EBS - HDD-based
● st1
○ Throughput Optimized
○ Fast hard drive
○ Not very agile
○ Cheap
○ Ideal for larger volumes of data
○ Designed for data sequentially accessed
■ Data which needs to be written or read in a fairly sequential way
○ From 125 GB top 16TB in size
○ Max of 500 IOPS
■ 500 IOPS = 500MB per second
○ Performance of 40 MB/s/TB Base
○ Performance of 250 MB/s/TB Burst
○ Use Cases
■ Big data
■ Data warehouses
■ Log processing
● sc1
○ Cold HDD
○ Even cheaper
■ But comes with significant trade-offs
○ Designed for data infrequent workload
○ Geared towards maximum economy
○ Just for store lots of data and performance doesn’t matter
○ From 125 GB top 16TB in size
○ Max of 250 IOPS = 250 MB/s
○ 12 MB/s/TB Base
○ 80 MB/s/TB Burst
○ Use Cases
■ Anythong requiring less than a few loads or scan per day

Instance Store Volumes - Architecture


● An instance store provides temporary block-level storage for your instance. This
storage is located on disks that are physically attached to the host computer.
● Instance store is ideal for temporary storage of information that changes frequently,
such as buffers, caches, scratch data, and other temporary content, or for data that is
replicated across a fleet of instances, such as a load-balanced pool of web servers.
● An instance store consists of one or more instance store volumes exposed as block
devices.
● The size of an instance store as well as the number of devices available varies by
instance type.
● The virtual devices for instance store volumes are ephemeral[0-23]. Instance types
that support one instance store volume have ephemeral0. Instance types that
support two instance store volumes have ephemeral0 and ephemeral1, and so on.

Instance Store Volumes


● Block Storage Devices
● Physically connected to one EC2 Host
● Instances on that host can access them
● Highest storage performance in AWS
● Included in instance price
● Attached at Launch
● Lost on instance move, resize, or hardware failure
● High performance
● You pay for it anyway - included in instance price
● Temporary Data
● D3 = 4.6 GB/s thoughput
● I3 = 16 GB/s of sequential thoughput
● More IOPS and Throughput vs EBS
https://docs.aws.amazon.com/pt_br/AWSEC2/latest/UserGuide/InstanceStorage.html

Storage Gateway - Volume Gateway


● Storage gateway is a product which integrates local infrastructure and AWS storage
such as S3, EBS Snapshots and Glacier.
● This lesson looks at Gateway Volume - Stored and Gateway Volume Caches
● It explores the features and architectures which each supports.

Storage Gateway - Basic


● Virtual machine (or hardware appliance*)
● Presents storage using iSCSI, NFS or SMB
● Integrates with EBS, S3 and Glacier within AWS
● Migrations, Extensions, Storage Tiering, DR and Replacement of backups
systems
● For the exam … picking the right mode…
Storage Gateway - Volume Stored

Storage Gateway - Volume Cached

Storage Gateway - Tape Gateway(VTL)


● Storage gateway in VTL mode allows the product to replace a tape based backup
solution with one which uses S3 and Glacier rather than physical tape media.

Storage Gateway Tape(VTL) - Basic


● Large backupd => TAPE
● LTO-9 Media can hold 24TB Raw Data
● … up to 60TB Compressed
● 1 Tape Drive can use 1 tape at a time
● Loaders(Robots) can swap tapes
● A Library is 1+ drivers(s), 1+ loaders(s) and slots
● Drive … library…shelf(anywhere but the library)

Traditional Tape Backup

Storage Gateway Tape(VTL)

Storage Gateway - File Gateway


● File gateway bridges local file storage over NFS and SMB with S3 Storage
● It supports multi site, maintains storage structure, integrates with other AWS products
and supports S3 object lifecycle Management
● Links Mentioned
○ NotifyWhenUploaded - Storage Gateway
○ TechStudySlack

Storage Gateway - File


● Bridged on-premises file storage and S3
● Mount Points (shared) available via NFS or SMB
● Map directly onto an S3 Bucket
● Files stored into a mount point, are visible as objects in an S3 buckets
● Read and Write Caching ensure LAN-like performance

Storage Gateway - Multiple Contributors

● NotifyWhenUploaded - Storage Gateway


Storage Gateway - Multiple Contributors and Replication

Storage Gateway - File Gateway

S3 Security (Resource Policies & ACL)


● S3 Security is controlled via a combination of Identity Policies, Bucket Policies
(Resource Policies) and Legacy Bucket and Object ACLs
● This lesson introduces bucket policies and warns you off using ACLs
● Bucket Policy Examples ::
○ https://docs.aws.amazon.com/AmazonS3/latest/dev/example-bucket-policies.
html

S3 Bucket Policies
● A form of resource policy
● Like identity policies, but attached to a bucket
● Resource perspective permissions
● ALLOW/DENY same or different accounts
● ALLOW/DENY Anonymous principals

Access Control Lists (ACLs)


● ACLs on objects and bucket
● A Subresource
● Legacy
● Infexible & Simples permissions

Permission Bucket Object

READ Allows grentee to list the objects Allows grentee to read the
object data and its metadata

WRITE Allows grantee to create, overwrite, Not applicable


and delete any object in the bucket

READ_ACP Allows grantee to read the bucket Allows grantee to read the
ACL object ACL

WRITE_ACP Allows grentee to write the ACL for Allows gratee to write the
the applicable bucket ACL for the applicable
object

FULL_CONTROL Allows grantee the READ, WRITE, Allows grantee the READ,
READ_ACP, an WRITE_ACP READ_ACP, and
permissions on the bucket WRITE_ACP permissions
on the object
Block Public Access

S3 Security - Exam
● Identity: Controlling different resources
● Identity: You have a preference for IAM
● Identity: Same Account
● Bucket: Just conrtolling S3
● Bucket:Anonymous or Cross-Account
● ACLs: NEVER - unless you must

S3 Static Hosting
● Accessing S3 is generally done via APIs
● Static Website Hosting is a feature of the product which lets you define a HTTP
endpoint, set index and error documents and use S3 like a website.
● This lesson exposures the functionality and some common usages.
● S3 Pricing : https://aws.amazon.com/s3/pricing/

S3 Static Hosting
● Normal access is via AWS APIs
● This feature allows access via HTTP
● Index and Error documents are set
● Website Endpoint is created
● Custom Domain via R53 - BucketName Matters
Static Website Hosting

S3 Object Versioning & MFA

Object Versioning

● Versioning lets you store multiple versions of objects within a bucket


● Operations which would modify objects generate a new version
● …cannot be switched off - only suspended
● Space is consumed by All versions
● You are billed for ALL versions
● Only way to 0 costs - is to delete the bucket

MFA Delete
● Enabled in versiosning configuration
● MFA is reuired to change bucket versiosning state
● MFA is required to delete versions
● Serial number (MFA) + Code passed with API CALLS

S3 Object Encryption
● This lesson steps through the various encryption options available within S3 and
finishes by looking at default bucket encryption settings
○ Client-Side Encryption
○ SSE-C
○ SSE-S3
○ SSE-KMS
● As part of the lesson we review how SSE-KMS impacts permissions and how it can
achieve role separation.

S3 Encryption
● Buckets aren’t encrypted.. objects are..
● Encryption AT REST
○ Client-Side Encryption
○ Server-Side Encryption
● Server-Side Encryption with Customer-Provided (SSE-C)
● Server-Side Encryption with Amazon S3-Managed Keys (SSE-S3)
● Server-Side Encryption with Customer Master Keys (CMKs) Stored in AWS Key
Management Service (SSE-KMS)

SSE-C
SSE-S3(AES256)

SSE-KMS
Summary

Bucket Default Encryption

S3 Object Store Classes


● This lesson steps through the S3 storage classes
● Covers S3 Standard, S3 Standard-IA and S3 One Zone-IA, S3 Glacier, S3 Glacier
Deep Archive and Intelligent-Tiering
● https://aws.amazon.com/s3/pricing/
● https://aws.amazon.com/s3/storage-classes/

S3 Standard
● Objects are replicated across at least 3 AZ in a Region
● 99.9999999999% durability (11’s 9)
● Replication over 3 AZ & content-MD5 Checksums and Cyclic Redundancy
Checks(CRCs) are used to detect and fix any data corruption
● HTTP/1 200 OK response is provided by S3 API Endpoints when Objects are stored
● Billed a GB/m fee for data stored
○ a $ per GB charge for transfer OUT (IN is free) and price per 1.000 requests
○ No specific retrieval fee, no minimum duration, no minimin size
● Milliseconds first byte latency and objects can be publicity available
● Use Cases
○ Frequently Accessed Data which is important
○ Non Replaceable

S3 Standard-IA
● Objects are replicated across at least 3 AZ in a Region
● 99.9999999999% durability (11’s 9)
● Replication over 3 AZ & content-MD5 Checksums and Cyclic Redundancy
Checks(CRCs) are used to detect and fix any data corruption
● Billed a GB/m retrieval fee
○ Overall cost increase with frequent data access
○ a $ per GB charge for transfer OUT (IN is free) and price per 1.000 requests
● Minimum duration charge od 30 days
○ Objects can be stored for less, but the minimum billing always applies
● Minimum capacity of 128KB per object
● Use Case
○ Long-lived data which is important
○ But where access is infrequent
S3 One Zone-IA
● Objects are replicated across at least 3 AZ in a Region
● 99.9999999999% durability (11’s 9)
● Replication for only 1 AZ
○ Assuming that the AZ where the data is stored in doesn’t fail during the time
period
● Billed a GB/m retrieval fee
○ Overall cost increase with frequent data access
● Minimum duration charge od 30 days
○ Objects can be stored for less, but the minimum billing always applies
● Minimum capacity of 128KB per object
● Use Case
○ Long-lived data
○ Non-critical & replaceable
○ Infrequent access
S3 Glacier
● Objects are replicated across at least 3 AZ in a Region
● 99.9999999999% durability (11’s 9)
● Replication over 3 AZ & content-MD5 Checksums and Cyclic Redundancy
Checks(CRCs) are used to detect and fix any data corruption
● Data in Glacier is retrieved to S3 Standard-IA temporarily
○ Expedited - 1-5 minutes
○ Standard - 3-5 hours
○ Bulk- 5-12 hours
○ Faster = More Expensive
○ First byte latency = minutes or hours
● Object cannot be made publicly accessible…any access of data (beyond object
metadata) requires a retrieval process
● 40 KB min size
● 90 day min Duration
● Use Case
○ Archival data where frequent or realtime access isn’t needed
■ Minutes - hours retrieval

S3 Glacier Deep Archive


● Objects are replicated across at least 3 AZ in a Region
● 99.9999999999% durability (11’s 9)
● Replication over 3 AZ & content-MD5 Checksums and Cyclic Redundancy
Checks(CRCs) are used to detect and fix any data corruption
● Object cannot be made publicly accessible…any access of data (beyond object
metadata) requires a retrieval process
● Data in Glacier is retrieved to S3 Standard-IA temporarily
○ Standard - 12 hours
○ Bulk - up to 48 hours
○ First byte latency = minutes or hours
○ Much longer than Glacier
● Use Case
○ Archival data that rarely if ever needs to be accessed - hours or days for
retrieval
■ Legal or regulation data storage

S3 Intelligent-Tiering
● Tiers
○ Frequent Access Tier
○ Infrequent Access
○ Archive
○ Deep Archive
● Intelliget-Tiering monitors and automatically moves any objects not accessed for 30
days to a low cost infrequent access tier and eventually to archive or deep archive
tiers
● As objects are accesses, they are moved back to the frequent access tier
○ There are no retrieval fess for accessing objects, only a 30 day minimum
duration
● Has a monitoring and automation cost per 1.000 objects
○ The frequent access tier cost the same as S3 Standard, the infrequent the
same as Standard-IA, Archive and Deep Archive are comparable to their
glacier equivalents
● Use Cases
○ Long-lived data
○ With changing and unknown patterns
S3 Lifecycle Configuration

S3 Licecycle Configuration - Basic


● A lifecycle configuration is a set of rules
● Rules consist of actions..
● .. on a Bucket or groups of objects
● Transition Actions
● Expiration Actions

S3 Licecycle Configuration - Transitions


S3 PreSigned URL
● Presigned URL's are a feature of S3 which allows the system to generate a URL with
access permissions encoded into it, for a specific bucket and object, valid for a
certain time period.

Presigned URL

● You can create a URL for an object you have no access too
● When using the URL, the pemrissions match the identity which generated it..
● Access denied could mean the generating ID never had access .. or doesn’t now
● Don’t generate with a role .. URL stops working when temporary creadentials
expire…

S3 Select & Glacier Select


● S3 and Glacier Select allow you to use a SQL-Like statement to retrieve partial
objects from S3 and Glacier.
S3 Select and Glacier Select
● S3 can store HUGE objects (up to 5TB)
● You often want to retrieve the entire object
● Retrieving a 5TB object .. takes time, uses 5TB
● Filtering at the client side side doesn’t reduce this
● S3/Glacier select let you use SQL-Like statements…
● .. to select part of the object, pre-filtered by S3
● CSV, JSON, Parquet, BZIP2 compression for CSV and JSON

Cross-Origin Resource Sharing(CORS)


● Cross-origin resource sharing (CORS) defines a way for client web applications that
are loaded in one domain to interact with resources in a different domain.
● With CORS support, you can build rich client-side web applications with Amazon S3
and selectively allow cross-origin access to your Amazon S3 resources.
● https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS
Cross-Origin Resource Sharing (CORS)

● Simple Requests
● Preflight & Preflighted requests
● Access-Control-Allow-Origin
● Access-Control-Max-Age
● Access-Control-Allow-Methods
● Access-Control-Allow-Headers
● Define Origin (original URL used)..other domains need CORS

S3 Events
● The Amazon S3 notification feature enables you to receive notifications when certain
events happen in your bucket.
● To enable notifications, you must first add a notification configuration that identifies
the events you want Amazon S3 to publish and the destinations where you want
Amazon S3 to send the notifications.
● You store this configuration in the notification subresource that is associated with a
bucket

S3 Event Notifications
● Notification generated when events occur in a bucket
● .. can be delivered to SNS, SQS and Lambda Functions
● Object Created (Put, Post, Copy. CompleteMultiPartUpload)
● Object Deleted (*, Delete, DeleteMarkerCreated)
● Object Restore (Post(Initiated), Completed)
● Replication (OperationMissedThreshold, OperationReplicatedAfterThreshold,
OperationNotTracked, OperationFailedReplication)

S3 Access logs
● Server access logging provides detailed records for the requests that are made to a
bucket. Server access logs are useful for many applications.
○ For example, access log information can be useful in security and access
audits.
● It can also help you learn about your customer base and understand your Amazon
S3 bill.
S3 Access Logs

S3 Object Lock
● You can use S3 Object Lock to store objects using a write-once-read-many (WORM)
model. It can help you prevent objects from being deleted or overwritten for a fixed
amount of time or indefinitely.
● You can use S3 Object Lock to meet regulatory requirements that require WORM
storage, or add an extra layer of protection against object changes and deletion.

S3 Object Lock
● Object Lock enabled on ‘new buckets’ (Support re for existing)
● Write-Once-Read-Many(WORM) - No Delete, No Overwrite
● Requites versioning - individual versions are locked
● 1 - Retention Period
● 2 - Legal Hold
● Both, Once or the other, or none
● A buckeyt can have default Object Lock Settings
S3 Object Lock - Retention
● Specify DAYS & YEARS - A Retention Period
● COMPLIANCE - Can’t be adjusted, deleted, overwritten
● .. even by the account root user
● .. Until retention expires
● GOVERNANCE - special permissions can be granted allowing lock settings to be
adjusted
● s3:BypassGovernanceRetention..
● …. x-amz-bypass-governance-retention:true (console default)

S3 Object Lock - Legal Hold


● Set on an object version - ON or OFF
● .. no retention
● NO deletes or Changes until removed
● s3:PutObjectLegalHold is required to add or remove
● Prevent accidental deletion of critical object versions

S3 Access Points
● Amazon S3 Access Points, a feature of S3, simplifies managing data access at scale
for applications using shared data sets on S3.
● Access points are unique hostnames that customers create to enforce distinct
permissions and network controls for any request made through the access point.
● Creating access points - Amazon Simple Storage Service
S3 Access Points
● Simplify managinf access to S3 Buckets/Objects
● Rather than 1 bucket w/ 1 Bucket Policy…
● .. create many access points
● .. each with different policies
● .. each with different network controls
● Each access points has its own endpoint address
● Created via Console or aws s3control create-access-point --name secretcats
--account-id 1234565226 --bucket catpics

Elastic File System(EFS) Architecture


● The Elastic File System (EFS) is an AWS managed implementation of NFS which
allows for the creation of shared 'filesystems' which can be mounted within multi EC2
instances.
● EFS can play an essential part in building scalable and resilient systems.
● # Lesson Links
○ File-system permissions - Wikipedia
○ Amazon EFS performance - Amazon Elastic File System
○ EFS storage classes - Amazon Elastic File System
○ Amazon EFS lifecycle management - Amazon Elastic File System

EFS - Architecture
● EFS ia an implementation of NFSv4
● EFS Filesystems can be mounted in Linux
● Shared between manu EC2 Instances
● Private service, via mount targets inside a VPC
● Can be accessed from on-premises - VPN or DX

● Linux Only
● General Purpose and MAX IO Performance Modes
● General Purpose = default dor 99.9% of uses
● Bursting and Provisioned Throughput Modes
● Standard and Infrequent Access (IA) Classes

FSx for Windows File Server


● FSx for Windows Servers provides a native windows file system as a service which
can be used within AWS, or from on-premises environments via VPN or Direct
Connect
● FSx is an advanced shared file system accessible over SMB, and integrates with
Active Directory (either managed, or self-hosted).
● It provides advanced features such as VSS, Data de-duplication, backups, encryption
at rest and forced encryption in transit.

FSx for Windows File Server


● Fully managed native windows file servers/shared
● Designed for integration with windows environments
● Integrates with Directory service or self-managed AD
● Single or Multi-AZ within a VPC
● On-demand and Scheduled Backups
● Accessible using VPC, Peering, VP, Direct Connect
FSx - Key Features and Benefits
● VSS - User-Driven Restores
○ Files and Folders levels Restores
● Native Windows file system accessible over SMB
● Windows permission model
● Supports DFS ..scale-out file share structure
● Managed - no file server admin
● Integrates with DS and your own directory

FSx for Lustre

FSx for Lustre


● Managed Lustre - Designed for HPC - LINUX Clients (POSIX)
● Machine Learning, Big Data, Financial Modeling
● 100’s GB/s throughput & sub millisecond latency
● Deployment types - Persistent or Scratch
● Scratch - Highly optimized for Short term no replication & fast
● Persistent - longer term, HA(in one AZ), self-healing
● Accessible over VPN or Direct Connect
● Metadata stored on Metadata Targets(MST)
● Objects are stored on called object storage targets (OSTs)(1.17TiB)
● Baseline performance based on size
● Size - min 1.2TiB then increments of 2.4TiB
● For Scratch - Base 200 MB/s per TiB of storage
● Burst up to 1,300 MB/s per TiB (Credit System)

● Scratch is designed for pure performance


○ Short term or temp workloads
○ NO HA .. NO REPLICATION
○ Larger file system means more servers, more disks and more chance of
failure
● Persistent has replication within ONE AZ only
○ Auto-heals when hardware failure occurs
● You can backup to S3 with both!!
○ Manual or Automatic 0-35 day retention

Content Delivery Network (CDN)

CloudFront Architecture

CloudFront - Basis

CloudFront Terms
● Origin - The source location of your content
● S3 Origin or Custom Origin
● Distribution - The ‘configuration’ unit of CloudFront
● Edge Location - Local cache of your data
● Regional Edge Cache - Larger version of an edge location
○ Provides enother layer of caching
CloudFront Architecture

TTL and Invalidations


● This lesson steps through how CloudFront handles object expiry and invalidation
○ Covering
○ Default TTL, Minimum TTL, Maximum TTL
○ Cache Invalidation
TTL and Invalidations

● More frequent cache HITS = lower origin load


● Default TTL (behaviour) = 24 hours (validity period)
● You can set Minimum TTL and Maximim TTL values
● Origin Header: Cache-Control max-age (seconds)
● Origin Header: Cache-Control s-maxage (seconds)
● Origin Header: Expires (Date & Time)
● Custom Origin or S3 (Via Object metadata)
● Cache Invalidation .. performed on a distribution
● … applies to all edge locations ..takes time
● /image/whiskers1.jpg
● /image/whiskers*
● /image/*
● /*
● Version file names … whiskers1_v1.jpg // _v2.jpg // _v3.jpg
AWS Certificate Manager
● AWS Certificate Manager is a service that lets you easily provision, manage, and
deploy public and private Secure Sockets Layer/Transport Layer Security (SSL/TLS)
certificates for use with AWS services and your internal connected resources

AWS Certificate Manager (ACM) - Basic


● HTTP - SImple and Insecure
● HTTPS - SSL/TLS Layer of Encryption added to HTTP
● Data is encrypted in-transit
● Certificates prove identity
● Chain of trust - Signed by a trusted authority
● ACM lets you run a public or private Certificate Authority (CA)
● Private CA - Applications need to trust your private CA
● Public CA - Browsers trust a list of providers, which can trust other providers

AWS Certificate Manager (ACM)


● ACM can generate or import Certificates
● If generated .. it can automatically renew
● If imported .. you are responsible for renewal
● Certificates can be deployed out to supported services
● Supported AWS Services ONLY (CloudFront and ALBs..NOT EC2)
● ACM is a regional service
● Certs cannot leave that region they are generated or imported in
● To use a cert with an ALB in ap-southeast-2 you need a cert in ACM in
ap-southeast-2
● Global Services such as a CloudFront operate as though within ‘us-east-1’
CloudFrton, SSL/TLS & SNI
● AWS Certificate Manager is a service that lets you easily provision, manage, and
deploy public and private Secure Sockets Layer/Transport Layer Security (SSL/TLS)
certificates for use with AWS services and your internal connected resources

CloudFront and SSL


● CloudFront Default Domain Name (CNAME)
● https://d11111abcdef8.cloudfront.net/
● SSL SUpported by default …*.cloudfront.net cert
● Alternate Domain Names (CNAME) ex: cdn.categram…
● Verify Ownership (optionally HTTPS) using a matching certificate
● Generate or import in ACM .. in us-east-1
● HTTP or HTTPS, HTTP => HTTPS, HTTPS only
● Two SSL COnnections: Viewer => CLoudFront and CloudFront => Origin
● … Both need valid public certificates (and intermediate certs)

CloudFront and SNI


● Historically..
● ..every SSL enabled site needed its own IP
● Encryption starts at the TCP connection …
● Host headers happens after that - Layer 7 // Application
● SNI is a TLS extension, allowing a host to be included
● Resulting in many SSL Certs / Hosts using a shared IP
● Old browsers don’t support SNI … CF charges extra for dedicated IP

CloudFront and SSL/SNI

CloudFront Security - OAI & Custom Origins


● This lesson covers the main ways to secure origins from direct access (bypassing
CloudFront)
○ Origin Access identities (OAI) - for S3 Origins
○ Custom Headers - For Custom Origins
○ IP Based FW Blocks - For Custom Origins.

Securing the CF Content Delivery Path

Origin Access Identity (OAI)


● An OAI is a type of identity
● It can be associated with CloudFront Distributions
● CloudFront ‘becomes’ that OAI
● That OAI can be used in S3 Bucket Policies
● DENY all BUT one or more OAI’s
Securing Custom Origins

Lambda@Edge
● Lambda@Edge allows CloudFront to run lambda function at CloudFront edge
locations to modify traffic between the viewer and edge location and edge locations
and origins.
● Lambda@Edge example functions - Amazon CloudFront

Lambda@Edge
● You can run lighweight Lambda at edge locations
● Adjust data between the Viewer & Origin
● Currently supports Node.js and Python
● Run in the AWS public Space (NOT VPC)
● Layers are not supported
● Different Limits vs Normal Lambda Functions
Lambda@Edge - Use Cases
● A/B testing - Viewer Request
● Migration Between S3 Origins - Origin Request
● Different Objects Based on Device - Origin Request
● COntent By Country - Origin Request

CloudFront Geo-Retsriction
● There are two common architectures for restricting access to content via CloudFront.
● The build in feature set - CloudFront Geo Restriction allows for White or Black list
restrictions based on ONLY Country Code
● 3rd Party Geolocation requires a compute instance, a private distribution and the
generation of signed URLs or Cookies - but can restrict based on almost anything
(licensing, user login status, user profile fields and much more)

CloudFront Geo Restriction


● CloudFront Geo Restriction
● 3rd Party Geolocation
● CF - Whitelist or Blacklist - COUNTRY ONLY
● CF - GeoIP Database 99.8%+ Accurate
● CF - Applies to the entire distribution
● 3P - Completly customisable…
3rd Party Geo Location (or anything!)

Database

DynamoDB Architecture Basis


● DynamoDB is a NoSQL fully managed Database-as-a-Service (DBaaS) product
available within AWS.
● In this lesson I step through the key architectural components and features you will
need to understand for the exam.

DynamoDB
● NoSQL Public Database-as-a-Service(DBaaS) - Key/Value & Document
● No self-managed servers or infrastructure
● Manual / AUuomatic provisioned performance IN/OUT or On-Demand
● Highly Resilient …across AZs and optionally global
● Really fast..single-sigit milliseconds (SSD based)
● Backups, point-in-time recovery, encryption at rest
● Event-Driven integration … do things when data changes
DynamoDB Tables

DynamoDB On-Demand Backups


Point-in-Time Recovery (PITR)

DynamoDB Considerations
● NoSQL…preference DynamoDB in the exam
● Relational Data … generally NOT DynamoDB
● Key/Value .. preference DynamoDB in the exam
● Access via console, CLI, API .. ‘NO SQL’
● Billed based RCU, WCU, Storage and features

DynamoDB Operations, Consistency and Performance

Reading and Writing


● On-Demand - unknown, unpredictable, low admin
● On-Demand - price per million R or W units
● Provisioned…RCU and WCU set on a per table basis
● Every operation consumed at lead 1 RCU/WCU(*)
● 1 RCU 1 x 4KB read operation per second(*)
● 1 WCU is 1 x 1 KB write operation per second
● Every table has a RCU and WCU brust pool (300 seconds)
Query

Scan

DynamoDB Consistency Moodel


WCU Calculation
● Ifyou need to store 10 ITEMS per second … 2.5K average size per ITEM
● Calculate WCU per item…ROUND UP (ITEM SIZE / 1 KB) (3)
● Multiply by average number per second (30)
● = WCU Required (30)

RCU Calculation
● If you need to retrieve 10 ITEMS per second … 2.5K average size
● Calculate RCU per item … ROUND UP (ITEM SIZE / 4KB) (1)
● Multiply by average read ops per second (10)
● Strongly Consistent RCU Required (10)
● (50% of strongly consistent = Eventually Consistent RCU Required (5)

DynamoDB Indexes (LSI and GSI)


● Local Secondary Indexes (LSI) and Global Secondary Indexes (GSI) allow for an
alternative presentation of data stored in a base table.
● LSI allow for alternative SK's whereas with GSIs you can use alternative PK and SK.
● This lesson details the difference between them - and steps through an example way
they can help improve database query performance.

DynamoDB Indexes
● Query is the most efficient operation in DDB
● Query can only work on 1 PK value at a time..
● .. and optionally a single, or range of SK values
● Indexes are alternative views on table data
● Different SK (LSI) or Different PK and SK (GSI)
● Some or all attributes (projection)

Local Secondary Indexes (LSI)


● LSI is an alternative view for a table
● MUST be crated with a table
● 5 LSI’s per base table
● Alternative SK on the table
● Shared the RCU and WCU with the table
● Attributes - ALL, KEYS_ONLY & INCLUDE

Global Secondary Indexes (GSI)


● Can be created at any time
● Default limite of 20 per base table
● Alternative PK and SK
● GSI’s have their own RCU and WCU allocations
● Attributes - ALL, KEYS_ONLY & INCLUDE

LSI and GSI Considerations


● Careful with projection (KEYS_ONLY, INCLUDE, ALL)
● Queries on attributes NOT projected are expensive
● Use GSI as default, LSI only when strong consistency is required
● Use indexes for alternative access patterns
DynamoDB Streams and Triggers
● DynamoDB Streams are a 24 hour rolling window of time ordered changes to ITEMS
in a DynamoDB table
● Streams have to be enabled on a per table basis , and have 4 view types
○ KEYS_ONLY
○ NEW_IMAGE
○ OLD_IMAGE
○ NEW_AND_OLD_IMAGES
● Lambda can be integrated to provide trigger functionality - invoking when new entries
are added on the stream.

Streams Concepts
● Time ordered list of ITEM CHANGES in a table
● 24-Hour rolling window
● Enabbled on a per table basis
● Recods INSERTS, UPDATES and DELETES
● Different view types influence whats is the stream

DynamoDB Streams

Trigger Concepts
● ITEM changes generate an event
● That event contains the data which changed
● A action is taken using that data
● AWS = Stream + Lambda
● Reporting & Analytics
● Aggregation, Messaging or Notifications
DynamoDB Triggers

DynamoDB Accelerator
● DynamoDB Accelerator (DAX) is an in-memory cache designed specifically for
DynamoDB.
● It should be your default choice for any DynamoDB caching related questions.

Traditional Caches vs DAX


DAX Architecture

DAX Considerations
● Primary NODE (Write) and Replicas (Read)
● Nodes are HA .. Primary failure = election
● In-Memory cache - Scaling … Much faster reads, reduced costs
● Scale UP and Scale OUT (Bigger or More)
● Supports write-through
● DAX Deployed WITHIN a VPC

DynamoDB Global Tables


● DynamoDB Global Tables provides multi-master global replication of DynamoDB
tables which can be used for performance, HA or DR/BC reasons.

DynamoDB Global Tables


● Global tables provides multi-master cross-region replication
● Tables are created in multiple regions and added to the same global table (becoming
replica tables)
● Last writer wins is used for conflict resolution
● Reads dn Writes can occur to any region
● Generaly sub-second replic
● Strongly consistent reads only in the same region as writes
DynamoDB Time-To-Live (TTL)
● Amazon DynamoDB Time to Live (TTL) allows you to define a per-item timestamp to
determine when an item is no longer needed.
● Shortly after the date and time of the specified timestamp, DynamoDB deletes the
item from your table without consuming any write throughput.
● TTL is provided at no extra cost as a means to reduce stored data volumes by
retaining only the items that remain current for your workload’s needs

DynamoDB TTL

HA vs FT vs DR

Types DR - Cold, Warm, PilotLight


DB / BC ARchitecture
● Effective DR / BC costs money all of the time
● You need some type of extra resource
● Executing a DR/BC process takes time
● How long .. depends on the type of DR/BC
● Types of DR/BC (from cheap but slower to recover to expensive but faster to recover)
○ Backup & Restore
○ Pilot Light
○ Warm Standby
○ Active / Active Multi-Site

DB / BC - Backup & Restore


● Data is constantly backed up at the primary site.
○ The only costs are backup media & management - no ongoing spare
infrastructure costs
● Restore requires new hardware or a lengthy restore process

DB / BC - Pilot Light
● A secondary environment is provisioned in advanced running only the absolute
minimum of infrastructure - like a pilot light in a heater
○ … it can be powered on much quicker than backup and restore
● Critical components such as Databases are always syncing ready to be used
DB / BC - Warm Standby
● A smaller sized, but fully functional version of your primary insfrastructure is
running 24/4/365…
○ Ready to be increased in size when failover is required
○ .. faster than pilot light ..cheaper than full active

DB / BC - Multi Site - Active / Active/


● Data is constantly replicated from the primary site to backup
○ Costs are generally 200%
○ .. a full copy is running at all times …
● You can load balance across environments
○ … improving HA and performance
DB / BC - Architecture
● How important DR is to your business?
● How quickly do you need recovery?
● Backups - Cheap and Slow
● Pilot Light - Fairly Cheap but Faster
● Warm Standby - Costly, but quick to recover
● Active/Active - Expensive…no recovery time

DR Architecture - Storage
● This lesson steps through how the failure of various different parts of the AWS
infrastructure platform will effect Instance Store Volumes, EBS, EFS, S3 and S3
Snapshots

DR Architecture - Storage
DR Architecture - Compute

DR Architecture - Compute

DR Architecture - Database

DR Architecture - Database

DR Architecture - Global Databases


● DynamoDB Global Tables
● Aurora Global Databases
● RDS Cross Region Read Replica
DR Architecture - Networking

DR Architecture - Networking
DR Architecture - Global Networking

EC2 - Launch Configuration and Templates

Launch Configuration(LC) and Launch Templates(LT) Key Concepts


● Allow you to define the configuration of an EC2 inatance in advance
● AMI, Instance Type, Storage & Key Par
● Networking and Security Groups
● Userdata & IAM Role
● Both are NOT editable - defined once. LT has versions
● LT provide newer features - including T2;T3 Unlimited, Placement Groups, Capacity
Reservations, Elastic Graphics

LC and LT - Architecture
Auto-Scaling Groups
● An Auto Scaling group contains a collection of Amazon EC2 instances that are
treated as a logical grouping for the purposes of automatic scaling and management.
● An Auto Scaling group also enables you to use Amazon EC2 Auto Scaling features
such as health check replacements and scaling policies.
● Both maintaining the number of instances in an Auto Scaling group and automatic
scaling are the core functionality of the Amazon EC2 Auto Scaling service.

ASG - Basic
● Automatic Scaling and Self-Healing for EC2
● Uses Launch Templates or Configuration
● Has a Minimum, Desired and Maximum Size
● Keep running instances at the Desired capacity by provisioning or terminating
instances
● Scaling Policies automate based on metrics

ASG - Architecture
ASG - Policies
● Manual Scaling - Manually Adjust the desired capacity
● Scheduled Scaling - Time based adjustment
● Dynamic Scaling
○ Simple - “CPU above 50% +1”, “CPU Below 50% -1”
○ Stepped Scaling - Bigger +/- based on difference
○ Target Tracking - Desired Aggregate CPU = 40% ..ASG handle it
● Cooldown Periods …

ASG - Health
ASG - ASG + Load Balancers

ASG - Scaling Processes


● Launch and Terminate - SUSPENDED and RESUME
● AddToLoadBalancer - add to LB on launch
● AlarmNotification - accept notification from CW
● AZBalance - Balances instances evenly across all of the AZ’s
● HealthCheck - instance health check on/off
● ReplaceUnhealthy - Temrinate unhealthy and replace
● ScheduledActions - Scheduled on/off
● Standby - use this instances ‘InService vs Standby’

ASG - Final Points


● Autoscaling Groups are free
● Only the resources ctreated are billed
● Use cool downs to avoid rapic scaling
● Think about more, smaller instances - granularity
● Think about with ALB’s for elaticity - abstraction
● ASG defines WHEN and WHERE, LT defines WHAT

ASG Scaling Policies

ASG - Scaling Policies


● ASG’s don’t need scaling policies - they can have none
● Manual - Min, Max & Desired - Testing & Urgent
● Simple Scaling
● Step Scaling
● Target Tracking
● Scaling Bsed on SQS - ApproximateNumberOfMessagesVisible
ASG - Simple Scaling

ASG - Step Scaling

ASG Lifecycle Hooks


ASG LifeCycle Hooks - Basics
● Custom Actions on instances during ASG actions
● .. instance launch or instance terminate transitions
● Instances are paused within the flow .. they wait
● .. until a timeout (then either CONTINUE or ABANDON)
● .. or you resume the ASG process CompleteLifecycleAction
● EventBridge or SNS Notification

https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-add-elb-healthcheck.html
https://docs.aws.amazon.com/autoscaling/ec2/userguide/AutoScalingGroupLifecycle.html
https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-enter-exit-standby.html

ASG Health-Checks EC2 vs ALB

ASG - Health Checks


● EC2(Default), ELB(can be enabled) & Custom
● EC2 - Stopping, Stopped, Terminatedm Shutting Down or Impaired(not 2/2 status) =
UNHEALTHY
● ELB - HEALTHY = Running & passing ELB health check
● ..can be more application aware (layer 7)
● Custom - Instances marked healthy & unhealthy by an external system
● Health check grace period (Default 300s) - Delay defore starting checks
● … allows system launch, bootstrapping and application start

Load Balancing Evolution

ELB - Evolution
● 3 types od Load Balancers (ELB) available within AWS
● Split between v1 (avoid/migrate) and v2 (prefer)
● Classic Loasd Balancer (CLB) - v1 - Introduced in 2009
○ Not really layer 7, lacking features, 1 SSL per CLB
● Application Load Balanvcer (ALB) - v2 - HTTP/HTTPS/WebSocket
● Network Load Balancer (NLB) - v2 - TCP, TSL & UDP
● V2 = faster, chaper, support target groups and rules

ELB Architecture

ELB - Architecture
ELB - Cross-Zone Load Balancer

ELB - About the Architecture


● ELB is a DNS A record pointing at 1+ Nodes per AZ
● Nodes (in one subnet per AZ) can scale
● Internet-facing means nodes have public IPv4 IPs
● Internal is private only IPs
● EC2 doesn’t need to be public to work with LB
● Listener Configuration control WHAT the LB does
● 8+ free IPs per subnet, and /27 subnet to allow scaling

ALB vs NLB

Load Balancer Consolidation


● CLB
○ Don`t scale
○ 1 SSL per CLB
○ Every unique HTTPS name requires individual CLB because SNI ins`t
supported
● ALB
○ 1 SSL per rule
○ v2 load balancer support rules and target groups
○ Host based rules using SNI and an ALB allows consolidation
Application Load Balancer (ALB)
● Layer 7 Load Balancer .. listen on HTTP and/or HTTPS
● No other Layer 7 protocols (SMTP, SSH, Gaming..)
● …and NO TCP/UDP/TLS Listeners
● L7 content type, cookies, custom headers, user location and app behaviour
● HTTP HTTPS(SSL/TLS) always terminated on the ALB - no unbroken SSL (security
teams!)
● … a new connection is made to the application
● ALBs MUST have SSL certs if HTTPS is used
● ALB are slower than NLB..more levels of the network stack to process
● Health checks evaluate application health … layer 7

Application Load Balancer(ALB) - Rules


● Rules direct connections which arrive at a listener
● Processed in priority order
● Default rule = catchall
● Rule Conditions: host-header, http-header, http-request-method, path-pattern,
query-string & source-ip
● Actions: forward, redirect, fixed-response, authenticate-oidc & authenticate-cognito
Network Load Balancer (NLB)
● Layer 4 load balancer … TCP, TLS, UDP, TCP_UDP
● No visibility or undertanding of HTTP or HTTPS
● No headers, no cookies, no session stickiness
● Really fast (millions of rps, 25% of ALB latency)
● .. SMTP. SSH, Game Servers, Financial apps (not https)
● Health checks JUST check ICMP / TCP handshake .. Not app aware
● NLB`s can have static IP`s - useful for whitelisting
● Forward TCP to instances … unbroken encryption
● Used with private link to provide services to other VPCs

ALB vs NLB
● NLB
○ Unbroken encryption
○ Static IP for whitelistening
○ The fastest perofrmance(millions rps)
○ Protocols not HTTP or HTTPS
○ PirvateLink
● Otherwise = ALB

Session State

Session State - Whats is


● Server-side piece of information
● Persists while you interact with that application
● Shopping Cart, Workflow Position or Login State
● Session State loss = User Experience (UX) Issues
● Session state atored on a server or externally
Session State - State matters

Session Stickiness
● Session stickiness is a feature of AWS ELB's which allows applications which store
session state internally on EC2 instances to function with load balancers
● Sessions are locked to specific back end instances using a cookie generated by the
load balancer.
Connection Stickiness

Connection Stickiness - Key Points


● STickiness locks a session to 1 backend instance
● Creates AWSALB .. which is held by the client
● Session move on expiry or instance failure
● Enable if an application does`t use external sessions
● Look for question keywords logout, lost carts, lost progress … These suggest lost
session state

ALB - Session Stickiness


Gateway Load Balancer(GWLB)
● Gateway Load Balancers enable you to deploy, scale, and manage virtual
appliances, such as firewalls, intrusion detection and prevention systems, and deep
packet inspection systems.
● It combines a transparent network gateway (that is, a single entry and exit point for
all traffic) and distributes traffic while scaling your virtual appliances with the demand.

GWLB - Basic

● Help you run and scale 3rd party appliances


● ..things like firewalls, intrusion detection and prevention systems
● Inbound and Outbound traffic (transparent inspection and protection)
● …GWLB endpoints..traffic enters/leaves via these endpoints
● …the GWLB balances across multiple backend appliances
● Traffic and metadata is tunelled using GENEVE protocol

GWLB - How it works


GWLB - Architecture

Connection Draining
● In order to provide a first-class use experience, you’d like to avoid
breaking open network connections while taking an instance out of
service, updating its software, or replacing it with a fresh instance that
contains updated software. Imagine each broken connection as a
half-drawn web page, an aborted file download, or a failed web service
call, each of which results in an unhappy user or customer.
● You can now avoid this situation by enabling the new Connection
Draining feature for your Classic Load Balancers or Deregistration delay
on ALB, NLB or GWLB

Connection Draining
● What happens when instances are unhealthy … or derestered
● Normally all connections are closed & no new connections ..
● COnnection drainning allows in-flught requests to complete
● CLASSIC LOAD BALANCER ONLY - defined on the CLB
● Timeout: Between 1 and 3600 seconds (default 300)
● InService: Instance deregistration currently in progress
● Auto SCaling waits for all connections to complete or Timeout

Derestritation Delay
● Supported on ALB, NLB and GWLB (subtle differences)
● Defined on the Target Group - NOT the LB
● Stops sending requests to deregistering targets
● Existing connections can continue
● .. until they complete naturally
● … or the deregistration delay is reached
● Default 300 seconds (0-3600 seconds)

X-Forwarded-For & Proxy Protocol


● X-Forwarded-For and Proxy protocol are two alternative versions of gaining visibility
of original client IP address when using proxy servers or load balancers
● X-Forwarded-For - HTTP | MDN
● Proxy Protocol - HAProxy Technologies

Why? X-Forwarded-For & Proxy Protocol

X-Forwarded
● A set of HTTP headers (if only workds with HTTP/S) (NO other protocols) (Layer 7)
● e.g. X-Forwarded-For: client
● The header is added or appended by proxies/LBs
● The client is left most in the list
● X-Forwarded-For: 1.3.3.7, proxy1, proxy2 …
● LB add ^^ header, containing Julies IP
● Backend web server needs to be aware of its header
● Connections from LB, but X-Forwarded-For contains original client
● Supported … CLB & ALB, NOT SUPPORTED NLB (because it’s layer 4)

Proxy Protocol
● Proxy Protocol works at Layer 4 ..
● .. additional layer 4 (tcp) header .. Works with a range of protocols (including HTTP
and HTTPS)
● ..Works with CLB (v1) and NLB (v2 - binary encoded)
● End to end encryption - e.g. unbroken HTTPS (TCP listener)
● … use PROXY Protocol, you can’t add a HTTP header, it isn’t decrypted
References
https://learn.cantrill.io/
https://portal.tutorialsdojo.com/courses/aws-certified-devops-engineer-professional-practice-
exams/
https://www.udemy.com/course/aws-certified-devops-engineer-professional-hands-on/
https://www.whizlabs.com/aws-devops-certification-training

You might also like