You are on page 1of 66

Get the Most, from the Best!!

1
Get the Most, from the Best!!

Topics
 Amazon Elastic Block Store
(Amazon EBS)
 Instance Store
 Amazon Elastic File System
(Amazon EFS)
 Amazon Simple Storage
Service (Amazon S3)
 Amazon S3 Glacier
 AWS Snowball
Get the Most, from the Best!!

Amazon Elastic Amazon Elastic File Amazon FSx for


Block Store System (Amazon Windows File Server
(Amazon EBS) EFS)

Amazon S3 Amazon S3 AWS Storage


Glacier Gateway

Cloud storage is a critical component of cloud computing, holding the information


used by applications. Big data analytics, data warehouses, Internet of Things,
databases, backup and archive applications all rely on some form of data storage
architecture.

AWS offers a complete range of cloud storage services to support both application
and archival compliance requirements. Select from object, file, and block storage
services as well as cloud data migration options to start designing the foundation of
your cloud IT environment.

These are some of the storage services that we will be discussing in this module.
Get the Most, from the Best!!

AWS Transfer AWS AWS


for SFTP DataSync Snowball

These are the data transfer and migration services we will discuss in this module.
Get the Most, from the Best!!

 Storage, retrieval, and management of application data


 Storage of log files from transient Amazon EC2 instances for
later analysis
 Backup of Amazon EC2 instances and attached volumes
 Backup of on-premises data for disaster recovery

We will discuss these scenarios involving data management in the cloud.


• Backup of Amazon EC2 instances and attached volumes: Your Amazon EC2 instances
may contain critical business data, such as operations data stored in databases. Such
data needs to be backed up in case the Amazon EC2 instance becomes unstable or
you need to change the underlying instance (e.g., upgrading to a new instance type
that delivers better performance).
• Storage, retrieval, and management of application data: Applications such as web
services and image analysis applications must frequently interact with durable
storage.
• Storage of log files: While it is true that you need the ability to create and terminate
Amazon EC2 instances on demand, you don't want to lose the valuable log data that
you may have accumulated on such instances. AWS storage options provide durable
storage when mission-critical data must survive the lifetime of a specific instance.
• Backup: AWS storage solutions can be used to provide peace of mind and enhanced
durability for objects stored in an on-premises data center.
Get the Most, from the Best!!
Get the Most, from the Best!!

 Network-attached
 Persistent block storage
 Boot and data volumes supported Instance 1 Instance 2

VM Host

Amazon EBS Volume

Amazon Elastic Block Store (Amazon EBS) provides block-level storage volumes for
use with Amazon EC2 instances. Amazon EBS volumes are highly available and are
reliable storage volumes that can be attached to any running instance that is in the
same Availability Zone. Amazon EBS volumes that are attached to an Amazon EC2
instance are exposed as storage volumes that persist independently from the life of
the instance. With Amazon EBS, you pay only for what you use.
Get the Most, from the Best!!

 Solid State Drives (SSD)


◦ Provisioned IOPS SSD (io1) Volumes
◦ General Purpose SSD (gp2) Volumes
 Hard Disk Drives (HDD)
◦ Throughput Optimized HDD (st1) Volumes
◦ Cold HDD (sc1) Volumes

SSD-backed Volumes (IOPS-intensive)

Provisioned IOPS SSD (io1) Volumes


IO1 is backed by solid-state drives (SSDs) and is the highest performance EBS storage
option designed for critical, I/O intensive database and application workloads, as well
as throughput-intensive database and data warehouse workloads, such as HBase,
Vertica, and Cassandra. These volumes are ideal for both IOPS-intensive and
throughput-intensive workloads that require extremely low latency.

IO1 is the highest-performance SSD volume for mission-critical low-latency or high-


throughput workloads. To maximize the benefit of io1, we recommend using EBS-
optimized EC2 instances. When attached to EBS-optimized EC2 instances, io1 is
designed to achieve single-digit millisecond latencies and is designed to deliver the
provisioned performance 99.9% of the time. For more information about instance
types that can be launched as EBS-optimized instances, see Amazon EC2 Instance
Types. For more information about Amazon EBS performance guidelines, see
Increasing EBS Performance.

General Purpose SSD (gp2) Volumes


GP2 is the default EBS volume type for Amazon EC2 instances. These volumes are
backed by solid-state drives (SSDs) and are suitable for a broad range of transactional
workloads, including dev/test environments, low-latency interactive applications, and
boot volumes. I/O is included in the price of gp2, so you pay only for each GB of
storage you provision. GP2 is designed to deliver the provisioned performance 99% of
the time. If you need a greater number of IOPS than gp2 can provide, or if you have a
workload where low latency is critical or you need better performance consistency,
we recommend that you use io1. To maximize the performance of gp2, we
recommend using EBS-optimized EC2 instances.

HDD-backed Volumes (MB/s–intensive)

Throughput Optimized HDD (st1) Volumes


ST1 is backed by hard disk drives (HDDs) and is ideal for frequently accessed,
throughput intensive workloads with large datasets and large I/O sizes, such as
MapReduce, Kafka, log processing, data warehouse, and ETL workloads. These
volumes deliver performance in terms of throughput, measured in MB/s. ST1 is
designed to deliver the expected throughput performance 99% of the time and has
enough I/O credits to support a full-volume scan at the burst rate. To maximize the
performance of st1, we recommend using EBS-optimized EC2 instances.

Cold HDD (sc1) Volumes


SC1 is backed by hard disk drives (HDDs) and provides the lowest cost per GB of all
EBS volume types. It is ideal for less frequently accessed workloads with large, cold
datasets. Similar to st1, sc1 provides a burst model. SC1 is designed to deliver the
expected throughput performance 99% of the time and has enough I/O credits to
support a full-volume scan at the burst rate. To maximize the performance of sc1, we
recommend using EBS-optimized EC2 instances.

For more information, see: https://aws.amazon.com/ebs/details/ and


https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSVolumeTypes.html
Get the Most, from the Best!!

 Volumes are tied to Availability Zones


 Example:
Use a --volume-type of gp2 to designate a General Purpose
SSD

aws ec2 create-volume –-size 80 –-availability-zone us-east-1a


--volume-type gp2
Get the Most, from the Best!!

 Attach volumes to logical mount points in operating system:

aws ec2 attach-volume –-volume-id vol-12345678


--instance-id i-abc1234 –-device /dev/sdf

 Volumes restored from snapshots have “first-access” penalty


 Consider reading all blocks to eliminate penalty in production

New Amazon EBS volumes receive their maximum performance the moment that
they are available and do not require initialization. However, storage blocks on
volumes that were restored from snapshots must be initialized (pulled down from
Amazon S3 and written to the volume) before you can access the block. This
preliminary action takes time and can cause a significant increase in the latency of an
I/O operation the first time each block is accessed. Therefore, we recommend that
customers essentially ‘warm’ the volume in order to eliminate this penalty from
affecting production workloads, though this is not required and any increase in
latency may be minimal. If you’re restoring to an Amazon EBS volume from a
snapshot, the recommendation is to read all blocks from your volume.

For more information on the different mount points available in each operating
system, see http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-using-
volumes.html

For details on initializing Amazon EBS volumes, see


http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-initialize.html
Get the Most, from the Best!!

 Optimized configuration stack


 Provides additional, dedicated capacity for Amazon EBS I/O
 EBS–Optimization
◦ EBS–optimized instance types – Enabled
◦ Non-EBS–optimized instance types – Must be enabled
 Amazon EC2 Nitro System Based Instances
◦ AWS-built hardware and software components
◦ Enable high performance, high availability, and high security

An Amazon EBS–optimized instance uses an optimized configuration stack and


provides additional, dedicated capacity for Amazon EBS I/O. This optimization
provides the best performance for your EBS volumes by minimizing contention
between Amazon EBS I/O and other traffic from your instance.
EBS–optimized instances deliver dedicated bandwidth to Amazon EBS, with options
between 425 Mbps and 14,000 Mbps, depending on the instance type you use. When
attached to an EBS–optimized instance, General Purpose SSD (gp2) volumes are
designed to deliver within 10% of their baseline and burst performance 99% of the
time in a given year, and Provisioned IOPS SSD (io1) volumes are designed to deliver
within 10% of their provisioned performance 99.9% of the time in a given year. Both
Throughput Optimized HDD (st1) and Cold HDD (sc1) guarantee performance
consistency of 90% of burst throughput 99% of the time in a given year. Non-
compliant periods are approximately uniformly distributed, targeting 99% of expected
total throughput each hour.

Amazon EC2 C5/C5d and M5/M5d instances are built on the Nitro system, a
collection of AWS-built hardware and software components that enable high
performance, high availability, high security, and bare metal capabilities to eliminate
virtualization overhead. With the latest set of enhancements to the Nitro system, we
have increased the maximum EBS-optimized instance bandwidth to 14Gbps, up from
9Gbps and 10Gbps for C5/C5d and M5/M5d respectively. We have also increased the
maximum EBS-optimized instance IOPS for Nitro systems to 80,000 IOPS, up from
64,000 IOPS and 65,000 IOPS for C5/C5d and M5/M5d respectively. In addition we
have increased the EBS-optimized instance burst performance on the large, xlarge
and 2xlarge C5/C5d and M5/M5d instances to 3.5Gbps, up from 2.25 Gbps and 2.12
Gbps respectively. This performance increase enables you to speed up sections of
your workflows dependent on EBS-optimized instance performance. For storage
intensive workloads, you will have an opportunity to use smaller instance sizes and
still meet your EBS-optimized instance performance requirement, thereby saving
costs. With this performance increase, you will be able to handle unplanned spikes in
EBS-optimized instance demand without any impact to your application performance.

All new C5/C5d and M5/M5d instances starting today will be able to take advantage
of this performance increase at no additional cost. This performance increase is
available in all AWS regions where C5/C5d and M5/M5d are available. For details on
the performance increase for each instance size, see
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSOptimized.html.
Get the Most, from the Best!!

G
A H
B C D Snapshot 1 = ABC A B C
E F
Snapshot 2 = ABCDEF D E F
Snapshot

Snapshot 3 = GHCDEF G H

It is crucial to take snapshots of any data you may have in development, test, or
production on a regular basis. Amazon EBS provides the ability to take snapshots of
your volumes so that they can be restored in case the underlying hardware supporting
your volume fails or your volume is accidentally deleted.

Snapshots copy block-level data to Amazon S3 infrastructure. Objects are redundantly


stored and can sustain the concurrent loss of data in two facilities. The first snapshot is
a full snapshot of the state of the disk at the time the snapshot was taken. All
subsequent snapshots will capture only the deltas compared to the previous snapshot.

If you add data to your Amazon EBS volume after copying your first snapshot and then
create a second snapshot, the new blocks will be copied to Amazon S3, and the second
snapshot will consist of a recipe to restore the volume consisting of the blocks from the
first snapshot and all new blocks. Snapshots are based on deltas. Only the changes from
previous snapshots need to be copied to Amazon S3.

If you update blocks on your Amazon EBS volume and then create a third snapshot,
only the blocks that have changed will be copied to Amazon S3, and the new recipe to
restore will consist of the updated blocks and any remaining blocks still present on your
Amazon EBS volume.
Test Your Knowledge
Question 1: If you delete snapshot 1, what will happen to blocks A, B, and C in the
Amazon S3 bucket?
Answer: Nothing. Because snapshot 2 has a recipe to restore that contains blocks A,
B, and C. Those blocks will remain even though Snapshot 1 has been deleted.

Question 2: If you delete snapshot 2, what will happen to blocks A, B, and C in the
Amazon S3 bucket?
Answer: Because blocks A, and B are no longer required to restore any remaining
snapshots, they will be deleted from Amazon S3. Because block C is still needed for
snapshot 3, it will be retained.
Get the Most, from the Best!!

 Create a snapshot:

aws ec2 create-snapshot –-volume-id vol-1234567890abcdef0


--description "This is my root volume snapshot"

 Move your snapshot across AWS Regions:

aws ec2 copy-snapshot –-region us-east-1 --source-region us-west-2


–-source-snapshot-id snap-1234567890abcdef0
--description "This is my copied snapshot"

Question to Consider:
What command might you use to find the volume IDs associated with a given
Amazon EC2 instance?

Answer:
aws ec2 describe-instances will give you this information.
Get the Most, from the Best!!

 Snapshot command returns asynchronously


 Stop or quiesce the disk/database
◦ For consistency
◦ Critical for database servers, RAID configurations
◦ No subsequent writes are captured
◦ Use fs_freeze on Linux
 Remains in pending state until finished

As an alternative to quiescing the disk first, consider the following procedure.

First take a snapshot of the volume without quiescing. This step will take somewhat
longer than the next step, and probably will not contain all the data, as the volume
wasn't quiesced.

After the snapshot has been created, quiesce the file system, and then take another
snapshot, continuing with normal Amazon EBS usage.

Finally, delete the first snapshot.


Get the Most, from the Best!!

 Snapshot are stored in Amazon S3


 No direct Amazon S3 access; managed by AWS
 Find the snapshot ID, and restore to a new volume using
the
aws ec2 create-volume command

aws ec2 create-volume –-size 80 –-availability-zone us-east-1a


--volume-type gp2 –-snapshot-id snap-1234567890abcdef0

While snapshots are stored in Amazon S3, they are not directly accessible using the
Amazon S3 utilities. Instead, you must use the Amazon EBS tools to restore and
manage snapshots.
Get the Most, from the Best!!

 Uses Amazon Data Lifecycle Manager (Amazon


DLM)
 Automate the creation, retention, and deletion of
snapshots
 Use AWS Console
 Use command-line

You can use Amazon Data Lifecycle Manager (Amazon DLM) to automate the
creation, retention, and deletion of snapshots taken to back up your Amazon EBS
volumes. Automating snapshot management helps you to:
• Protect valuable data by enforcing a regular backup schedule.
• Retain backups as required by auditors or internal compliance.
• Reduce storage costs by deleting outdated backups.

Combined with the monitoring features of Amazon CloudWatch Events and AWS
CloudTrail, Amazon DLM provides a complete backup solution for EBS volumes at no
additional cost.

For more information, see


https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/snapshot-lifecycle.html.
Get the Most, from the Best!!
Get the Most, from the Best!!

 Definition
◦ Block-level storage on a shared-
disk subsystem
◦ Fast
◦ Reclaimed when instance stopped Instance 1 Instance 2
or terminated
 Use case
◦ Buffers
◦ Cache Ephemeral Ephemeral Ephemeral
Instances
0 1 2
◦ Scratch data Amazon Instance store
EBS
VM host computer in AWS

An instance store provides temporary block-level storage for your instance. This storage
is located on disks that are physically attached to the host computer. Instance store is
ideal for temporary storage of information that changes frequently, such as buffers,
caches, scratch data, and other temporary content, or for data that is replicated across
a fleet of instances, such as a load-balanced pool of web servers.

An instance store consists of one or more instance store volumes exposed as block
devices. The size of an instance store as well as the number of devices available varies
by instance type. While an instance store is dedicated to a particular instance, the disk
subsystem is shared among instances on a host computer.

For more information, see


https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/InstanceStorage.html.
Get the Most, from the Best!!

 Instance store available for some instance types


 Number, size, and type (HDD vs. SSD) differs by instance type
◦ NVMe on some instance types
 Mounting
◦ Auto-mounted by Windows EC2Launch
◦ Manual mounting on Linux

Some instance types use NVMe (Non-Volatile Memory express) or SATA-based solid
state drives (SSD) to deliver high random I/O performance. This is a good option
when you need storage with very low latency, but you don't need the data to persist
when the instance terminates or you can take advantage of fault-tolerant
architectures. For more information, see SSD Instance Store Volumes.

For more information, see


https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/add-instance-store-
volumes.html.
Get the Most, from the Best!!
Get the Most, from the Best!!

 Shared file storage


 Petabyte-scale file system
 Elastic capacity
 Supports Network File System version 4.0 and 4.1 protocols
 Compatible with all Linux-based AMIs for Amazon EC2

Amazon EFS is a fully-managed service that makes it easy to set up and scale file
storage in the AWS cloud. With a few clicks in the AWS Management Console, you
can create file systems that are accessible to Amazon EC2 instances via a file system
interface (using standard operating system file I/O APIs) and that support full file
system access semantics (such as strong consistency and file locking).

Amazon EFS file systems can automatically scale from gigabytes to petabytes of data
without needing to provision storage. Tens, hundreds, or even thousands of Amazon
EC2 instances can access an Amazon EFS file system at the same time, and Amazon
EFS provides consistent performance to each Amazon EC2 instance. Amazon EFS is
designed to be highly durable and highly available. With Amazon EFS, there is no
minimum fee or setup costs, and you pay only for the storage you use.

When mounted on Amazon EC2 instances, an Amazon EFS file system provides a
standard file system interface and file system access semantics, allowing you to
seamlessly integrate Amazon EFS with your existing applications and tools. Multiple
Amazon EC2 instances can access an Amazon EFS file system at the same time,
allowing Amazon EFS to provide a common data source for workloads and
applications running on more than one Amazon EC2 instance. You can mount your
Amazon EFS file systems on your on-premises datacenter servers when connected to
your Amazon VPC with AWS Direct Connect. You can then migrate data sets to
Amazon EFS, enable cloud bursting scenarios, or backup your on-premises data to
Amazon EFS.
Get the Most, from the Best!!

Availability Zone A Availability Zone B Availability Zone C

NFS clients
EC2 Instance
EC2 Instance EC2 Instance
NFS NFS Network
Network Network Clients Clients Interface
Interface Interface Subnet

Network
Mount Network Mount Network Mount
Interface
Target Interface Target Interface Target
Mount Target Mount Target Mount Target
Subnet Subnet
Subnet

Amazon EFS

You can access your Amazon EFS file system concurrently from Amazon EC2 instances
in your Amazon VPC, so that applications that scale beyond a single connection can
access a file system. Amazon EC2 instances running in multiple Availability Zones
within the same region can access the file system, so that many users can access and
share a common data source.

In this illustration, the VPC has three Availability Zones, and each has one mount
target created in it. We recommend that you access the file system from a mount
target within the same Availability Zone. Note that one of the Availability Zones has
two subnets. However, a mount target is created in only one of the subnets.
Get the Most, from the Best!!
Get the Most, from the Best!!

Home directories Line of business applications Web serving and


content management

Software development Media workflows Analytics


environments

Amazon FSx for Windows Use Cases


• Fully managed
• Easy to deploy
• Windows file servers that provide a native Windows file system

It is easy to use and offers a simple interface to create and configure file systems that
are highly durable and highly available and that can be accessed from multiple
compute instances with the industry standard Server Message Block (SMB) protocol.
Get the Most, from the Best!!

Built on Windows Server

Native-Windows Native SMB protocol NTFS


compatibility 2.0 to 3.1.1

Backed by high performance DFS Namespaces and


SSD storage DFS Replication

Amazon FSx for Windows File Server provides fully managed Windows file servers,
backed by a fully–native Windows file system with the features, performance, and
compatibility to easily lift and shift enterprise applications to AWS.

Amazon FSx supports a broad set of enterprise Windows workloads with fully
managed file storage built on Microsoft Windows Server. Amazon FSx has native
support for Windows file system features and for the industry-standard Server
Message Block (SMB) protocol to access file storage over a network. Amazon FSx is
optimized for enterprise applications in the AWS Cloud, with native Windows
compatibility, enterprise performance and features, and consistent sub-millisecond
latencies.

With file storage on Amazon FSx, the code, applications, and tools that Windows
developers and administrators use today can continue to work unchanged. Windows
applications and workloads ideal for FSx include business applications, home
directories, web serving, content management, data analytics, software build setups,
and media processing workloads.

As a fully managed service, FSx for Windows File Server eliminates the administrative
overhead of setting up and provisioning file servers and storage volumes.
Additionally, Amazon FSx keeps Windows software up to date, detects and addresses
hardware failures, and performs backups. It also provides rich integration with other
AWS services like AWS Directory Service for Microsoft Active Directory, Amazon
WorkSpaces, AWS Key Management Service, and AWS CloudTrail.
Get the Most, from the Best!!

2
Active Directory trust
1
AWS Directory Service
Microsoft Active Directory
6 7
AWS Direct ROBOCOPY
Connect
3
Amazon EC2
Windows 5 \\Target\Share
instance
File server VPN connection
4 \\Source\Share
corporate data center AWS cloud

Built on Microsoft Windows Server, Amazon FSx for Windows File Server allows you
to migrate your existing file datasets fully into your Amazon FSx file systems. You can
migrate not just the data for each file, but also all the relevant file metadata including
attributes, time stamps, access control lists (ACLs), owner information, and auditing
information. With this total migration support, Amazon FSx enables moving your
Windows-based workloads and applications relying on these file datasets to the AWS
Cloud.

1. Deploy an AWS Directory Service for Microsoft Active Directory (AWS Managed
Microsoft AD) directory in your Amazon VPC.
2. Establish a one-way forest-level trust relationship between the AWS Managed
Microsoft AD directory and your on-premises Microsoft AD directory, where the
AWS Managed Microsoft AD directory forest trusts your on-premises Microsoft
AD forest.
3. Create an Amazon FSx file system in the same Amazon VPC, joined to your AWS
Managed Microsoft AD directory.
4. Note the location (for example, \\Source\Share) of the file share (either on-
premises or in AWS) that contains the existing files you want to transfer over to
Amazon FSx.
5. Note the location (for example, \\Target\Share) of the file share on your Amazon
FSx file system to which you want to transfer over your existing files.
6. Create an Amazon EC2 instance with Windows Server AMI
7. From the Command Prompt or Windows PowerShell as Administrator (using
the Run as Administrator option from the context menu), execute the RoboCopy
command to copy the files from the source share to the target share.
Get the Most, from the Best!!
Get the Most, from the Best!!

 Fast, durable, highly available key-based access to objects


 Object storage built to store and retrieve data
 Not a file system

Amazon S3
Client

Amazon S3
Bucket
CLI sends GET request via S3 API 

 Object returned

The term “bucket” is a hint to the different approach that Amazon S3 takes with file
storage. Instead of a “folder,” you create a “bucket,” which is a large space that can
hold any number of objects.
Get the Most, from the Best!!

 Can be accessed anywhere


(both inside and outside of AWS)
 Highly durable (designed for 99.999999999% durability)
through redundant storage
 Highly available (designed for 99.99%)
 No limit to the amount of data stored
 Up to 5 TB per object
Get the Most, from the Best!!

Amazon S3 provides Block Public Access settings for buckets and accounts to help you
manage public access to Amazon S3 resources. By default, new buckets and objects
don't allow public access, but users can modify bucket policies or object permissions
to allow public access. Amazon S3 Block Public Access provides settings that override
these policies and permissions so that you can limit public access to these resources.
With Amazon S3 Block Public Access, account administrators and bucket owners can
easily set up centralized controls to limit public access to their Amazon S3 resources
that are enforced regardless of how the resources are created. Our goal is to make
clear that public access is to be used for web hosting! This feature is designed to be
easy to use, and can be accessed from the Amazon S3 Console, the CLI, the S3 APIs,
and from within CloudFormation templates.

Amazon S3 Block Public Access provides four settings:


Block new public ACLs and uploading public objects – This option disallows the use
of new public bucket or object ACLs, and is used to ensure that future PUT requests
that include them will fail. It does not affect existing buckets or objects. Use this
setting to protect against future attempts to use ACLs to make buckets or objects
public. If an application tries to upload an object with a public ACL or if an
administrator tries to apply a public access setting to the bucket, this setting will
block the public access setting for the bucket or the object.
Remove public access granted through public ACLs – This option tells S3 not to
evaluate any public ACL when authorizing a request, ensuring that no bucket or
object can be made public by using ACLs. This setting overrides any current or future
public access settings for current and future objects in the bucket. If an existing
application is currently uploading objects with public ACLs to the bucket, this setting
will override the setting on the object.

Block new public bucket policies – This option disallows the use of new public bucket
policies, and is used to ensure that future PUT requests that include them will fail.
Again, this does not affect existing buckets or objects. This setting ensures that a
bucket policy cannot be updated to grant public access.

Block public and cross-account access to buckets that have public policies – If this
option is set, access to buckets that are publicly accessible will be limited to the
bucket owner and to AWS services. This option can be used to protect buckets that
have public policies while you work to remove the policies; it serves to protect
information that is logged to a bucket by an AWS service from becoming publicly
accessible.
Get the Most, from the Best!!

 Store objects using “write once, read many” (WORM) model


 Enable at time of bucket creation
 Two ways to manage object retention
◦ Retention
◦ Legal hold
 Two retention modes
◦ Compliance
 Deletes not allowed, even for root user
 Ensures that an object version can't be overwritten or deleted
◦ Governance
 Enables privileged delete of WORM-protected objects
 Protects against account compromise and rogue actors

Amazon S3 Object Lock enables you to store objects using the write once, read many
(WORM) model. Using Amazon S3 Object Lock, you can prevent an object from being
deleted or overwritten for a fixed amount of time or indefinitely. AmazonS3 Object
Lock enables you to meet regulatory requirements that require WORM storage or
simply to add an additional layer of protection against object changes and deletion.
Amazon S3 Object Lock has been assessed by Cohasset Associates for use in
environments that are subject to SEC 17a-4, CTCC, and FINRA regulations.

Amazon S3 Object Lock provides two ways to manage object retention: retention
periods and legal holds.

A retention period specifies a fixed period of time during which an object remains
locked. During this period, your object will be WORM-protected and can't be
overwritten or deleted.

A legal hold provides the same protection as a retention period, but has no expiration
date. Instead, a legal hold remains in place until you explicitly remove it. Legal holds
are independent from retention periods: an object version can have both a retention
period and a legal hold, one but not the other, or neither.
Retention Modes
Amazon S3 Object Lock provides two retention modes: Governance and Compliance.
These retention modes apply different levels of protection to your objects. You can
apply either retention mode to any object version that is protected by Amazon S3
Object Lock.

In Compliance mode, a protected object version can't be overwritten or deleted by


any user, including the root user in your AWS account. Once an object is locked in
Compliance mode, its retention mode can't be changed and its retention period can't
be shortened. Compliance mode ensures that an object version can't be overwritten
or deleted for the duration of the retention period.

Governance mode, users can't overwrite or delete an object version or alter its lock
settings unless they have special permissions. Governance mode enables you to
protect objects against deletion by most users while still allowing you to grant some
users permission to alter the retention settings or delete the object if necessary. You
can also use Governance mode to test retention-period settings before creating a
Compliance-mode retention period. In order to override or remove Governance-
mode retention settings, a user must have
the s3:BypassGovernanceMode permission and must explicitly include x-amz-bypass-
governance-retention:true as a request header with any request that requires
overriding Governance mode.
Get the Most, from the Best!!

Configure buckets so notifications are Incoming Objects

issued when an object is:


 Added
 Deleted
 Overwritten
Amazon S3 Bucket

SQS Queue SNS Topic Lambda Function

Notifications can be issued directly to:


 Amazon SQS queues
 Amazon SNS topics
 AWS Lambda functions Message Message Invocation

Region

Event notifications for Amazon S3 allow bucket owners (or others, as permitted by an
IAM policy) to configure their buckets so that notifications are issued to Amazon Simple
Queue Service (Amazon SQS), Amazon Simple Notification Service (Amazon SNS), or
AWS Lambda when a new object is added to or deleted from the bucket, or an existing
object is overwritten.

When configuring these notifications, you can also use prefix and suffix filters to opt in
to event notifications based on object name. For example, you can choose to receive
DELETE notifications for the images/ prefix and the .png suffix in a particular bucket.

Here’s what you need to do in order to start using this feature with your application:
1. Create the queue, topic, or Lambda function (which we’ll call target for brevity) if
necessary.
2. Grant Amazon S3 permission to publish to the target or invoke the Lambda
function. For Amazon SNS or Amazon SQS, you do this by applying an appropriate
policy to the topic or the queue. For AWS Lambda, you must create and supply an
IAM role, then associate it with the Lambda function.
3. Arrange for your application to be invoked in response to activity on the target. As
you will see in a moment, you have several options here.
4. Set the bucket’s Notification Configuration to point to the target.

For more information, see


https://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html.
Get the Most, from the Best!!

 Can use through web browser, CLI, API


 Use s3:// URI syntax to address objects

Create a new bucket:


aws s3 mb s3://mybucket

List all buckets in the AWS account:


aws s3 ls
Get the Most, from the Best!!

 Copy a local file to Amazon S3:

aws s3 cp file.txt s3://mybucket/myprefix/file.txt


 Sync objects under a specified bucket and prefix to the current local
directory:

aws s3 sync s3://mybucket/myprefix/ .


 Remove an object:

aws s3 rm s3://mybucket/myprefix/file.txt
Get the Most, from the Best!!

 Provide direct access to the Amazon S3 APIs


 Enable operations not exposed by aws s3

aws s3api

create-multipart-upload
put-object-acl
put-bucket-policy
list-object-versions

While the aws s3 command is fine for simple use cases, it does not expose the full
feature set of Amazon S3. To gain access to advanced features, use the aws s3api
command instead. s3api supports manipulating the full range of data and metadata
associated with Amazon S3 buckets and objects, including bucket policies, access
control lists, multipart uploads, and versioning.

The API-level commands (contained in the s3api command set) provide direct access
to the Amazon S3 APIs and enable some operations not exposed in the high-level
commands. This section describes the API-level commands and provides a few
examples.

For more information, see https://docs.aws.amazon.com/cli/latest/userguide/cli-


services-s3.html.
Get the Most, from the Best!!

Protects from accidental overwrites and deletes with no performance penalty

Versioning Not Enabled/


Versioning Enabled
Versioning Suspended

Upload Object with Creates a new object with a different


Overwrites the original object and
the Same Key version ID. Both are retrievable by version
previous object is no longer retrievable.
ID.

Delete Adds a Delete marker, but the object is still Deletes the object and it is no longer
retrievable by version ID. retrievable.

Amazon S3 versioning protects from accidental overwrites and deletes with no


performance penalty.

Versioning:
• Generates a new version with every upload.
• Deletes are logical deletes only; objects are not actually removed from bucket.
• Allows easy retrieval of deleted objects or roll back to previous versions

There are three states of an Amazon S3 bucket:


1. Default no versioning:
• Upload an object to a bucket with the same key; it overwrites old version.
• Delete an object and it is permanently deleted.
2. Versioning enabled:
• Upload an object with the same key: Amazon S3 keeps the old object and creates
a new object with a new version ID.
• Delete an object: Amazon S3 logically deletes it and adds a marker, but the old
version is retrievable by its version ID.
3. Versioning suspended:
• Versions of objects are maintained, but the bucket temporarily behaves as it does
when bucket version is disabled.
• When enabled, it can't be disabled, only suspended.
When versioning has been enabled on a bucket, it cannot be disabled. However, it
can be suspended, in which case all versions created previously will remain.

Buckets with versioning suspended behave like buckets that never had versioning
enabled.
Get the Most, from the Best!!

 General Purpose
◦ Amazon S3 Standard
 S3 Intelligent-Tiering
 Infrequent Access
◦ Amazon S3 Standard-Infrequent Access (S3 Standard-IA)
◦ Amazon S3 One Zone-Infrequent Access (S3 One Zone-IA)
 Archive
◦ Amazon S3 Glacier

Amazon S3 offers a range of storage classes designed for different use cases. These
include S3 Standard for general-purpose storage of frequently accessed data; S3
Intelligent-Tiering for data with unknown or changing access patterns; S3 Standard-
Infrequent Access (S3 Standard-IA) and S3 One Zone-Infrequent Access (S3 One Zone-
IA) for long-lived, but less frequently accessed data; and Amazon S3 Glacier (S3
Glacier) for long-term archive and digital preservation. Amazon S3 also offers
capabilities to manage your data throughout its lifecycle. Once an S3 Lifecycle policy
is set, your data will automatically transfer to a different storage class without any
changes to your application.

For more information, see https://aws.amazon.com/s3/storage-classes/.

For more information on performance across the Amazon S3 storage classes, see
https://aws.amazon.com/s3/storage-
classes/#Performance_across_the_S3_Storage_Classes
Get the Most, from the Best!!

Automatically move data to the most cost-effective access tier

Automatically move data Two access tiers: Encrypt data in transit Use in lifecycle policies
to the most cost- • Frequent access and at rest
effective access tier • Infrequent access

Amazon S3 Intelligent-Tiering (S3 Intelligent-Tiering)


The S3 Intelligent-Tiering storage class is designed to optimize costs by automatically
moving data to the most cost-effective access tier, without performance impact or
operational overhead. It works by storing objects in two access tiers: one tier that is
optimized for frequent access and another lower-cost tier that is optimized for
infrequent access.

For a small monthly monitoring and automation fee per object, Amazon S3 monitors
access patterns of the objects in S3 Intelligent-Tiering, and moves the ones that have
not been accessed for 30 consecutive days to the infrequent access tier. If an object
in the infrequent access tier is accessed, it is automatically moved back to the
frequent access tier.

There are no retrieval fees when using the S3 Intelligent-Tiering storage class, and no
additional tiering fees when objects are moved between access tiers. It is the ideal
storage class for long-lived data with access patterns that are unknown or
unpredictable.

S3 Storage Classes can be configured at the object level and a single bucket can
contain objects stored in S3 Standard, S3 Intelligent-Tiering, S3 Standard-IA, and S3
One Zone-IA. You can upload objects directly to S3 Intelligent-Tiering, or use S3
Lifecycle policies to transfer objects from S3 Standard and S3 Standard-IA to S3
Intelligent-Tiering. You can also archive objects from S3 Intelligent-Tiering to S3
Glacier.

Key Features:
• Same low latency and high throughput performance of S3 Standard
• Small monthly monitoring and auto-tiering fee
• Automatically moves objects between two access tiers based on changing access
patterns
• Designed for 99.999999999% of durability of objects across multiple Availability
Zones
• Resilient against events that impact an entire Availability Zone
• Designed for 99.9% availability over a given year
• Backed with the Amazon S3 Service Level Agreement for availability
• Supports SSL for data in transit and encryption of data at rest
• S3 Lifecycle management for automatic migration of objects to other Amazon S3
storage classes
Get the Most, from the Best!!

Use the AWS CLI to set the storage class on a specific object:

aws s3 cp file.txt s3://mybucket/myprefix/file.txt


--storage-class STANDARD_IA

aws s3 cp file.txt s3://mybucket/myprefix/file.txt


--storage-class ONEZONE_IA

In order to achieve greater cost savings, objects in Amazon S3 You can be saved using
Reduced Redundancy Storage.

All objects in Amazon S3 have what is called a storage class. The default storage class
is Standard, which replicates data to achieve durability.

However, if you have data that is already backed up at your own facilities and is easily
recreated, or you are storing data that is derived from other durably stored data, you
can consider using the One Zone-Infrequent Access storage class for storing your
objects.

For more information on using High-Level S3 Commands with the AWS Command
Line Interface, see http://docs.aws.amazon.com/cli/latest/userguide/using-s3-
commands.html.

Fro more information on Amazon S3 storage classes, see


https://aws.amazon.com/s3/storage-classes/
Get the Most, from the Best!!
Get the Most, from the Best!!

 Extremely low-cost data


archiving and long-term
backup On-premises
 Retrieval by request EC2 instances
server Amazon S3

 Can configure lifecycle


archiving of Amazon S3 Archive
Use
content to Amazon S3 Glacier via API
or… lifecycle
rules

Amazon S3
Glacier

Amazon S3 Glacier is a secure, durable, and extremely low-cost cloud storage service
for data archiving and long-term backup. It is designed to deliver 99.999999999%
durability, and provides comprehensive security and compliance capabilities that can
help meet even the most stringent regulatory requirements. Amazon S3 Glacier
provides query-in-place functionality, allowing you to run powerful analytics directly on
your archive data at rest. To keep costs low yet suitable for varying retrieval needs,
Amazon S3 Glacier provides three options for access to archives, from a few minutes to
several hours.

For more information, see https://aws.amazon.com/glacier/.


Get the Most, from the Best!!

◦ Archives: A basic unit of data in Amazon Glacier


(document, video, image, etc.)
◦ Vaults: A collection of archives

API application https://glacier.us-east-1.amazonaws.com $VAULT_URL/archives/


/111122223333/vaults/examplevault <archive_id>

An individual object archived into Amazon S3 Glacier—a document, video, or any


other type of file—is referred to as an archive. Each archive has a unique ID assigned
to it by AWS.

Archives are stored in vaults. A vault is addressed by its unique name assigned to it by
its creator. A given AWS account may create up to 1,000 vaults in Amazon S3 Glacier.

Archives in Amazon S3 Glacier are referenced by a URL that points to the Amazon S3
Glacier service, and consists of the following components:
• The account ID of your AWS account
• The name of the vault
• The ID of the individual archive

For more information about Amazon S3 Glacier, see


http://docs.aws.amazon.com/amazonglacier/latest/dev/introduction.html.
Get the Most, from the Best!!

 Amazon S3 lifecycle policies


◦ Accessed via the Amazon S3 API
 Amazon S3 Glacier API
◦ Directly add files
◦ To retrieve files:
 Initiate a job request
 Restore an archive
 Restore by date range, or perform a byte range
retrieval

Data that is stored in Amazon S3 Glacier by Amazon S3 cannot be directly retrieved


using the Amazon S3 Glacier API; this is data that Amazon S3 manages on your behalf,
and it will not show up as a vault or a set of archives if you use the Amazon S3 Glacier
API with your account credentials. Amazon S3 archived files can be restored using the
Amazon S3 Management Console interface, or using the Amazon S3 API.

Files added directly to Amazon S3 Glacier using the Amazon S3 Glacier API are retrieved
using the Amazon S3 Glacier API. Retrieval requests can be one of the following:
• Direct retrieval of a single archive object by archive ID.
• Filter by archive creation date.
• Ranged archive retrieval; retrieve only a specific range of bytes from a specific
archive.

Users can poll for job completion using the DescribeJob API function. Amazon S3
Glacier completion notifications can also be sent using Amazon SNS. When an Amazon
S3 Glacier job has finished executing, the user may request a download of their
“thawed” data.
Use data retrieval policies to set data retrieval limits and simplify data retrieval cost
management. Amazon S3 Glacier provides a free retrieval tier of 5% of the monthly
storage (pro-rated daily) and charges for retrievals that exceed the free tier based on
how quickly data is retrieved. With data retrieval policies, you can limit retrievals to
"Free Tier Only," or specify a "Max Retrieval Rate" to limit retrieval speed and
establish a retrieval cost ceiling. In both cases, Amazon S3 Glacier will not accept
retrieval requests that would exceed the predefined retrieval limits.

Amazon S3 Glacier supports audit logging with AWS CloudTrail, which records
Amazon S3 Glacier API calls for the customer account and delivers these log files to a
specified location. These log files provide visibility into actions performed on their
Amazon S3 Glacier assets. For instance, use audit logging to determine which users
have accessed a vault over the last month, or identify who deleted a particular
archive and when. Additionally, audit logging can help customers implement
compliance and governance objectives for their cloud-based archival system.
Get the Most, from the Best!!

 Poll for job completion, or receive SNS notification


 Archive retrieval options:
◦ Expedited
1-5 minutes
◦ Standard
3-5 hours
◦ Bulk
5-12 hours

Archive Retrieval Options


Expedited — Expedited retrievals allow you to quickly access your data when
occasional urgent requests for a subset of archives are required. For all but the largest
archives (250 MB+), data accessed using Expedited retrievals are typically made
available within 1–5 minutes. There are two types of Expedited retrievals: On-
Demand and Provisioned. On-Demand requests are similar to EC2 On-Demand
instances and are available most of the time. Provisioned requests are guaranteed to
be available when you need them. For more information, see Provisioned Capacity.

Standard — Standard retrievals allow you to access any of your archives within
several hours. Standard retrievals typically complete within 3–5 hours. This is the
default option for retrieval requests that do not specify the retrieval option.

Bulk — Bulk retrievals are Amazon S3 Glacier’s lowest-cost retrieval option, which
you can use to retrieve large amounts, even petabytes, of data inexpensively in a day.
Bulk retrievals typically complete within 5–12 hours.

For more information, see


https://docs.aws.amazon.com/amazonglacier/latest/dev/downloading-an-archive-
two-steps.html.
Get the Most, from the Best!!
Get the Most, from the Best!!

Customer premises

NFS and/or SMB


HTTPS

Server File gateway


Amazon S3 Amazon
Glacier

AWS Storage Gateway is a hybrid storage service that enables your on-premises
applications to seamlessly use AWS cloud storage. You can use the service for backup
and archiving, disaster recovery, cloud data processing, storage tiering, and migration.
Your applications connect to the service through a virtual machine or hardware
gateway appliance using standard storage protocols, such as NFS, SMB and iSCSI. The
gateway connects to AWS storage services, such as Amazon S3, Amazon Glacier, and
Amazon EBS, providing storage for files, volumes, and virtual tapes in AWS. The
service includes a highly-optimized data transfer mechanism, with bandwidth
management, automated network resilience, and efficient data transfer, along with a
local cache for low-latency on-premises access to your most active data.

First let’s go over file interface.


This interface allows you to store and retrieve objects in Amazon S3 using industry-
standard file protocols. Once your files are transferred to Amazon S3, they’re stored
as objects and they’re accessed through a Network File System mount point. Here,
they can be managed as native Amazon S3 objects, and things like versioning,
lifecycle management, and cross-region replication apply directly to them.

46
Get the Most, from the Best!!

Customer premises

iSCSI
HTTPS

Server Volume gateway


Amazon S3 Amazon EBS
snapshots

The next gateway we’ll go over at is the Volume Interface.


The volume interface presents your applications with disk volumes using industry
protocol. Data on these volumes can be simultaneously backed up as point-in-time
snapshots, and stored in the cloud as Amazon EBS snapshots. Snapshots are
incremental backups that only capture changed blocks. All snapshot storage is
compressed to minimize the amount you’re charged, and you also get to set the
schedule for when snapshots occur.

When connecting with this block interface, you can run the gateway in two different
modes: cached and stored. In cached mode, your primary data is stored in Amazon S3
and you retain your frequently accessed data locally. This results in a substantial cost
savings for primary storage because it minimizes the need to scale your storage on-
premises while retaining low-latency access to your frequently accessed data.

In stored mode, you store your entire data set locally while performing asynchronous
backups of this data in Amazon S3. This provides durable and inexpensive offsite
backups that you can recover locally or from Amazon EC2.

47
Get the Most, from the Best!!

Customer premises

iSCSI
HTTPS

Backup Tape gateway


Virtual tapes Archived tapes
server
stored in stored in
Amazon S3 Amazon S3 Glacier

The third interface is the tape interface.

This interface presents the Storage Gateway to your existing backup application as a
virtual tape library, which consists of a virtual media changer and virtual tape drives.
You can continue to use your existing backup applications while writing to an almost
limitless collection of virtual tapes. Each virtual tape is stored in Amazon S3, and
when you no longer require access to data on virtual tapes, your backup application
can archive it from the virtual tape library into Amazon S3 Glacier. This further
reduces your storage costs.

48
Get the Most, from the Best!!
Get the Most, from the Best!!

Sending data to Amazon S3 can now be done with a managed, highly available SFTP
endpoint.

Secure transfer using SFTP Retain existing workflows Simple to use Data stored in Amazon
into and out of AWS S3 bucket

AWS Transfer for SFTP is a fully managed service that enables the transfer of files
directly into and out of Amazon S3 using the Secure File Transfer Protocol (SFTP)—also
known as Secure Shell (SSH) File Transfer Protocol. AWS helps you seamlessly migrate
your file transfer workflows to AWS Transfer for SFTP—by integrating with existing
authentication systems, and providing DNS routing with Amazon Route 53—so nothing
changes for your customers and partners, or their applications. With your data in
Amazon S3, you can use it with AWS services for processing, analytics, machine
learning, and archiving. Getting started with AWS Transfer for SFTP (AWS SFTP) is easy;
there is no infrastructure to buy and setup.
Get the Most, from the Best!!
Get the Most, from the Best!!

Efficient and fast Sync between on-premises Managed service


and AWS

AWS DataSync

Connect via internet or AWS DataSync Agent


AWS Direct Connect (NFS protocol)

AWS DataSync is a data transfer service that simplifies, automates, and accelerates
moving and replicating data between on-premises storage systems and AWS storage
services over the internet or AWS Direct Connect. As a fully managed service,
DataSync removes the need to modify applications, develop scripts, or manage
infrastructure. DataSync automatically handles many of the tasks related to data
transfers that can slow down migrations or burden your IT operations, including
running your own instances, handling encryption, managing scripts, network
optimization, and data integrity validation.

You can use DataSync to transfer data at speeds up to 10 times faster than open-
source tools. DataSync uses an on-premises software agent to connect to your
existing storage or file systems using the Network File System (NFS) protocol, so you
don’t have write scripts or modify your applications to work with AWS APIs.
You can use DataSync to copy data over AWS Direct Connect or internet links to AWS.
The service enables one-time data migrations, recurring data processing workflows,
and automated replication for data protection and recovery.

Getting started with DataSync is easy:


• Deploy the DataSync agent on premises
• Connect it to a file system or storage array
• Select Amazon EFS or Amazon S3 as your AWS storage, and start moving data
Get the Most, from the Best!!

NFS TLS
Amazon S3
bucket

Shared file AWS DataSync AWS DataSync


system Agent

corporate data center Amazon EFS file


system
Region

The diagram shows a high-level view of the DataSync architecture for transferring files
between on-premises storage and AWS storage services.
Get the Most, from the Best!!
Get the Most, from the Best!!

Secure Enclosure 50/80TB Simplified Logistics Strong Encryption,


Shipped in Parallel End-to-End

Large Customer End-to-end Customer Dataset


Dataset Custody Loaded

AWS Snowball is a solution for customers with large-scale data transfer challenges. It is
a petabyte-scale data transport solution that uses secure appliances to transfer large
amounts of data into and out of the AWS cloud. Using Snowball addresses common
challenges with large-scale data transfers including high network costs, long transfer
times, and security concerns. Transfer data with Snowball is simple, fast, secure, and
can be as little as one-fifth the cost of high-speed internet.

With Snowball, you don’t need to write any code or purchase any hardware to transfer
your data. Simply create a job in the AWS Management Console and a Snowball
appliance will be automatically shipped to you. Once it arrives, attach the appliance to
your local network, download and run the Snowball client to establish a connection,
and then use the client to select the file directories that you want to transfer to the
appliance. The client will then encrypt and transfer the files to the appliance at high
speed. Once the transfer is complete and the appliance is ready to be returned, the E
Ink shipping label will automatically update and you can track the job status via Amazon
Simple Notification Service (SNS), text messages, or directly in the console.

Snowball uses multiple layers of security designed to protect your data including
tamper-resistant enclosures, 256-bit encryption, and an industry-standard Trusted
Platform Module (TPM) designed to ensure both security and full chain-of-custody of
your data.
Get the Most, from the Best!!

 Explain Amazon EBS volumes and snapshots


 Explain the benefits of Instance store
 Explain the role of Amazon Elastic File System
(Amazon EFS)
 Know the characteristics and purpose of Amazon S3
 Know briefly:
◦ Amazon Glacier
◦ Amazon Snowball

You might also like