You are on page 1of 19

Storage Classes - What and Why

Usage of Data:
Data is of different types and it has different use cases.
 Some data may need to be accessed and used very frequently, say, in few days
or hours after storing. Eg - Fresco Play user details
 Some may not be accessed for a long time, say, for months, or few for many
years. Eg - Sales data that are > 1 year old
 Some are just stored as an archive. Eg - Database backups
 Similarly, there are different aspects to a storage service such as performance,
availability, durability, frequently or infrequently accessed data and long-term data
storage
Depending on the access frequency of the data, S3 provides six different storage
classes viz., Standard, Standard-IA, One Zone-IA, Intelligent-Tiering, Glacier and
Glacier Deep Archive. Thus, allowing users not to pay for the features they don't use.
User can choose the storage class depending on the kind of use case.

Course Introduction
For the last decade with the rise in usage of internet one trend in the field of computers
is loud and clear that is Cloud Computing.
Amazon Web Services (AWS) is one of the eminent Global cloud service providers.
We have seen all the services at a glance provided by AWS in AWS Essentials Course.
Now in this course, we will dive deeply into the storage service provided by AWS.which
is known as AWS Simple Storage Service(S3).
Lets get Started.

Data and its Management


Any organization or business now-a-days generates lots of data and finds heavy usage
in running the business. With digital disruption, every organization data is getting piled
up rapidly. So, how can these enormous amounts of data be handled. There are two
ways of storing them:

 Local Storage
 Cloud Storage

Creating and maintaining a local Storage involves a huge investment in the form of
hardware, software and human effort.
Often, business is provided with an alternative to store its data in Cloud Storage.
There are many third party cloud service providers that can be leveraged.

Bucket Policy
A bucket policy is a resource-based AWS Identity and Access Management (IAM) policy.
It is used to grant or restrict users to access the bucket having a policy, You need some
basic knowledge on policies to create a policy and know how it works.
The policy applied to bucket is applied to all the objects in it.
A Bucket policy is defined using a valid JSON which can be written on own if you have a
good knowledge on Policies or you could use the policy generator service for
generation of JSON.

Cloud Storage Providers


Cloud Storage is a method of storing data in various Data centres situated at
various geographical locations around the globe. They are owned and maintained by
hosting companies. Responsibility of maintaining and securing the data lies with the
hosting organizations.

Some of the cloud storage providers are:

 Amazon Web Services S3


 Azure cloud storage by Microsoft
 Google Storage
 IBM Cloud Storage
 TCS Cloud Storage

In this course, we will learn about Amazon Object Storage.

Cloud Storage vs Local Storage


Some of the main advantages of cloud storage over local storage are:
 Cost-effectiveness, availability, and maintenance, as cloud providers take
care of all maintainance
 Scalability, as upscaling or downscaling the storage is a few clicks away and
can be automated
 Security, as data remains safe with disaster-recovery mechanisms in place
 Accessibility, allowing users to access data instantly any where any time.
Amazon S3 Data consistency model
Amazon S3 follows client centric consistency model, this is emphasized on how
data is seen by the clients after it is created or updated or deleted.
 S3 provides read-after-write consistency for PUTS of new objects into an S3
bucket i.e., A newly inserted data item will be immediately visible to clients. with a
limitation that if you make a GET or HEAD request to an object before creating it,
S3 provides eventual consistency i.e., If no new updates are made to a
particular piece of data, eventually all reads to that item will return the last
updated value.
 S3 offers eventual consistency for overwrite PUTS and DELETES in all
regions.
 This read-after-write consistency allows you to build distributed systems with
less latency
 Amazon S3 provide eventual consistency in US standard region for all
requests.

Advantages of AWS S3
Amazon S3 is the leading cloud storage platform in the market for the following reasons:
 S3 storage's durability, availability and scalability are unmatched.
 Security is one major advantage of S3 supporting 3 different forms of encryption
that can be applied at object level.
 Amazon has good Content Distribution Network(CDN). With Amazon
CloudFront, S3 can be configured to cache the data across any number of
Amazon’s global data centres.
 S3 is very much useful in hosting static websites.
 Amazon S3 combined with Amazon's Quicksight UI forms a powerful Big Data
Tool.
 One of the major benefits of Amazon S3 is the ability to implement version
controlled backups of your data within S3 buckets.
 Competitive pricing is one major aspect giving edge to S3 as a cloud storage.
 AWS S3 can be accessed from your Amazon VPC using VPC endpoints. These
are easy to configure and provide reliable connectivity to S3 without requiring an
internet gateway instance.

Amazing Fact

AWS S3 offers 99.99999999999% durability famously


called as 11x9s for a year. ie., If you store 10,000
objects chances of losing an object is only 1 in every
10 million years.

AWS S3 Pricing
 A new customer can get a free usage tier for one year which includes
o 5 GB of Amazon S3 storage in the Standard Storage class
o 20,000 Get Requests
o 2,000 Put Requests
o 15 GB of data transfer out each month for one year.
 The Storage prices vary with region and it is different for different storage classes
with Standard storage having high price and Glacier with lowest per gb price.
 Pricing also depends on the data stored or requests made, transfers into and out
of bucket, and also for transfer acceleration, cross region replication.

You will see more on the storage classes in the upcoming topics.
To find the various pricing tiers, follow S3 Pricing.

S3 Storage Classes
S3 Standard:

Used for frequently accessed data offering high durability, availability and performance.

 S3 standard is used for various cases like cloud applications, dynamic websites, gaming,
content distribution, Big data analytics and so on.
 In this type of storage, data is resilient in the event of one entire availability zone
destruction.

S3 Standard - Infrequent Access:

This is used for infrequently accessed data but accessed rapidly when required.

 This also offers high durability, availability, and performance but with a low per GB
storage and retrieval fee making S3 Standard-IA ideal for long-term storage,
backups and as a data store for disaster recovery.
Amazon S3 One Zone-Infrequent Access:

Unlike S3 Standard-IA this S3 One Zone-Infrequent Access stores data in a


single availability zone.

 This is 20% cheaper than the S3 Standard-IA.


 Best suited for secondary backup copy storages or for data that is cross region
replicated.
 Data stored in this will be lost in the event of availability zone destruction.

S3 Glacier:

Amazon Glacier is a data archiving service which is highly durable,


extremely low cost, and secure, for varying retrieval needs.

 Amazon Glacier provides three options for access to archives, from a few minutes to
several hours.
 Data is resilient in the event of one entire Availability Zone destruction.

Setting Storage Classes .


 You can set the Storage class of an object during uploading it to the bucket and
as well as later after uploading. To storage classes can be set using the
management console,AWS CLI,and SDK's also using the Life Cycle
Policies.

Getting Started with AWS


To start working on AWS, you need to have AWS account and AWS CLI or
management console to work on services.
Setup AWS Account:

Creating an AWS account is a piece of cake, which can be done in just few simple steps
by providing:

 Valid email id and physical address


 Valid mobile number to confirm the account creation
 Payment information (debit or credit) for deducting ₹ 2.00 for the account creation
 Select a support plans if needed or use an early free subscription

AWS CLI
All AWS services can be accessed and managed through both Management
Console and AWS Command Line Interface(CLI).
 GUI provides more interactivity and user-friendly view.
 However, from performance of the tasks and automation of processes
perspective, Amazon CLI is best suited
 Thus, AWS CLI acts a unified tool to manage all services by simple download and
configuration of user keys.

S3 Buckets and Folders


In S3 there is no concept of filesystem or hierarchy all the objects uploaded
are same in the space of a bucket.

The concept of Folders in S3 is more logical than physical, when you create a folder in
s3 there will not exist a physical folder but a group of objects having same common first
name or prefix. Its same in the case if you upload a folder.

For example, if you upload a folder called images that contains two files, sample1.jpg
and sample2.jpg, Amazon S3 uploads the files and then assigns the corresponding key
names, images/sample1.jpg and images/sample2.jpg.The key names include the
folder name as a prefix.

AWS CLI commands for basic operations


Let's learn some basic CLI commands that are used to perform some important
operations:

1. To create a new bucket:

aws s3 mb s3://bucket-name

2. To delete a bucket:

aws s3 rb s3://bucket-name
or
aws s3 rb s3://bucket-name --force

3. To copy files from a device:

aws s3 cp filename s3://bucket-name


4. To copy files recursively:

aws s3 cp . s3://bucket-name --recursive

To know more cli commands check out S3 CLI Documentation.

Bucket Properties and permissions


You now know how to create a bucket and upload an object into it.now for ease of use
and better security you need to learn about the properties that can be applied to
buckets/objects.
Let's take a look at what are all those.
Versioning S3

Enabling/Disable Versioning using CLI


To Enable versioning you must be the owner of the bucket, not an IAM user accessing
the bucket:
To enable versioning:

aws s3api put-bucket-versioning --bucket my-bucket --versioning-configuration


Status=Enabled

To disable versioning:
To disable versioning you need replace Enabled with Suspended.
Server Access Logging
If you need to monitor the activities for Security and audit in a bucket you need know
about of an important feature of S3 known as logging, This logging is of two types:
Server Access Logging:
Server Access Logging provides detailed records for requests that are made to a
bucket, Enabling server access logging provides details about each and every detail
about the requester, bucket name, request time, request action, response status, and
an error code Stored in a target Bucket.
Object level Logging:
This records all API activities at the object level in a bucket, You do have the ability to
control what buckets, prefixes, and objects will be audited, and what types of actions to
audit to an AWS API auditing service called cloudTrail

Bucket and object Tags


In a project we have multiple teams working, and these teams have their resources in
the cloud. In this case, you can easily trace out what the particular team is using, and
costs are associated with.
Now, in a scenario where multiple teams are using a common resource, say storage.
Working on it, how would you calculate the usage and which team is spending how
much?
To fulfill this need, AWS has a functionality called Tagging.
Tags are a metadata of key-value pair or simply a label which are added to the
resources used by teams.
These tags differentiate which team is using how much of resource, and a report can be
generated using this, and necessary actions can be taken.
These are used in explaining costs and saving money.

Data Encryption in S3
 Data protection is of the paramount importance in S3.This data protection
refers to protection of data while in transit and while at rest, for this purpose S3
offers encryption of Data.
 You can protect data in transit by using SSL or by using client-side
encryption in which you upload the data which is encrypted and all the
encryption process, encryption keys and tools are managed by the user.
 AWS offers so called Server Side Encryption for S3. you request S3 to encrypt
the data while storing and decrypt while downloading.
 In S3 you can set default encryption for Buckets so that you don't need to set
encryption for each object in the bucket,you can also set up a bucket policy to
reject the objects with no encryption information.

Types of Server Side Encryptions offered by


S3
There are three ways of server side encryption depending on how you choose to
manage the encryption keys:
1. S3-Managed Encryption Keys (SSE-S3): This server side encryption uses
strong multi-factor encryption. Amazon S3 encrypts each object with a
unique key. As an additional safeguard, it encrypts the key itself with a master
key that it rotates regularly. This uses one of the strongest block ciphers 256-
bit Advanced Encryption Standard (AES-256), to encrypt your data.
2. AWS KMS Managed Keys(SSE-KMS): AWS Key Management Service (AWS
KMS) is a service that provides a secure key management system for the
cloud.KMS uses customer master keys (CMKs) to encrypt your Amazon S3
objects. for the first time you add an SSE-KMS–encrypted object to a bucket in a
region, a default CMK is created for you automatically. This key is used for SSE-
KMS encryption unless you select a CMK that you created separately using AWS
Key Management Service.
To know in detail of this service check out KMS Documentation.
3. Encryption with customer provided Encryption keys (SSE-C): In
this customer provides the encryption keys as a part of the
request. Amazon S3 manages both the encryption as it writes to disks and
decryption. When you upload an object, Amazon S3 uses the encryption key
you provide to apply AES-256 encryption to your data and removes the
encryption key from memory. When you retrieve an object, you must provide the
same encryption key as part of your request only then S3 decrypts the object.

Life Cycle Policy


S3 follows object based storage. ie., every file that is loaded in S3 is an object.
 The storage aspects or storage class requirement of the object (files) can change
over its life.
 Some files may be required for a month to be accessed frequently and are later
required to be preserved for, say, few more months but not accessed frequently.
Some files might need to be just archived, say for legal regulations for few years.
 In such cases, the files (object) life cycle can be configured to use appropriate
storage class.
 AWS S3 provides lifecycle policy to set rules defining actions that S3
applies to group. There are two types of actions that can be applied:
 Transition actions: Defined when objects are moved to a different storage class.
 Expiration actions: Defined when an object reaches the expiration state.

Create a Life Cycle Policy using AWS CLI


Amazon S3 Lifecycle configuration is an XML file, But when using CLI we use JSON
instead of XML.
consider an example policy:

<LifecycleConfiguration>
<Rule>
<ID>ExampleRule</ID>
<Filter>
<Prefix>documents/</Prefix>
</Filter>
<Status>Enabled</Status>
<Transition>
<Days>365</Days>
<StorageClass>GLACIER</StorageClass> must
</Transition>
<Expiration>
<Days>3650</Days>
</Expiration>
</Rule>
</LifecycleConfiguration>

The equivalent JSON file is:

{
"Rules": [
{
"Filter": {
"Prefix": "documents/"
},
"Status": "Enabled",
"Transitions": [
{
"Days": 365,
"StorageClass": "GLACIER"
}
],
"Expiration": {
"Days": 3650
},
"ID": "ExampleRule"
}
]
}

Don't forget to save json in path your working.

AWS CLI Life Cycle Policy Commands


1. To set life Cycle Policy to a Bucket:

$ aws s3api put-bucket-lifecycle-configuration


--bucket bucketname
--lifecycle-configuration file://lifecycle.json

2. To view the existing policy:

$ aws s3api get-bucket-lifecycle-configuration


--bucket bucketname

3. To delete the existing Policy:

aws s3api delete-bucket-lifecycle


--bucket bucketname
Access Control List (ACL)
Access Control List (ACL) are resource based policy options that can be used to
manage access to buckets and objects. When a bucket or object is created S3 creates
a default ACL that grants a resource owner full control over the bucket.
The ACL permissions are same for bucket and objects. S3 supports a set of predefined
grants, known as canned ACLs. Each canned ACL has a predefined set of grantees and
permission, They are:

 private
 public-read
 public-read-write
 authenticated-read
 log-delivery-write

Accessing ACLs through CLI


You can grant permissions through ACL by using Management console, AWS SDK for
JAVA and.Net, using REST API's.lets look an example through CLI using api's:
To grant full control to an AWS user and read permission to everyone :

aws s3api put-object-acl --bucket MyBucket --key file.txt --grant-full-


control emailaddress=user1@example.com,emailaddress=user2@example.com
--grant-read uri=http://acs.amazonaws.com/groups/global/AllUsers

using --grant-read, --grant-write and public-read,public-read-write we can restrict the


IAM users and public respectively
For more details follow AWS Documentation

Accessing Bucket Policies through CLI


To set up a bucket policy through CLI you need to have a good knowledge on writing
JSON policies,First you need to write the required policy in JSON format first then
execute following commands

1. To Put a Bucket Policy:

aws s3api put-bucket-policy --bucket MyBucket --policy file://policy.json

2. To delete existing policy:


aws s3api delete-bucket-policy --bucket my-bucket

3. To get details of existing bucket policy :

aws s3api get-bucket-policy --bucket my-bucket

Cross Origin Resource Sharing (CORS)


 Cross-origin resource sharing (CORS) is a mechanism that uses
additional HTTP headers to tell the browser to let a web application running at
one domain(origin) have permission to access selected resources from a server
at a different domain.
 Amazon S3 supports CORSmaking it able to build web applications that use
JavaScript and HTML 5 interact directly with resources in Amazon S3 without the
need for a proxy server.
 Cross-origin requests are made using the standard HTTP request methods. Most
servers will allow GET requests, meaning they will allow resources from external
origins (say, a web page) to read their assets.
 For example, say you are hosting a static website in a bucket
named TestWebsite , users load the website at endpoint http://TestWebsite.s3-
website-us-east-1.amazonaws.com.Now you want to use JavaScript on the
web pages that are stored in this bucket to be able to make
authenticated GET and PUT requests against the same bucket. A web browser
will block any JavaScript from allowing these requests but with CORS you can
configure your bucket to explicitly enable cross-origin requests
from TestWebsite.s3-website-us-east-1.amazonaws.com .

Configuring CORS on a Bucket in S3


A CORS configuration is an XML file with one or more rules that identify the
origins that you will allow to access your bucket, the operations (HTTP methods) that
will support for each origin, and other operation-specific information.
The following XML is a CORS configuration:

<CORSConfiguration>

<CORSRule>
<AllowedOrigin>http://www.example1.com</AllowedOrigin>
<AllowedMethod>PUT</AllowedMethod>
<AllowedMethod>POST</AllowedMethod>
<AllowedMethod>DELETE</AllowedMethod>
<AllowedHeader>*</AllowedHeader>
</CORSRule>

<CORSRule>
<AllowedOrigin>http://www.example2.com</AllowedOrigin>
<AllowedMethod>PUT</AllowedMethod>
<AllowedMethod>POST</AllowedMethod>
<AllowedMethod>DELETE</AllowedMethod>
<AllowedHeader>*</AllowedHeader>
</CORSRule>

<CORSRule>
<AllowedOrigin>*</AllowedOrigin>
<AllowedMethod>GET</AllowedMethod>
</CORSRule>

</CORSConfiguration>

The configuration has three rules :


 The first rule allows cross-origin PUT, POST, and DELETE requests from
the http://www.example1.com origin. The rule also allows all headers in a
preflight OPTIONS request through the Access-Control-Request-Headers
header. In response to preflight OPTIONS requests, Amazon S3 returns
requested headers.
 The second rule allows the same cross-origin requests as the first rule, but the
rule applies to another origin, http://www.example2.com.
 The third rule allows cross-origin GET requests from all origins. The * wildcard
character refers to all origins.

ACLs, Bucket Policy and CORS


ACLs and Bucket Policy are part of control tool box but they differ slighty and used
according to the use case.
 Bucket Policies provide control at bucket level. Whereas on the other hand ACLs
provide a fine grained access to both Buckets and objects in the Bucket.
 Both ACL and Bucket Policy can be used to provide access to other AWS
Accounts and also public access to Bucket.
 Bucket Policies are generally written in JSON format and ACLs are accessed
through GUI of portal.
 Both ACLs and Bucket Policies can be updated through SDK, Console and CLI.
Cross Origin Resource Sharing(CORS):
CORS is a feature that allows cross origin access to S3. Generally used when the
bucket is used for static web hosting that contains javascript.

Static Website Hosting on S3


A Static website can be hosted on S3 but S3 doesn't support server side scripting so
dynamic websites can't be hosted, There are other AWS services which host these
websites.
The website available is AWS region specific and its endpoint is in the following format

<bucket-name>.s3-website-<AWS-region>.amazonaws.com

If you want to host your own domain, not the S3 provided endpoint Amazon Route
53 helps you in hosting a website at its root domain.

Static Website Hosting on S3


A Static website can be hosted on S3 but S3 doesn't support server side scripting so
dynamic websites can't be hosted, There are other AWS services which host these
websites.
The website available is AWS region specific and its endpoint is in the following format

<bucket-name>.s3-website-<AWS-region>.amazonaws.com

If you want to host your own domain, not the S3 provided endpoint Amazon Route
53 helps you in hosting a website at its root domain.

Using CLI to host a website in a Bucket


It is easy to set up a Bucket for website hosting using CLI after creating a bucket using
the following command makes it configure for hosting a website:

aws s3 website s3://my-bucket/ --index-document index.html --error-document


error.html

All files in the bucket that appear on the static site must be configured to allow visitors to
open them.
Using AWS Lambda with S3
We know that AWS Lambda functions are event-driven, S3 can publish events to
Lambda and invoke lambda functions. This enables you to write lambda functions
that process S3 events.
In S3, you add bucket notification configuration that identifies the type of event that you
want S3 to publish and the Lambda function that you want to invoke.
S3 and Lambda integration are of two types:
Non-stream based (async) model :
In this S3 monitor's a bucket whenever an event occurs(object creation, deletion etc)it
invokes a Lambda function by passing event data as the parameter.
In a push model, you maintain event source mapping within Amazon S3 using the
bucket notification configuration in this you tell S3 to monitor event type which you want
to invoke Lambda.
Asynchronous Invocation:
In this, a lambda function is invocated using the event asynchronously.

Points to remember in AWS lambda


There are few crucial aspects important for using lambda functions,they are:
 There are few dependencies to be installed or imported into the functions
 AWS SDK for javascript in Node.js(similarly for other languages also).
 gm, Graphics magick for Node.js.
 Async utility module.
 Must create an IAM user with execution role make sure policy type is
AWSlambdaexecute.
 Policy ARN can be saved because it may be used in further steps.
 Test the lambda function before deploying.

Setting Notifications for an S3 bucket


To Track down or get notifications on important events like uploading or deletion of an
object in a bucket Event Notifications helps a lot,by simply enabling them
You can find event settings in Bucket properties,advanced settings section click
events you will find add notification after clicking you will be asked to enter event
name and what are all events you wish notifications for, set send dropdown to SNS
Topics and Save.

Advantages of using Cloud front


 Time-Saving - CloudFront has a simple web services interface and it gets started
within 30 mins
 Privacy - with CloudFront private feature we can restrict who can view the
content.
 Quick content delivery - It uses HTTP or HTTPS protocols to deliver content and
recognizes the origin servers that contain the original version of our content by
using URL rules that we configure.
 Less Expensive - We pay only for content delivered and requests.

Adding Custom Headers to CloudFront


Defining Custom headers are very much useful for troubleshooting, informational
purposes and to have some functionality server-side. CloudFront service also provides
a privilege to define our own custom headers during the distribution creation stage, Note
that this is origin custom headers.
In using only CloudFront and S3 there is a limitation CloudFront does not support
adding arbitrary HTTP headers to responses. It allows origin servers to pass through
headers, but S3 doesn’t allow adding custom headers to HTTP responses.
The Lambda Edge functionality adds Lambda functions into the request/response path
for CloudFront distributions at the CDN edge.

Data Migration to S3
So you finally decided to embrace the cloud for its advantages and you want to store all
your data into AWS S3.
For few gigabytes or more you can do it using the console, or AWS CLI with
multipart upload, or run a job using the AWS SDK's.
So what if your data is very high around petabytes, imagine the time taken to upload all
of it.
To overcome this hurdle AWS offers a service called AWS Snowball.
AWS Snowball is used to transfer hundreds of gigabytes of data from your on-
premises data centers to cloud using physical storage devices bypassing the internet.

AWS Snowball vs SnowballEdge


When you don't have a local workstation powerful enough to encrypt the data and Data
is greater than 80TB and a have local clustering,Then this SnowballEdge comes into
play.

SnowballEdge is a data transfer device with on board storage and compute power
which can undertake local processing and edge-computing workloads. The pricing
of SnowballEdge is higher than that of snowball and it is not economical for data less
than 30 petabytes.

To know in detail about the differences(hardware,use case,Tools) and advantages of


one over other check out Snowball Documentation

AWS Snowmobile

AWS Snowmobile is an exabyte-scale data transfer service used to move


very large amount of data to s3. You can transfer upto 100PB per snowmobile.

When AWS Snowmobile is on site AWS personnel will configure it for you so it can be accessed
as a network storage target. After your data is loaded, Snowmobile is driven back to AWS where
your data is imported into Amazon S3 or Amazon Glacier.

Snowmobile uses multiple layers of security to help protect your data including dedicated
security personnel, GPS tracking, alarm monitoring, 24/7 video surveillance, and an optional
escort security vehicle while in transit. All data is encrypted with 256-bit encryption keys

What Have We Learnt?


 We learned what cloud storage is and how flexible it is than a local storage.
 what is AWS S3 and its advantages over other cloud providers
 How to use AWS CLI.
 How to secure storage using IAM policies.
 How S3 is used by other AWS services inherently.
 How to host a static website using S3.
 Properties and features of S3.
Advantages of S3
The features which made S3 take an edge over other cloud storage providers are:

 Scalability on demand
 flexibility in integrating with other third party services.
 Best in class security offered
 99.99% availability backed by an actual SLA
 Awesome Content Storage and Distribution(CDN) using CloudFront.
 In built Big data and analytics support.
 Static website hosting.
 Free usage Tier and pricing.
 Multiple Storage Classes with respective pricing.

Quick Fact

AWS cloud revenue is larger than the revenue of next


four major competitors combined, with a market share
of 47.1% and S3 is the most used service among all
the services.

You might also like