Database

Database & Containers
Documented by- Kisho

What is a relational database ?
Relational databases are what most of us are all used to. They have been around since the 70’s. Think of a
traditional spreadsheet ;
Database
Tables
Row
Fields ( Columns)
ID First Name Surname Gender

1024 Kishor krishna M
1025 Ramakrishna Rout M
1026 Ravish raj M
Documented by- Kishor Krishna

Relational Database Types
• SQL Server
• Oracle
• MySQL Server
• PostgreSQL
• Aurora.. Amazon product
• MariaDB
• DB2… IBM
• Informatica…. No support
• Mainframe

Non Relational Databases
• Database (Dynamo DB)
• Collection = Table
• Document = Row
• Key Value Pairs = Fields

SQL Vs NoSQL DB
ACID
No –SQL… column
based.
Primary key, foreign
key

What is Data Warehousing ?
• Used for business intelligence. Tools like Cognos, Jaspersoft, SQL
Server Reporting Services, Oracle Hyperion, SAP NetWeaver.
• Used to pull in very large and complex data sets. Usually used by
management to do queries on data (such as current performance vs
targets etc )

OLTP vs OLAP
• Online Transaction Processing ( OLTP) differs from OLAP Online
Analytics Processing (OLAP) in terms of the types of queries run.
OLTP Example :
Order number 2120121
Pulls up a row of data such as Name, Date, Address to Deliver to ,
Delivery Status etc.

OLAP
OLAP transaction Example :
Net Profit for EMEA and Pacific for the Digital Radio Product.
Pulls in large numbers of records.
Sum of Radios Sold in EMEA
Sum of Radios Sold in Pacific
Unit of Radio in each region
Sales price of each radio
Sales price – unit cost.
Data Warehousing databases use different type of architecture both from a

database perspective and infrastructure layer.
What is Elasticache ?
• ElastiCache is a web service that makes it easy to deploy, operate and
scale an in-memory cache in the cloud. The service improves the
performance of web applications by allowing you to retrieve
information from fast, managed, in-memory caches, instead of relying
entirely on slower disk-based databases.
• ElastiCache supports two open-source in-memory caching engines :
• Memcached
• Redis

What is DMS ?
Announced at re:Invent 2015, DMS stands for Database Migration Servcie.
Allows you to migrate your production database to AWS. Once the migration
has started, AWS manages all the complexities of the migration process like
data type transformation, compression, and parallel transfer (for faster data
transfer) while ensuring the data changes to the source database that occur
during the migration process are automatically replicated to the target.
AWS schema conversion tool automatically converts the source database

schema and a majority of the custom code, including views, stored
procedures and functions, to a format compatible with the target database.

AWS database Types- Summary
• RDS – OLTP
• MS SQL
• MySQL
• PostgreSQL
• Oracle
• Aurora
• MariaDB
• DynamoDB – No SQL
• RedShift – OLAP
• Elasticache – In Memory Caching.
• Memcached
• Redis
• DMS

RDS- Back Ups, Multi –AZ & Read Replicas… DB performance

Automated Backups
There are two different types of Backups for AWS. Automated Backups
and Database Snapshots.
Automated Backups allow you to recover your database to any point in
time within a “retention period”. Retention period can be between one
and 35 days. Automated Backups will take a full daily snapshot and will
also store transaction logs throughout the day. When you do a
recovery, AWS will first choose the most recent daily back up, and then
apply transaction logs relevant to that day. This allows you to do a point
in time recovery down to a second, within the retention period.

Automated Backups
Automated Backups are enabled by default. The backup data is stored
in S3 and you get free storage space equal to the size of your database.
So if you have an RDS instance of 10Gb you will get 10Gb worth of
Storage.
Backups are taken within a defined window. During the backup
window, storage I/O may be suspended while your data is being backup
up and you may experience elevated latency.

Snapshots
• DB Snapshots are done manually ( ie. They are user initiated). They
are stored even after you delete the original RDS instance, unlike
automated backups.
Restoring Backups :
When ever you restore either an Automatic Backup or a manual
snapshot, the restored version of the database will be a new RDS
instance with a new DNS end point.

Encryption
Encryption at rest is supported for MySQL, Oracle, SQL Server,
PostgreSQL & MariaDB. Encryption is done using the AWS key
management service (KMS) service. Once your RDS instance is
encrypted the data stored at rest in the underlying storage is
encrypted, as are its automated backups, read replicas and snapshots.
At present time, encrypting an existing DB instance is not supported. To

use Amazon RDS encryption for an existing database, create a new DB
instance with encryption enabled and migrate your data into it.

What is a Multi-AZ ? DR

What is Multi-AZ RDS ?
Multi-AZ allows you to have an exact copy of your production database
in another Availability Zone. AWS handles the replication (near real
time) for you, so when your production database is written to, this
write will automatically be synchronized to the stand by database.
In the event of planned database maintenance, DB instance failure, or

an Availability Zone failure, Amazon RDS will automatically failover to
the standby so that database operations can resume quickly without
administrative intervention.

• Multi-AZ is for disaster Recovery only. It is not primarily used for
improving performance. For performance improvement you need
read replicas.

• SQL Server
• Oracle
• MySQL Server
• PostgreSQL
• MariaDB
• Aurora…… (by default you have Multi AZ)

What is Read Replica ?- DB performance

What is Read Replica ?
Read replica’s allow you to have a read only copy of your production
database. This is achieved by using Asynchronous replication from the
primary RDS instance to the read replica. You use read replica’s
primarily for very read-heavy database workloads.
Read Replica Databases Support :

• MySQL Server
• PostgreSQL
• MariaDB

Read Replica Databases
• Used for Scaling, Not for DR !!
• Must have automatic backups turned on in order to deploy a read replica.
• You can have up to 5 replicas copies of any databases.
• You can have read replicas of read replicas( but watchout for latency)
• Each read replica will have its own DNS end point.
• You can create read replica’s of Multi-AZ source databases however
• Read replicas can be promoted to be their own databases. This breaks the
replication.
• Read replica in a second region for MySQL and mariaDB. Not for
PostgreSQL.

DynomoDB vs RDS
• DynamoDB offers “push button” scaling, meaning that you can scale
your database on the fly, without any down time.
• RDS is not so easy and you usually have to use a bigger instance size
or to add a read replica.

• DynamoDB

What is DynamoDB ?
Amazon DynamoDB ia a fast and flexible NoSQL database service for all
applications that need consistent, single-digit millisecond latency at any
scale. It is a fully managed database and supports both document and
key-value data models. Its flexible data model and reliable
performance make it a great fit for mobile, web, gaming, ad-tech, IoT,
and many other applications.

DyanomoDB
• Stored on SSD storage.
• Spread across 3 geographically distinct data centers.
• Eventual consistent reads (default)
• Strongly Consistent Reads
• Eventual consistent reads

• Consistency across all copies of data is usually reached within a second. Repeating a
read after a short time should return the updated data. ( Best read performance)
• Strongly Consistent Reads : A strongly consistent read returns a result that reflects all
writes that received a successful response prior to the read.

DyanomoDB: The Basics
• Tables
• Items ( Think a row of data in table)
Student Table
• Attributes ( Think of a column of data in a table)
Attributes
Item 1
Item2

DynamoDB Pricing
Provisioned throughput capacity.
• Write throughput $ 0.0065 per hour for every 10 units … 12 write
• Read throughput $0.0065 per hour for every 50 units.. 120 …. 3 read unit
First 25 GB stored per month is free.. SSD

Storage costs of $ 0.25 Gb per Month there after.
For more details please visit below URL

https://aws.amazon.com/dynamodb/pricing/

DynamoDB Lab follows….

Redshift Dataware (OLAP)
• Amazon Redshift is a fast, fully managed data warehouse that makes it simple
and cost-effective to analyze all your data using standard SQL and your existing
Business Intelligence (BI) tools. It allows you to run complex analytic queries
against petabytes of structured data, using sophisticated query optimization,
columnar storage on high-performance local disks, and massively parallel query
execution. Most results come back in seconds. With Amazon Redshift, you can
start small for just $0.25 per hour with no commitments and scale out to
petabytes of data for $1,000 per terabyte per year, less than a tenth the cost of
traditional solutions.
• Amazon Redshift also includes Redshift Spectrum, allowing you to directly run
SQL queries against exabytes of unstructured data in Amazon S3. No loading or
transformation is required, and you can use open data formats, including CSV,
TSV, Parquet, Sequence, and RCFile. Redshift Spectrum automatically scales query
compute capacity based on the data being retrieved, so queries against Amazon
S3 run fast, regardless of data set size

OLAP
OLAP transaction Example :
Net Profit for EMEA and Pacific for the Digital Radio Product.
Pulls in large numbers of records.
Sum of Radios Sold in EMEA
Sum of Radios Sold in Pacific
Unit of Radio in each region
Sales price of each radio
Sales price – unit cost.
Data Warehousing databases use different type of architecture both from a

database perspective and infrastructure layer.
Redshift configuration
• Single Node (160GB)
• Multi-Node
• Leader Node (manages client connections and receives queries).
• Compute Node (store data and perform queries and computations). Up to 128
Compute Nodes

Redshift- 10 times faster
Columnar Data Storage : Instead of storing data as a series of rows,
Amazon Redshift organizes the data by column. Unlike row-based
systems, which are ideal for transaction processing, column-based
systems are ideal for data warehousing and analytics, where queries
often involve aggregates performed over large data sets. Since only the
columns involved in the queries are processed and columnar data is
stored sequentially on the storage media, column-based systems
require far fewer I/Os greatly improving query performance.

• Advanced Compression : Columnar data stores can be compressed
much more than row-based data stores because similar data is stored
sequentially on disk. Amazon redshift employs multiple compression
techniques and can often achieve significant compression relative to
traditional relational data stores. In addition, Amazon Redshift
doesn’t require indexes or materialized views and so uses less space
than traditional relational database systems.
When loading data into an empty table, Amazon redshift
automatically samples your data and selects the most appropriate
compression scheme.

• Massively Parallel processing ( MPP) : Amazon Redshift automatically
distributes data and query load across all nodes. Amazon redshift
makes it easy to add nodes to your data warehouse and enables you
to maintain fast query performance as your data warehouse grows.

Redshift pricing
• Compute node hours(total number of hours you run across all your
compute nodes for the billing period. You are billed for 1 unit per
node per hour, so a 3 node data warehouse cluster running
persistently for an entire month would incur 2160 instance hours.
(you will not be charged for leader node hours; only compute nodes
will incur charges.)
• Backup
• Data transfer (only within a VPC, not outside it)

Redshift pricing
• Amazon Redshift On-Demand pricing has no upfront costs - you simply pay an hourly rate based
on the type and number of nodes in your cluster. You can save up to 75% over On-Demand rates
by committing to use Amazon Redshift for a 1 or 3 year term. Prices include two additional copies
of your data, one on the cluster nodes and one in Amazon S3. We take care of backup, durability,
availability, security, monitoring, and maintenance for you.
• Our Dense Storage (DS) nodes allow you to create large data warehouses using hard disk drives
(HDDs) for a low price point. Dense Compute (DC) nodes allow you to create high performance
data warehouses using solid-state disks (SSDs). If you have less than 500GB of data, your most
cost-effective and highest performance option is Dense Compute nodes. Above 500GB, if your
primary focus is performance, you can continue with Dense Compute nodes up to hundreds of
compressed terabytes for $5,500/TB/Year (3 Year Partial Upfront Reserved Instance pricing). If
you want to reduce costs or need to scale further, you can switch to our larger, more cost-
effective Dense Storage nodes, and scale to over a petabyte of compressed data for under
$1,000/TB/Year (3 Year Partial Upfront Reserved Instance pricing). Scaling your cluster or
switching between node types requires a single API call or a few clicks in the AWS Console.
• https://aws.amazon.com/redshift/pricing/

Redshift Security
• Encryped in transit using SSL
• Encrypted at rest using AES-256 encryption
• By default Redshift takes care of key management.
• Manage your own keys through HSM.
• AWS key management service.

Redshift availability
• Currently only available in 1 AZ
• Can restore snapshots to new AZ’s in the event of an outage.

Elasticache

What is ElastiCache ?
• Amazon ElastiCache is a web service that makes it easy to deploy, operate, and
scale an in-memory data store or cache in the cloud. The service improves the
performance of web applications by allowing you to retrieve information from
fast, managed, in-memory data stores, instead of relying entirely on slower disk-
based databases. Amazon ElastiCache supports two open-source in-memory
engines:
• Redis - a fast, open source, in-memory data store and cache. Amazon ElastiCache for Redis is
a Redis-compatible in-memory service that delivers the ease-of-use and power of Redis along
with the availability, reliability and performance suitable for the most demanding
applications. Both single-node and up to 15-shard clusters are available, enabling scalability
to up to 3.55 TiB of in-memory data. ElastiCache for Redis is fully managed, scalable, and
secure - making it an ideal candidate to power high-performance use cases such as Web,
Mobile Apps, Gaming, Ad-Tech, and IoT.
• Memcached - a widely adopted memory object caching system. ElastiCache is protocol
compliant with Memcached, so popular tools that you use today with existing Memcached
environments will work seamlessly with the service.

What is ElastiCache ?
Amazon ElastiCache can be used to significantly improve latency and

throughput for many read-heavy application workloads (such as social
networking, gaming, media sharing and Q & A portals) or compute
intensive workloads (Such as a recommendation engine).
Caching improves application performance by storing critical pieces of
data in memory for low-latency access. Cached information may
include the results of I/O-intensive database queries or the results of
computationally-intensive calculations.

How ElastiCache Works

Elasticache Exam Tips
• Elasticache is a good choice if your database is particularly read
heavy and not prone to frequent changing.
• Redshift is a good answer if the reason your database is feeling stress
is because management keep running OLAP transactions on it etc.

Aurora

What is Aurora ? – Not a exam topic for
associate level. But its good to have insight.
Amazon Aurora is a MySQL – compatible, relational database engine
that combines the speed and availability of high-end commercial
databases with the simplicity and cost-effectiveness of open source
databases.
Amazon Aurora provides up to five times better performance than
MySQL at a price point one tenth that of a commercial database while
delivering similar performance and availability.
• In short its Oracle competitor product from Amazon.

Aurora Scaling
• Start with 10Gb, Scales in 10GB increments to 64Tb(Stroage Autoscaling)
• Compute resources can scale up to 32 vCPUs and 244 Gb of Memory.

• 2 copies of your data is contained in each availability zone, with minimum
of 3 availability zones. 6 copies of your data.
• Aurora is designed to transparently handle the loss of up to two copies of
data without affecting database write availability and up to three copies
without affecting read availability.
• Aurora storage is also self-healing. Data blocks and disks are continuously
scanned for errors and repaired automatically.
Aurora Replicas
• 2 types of Replicas are available.
• Aurora Replicas ( currently 15)
• MySQL read replicas ( currently 5)
• Its not available in free tier at all. ( so no lab)
• Its available only in selected regions.

AWS Database - Summary
• RDS – OLTP
• SQL, MySQL, PostgreSQL,Oracle, Aurora, MariaDB
• DyanoDB- No SQL
• RedShift – OLAP
• Elasticache – In Memory Caching.
• Memcached
• Redis
Amazon RDS Security Groups :
http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Overview.RDSSecurityG
roups.html
Storage for Amazon RDS
https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_Storage.html

What is Elastic Container services
• What is dockers ?

What is docker ?
• Docker is a software platform that allows you to build, test and
deploy applications quickly.
• Docker is highly reliable : you can quickly deploy and scale
applications into any environment and know you code will run.
• Docker is infinitely scalable : Running Docker on AWS is a great way to
run distributed applications at any scale.
• Docker packages software into standardized units called containers :
• Containers allow you to easily package an applications code, configurations
and dependencies into easy to use building blocks that deliver environmental
consistency, operational efficiency, developer productivity and version
control.

Virtualization vs Containerization
• Traditional virtual machine
• Containers
• And that where the technology diverge.
• Traditional virtualization has density
compromises.
• Dockers achieves higher density, and
improve portability by removing the
per container guest OS.

Containerisation Benefits
• Escape from dependency Hell
• Consistent progression from DEV -> TEST -> QA -> PROD
• Isolation- performance or stability issues with app A in container A,
wont impact App B in container B.
• Much better resource management.
• Extreme code portability
• Micro-Services

Docker components
• Docker Image
• Docker containers
• Layers / Union file system
• DockerFile
• Docker Daemon / Engine
• Docker Client
• Docker Registries / Docker Hub

Database

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Database

Uploaded by

Copyright:

Available Formats

Database & Containers

Documented by- Kisho

ID First Name Surname Gender

Documented by- Kishor Krishna

Documented by- Kishor Krishna

Documented by- Kishor Krishna

Documented by- Kishor Krishna

Documented by- Kishor Krishna

Documented by- Kishor Krishna

Data Warehousing databases use different type of architecture both from a

Documented by- Kishor Krishna

AWS schema conversion tool automatically converts the source database

Documented by- Kishor Krishna

Documented by- Kishor Krishna

Documented by- Kishor Krishna

Documented by- Kishor Krishna

Documented by- Kishor Krishna

Documented by- Kishor Krishna

At present time, encrypting an existing DB instance is not supported. To

Documented by- Kishor Krishna

Documented by- Kishor Krishna

In the event of planned database maintenance, DB instance failure, or

Documented by- Kishor Krishna

Documented by- Kishor Krishna

Documented by- Kishor Krishna

Documented by- Kishor Krishna

Read Replica Databases Support :

Documented by- Kishor Krishna

Documented by- Kishor Krishna

Documented by- Kishor Krishna

Documented by- Kishor Krishna

Documented by- Kishor Krishna

• Eventual consistent reads

Documented by- Kishor Krishna

Documented by- Kishor Krishna

First 25 GB stored per month is free.. SSD

For more details please visit below URL

Documented by- Kishor Krishna

Documented by- Kishor Krishna

Documented by- Kishor Krishna

Data Warehousing databases use different type of architecture both from a

Documented by- Kishor Krishna

Documented by- Kishor Krishna

Documented by- Kishor Krishna

Documented by- Kishor Krishna

Documented by- Kishor Krishna

Documented by- Kishor Krishna

Documented by- Kishor Krishna

Documented by- Kishor Krishna

Documented by- Kishor Krishna

Documented by- Kishor Krishna

Amazon ElastiCache can be used to significantly improve latency and

Documented by- Kishor Krishna

Documented by- Kishor Krishna

Documented by- Kishor Krishna

Documented by- Kishor Krishna

Documented by- Kishor Krishna

• Compute resources can scale up to 32 vCPUs and 244 Gb of Memory.

Documented by- Kishor Krishna

Documented by- Kishor Krishna

Documented by- Kishor Krishna

Documented by- Kishor Krishna

Documented by- Kishor Krishna