You are on page 1of 58

Rightsizing for Oracle

Database Workloads in
the Cloud

Karunakar Dutt & Simon Pane


December 4, 2019
Simon Pane
Pythian Principal Consultant

• ~25 years Oracle experience


• Community Volunteer
• Oracle ACE
• Oracle Certified

© Pythian Services Inc., 2019 2


Conference and/or Webcast Speaker For

© Pythian Services Inc., 2019 3


The “About Karun”
Slide Goes in
Hi I am Here
Karun
© Pythian Services Inc., 2019 4
PYTHIAN
A global IT company that helps businesses leverage disruptive technologies to better compete.

Our services and software solutions unleash the power of cloud, data and analytics to drive better
business outcomes for our clients.

Our 20 years in data, commitment to hiring the best talent, and our deep technical and business expertise
allow us to meet our promise of using technology to deliver the best outcomes faster.

© Pythian Services Inc., 2019 5


22 400+ 350+
Years in Experts in 35 Clients
Business Countries Globally

© Pythian Services Inc., 2019 6


Karun joined PYTHIAN
Simon joined
TIMELINE
in 2005
in 2014

1997-2012 2013-2014 2015 2016 2017

Remote Database Cloud emerges, Expanded Open Competencies 11,000 database


Management DevOps practice Source– grow with Cloud systems under
Services–Oracle, established databases partners–Data, Pythian
Microsoft SQL Cassandra, Machine Learning, management
Server, MySQL Hadoop practice MongoDB Migrations,
established DevOps Analytics as a
Cloud partnerships Service launches
with Google, AWS,
First Cloud Microsoft
Managed Service Completed one of
the world’s most
Analytics practice complex Cloud
established Migrations

© Pythian Services Inc., 2019 7


AGENDA

Impress you with our Credentials


Tell you why Pythian is awesome (it Is actually)
Ask a few survey questions
Provide why and why not reasons for moving to the Cloud

Talk about this fantastic new invention we built

DO YOU KNOW HOW MUCH WILL IT COST PER


MONTH to right-size your existing WORKLOADS
in the CLOUD? How will you begin estimating
that ?
© Pythian Services Inc., 2019 8
8
Client Project Background
• A little common “cents” (pun intended)
• Customer was California based – hence using “cents” instead of “pence”

This stuff
is real!

• Related blog article:


https://blog.pythian.com/consolidating-oracle-databases-cloud-vm/

© Pythian Services Inc., 2019 9


Describing the Challenge
Real-world

A Sample Environment

Estate includes
• Various 2, 3, and 4-node
RAC configurations
• Multiple single-instance
DBs
• Some one per server,
others with multiple DBs
per server
• Data Guard replication

© Pythian Services Inc., 2019 11


The Key Questions
• Fundamental questions:
1. How do we estimate our VM footprint for hosting our Oracle databases in the
cloud? (Finding the balance between over-provisioning and consolidation.)
2. How many VMs will we need?
You mean they
3. What sizes for vCPU and RAM should we be aiming for? are not using
• Additional questions: the Oracle
Autonomous
• How easy is it to scale up and down? database ?
• Backup / recovery strategies & handling refreshes and clones?
• Encryption options?

© Pythian Services Inc., 2019 12


One-to-one = The Wrong Direction
• The model of one VM per database doesn’t usually fit:
• If moving from an on-premises RAC, what sized single instance node?
• One VM per DB probably results in over-provisioned infrastructure
• Scaling costs: upgrade & patching efforts; deployment & management of monitoring
agents, other tools, etc.
Is that even
possible ?

Consequently, we need to look for a cloud VM configuration that allows


us to align databases with virtual servers in an optimal manner!

© Pythian Services Inc., 2019 13


The Desired Outcome: The “Org-Chart”?

Karun, can you create and insert here a


“org-chart” style diagram

What is he
Yes Sir!
talking
Org chart
about ?
coming up

© Pythian Services Inc., 2019 14


And this how it looks (when it is done)

© Pythian Services Inc., 2019 15


This what we use to build an org-chart..
TRACK (DC-LHR) Across
Clouds
Across
ENVIRONMENT (Prod, Dev, Replica, Test1…) Regions

One
INSTANCE-GROUP (App1, Batch1) region

SERVER-GROUP ( Primary, Standby 1, Standby 2)


One
Zone
SERVER

DB ( CPU, RAM, DISK)


© Pythian Services Inc., 2019 16
Or if you want more details...

© Pythian Services Inc., 2019 17


More ...If you can still read this …

© Pythian Services Inc., 2019 18


Differences with Cloud
Infrastructure
Managed Service or Build-Your-Own (IaaS)
• Cloud vendor might not offer a Managed Service

• Limitations of a DB Managed Service:


• Amazon RDS: limited access and OS interaction
• OCI: pre-defined configurations with limitations

• Usually offers patching options but no upgrade options

© Pythian Services Inc., 2019 20


IaaS: A Cloud VM is Different
• A cloud VM is unique: not quite the same as a hypervisor VM or a physical server:
• Software based components (i.e. networking and IO paths) introduces limiters
• Live (vendor coordinated) migrations or pre-emptibility
• Shared infrastructure or dedicated

© Pythian Services Inc., 2019 21


Cloud Software Based Throttles
• Cloud vendors often use software throttling (governors):
• Network limitations:
• Network throughput based on the number of vCPUs
• Block storage performance:
• Read and write IOPs dependent on PD size
• Sustained throughput dependent on PD size

© Pythian Services Inc., 2019 22


Licensing Complexities
• “Licensing Oracle Software in the Cloud Computing Environment”
• https://www.oracle.com/assets/cloud-licensing-070579.pdf

• Caveats at the bottom (including preventing replicating here)

• Cloud vendors responding and challenging that position:


• Example: AWS’s “Optimize CPUs” for EC2 and RDS removes hyper-threading

© Pythian Services Inc., 2019 23


The “Solver” Approach
What do we really Need on the Cloud ?

CPU === vCPU


SGA === RAM
Disk === MBpS/IOpS

© Pythian Services Inc., 2019 25


The Cloud VM as an Oracle Server
• Every cloud VM is also allocated CPUs, RAM and disks, but with a few important
differences:
• vCPUs are not “full” hyper-threaded CPUs, and we need to provide a conversion factor to
convert AAS to vCPUs. After comparing results from www.spec.org, we decided for our
purposes (and for our specific cloud provider) that AAS*1.5 = 1 vCPU.
• All access to disk is via a network, and network access is limited (max 2GBPS/vCPU in our
case).
• MBPS and IOPS may be dependent on the type, size of disk and the vCPUS used in the VM.
• We will keep a few vCPUs unused to provide for the OS and tools that may be running on the
VM.

© Pythian Services Inc., 2019 26


Three Significant Measurable Resources

• To run the server


CPU • Plus background processes

• Database buffers (cache)


Memory • Additional background process memory

• Store and retrieve the data


Storage • Grid Infrastructure (GI)?

• Additional considerations:
• Network IOPs and throughput
• Financial Costs
© Pythian Services Inc., 2019 27
Three Significant Measurable Resources

• To run the server


CPU • Plus background processes

• Database buffers (cache)


Memory • Additional background process memory

• Store and retrieve the data


Storage • Grid Infrastructure (GI)?

• Additional considerations:
• Network IOPs and throughput
• Financial Costs
© Pythian Services Inc., 2019 28
Provide Some Room for
Growth and Unknowns …
But not too much!
Requirements – Database to Server Re-alignment
• For each resource need to determine what is actually required:
• Existing systems may be over-provisioned, under-provisioned, or right-sized
• A simple stacking approach will almost certainly lead to over-provisioned cloud
infrastructure at an unnecessarily high financial cost
• Is consolidation desired?

• Other their other “non-functional requirements” (NFRs):


• Some database might require isolation due to internal policies, security requirements,
regulatory reasons, etc.

© Pythian Services Inc., 2019 30


Collecting the Oracle Performance Data
• An easy starting point is to use AWR and ASH data by running some performance
data collection scripts: https://github.com/carlos-sierra/esp_collect

• Similar but more limited data collection available if access to AWR is not licensed
(or if running Standard Edition)

• The analysis is only as strong as the collected performance data!!!


• Should the AWR snapshot frequency and data retention first be increased to provide
more data for analysis?
• Should the collection be re-run to capture seasonal cycles?
© Pythian Services Inc., 2019 31
Instance numbers -1 and -2

SUM of MAX values

MAX of SUMMED Values

© Pythian Services Inc., 2019 32


Sample Data Collection
• A summary table of our databases and their “resource needs”:

© Pythian Services Inc., 2019 33


Use the 97th Percentile Numbers

RAC : instance = -1

© Pythian Services Inc., 2019 34


Listing the “Capacities”
• Can come up with a table of “capacities” for each VM type that our cloud provider
offers. Some examples:
• hm-32 Indicates a High Memory VM of 32 vCPUs and 208 GB RAM
• s-8 is a standard VM of 8 vCPUs and 30 GB RAM
• The costs of running the VM (for these examples) are calculated for a full 730
hours (a month) with no usage discounts.
• We are also assuming that we can allocate at most 50% of RAM to the database
SGA
• Maybe a different percentage is more applicable

© Pythian Services Inc., 2019 35


Sample “Capacities” Table

© Pythian Services Inc., 2019 36


Sample “Capacities” Table

© Pythian Services Inc., 2019 37


Sample “Capacities” Table

© Pythian Services Inc., 2019 38


We know what we need
We know what sizes are
available….

so Go Pick Some VM shapes!

© Pythian Services Inc., 2019 39


The Knapsack Problem

Reference sources:
Text: https://en.wikipedia.org/wiki/Knapsack_problem

Illustration: Creative Commons Attribution-Share Alike 2.5 Generic licensehttps://commons.wikimedia.org/wiki/File:Knapsack.svg


© Pythian Services Inc., 2019 40
The KS tells us how many
databases we can put in a VM
shape

The VMUsage tells us how


much money £ we are wasting!
© Pythian Services Inc., 2019 41
Summarizing the “Process”
1. What are we are trying to maximize?
• Typically, we want to utilize all the resources that we pay for each month
• So each month we want to maximize VMUSAGE
• Start by defining VMUSAGE

2. Given a VM shape, which databases do we choose to put in it?

3. Rinse/repeat until we allocate all databases

© Pythian Services Inc., 2019 42


1. Defining VMUSAGE
• Starting point:
• VMUSAGE = (Cents paid for a VM / Cents wasted)
• Cents wasted = Cents paid for – Cents actually Utilized

• Must add “weightage” to larger VMs:


• Want to choose the largest VM where we waste the least
• Since bigger VMs cost more, multiply the cost with a weightage
• VMUSAGE = (Cents paid for a VM) * (Cents paid for a VM / Cents wasted )
• If Cents wasted = 0 then set Cents wasted = 0.0001

Note 1: If we are not wasting anything, VMUSAGE will be infinite. So we can put a boundary condition for that in our code. That is the
reason for the “if” caveat.

Note 2: It does not have to be this way all the time! We could be solving for a VMUSAGE calculated differently.
© Pythian Services Inc., 2019 43
2. For the Given VM Type – Which Databases?
• Classical Bin packing problem – explanation and sample Python code:
• Based on the Google OR-Tools collection of libraries and APIs:
https://developers.google.com/optimization/bin/knapsack

• Using a multi-dimensional “solver” (let’s call it “DBAllocate”):

© Pythian Services Inc., 2019 44


Solver Inputs
• Value: The “intrinsic value” of picking a specific DB
• Since we are trying to maximize resource utilization, this is simply the cost
• Again this is a simplification, we could calculate this differently; we could score each DB by its
“criticality,” for example

• Observations: This is the resources that each DB utilizes from the capacities
• Choosing IOPS, MBPS, VCPU, SGA as the four resources that each DB will use from the VM
• Sourced from our AWR mining script outputs. Notice we did not include the data disk size which is not
affected by the VM type that we select

• Capacities: The VM has fixed limits for IOPS, MBPS, VCPU, SGA
• The values in the capacities table allowing for a little bit of overhead for growth

• Also consider measurement inaccuracies (data engineering experience)

© Pythian Services Inc., 2019 45


Running the Solver
• Sample code lines after defining the three arrays (value, observations, capacities):

• The dbsallocated array shows the “indexes” of the selected databases, which
we can now use to get the rest of the details from the databases table.

© Pythian Services Inc., 2019 46


3. Repeating Until all DBs are Placed
• The Knapsack Solver is calculating testing DB placement and measuring
VMUSAGE for all VM types

• The Solver goes through all of the possible permutations to find the optimal
configuration

• The VMUSAGE reports very different values for all VM types, higher numbers are
better!

• VMs that cannot accommodate any databases are showing a VMUSAGE of 0.

© Pythian Services Inc., 2019 47


First Iteration
• For Iteration 1 the highest VMUSAGE is for a VM of type: hm-64 (Row 6):

© Pythian Services Inc., 2019 48


First Iteration
• For Iteration 1 the highest VMUSAGE is for a VM of type: hm-64 (Row 6):

© Pythian Services Inc., 2019 49


First Iteration
• The Solver determines that the VM hm-64 will contain the following databases:

48 195

© Pythian Services Inc., 2019 50


Second Iteration
• After the first Iteration, only 1 database is left to allocate!
• The code runs the second iteration automatically as long as there are databases
yet to be allocated.

© Pythian Services Inc., 2019 51


Resulting (Example) Solution
● Only two iterations were required for this example
● Outcome: a dense packing of our five databases in two kinds of VMs to minimize
expected costs

● The VM type hm-64 hosts the first four databases (APP, BATCH, DW, REPO)

● The VM type s-16 hosts one database (OLTP)

© Pythian Services Inc., 2019 52


Expected $(with no discounts) = $1932.16 + $388 = $2320.16

© Pythian Services Inc., 2019 53


Cautions!
• Strong disclaimer: The costs used in the capacities table are an estimate for each
of the VM types and are focused on vCPU and RAM
• For an actual server, there will be additional expenses for network, disks for the
databases, backups etc. which we can factor in as well, but we have kept it simple
• There may also be additional savings based on commitments, sustained usage
discounts, etc
• The cents per month are used to choose VMs for optimization and do not reflect
actual costs!

© Pythian Services Inc., 2019 54


The “Tools”
• A Jupyter Notebook running Python 3.7
• A SQLite Database holding the collected data

• Code link: https://github.com/pythiandutt/Solver101/

You mean give


it away ?

© Pythian Services Inc., 2019 55


WRAP UP!

© Pythian Services Inc., 2019 56


Summary…

• With a few simplifying assumptions we can model our requirements as a


use-case of the well known multi-dimensional Knapsack problem

• Using the Google OR-Tools toolkit, data mined from the Oracle AWR and
ASH, and some simple Python code, we can develop a future-state
configuration based on our cloud provider’s VM shapes

• Change the values, observations, and capacities to meet your specific


requirements

• Closing Remember: cloud offerings are continuously evolving!


© Pythian Services Inc., 2019 57
THANK YOU
http://bit.ly/OraSizing-UKOUG19

dutt@pythian.com
pane@pythian.com

© Pythian Services Inc., 2019 58

You might also like