You are on page 1of 13

 

 
 
A  Checklist  for  Preventing  
Common  But  Deadly  MySQL®  
Problems  
 
 
 
 
 
 
 
 
A  Percona  Whitepaper  
September  15,  2015  

www.percona.com  
 

 
 

Contents  
 ©Percona.  All  rights  reserved.  

Introduction  ...........................................................................................................................................................................  3  

The  Busy  MySQL  DBA  .........................................................................................................................................................  3  

The  Impact  on  Your  Business  .............................................................................................................................................  3  

The  Common  but  Deadly  MySQL  Problems  ..........................................................................................................................  4  

Out-­‐of-­‐Date  MySQL  Version  ...............................................................................................................................................  4  

Inadequate  MySQL  Configuration  Settings  ........................................................................................................................  4  

Unmindful  Deployment  Practices  ......................................................................................................................................  5  

Poor  Migration  Preparation  ...............................................................................................................................................  6  

Upgrades  ........................................................................................................................................................................  6  

Capacity  Alterations  .......................................................................................................................................................  7  

Moving  Operations  Into  or  Out  of  the  Cloud  .................................................................................................................  7  

Changing  Products  ..........................................................................................................................................................  8  

Poor  Query  Performance  ...................................................................................................................................................  8  

Lack  of  High  Availability/Too  Much  Downtime  ..................................................................................................................  8  

Inadequate  Monitoring  and  Alerting  ..................................................................................................................................  9  

Faulty  Security  Practices  ...................................................................................................................................................  11  

Limited  Backup  and  Recovery  Plan  ..................................................................................................................................  12  

How  Percona  Can  Help  You  .................................................................................................................................................  12  

About  Percona  .....................................................................................................................................................................  13  

 
Percona  
400  W.  Main  Street,  Suite  204  
Durham,  North  Carolina  USA  
www.percona.com  
©Percona.  All  Rights  Reserved  
 

Introduction  
This  white  paper  gives  busy  DBAs  and  their  management  team  an  understanding  of  some  of  the  most  
common  MySQL  implementation,  performance,  and  recovery  problems  that  can  be  minimized  or  prevented  
by  proactive  and  diligent  MySQL  administration  best  practices.  The  issues  covered  in  this  white  paper  are:  

• Out-­‐of-­‐Date  MySQL  Version  


• Inadequate  MySQL  Configuration  Settings  
• Unmindful  Deployment  Practices    
• Poor  Migration  Preparation  
• Poor  Query  Performance    
• Lack  of  High  Availability/Too  Much  Downtime  
• Inadequate  Monitoring  and  Alerting  
• Faulty  Security  Practices  
• Limited  Backup  and  Recovery  Plan  
 
This  information  you’ll  read  is  based  on  collected  advice  from  the  Percona  Support,  Consulting,  and  Managed  
Services  teams.  This  group  includes  top  experts  in  MySQL  operations  who  have  helped  thousands  of  clients  
over  the  past  8  years.  

The  Busy  MySQL  DBA  


Busy  MySQL  DBAs  often  fall  into  one  of  three  categories:  

• You  may  be  filling  the  role  of  your  organization’s  MySQL  DBA  because  there  is  no  one  else  available.  
You  may  have  limited  DBA  experience  and  might  be  an  IT  professional,  a  SysAdmin,  the  CTO,  or  on  
the  engineering  and  development  staff.    
• You  may  be  a  DBA  by  profession,  but  are  overloaded  due  to  lack  of  resources  in  your  organization.  
Your  time  might  be  divided  between  database  operations  and  other  tasks,  such  as  application  
development.  You  may  be  addressing  issues  reactively  or  when  time  allows,  hopefully  before  disaster  
strikes.    
• You  may  be  a  DBA  transitioning  into  a  more  strategic  role,  in  which  the  handling  of  common  
operational  tasks  is  no  longer  your  main  focus  or  desire.  In  this  capacity,  common  but  necessary  
database  administrative  tasks  are  performed  ad  hoc,  as  time  allows,  or  perhaps  not  at  all.    

The  Impact  on  Your  Business  


The  busy  DBA  and  his  or  her  managers  must  also  consider  the  problems  caused  by  part-­‐time  or  reactive  
MySQL  administration  and  the  impact  those  issues  may  have  on  your  overall  business.  It  is  management’s  
responsibility  to  maintain  the  resources  necessary  to  manage  their  organization’s  MySQL  environments.  A  
lack  of  needed  resources  inevitably  leads  to:  

• Unacceptable  downtime  
• Unnecessary  costs  

 
Percona  
400  W.  Main  Street,  Suite  204  
Durham,  North  Carolina  USA  
www.percona.com  
©Percona.  All  Rights  Reserved  
 

• Potential  malicious  attacks  


• Frustrated  users  
• Lost  business  

Percona  Services  can  help.  Percona  Consulting  and  Support  Services  for  MySQL  offer  customized  help  and  
immediate  assistance  from  MySQL  experts.  Percona  Managed  Services  for  MySQL  can  take  over  all  of  your  
organization’s  MySQL  operational  tasks,  letting  you  concentrate  on  other  projects.  

The  Common  but  Deadly  MySQL  Problems  


We  will  now  take  a  look  at  some  of  the  most  common  MySQL  implementation,  performance,  and  recovery  
problems.  

Out-­‐of-­‐Date  MySQL  Version  


You  may  adopt  or  inherit  an  environment  where  the  MySQL  version  is  out-­‐of-­‐date.  There  are  many  scenarios,  
such  as  relying  on  the  version  available  from  your  Linux  package  manager,  where  you  or  your  team  may  be  
unknowingly  using  a  legacy  version  of  MySQL.  Remaining  with  a  legacy  version  can  lead  to  risks:  

• Security  —  you  can  be  more  vulnerable  to  security  exploits  with  an  old  version  of  MySQL  which  may  
increase  the  surface  area  for  potential  attacks  
• Performance  —  you  will  usually  experience  better  performance  with  the  latest  version  of  MySQL  
• Support—  your  version  might  be  so  old  that  it  is  no  longer  supported  by  your  vendor  or  it  is  difficult  
to  find  tools  or  a  consultant  to  help  
 
For  example,  Percona  had  an  online  consumer  business  client  that  was  running  MySQL  5.1.  They  were  
experiencing  edge  case  query  performance  issues  where  index  hints  were  unable  to  provide  relief  from  poor  
response  time.  The  problem  was  critically  impacting  sales  due  to  frustrated  users.  Testing  the  same  queries  
on  Percona  Server  5.6  proved  that  the  performance  enhancements  of  the  Percona  Server  5.6  query  optimizer  
reduced  their  overall  load  by  around  40%.  Upgrading  to  Percona  Server  5.6  provided  capacity  for  growth  on  
their  existing  systems,  avoided  costly  hardware  purchases,  and  delivered  better  response  times  and  a  
superior  customer  experience.    

Minor  version  upgrades  may  be  something  you  can  handle  with  little  risk.  However,  for  a  major  upgrade,  you  
may  want  to  consider  using  outside  help  that  has  experience  dealing  with  a  multitude  of  upgrade  scenarios.  
Percona  uses  a  carefully  managed  approach  to  upgrades  that  combines  unique  tools  such  as  Percona  Toolkit  
with  our  decades  of  experience.  For  our  Managed  Services  clients,  the  team  works  hand-­‐in-­‐hand  with  our  
clients  to  test,  plan,  execute,  and  re-­‐test  so  there  are  no  surprises  when  moving  from  a  legacy  version  of  
MySQL  to  the  latest  release.  

Inadequate  MySQL  Configuration  Settings  


MySQL  can  run  right  out-­‐of-­‐the-­‐box.  While  this  makes  it  easy  to  install  and  evaluate  MySQL  at  a  small  scale,  
many  of  the  default  MySQL  configuration  values  are  not  appropriate  for  a  large  production  environment.  

 
Percona  
400  W.  Main  Street,  Suite  204  
Durham,  North  Carolina  USA  
www.percona.com  
©Percona.  All  Rights  Reserved  
 

You  might  use  the  default  MySQL  configuration  values  because  you  do  not  have  the  time  or  experience  to  
tune  them  appropriately.  Tuning  MySQL  without  understanding  how  to  measure  the  results  can  cause  more  
harm  than  good.  This  can  lead  to:  

• A  highly  inefficient  usage  of  system  resources  


• A  system  that  does  not  scale  well  
• Unpredictable  behavior  for  application  end  users  

Here  are  some  examples  of  how  default  settings  can  impact  your  system:  

• The  overhead  of  fully  synchronous  disk  writes  may  be  unnecessarily  throttling  your  site’s  rate  of  write  
traffic  
• The  size  of  the  InnoDB  buffer  pool  and  transaction  log  may  be  keeping  your  server  from  fully  using  its  
hardware  capacity  
• Using  the  outdated  MyISAM  storage  engine  (instead  of  the  preferred  InnoDB  engine)  risks  data  loss  
and  costly  table  repairs  
 
The  reality  is  that  organizations  can  often  use  their  database  servers  a  lot  longer  than  they  think.  Adding  
additional  hardware  should  only  occur  after  MySQL  is  tuned  appropriately.  A  new  Percona  client  in  the  
ecommerce  industry  asked  us  to  investigate  performance  issues  because  their  current  architecture  couldn't  
cope  with  their  query  workload.  Upon  investigation,  we  found  that  the  variables  configured  to  sync  the  
replication  metadata  were  all  set  to  1  and  would  sync  after  every  event  written.  This  generated  a  huge  
amount  of  fsyncs  on  their  environment  and  put  significant  pressure  on  their  storage.  Changing  these  
variables  to  values  that  were  more  suitable  to  their  application  cut  storage  usage  by  over  50%  and  instantly  
resolved  their  problems  with  query  performance.  

Percona  approaches  MySQL  configuration  tuning  by  measuring  performance,  identifying  bottlenecks,  
applying  tuning  changes,  and  measuring  the  results.  This  approach  provides  a  record  of  the  reasons  behind  
tuning  changes  so  they  can  be  applied  with  confidence.  

Unmindful  Deployment  Practices  


Updating  the  code  in  your  production  application  is  a  frequent  necessity.  This  may  occur  every  few  weeks  or  
even  every  few  minutes.  Sometimes,  deploying  new  code  requires  changes  to  the  database  for  schema  
updates,  adding  indexes,  or  deleting  rows.  

Making  changes  to  a  database,  such  as  deleting  many  records,  can  cause  problems  if  not  thoughtfully  
planned  and  executed.  For  example,  many  database  design  and  configuration  activities  can  block  all  users  of  
an  application  for  unknown  periods  of  time.  

Unplanned  downtime  can  occur  when  operations  that  require  various  types  of  locks  run  in  the  production  
environment  at  the  wrong  time  or  using  the  wrong  approach.  MyISAM  can  be  a  perpetrator  in  this  context.  
However,  even  with  a  dataset  entirely  using  InnoDB,  some  operations  can  demand  global  locking.  For  

 
Percona  
400  W.  Main  Street,  Suite  204  
Durham,  North  Carolina  USA  
www.percona.com  
©Percona.  All  Rights  Reserved  
 

example,  executing  ALTER  table  on  a  large  scale  can  require  a  complete  table  rebuild.  This  action  will  block  
any  reads  and  writes  on  the  affected  tables  and  can  take  a  significant  amount  of  time  to  complete  on  a  multi-­‐
gigabyte  scale.  

There  are  tricks  and  tools  that  Percona  uses  to  mitigate  the  effect  of  an  ALTER.  For  example,  you  might  be  
able  to  perform  the  actions  on  a  slave  and  then  promote  it  to  become  the  master  or  you  might  use  pt-­‐online-­‐
schema-­‐change  from  Percona  Toolkit.  You  might  even  be  able  to  directly  run  ALTER  on  small  tables  or  run  the  
larger  ALTERs  during  off  hours.  The  point  is  to  pay  careful  attention  in  each  deployment,  not  only  the  code  
changes  but  also  how  the  databases  are  impacted.  

Percona  works  with  clients  to  thoroughly  inspect  SQL  to  ensure  that  there  are  no  hidden  surprises.  When  we  
have  identified  database  changes  in  a  client’s  code  deployment  that  could  impact  performance,  we  have  
found  that  gently  administering  the  needed  changes  over  an  elongated  period  of  time  reduces  risk.  In  
addition,  we  have  deployment  tools  that  enable  us  to  perform  dry  runs  before  actually  executing  the  changes  
on  the  data.  Running  tests  and  manually  checking  changes  has  saved  our  clients  from  potential  disasters  in  
the  past.  

Poor  Migration  Preparation  


There  may  come  a  time  in  your  application’s  lifecycle  when  it  makes  sense  to  migrate  to  a  new  infrastructure.  
This  may  involve  moving  from:  

• One  variant  of  MySQL  to  another  


• Single  node  MySQL  to  a  cluster  environment  
• On  premise  MySQL  to  the  cloud  
• Your  current  infrastructure  to  different  hardware  

A  migration  should  result  in  a  positive  impact  on  your  bottom  line.  However,  migrations  face  a  number  of  
potential  issues  that  can  bring  down  your  system,  and  your  business,  for  an  extended  period  of  time.  
Upgrades,  capacity  alterations,  moving  operations  in  or  out  of  the  cloud,  and  changing  products  can  result  in  
downtime  if  not  properly  planned.  

Upgrades  
Upgrading  an  operating  system  or  MySQL  version  might  require  a  primary  master  to  be  taken  offline.  An  
upgrade  can  be  facilitated  by  promoting  a  slave  to  take  over  the  primary  master’s  role  for  the  duration  of  the  
upgrade.  However,  this  method  requires  careful  planning  and  knowledge  of  potential  problems.  In  some  
cases,  erroneous  writes  can  be  made  to  the  demoted  master,  resulting  in  data  integrity  issues.  Replication  
reconfiguration  mistakes  can  result  in  missing  transactions  or  data  consistency  issues.  Both  of  these  potential  
pitfalls  cost  organizations  extra  time  and  effort  to  manually  solve  the  problems  introduced  in  the  upgrade.  

When  preparing  for  an  upgrade  in  this  manner,  there  are  several  things  you  can  do  to  minimize  disruption.  
First,  make  sure  that  you  isolate  the  node  from  potential  connection  sources.  You  will  need  to  work  with  the  
team  responsible  for  the  application  to  ensure  that  threads  are  not  being  created  on  the  host  that  is  

 
Percona  
400  W.  Main  Street,  Suite  204  
Durham,  North  Carolina  USA  
www.percona.com  
©Percona.  All  Rights  Reserved  
 

undergoing  the  maintenance.  Percona  uses  virtual  IP  addresses  to  relocate  roles  between  nodes  and  then  we  
use  MHA  to  perform  the  reconfiguration  of  slaves.    

The  integrity  of  the  data  on  the  slave  must  be  trusted  when  moving  the  read/write  role  to  an  asynchronous  
slave  because  there  is  no  guarantee  from  MySQL  that  all  events  have  been  replayed  on  the  slave.  The  best  
way  to  achieve  this  is  to  use  the  Percona  Toolkit  tool  pt-­‐table-­‐checksum  regularly  and  action  the  sync  of  any  
detected  differences  with  pt-­‐table-­‐sync.  It  is  a  best  practice  to  make  a  backup  of  the  data  before  executing  
the  upgrade  so  you  have  a  rollback  plan.  To  further  reduce  the  margin  for  error,  we  recommend  that  a  
detailed  action  plan  is  produced  and  peer  reviewed  to  make  sure  nothing  is  missed.  

Capacity  Alterations  
You  may  need  to  take  a  server  offline  to  perform  hardware  maintenance  such  as  fitting  memory.  There  are  
also  cases  where  your  primary  master  may  be  on  hardware  over-­‐specified  for  its  workload  and  you  want  to  
reduce  its  capacity.  Minimizing  downtime  through  the  use  of  high  availability  tools  can  reduce  or  even  
remove  the  need  to  take  your  application  offline  while  work  is  performed.  

For  Percona,  Master  High  Availability  (MHA)  with  MySQL  replication  allows  us  to  automate  the  practice  of  
relocating  a  virtual  IP  address  and  reconfiguring  slave  replication  coordinates  from  manual  atomic  individual  
tasks  to  a  smart  and  swift  process.  The  technique  has  permitted  us  to  grow  block  storage  by  terabytes  
without  end  users  noticing  any  service  interruptions.  We  recently  diagnosed  that  a  client  had  over  specified  
their  hardware  and  were  paying  for  expensive  high  performance  storage  capable  of  many  times  the  required  
throughput.  Through  the  user  of  Percona’s  HA  toolset,  we  were  able  to  clone  the  master  and  migrate  without  
suffering  any  outages.  In  the  end,  we  significantly  reduced  our  customer’s  total  cost  for  their  overall  system.  

Moving  Operations  Into  or  Out  of  the  Cloud  


Moving  from  one  cloud  provider  to  another  or  even  between  services  of  the  same  cloud  provider  (e.g.,  AWS  
RDS  and  EC2)  also  requires  careful  planning.  For  example,  the  route  from  RDS  MySQL  version  5.5  to  EC2  is  not  
an  obvious  one.  Many  organizations  will  justify  losing  business  to  make  the  change.  Other  organizations  may  
decide  that  the  migration  is  not  justified  and  remain  on  legacy  versions  that  lack  new  functionality  or  
performance  improvements.  Switchover  must  be  expertly  timed  and  handled  with  minimal  interruption  so  
operations  can  continue  unaffected  and  users  do  not  experience  any  availability  issues.    

Percona  recently  managed  a  multi-­‐terabyte  migration  from  legacy  AWS  to  Amazon  Virtual  Private  Cloud  that  
provided  a  24x7  online  gaming  client  bigger,  newer  hardware.  Our  Managed  Services  team,  which  does  this  
type  of  operation  regularly,  completed  the  migration  without  any  interruption  of  service.  We  were  able  to  
plan  and  execute  this  migration  by  using  Percona  XtraBackup  as  a  means  of  cloning  the  nodes  and  MHA  to  
provide  replication  failover  to  our  preferred  new  node  while  completing  the  task  of  reconfiguring  all  slaves  
for  the  new  source.  This  protocol  permitted  operations  to  be  performed  completely  online.  

 
Percona  
400  W.  Main  Street,  Suite  204  
Durham,  North  Carolina  USA  
www.percona.com  
©Percona.  All  Rights  Reserved  
 

Changing  Products  
Some  organizations  migrate  from  one  variant  of  MySQL  to  another  to  make  use  of  functionality  that  does  not  
exist  in  their  current  variant.  Taking  time  to  plan  and  identify  the  issues  that  may  result  in  downtime  can  
highlight  where  measurements  need  to  be  made  to  minimize  disruptions  to  daily  workflow.  

If  you  take  the  database  offline  while  making  changes,  operations  may  be  interrupted.  Percona  has  
developed  techniques  to  minimize  or  eradicate  downtime  over  many  years  of  implementing  migrations  for  
clients.  In  many  cases,  Percona  uses  the  MHA  software  package  to  simply  and  safely  move  the  role  of  master  
to  a  waiting  slave  host.  This  provides  the  agility  to  failover  to  a  machine  that  has  had  upgrades  applied  or  that  
has  been  built  in  the  new  datacenter.  

Poor  Query  Performance  


Query  tuning  is  one  of  the  most  important  ways  you  can  improve  the  performance  of  your  MySQL  server,  yet  
it  is  often  overlooked.  Bad  queries  can  overload  your  server  or  affect  user  response  time.  Users  do  not  
tolerate  poor  response  time.  If  a  user  has  to  wait,  they  may  leave.  If  they  do  not  leave,  they  may  retry  their  
previous  action,  which  further  increases  the  number  of  concurrent  bad  queries  running  on  your  server.  Too  
many  poorly  performing  queries  may  result  in  hitting  the  max  connections  on  your  server,  causing  it  to  
become  overloaded  and  unresponsive.  

A  good  index  on  a  simple  select  can  be  the  difference  between  response  times  measured  in  minutes  or  
milliseconds.  For  most  of  our  clients  experiencing  excessive  server  loads  or  slow  MySQL,  the  root  cause  is  a  
poor  performing  query  or  queries  that  can  be  easily  fixed  with  the  correct  indexes  or  by  rewriting  the  query.  

We  have  improved  response  times,  reduced  server  load  and  improved  scalability  with  our  query  optimization  
services.  This  allows  our  clients  to  serve  more  concurrent  users  without  affecting  performance.  

The  first  step  before  making  any  changes  to  your  application  is  measuring  your  query  performance.  Percona  
uses  tools  like  pt-­‐query-­‐digest  to  identify  the  worst  performing  queries  and  quantify  the  results  of  a  change.    

Lack  of  High  Availability/Too  Much  Downtime  


A  MySQL  architecture  with  a  single  node  has  the  potential  to  significantly  impact  application  availability.  But  
even  a  multi-­‐node  MySQL  architecture  with  a  master  and  slaves  using  standard  asynchronous  replication  has  
risk  if  failover  is  manual.  Manually  failing  over  can  take  time  and  requires  precision  that  can  be  lost  when  
performed  during  a  disaster.  Downtime  of  production  databases  typically  results  in  a  major  impact  on  service  
quality,  a  poor  user  experience,  and  potential  lost  business.  

In  order  to  minimize  downtime,  organizations  can  pursue  a  high  availability  (HA)  strategy  that  incorporates  
automation.  Automating  failover  processes  potentially  reduces  failover  time  from  minutes  to  seconds  and  
minimizes  human  error.  

High  availability  consists  of  two  core  needs:  

 
Percona  
400  W.  Main  Street,  Suite  204  
Durham,  North  Carolina  USA  
www.percona.com  
©Percona.  All  Rights  Reserved  
 

• Data   redundancy,   which   can   be   accomplished   with   native   MySQL   replication,   solutions   such   as  
Percona  XtraDB  Cluster  (which  incorporates  Galera  technology),  lower  level  solutions  such  as  DRBD,  
and  other  storage-­‐based  software  and  technology.  
• Automated  failover,  which  can  use  solutions  based  on  virtual  IPs,  AWS  elastic  IPs,  or  load  balancing  
solutions  such  as  HAProxy  

While  it  is  imperative  to  minimize  downtime,  you  may  lack  the  resources  to  implement  a  proven  high  
availability  strategy.  It  is  best  to  avoid  an  over-­‐  engineered  solution.  It  is  preferable  to  opt  for  a  solution  you  
can  easily  troubleshoot  if  required.  Your  chosen  solution  should  be  tried  and  tested  so  you  can  be  confident  it  
will  work  when  you  need  it.  Among  other  characteristics,  your  strategy  should  include:  

• A  warmed  cache  on  a  standby  host  to  reduce  the  time  to  recover  from  a  failure  
• Suitable  backup  hosts  to  cope  with  the  load  within  acceptable  tolerances  after  a  failover  

There  are  many  high  availability  options  available.  The  right  solution  depends  on  your  application’s  unique  
needs.  For  example,  a  Percona  client  in  the  retail  industry  had  implemented  Percona  XtraDB  Cluster  with  
HAProxy  and  came  to  Percona  for  help  with  poor  availability.  We  analyzed  their  environment  and  found  that  
their  approach  was  not  a  match  for  the  workload  of  their  application.  In  their  case,  large  transactions  were  
affecting  the  availability  of  the  cluster.  We  helped  migrate  them  to  a  solution  based  on  native  MySQL  
replication,  MHA,  and  AWS  EIP  failover.  The  result  was  an  improvement  in  availability  and  a  simple  failover  
that  suited  their  application’s  workload.  

Finding  a  high  availability  solution  that  is  right  for  your  business  requires  in-­‐depth  knowledge  of  the  options.  
We  recommend  working  with  an  experienced  team  that  understands  the  choices,  how  to  assess  which  is  
suitable  for  your  situation,  and  can  provide  guidance  on  implementation  matters.  

Inadequate  Monitoring  and  Alerting  


Database  monitoring  is  an  essential  component  of  application  operations.  It  ensures  both  the  availability  and  
performance  of  the  service  are  maximized.  A  lack  of  monitoring  can  lead  to  incidents  that  take  longer  than  
necessary  to  diagnose  or  even  to  missed  incidents.  In  addition,  effective  monitoring  helps  deliver  an  optimal  
and  consistent  user  experience  by  proactively  alerting  the  business  to  potential  risks  and  incidents  before  
they  occur.  

Monitoring  MySQL  requires  a  solid  understanding  of  the  database’s  architecture  and  how  its  different  
components  interact  and  impact  system  resources  such  as  CPU,  memory,  and  disk  usage.  Monitoring  consists  
of  two  aspects:  

1. Availability  Monitoring  
Availability  monitoring  is  used  for  alerting.  Ideally,  these  monitors  should  be  limited  to  actionable  or  
potentially  service  affecting  checks  and  should  be  set  with  the  appropriate  tolerances.  This  safeguards  
against  important  alerts  being  lost  in  the  noise  and  helps  to  ensure  a  rapid  response  for  the  important  
problems  when  they  occur.  

 
Percona  
400  W.  Main  Street,  Suite  204  
Durham,  North  Carolina  USA  
www.percona.com  
©Percona.  All  Rights  Reserved  
 

2. Performance  Monitoring  
Performance  monitoring  solutions  are  usually  graphical  and  can  be  used  for  historical  analysis  to  aid  both  
troubleshooting  and  identifying  areas  for  improvement.  Performance  monitoring  frequently  includes  
monitoring  a  larger  number  of  performance  variables.  Additionally,  the  monitored  performance  variables  
don't  need  to  be  limited  to  those  that  affect  service.  It  is  useful  to  include  additional  monitors  that  may  
uncover  abnormalities.  

It  is  also  common  practice  to  use  alerts  to  monitor  disk  capacity  and  MySQL  service  status.  They  also  
commonly  capture  statistics  such  as  CPU  usage  and  the  number  of  queries  being  handled.  There  are  
countless  checks  that  can  be  performed.  Below  are  examples  of  some  important  checks  that  are  frequently  
overlooked.  

Replication  
Replication  is  easy  to  set  up  and  is  usually  the  first  solution  for  scaling  an  architecture.  The  native  MySQL  
replication  is  asynchronous  and  replication  events  from  the  master  are  applied  in  a  single  thread  on  the  slave.  
Therefore,  we  recommended  deploying  monitoring  checks  that  observe  the  MySQL  replication  status  and  lag.  

Since  native  MySQL  replication  is  asynchronous,  the  master  and  slave  are  not  guaranteed  to  have  identical  
data  sets.  Therefore,  the  data  can  diverge.  This  could  lead  to  inconsistent  result  sets  when  querying  the  
master  and  slave.  It  can  also  mean  backups  do  not  contain  a  valid  copy  of  the  data  because  backups  are  
usually  taken  on  a  slave.  To  check  for  replication  lag,  you  should  regularly  run  a  process  to  identify  data  drift  
and  actively  monitor  the  results  so  you  can  take  action  when  a  divergence  has  been  identified.  

LVM  Snapshot  Size  


Logical  Volume  Master  (LVM)  snapshots  are  used  by  some  organizations  to  take  a  physical  backup.  LVM  
snapshots  use  copy-­‐on-­‐write  functionality  where  a  copy  of  the  original  blocks  is  copied  from  the  logical  
volume  into  a  snapshot  when  they  are  changed.  In  our  experience,  most  organizations  that  take  this  
approach  monitor  the  disk  capacity  of  the  original  logical  volume.  However,  monitoring  the  capacity  of  the  
snapshot  while  it  is  active  is  much  less  common.  If  the  snapshot  fills  completely  before  a  backup  can  be  
taken,  the  backup  will  fail.  If  you  use  LVM  snapshots,  you  will  want  to  monitor  the  size  of  the  snapshot  so  you  
can  proactively  be  alerted  when  the  snapshot  is  reaching  capacity.  This  enables  you  to  increase  the  size  of  the  
snapshot  before  backups  begin  to  fail.  

Query  Response  Time  


Query  response  time  is  arguably  the  most  important  check  when  it  comes  to  database  monitoring.  It  provides  
an  excellent  overview  of  performance  and  can  be  a  key  indicator  when  trying  to  understand  the  impact  of  
incidents  on  application  users.  Unfortunately,  detail  can  be  lost  when  looking  solely  at  an  average  so  it  is  
advisable  to  look  at  a  range  of  response  times.  One  way  we  do  this  at  Percona  is  by  visualizing  the  response  
time  with  a  histogram.  This  way,  we  can  look  at  both  the  query  volume  and  time  accrued  for  defined  
response  time  thresholds.  Having  fine-­‐grained  insight  can  focus  your  efforts  to  help  ensure  a  more  consistent  
user  experience.  

 
Percona  
400  W.  Main  Street,  Suite  204  
Durham,  North  Carolina  USA  
www.percona.com  
©Percona.  All  Rights  Reserved  
 

There  are  many  tools  that  Percona  uses  to  enhance  the  monitoring  and  alerting  capacity  for  our  clients.  For  
example,  Percona  Monitoring  Plugins  are  high-­‐quality  components  that  add  enterprise-­‐grade  MySQL  
monitoring  and  graphing  to  your  existing  monitoring  solution.  They  currently  integrate  with  Nagios,  Cacti,  
and  Zabbix.  The  Percona  Monitoring  Plugins  are  open  source  and  free  to  download.  

If  you  lack  the  resources  to  adequately  monitor  your  MySQL  operations,  you  may  want  to  consider  using  a  
third-­‐party  whose  expertise  you  can  leverage.  For  example,  the  Percona  Remote  DBA  Service  includes  a  
monitoring  solution  for  both  availability  and  performance.  The  alerts  we  implement  are  regularly  reviewed  
and  refined  to  ensure  that  the  relevant  checks  are  active  and  that  the  appropriate  tolerances  are  set.    

Faulty  Security  Practices  


Security  will  always  be  a  hot  topic,  especially  in  the  wake  of  incidents  like  Heartbleed.  For  many  
organizations,  MySQL  is  the  primary  datastore,  and  the  company  DBA  must  always  be  vigilant  in  ensuring  its  
security.  The  security  of  your  business  as  well  as  the  personal  or  financial  business  of  your  clients  can  be  at  
risk.    

Adopting  a  security  culture  within  your  organization  is  necessary  to  reduce  the  possibility  of  a  breach.  Taking  
a  few  foundational  steps  early  in  your  MySQL  instance’s  life  and  maintaining  firm  security  practices  will  
harden  your  environment  against  potential  intruders.  

Passwords  are  a  great  place  to  start.  Two  basic  rules  of  thumb  for  a  password  policy  that  will  provide  a  hurdle  
to  potential  attackers  are:  

• Use  a  suitably  complex  password  which  avoids  dictionary  words  


• Cycle  passwords  regularly  
 
When  we  review  existing  client  installations,  we  often  see  users  without  passwords  or  accounts  open  to  the  
world  (user@%).  In  addition  to  fixing  any  password  or  open  account  issues,  we  recommend  that  applications  
and  engineers  not  use  the  same  accounts.  For  example,  engineers  should  perform  administration  tasks  using  
their  own  accounts  and  not  application  or  service  accounts.  Also,  users  should  only  have  the  appropriate  
privileges  for  their  function.  For  example,  the  user  that  connects  to  MySQL  to  read  content  for  your  blog  
posts  does  not  need  DROP  privileges.  Additionally,  the  SUPER  privilege  should  be  used  sparingly  and  never  
granted  to  an  application  or  service  account.  While  these  password  measures  add  some  administrative  
overhead,  they  are  good  practices  to  maintain.  

Remove  accounts  from  users  that  have  moved  on  from  your  organization.  Through  the  use  of  orchestration  
tools,  you  can  update  a  common  account’s  password  without  having  to  visit  multiple  servers.  Having  the  
perspective  that  ex-­‐employees  can  cause  permanent  damage  is  erring  on  the  side  of  caution  and  should  be  
handled  as  though  they  are  out  to  be  destructive.  

It  does  not  take  long  to  find  horror  stories  where  unsuspecting  organizations  were  targeted  by  exploiters  and  
lost  client  data.  Take  the  necessary  precautions  to  prevent  your  data  from  being  compromised.  A  common  

 
Percona  
400  W.  Main  Street,  Suite  204  
Durham,  North  Carolina  USA  
www.percona.com  
©Percona.  All  Rights  Reserved  
 

practice  at  Percona  is  to  regularly  audit  the  MySQL  deployments  of  our  clients  to  detect  holes  in  their  
security.  This  gives  both  us  and  our  clients  greater  confidence  in  the  ability  to  avoid  a  breach.  We  recommend  
that  organizations  adopt  this  approach  to  reduce  their  vulnerability  to  attack.  

Limited  Backup  and  Recovery  Plan  


System  failure  can  result  from  faulty  hardware,  application  bugs,  operational  mistakes,  or  malicious  attacks.  
Unless  you  strictly  adhere  to  a  MySQL  backup  strategy,  you  can  lose  part  or  all  of  your  data  and  potentially  
your  clients,  your  job,  and  your  business.  

An  inexperienced  or  very  busy  DBA  may  not  have  the  experience  or  bandwidth  to  put  a  robust  backup  and  
recovery  strategy  in  place.  Many  think  they  are  safe  just  running  daily  binary  backups  or  that  their  data  is  safe  
in  Amazon  Web  Services.  Others  feel  confident  because  they  have  slaves  or  delayed  slaves.  There  are  
problems  with  these  approaches:  

• Relying  on  one  type  of  backup  will  likely  be  insufficient  to  recover  the  application  database  in  the  
shortest  amount  of  time  
• Slaves  will  be  affected  the  same  way  as  the  master  in  cases  of  operational  mistakes  or  application  
bugs  
• Delayed  slaves  will  be  affected  if  you  discover  the  issue  beyond  the  delayed  time  
• These  approaches  may  help  in  some  failure  circumstances  but  they  are  far  from  a  complete  solution.  

A  robust  backup  and  recovery  strategy  should  include  monitors  for  the  backup  processes  to  ensure  they  work  
every  time  as  planned.  You  should  perform  binary,  logical,  and  binlog  backups  regularly  to  minimize  recovery  
time  no  matter  what  the  failure.  You  should  be  confident  that  the  recovery  processes  are  well  documented  
and  that  all  operations  staff  members  are  capable  of  performing  data  recovery  from  the  backups.  

These  are  the  measures  and  practices  that  Percona  takes  with  its  clients  to  ensure  their  data  is  reliably  
backed  up  and  can  be  recovered  in  a  timely  manner.  In  addition,  Percona  uses  tools  that  reduce  disk  space,  
lower  production  workload,  and  provide  full,  partial,  and  point-­‐in-­‐time  recovery  of  data.  

How  Percona  Can  Help  You  


Managing  your  organization’s  MySQL  operations  requires  in-­‐depth  knowledge  of  potential  issues  plus  
diligent,  dedicated  practice.  It  is  a  vital  competency  if  your  business  depends  on  MySQL.  Being  aware  of  the  
issues  above  will  help  protect  your  organization’s  MySQL-­‐based  applications.  It  will  also  significantly  enhance  
both  performance  and  scalability  to  deliver  a  better  user  experience.  

Keeping  abreast  of  best  practices  for  preventing  these  dangers  takes  time  and  expertise.  If  you  are  your  
organization’s  busy  MySQL  DBA  and  you  worry  about  the  level  of  attention  being  given  to  your  databases,  
Percona  can  help.  

For  the  over-­‐extended  DBA  who  requires  expert  help  to  optimize  and  provide  additional  support  for  MySQL,  
Percona  can  help.  Percona  Support  services  are  accessible  24x7  online  or  by  phone  to  ensure  that  your  

 
Percona  
400  W.  Main  Street,  Suite  204  
Durham,  North  Carolina  USA  
www.percona.com  
©Percona.  All  Rights  Reserved  
 

MySQL  installation  is  running  optimally.  We  can  also  provide  onsite  or  remote  Percona  Consulting  for  current  
or  planned  projects,  or  in  emergency  situations.  Every  engagement  is  unique  and  we  will  work  with  you  to  
create  the  most  effective  solution  for  your  business.  

If  you  don’t  have  the  resources  to  adequately  manage  your  MySQL  operations,  you  can  outsource  the  full  
operational  control  of  your  MySQL  servers  to  Percona.  Percona  is  a  leader  in  MySQL  managed  services,  
offering  both  outsourced  database  administration  services  and  backup  and  recovery  services.  We  can  
evaluate  your  current  MySQL  operations,  your  goals,  and  your  resources  to  collaboratively  implement  a  
solution  that  will:  

• Prevent  common  operational  problems  


• Keep  your  MySQL  performing  optimally  
• Lower  your  database  operations  and  administration  costs    

Organizations  that  rely  on  Percona  Managed  Services  benefit  from  our  deep  operational  expertise,  24x7x365  
coverage  from  our  worldwide  team  of  experts,  SLA  commitments,  and  unlimited  incidents  without  additional  
hourly  fees.  

Percona  Managed  Services  can  support  your  existing  database  infrastructure  whether  it  is  hosted  on  premise  
or  at  a  colocation  facility  or  if  you  purchase  services  from  a  cloud  provider  or  database-­‐as-­‐a-­‐service  provider.  

To  learn  about  any  of  our  services,  please  contact  us  at  (208)  473-­‐2904  or  +44  (203)  6086727  in  Europe  or  
email  us  at  sales@percona.com.  

About  Percona  
Percona  is  the  only  company  that  delivers  enterprise-­‐class  software,  support,  consulting  and  managed  
services  solutions  for  both  MySQL  and  MongoDB®  across  traditional  and  cloud-­‐based  platforms  that  
maximize  application  performance  while  streamlining  database  efficiencies.  Our  global  24x7x365  consulting  
team  has  worked  with  over  3,000  clients  worldwide,  including  the  largest  companies  on  the  Internet,  who  
use  MySQL,  Percona  Server,  Amazon®  RDS  for  MySQL,  MariaDB®  and  MongoDB.  

Percona  consultants  have  decades  of  experience  solving  complex  database  and  data  performance  issues  and  
design  challenges.  We  consult  on  the  full  LAMP  stack,  from  hardware  to  operating  systems  and  right  up  
through  the  database  and  web  tiers.  Because  we  are  both  broadly  and  deeply  experienced,  we  can  help  build  
complete  solutions.  Our  consultants  work  both  remotely  and  on  site.  We  can  also  provide  full-­‐time  or  part-­‐
time  interim  staff  to  cover  employee  absences  or  provide  extra  help  on  big  projects.  

Percona  was  founded  in  August  2006  by  Peter  Zaitsev  and  Vadim  Tkachenko  and  now  employs  a  global  
network  of  experts.  Our  customer  list  is  large  and  diverse  and  we  have  one  of  the  highest  renewal  rates  in  the  
business.  Our  expertise  is  visible  in  our  widely  read  Percona  Data  Performance  blog  and  our  book  High  
Performance  MySQL.  Visit  Percona  at  https://www.percona.com/.  

 
Percona  
400  W.  Main  Street,  Suite  204  
Durham,  North  Carolina  USA  
www.percona.com  
©Percona.  All  Rights  Reserved  

You might also like