You are on page 1of 31

2010 Calpont Corporation Confidential & Proprietary

1
Making MySQL
Great for Business
Intelligence

Robin Schumacher
VP Products
Calpont
2010 Calpont Corporation Confidential & Proprietary
2
Agenda
Quick overview of BI
Looking at the right technology foundation
General physical MySQL design decisions that
impact success
A look at row vs. column MySQL databases
Conclusions
2010 Calpont Corporation Confidential & Proprietary
3
A Quick Overview of Business
Intelligence
2010 Calpont Corporation Confidential & Proprietary
4
What is Business Intelligence?
Business Intelligence (BI) refers to skills, processes, technologies,
applications and practices used to support decision making.

BI technologies provide historical, current, and predictive views of business
operations. Common functions of Business Intelligence technologies are
reporting, online analytical processing, analytics, data mining, business
performance management, benchmarking, text mining, and predictive
analytics.
2010 Calpont Corporation Confidential & Proprietary
5
Why Business Intelligence?
All companies now recognize the need for BI
Information is a weapon that both large and small
companies use to better understand their
customer, competitors, and marketplace
Making poorly informed decisions can be
disastrous
2010 Calpont Corporation Confidential & Proprietary
6
Overview of Most BI Frameworks
OLTP
Files/XML
Log Files
Operational
Source Data
S
t
a
g
i
n
g


o
r

O
D
S

E
T
L

F
i
n
a
l


E
T
L

R
e
p
o
r
t
i
n
g
,

B
I
,

N
o
t
i
f
i
c
a
t
i
o
n

L
a
y
e
r

Ad-Hoc
Dashboards
Reports
Notifications
Users
Staging
Area
Data
Warehouse
Warehouse
Archive
Purge/Archive
Data Warehouse and Metadata Management
2010 Calpont Corporation Confidential & Proprietary
7
Simple Reporting Databases
OLTP Database Read Shard One Reporting Database
Application Servers
End Users
ETL
Data Archiving Link
Replication
2010 Calpont Corporation Confidential & Proprietary
8
Building the Right Technical
Foundation
2010 Calpont Corporation Confidential & Proprietary
9
What is the Key Component for Success?
In other words, what you do with your MySQL Server
in terms of physical design, schema design, and
performance design will be the biggest factor on
whether a BI system hits the mark
* Philip Russom, Next Generation Data Warehouse Platforms, TDWI, 2009.
*
2010 Calpont Corporation Confidential & Proprietary
10
What Technology Decisions are Being Made?
* Philip Russom, Next Generation Data Warehouse Platforms, TDWI, 2009.
*
2010 Calpont Corporation Confidential & Proprietary
11
What General MySQL Design
Decisions Help Success?
2010 Calpont Corporation Confidential & Proprietary
12
First Get/Use a Modeling Tool
2010 Calpont Corporation Confidential & Proprietary
13
Horizontal Partitioning Model
2010 Calpont Corporation Confidential & Proprietary
14
Read Sharding / Horizontal Partitioning
2010 Calpont Corporation Confidential & Proprietary
15
Vertical Partitioning Model
2010 Calpont Corporation Confidential & Proprietary
16
General List of Top BI Design Decisions
Storage Engine
Selection
Physical Table/Index
Partitioning
Indexing Creation and
Placement
Set proper amounts for
memory caches, etc.
Row vs. Column Engine
/ Database
2010 Calpont Corporation Confidential & Proprietary
17
No practical storage limits (1 tablespace=110TB)
Automatic storage management
ANSI-SQL support for all datatypes (including BLOB and XML)
Data/Index partitioning (range, hash, key, list, composite)
Built-in Replication
Main memory tables (for dimension tables)
Variety of indexes (b-tree, fulltext, clustered, hash, GIS)
Multiple-configurable data/index caches
Pre-loading of index data into index caches
Unique query cache (caches result set + query; not just data)
Parallel data load (5.1 and higher multiple files)
Multi-insert DML
Data compression (depends on engine)
Read-only tables
Fast connection pooling
Cost-based optimizer
Wide platform support
Core BI Features for MySQL
2010 Calpont Corporation Confidential & Proprietary
18
MyISAM
Archive
Memory
CSV
High-speed query/insert engine
Non-transactional, table locking
Good for data marts, small
warehouses
Compresses data by up to 80%
Fastest for data loads
Only allows inserts/selects
Good for seldom accessed data
Main memory tables
Good for small dimension tables
B-tree and hash indexes
Comma separated values
Allows both flat file access and
editing as well as SQL query/DML
Allows instantaneous data loads
Also:Merge for pre-5.1 partitioning
Storage Engines Internal to MySQL
2010 Calpont Corporation Confidential & Proprietary
Partitioning and Performance (5.1+)
mysql> CREATE TABLE part_tab
-> ( c1 int ,c2 varchar(30) ,c3 date )
-> PARTITION BY RANGE (year(c3)) (PARTITION p0 VALUES LESS THAN (1995),
-> PARTITION p1 VALUES LESS THAN (1996) , PARTITION p2 VALUES LESS THAN (1997) ,
-> PARTITION p3 VALUES LESS THAN (1998) , PARTITION p4 VALUES LESS THAN (1999) ,
-> PARTITION p5 VALUES LESS THAN (2000) , PARTITION p6 VALUES LESS THAN (2001) ,
-> PARTITION p7 VALUES LESS THAN (2002) , PARTITION p8 VALUES LESS THAN (2003) ,
-> PARTITION p9 VALUES LESS THAN (2004) , PARTITION p10 VALUES LESS THAN (2010),
-> PARTITION p11 VALUES LESS THAN MAXVALUE );
mysql> create table no_part_tab (c1 int,c2 varchar(30),c3 date);
*** Load 8 million rows of data into each table ***
mysql> select count(*) from no_part_tab where c3 > date '1995-01-01' and c3 < date '1995-12-
31';
+----------+
| count(*) |
+----------+
| 795181 |
+----------+
1 row in set (38.30 sec)
mysql> select count(*) from part_tab where c3 > date '1995-01-01' and c3 < date '1995-12-
31';
+----------+
| count(*) |
+----------+
| 795181 |
+----------+
1 row in set (3.88 sec)
90% Response Time Reduction
2010 Calpont Corporation Confidential & Proprietary
20
Index Creation and Placement
If query patterns are known and predictable, and data is
relatively static, then indexing isnt that difficult
If the situation is a very ad-hoc environment, indexing
becomes more difficult. Must analyze SQL traffic and index the
best you can
Over-indexing a table that is frequently loaded / refreshed /
updated can severely impact load and DML performance. Test
dropping and re-creating indexes vs. doing in-place loads and
DML. Realize, though, any queries will be impacted from
dropped indexes
Index maintenance (rebuilds, etc.) can cause issues in MySQL
(locking, etc.)
Remember some storage engines dont support normal
indexes (Archive, CSV)
2010 Calpont Corporation Confidential & Proprietary
21
Row vs. Column Engines /
Databases
2010 Calpont Corporation Confidential & Proprietary
22
Column vs. Row Orientation
A column-oriented architecture looks the same on the surface, but stores
data differently than legacy/row-based databases
2010 Calpont Corporation Confidential & Proprietary
23
Column databases only read the columns needed to satisfy a query vs. full
rows
If you are only selecting a subset of columns from a table and / or are using
very wide tables, column DBs are a great choice for BI
Column databases (most of them) remove the need for indexing because
the column is the index
Column databases automatically eliminate unnecessary I/O both logically
and physically, so they do away with partitioning needs too as well as
materialized views, etc.
As a rule of thumb, column databases provide 5-10x (or more) the query
performance of legacy RDBMSs
Why a Column Database?
2010 Calpont Corporation Confidential & Proprietary
24
Why a Column Database?
"If you're bringing back all the columns, a column-store database
isn't going to perform any better than a row-store DBMS, but
analytic applications are typically looking at all rows and only a
few columns. When you put that type of application on a column-
store DBMS, it outperforms anything that doesn't take a
column-store approach."
- Donald Feinberg, Gartner Group
2010 Calpont Corporation Confidential & Proprietary
25
If you routinely have SELECT * queries or queries that request the majority
of columns in a table
If you constantly are doing lots of singleton inserts and deletes. As these
are row-based operations they will normally run somewhat slower on a
column DB than a row-oriented DB (more block touches are needed).
Updates tend to run OK as they are a column operation
If you want to do pure OLTP work. Some column DBs are transactional (so
data integrity is ensured), but they are not suited for straight OLTP work
If you have a small database: such a DB eclipses the benefit column
databases offer over row DBs
Why Not a Column Database?
2010 Calpont Corporation Confidential & Proprietary
26
What is Calponts InfiniDB?
InfiniDB is an open source, column-oriented database architected to handle
data warehouses, data marts, analytic/BI systems, and other read-intensive
applications. It delivers true scale up (more CPUs/cores, RAM) and massive
parallel processing (MPP) scale out capabilities for MySQL users. Linear
performance gains are achieved when adding either more capabilities to one
box or using commodity machines in a scale out configuration.
Scale up Scale Out
2010 Calpont Corporation Confidential & Proprietary
27
InfiniDB vs. a Leading Row RDBMS
2 TBs of raw data; 16 CPU 16GB RAM 14 SAS 15K RPM RAID-0 512MB Cache
2010 Calpont Corporation Confidential & Proprietary
28
Perconas Test of Column Databases
610 GB of raw data; 8 Core Machine
http://www.mysqlperformanceblog.com/2010/01/07/star-schema-bechmark-infobright-infinidb-and-luciddb/
2010 Calpont Corporation Confidential & Proprietary
29
Calpont Solutions
Calpont Analytic Database Server Editions
Calpont Analytic Database Solutions
InfiniDB
Community Server
Column-Oriented
Multi-threaded
Terabyte Capable
Single Server
InfiniDB
Enterprise Server
Scale out /
Parallel Processing
Automatic
Failover
InfiniDB
Enterprise Solution
Monitoring
24x7
Support
Auto Patch
Management
Alerts & SNMP
Notifications
Hot Fix
Builds
Consultative
Help
2010 Calpont Corporation Confidential & Proprietary
30
InfiniDB Community & Enterprise Server Comparison
Core Database Server Features InfiniDB
Community
InfiniDB
Enterprise
MySQL front end Yes Yes
Column-oriented Yes Yes
Logical data compression Yes Yes
High-Speed bulk loader w/ no blocking queries while loading Yes Yes
Crash-recovery Yes Yes
Transaction support (ACID compliant) Yes Yes
INSERT/UPDATE/DELETE (DML) support Yes Yes
Multi-threaded engine (queries/writes will use all CPUs/cores on box) Yes Yes
No indexing necessary Yes Yes
Automatic vertical (column) and logical horizontal partitioning of data Yes Yes
MVCC support snapshot read (readers dont block writers) Yes Yes
Alter Table with online add column capability Yes Yes
High concurrency supported Yes Yes
Terabyte database capable Yes Yes
Multi-Node, MPP scale out capable w/ failover No Yes
Support Forums Only Formal
Production
Support
2010 Calpont Corporation Confidential & Proprietary
31
For More Information
Download InfiniDB Community Edition
Download InfiniDB documentation
Read InfiniDB technical white papers
Read InfiniDB intro articles on MySQL dev zone
Visit InfiniDB online forums
Trial the InfiniDB Enterprise Edition: http://www.calpont.com
www.infinidb.org
www.calpont.com

You might also like