You are on page 1of 55

Redundant Storage Cluster

For When It's Just Too Big

Bob Burgess
radian6 Technologies

MySQL User Conference 2009


Scope
 Fundamentals of MySQL Proxy
 Fundamentals of LuaSQL
 Description of the
Redundant Storage Cluster
◦ Architecture
◦ Sample code
Scope – NOT
 Complete Course on MySQL Proxy
 Lua Programming Course
 Amazing Lua Tricks
 Complete code listing of cluster
(Posted online)
The Problem
Growth!
Blog Posts – millions per day
12

10

0
Current Solution
month table

myisam merge month table


table
month table

month table

...
 Data over NFS can be unreliable
 Separate copy for each DB that could
use it (Master / Replicas)
Goals
 Single place for all content
 Redundancy without complete
duplication
 Add storage by adding nodes
 Survive a server failure
Existing Products
 HiveDB
◦ Java/Hibernate-based
◦ No redundancy
 Spock Proxy
◦ No redundancy
 MySQL Cluster
◦ All indexes in RAM
Cluster
Master DB Other
Other DBs
OtherDBs
DBs
content content
(federated) (federated)

Load Balancer

MySQL MySQL MySQL


Proxy Lua Proxy Lua Proxy Lua

content 1 content 2 content n


dir dir ... n
dir
1 2
Component Walk-Through
 Federated Engine
 Load Balancer
 MySQL Proxy
 Lua
 LuaSQL
Federated Engine
 Just a pointer to another table
create table sample (
id int primary key,
value varchar(100) )
engine=federated
connection=
'mysql://user:password@host:9999/schema/sample'
Federated Engine
create table sample (
id int primary key,
val int,
name varchar(50) )
select id,name from sample;
SELECT `id`, `val`, `name` FROM `sample`;
select id,name from sample where id=2;
SELECT `id`, `val`, `name`
FROM sample WHERE `id` = '2';
select val,name from sample where val=2;
SELECT `id`, `val`, `name` FROM `sample`;
select * from sample limit 10;
SELECT `id`, `val`, `name` FROM `sample`;
Federated Engine
select max(id)from sample;
SELECT `id`, `val`, `name` FROM `sample`;
select count(*) from sample;
SELECT `id`, `val`, `name` FROM `sample`;
insert into sample (id,name) values (5,'bob');
INSERT INTO `sample` (`id`, `val`, `name`)
VALUES ('5', NULL, 'bob');
insert into sample values (5,10,'bob');
INSERT INTO `sample` (`id`, `val`, `name`)
VALUES ('5', '10', 'bob');
Load Balancer Options
 MySQL Proxy Load Balancer
◦ For read balancing only
◦ For traditional master/slave architecture
 Custom load balancer
– MySQL Proxy & Lua script
◦ Still an option, depending on future
architecture
Load Balancer Options
 Linux Networking – NAT
◦ LB stays involved in session
◦ Limited Scalability
◦ Slower option
 Linux Networking – Direct Routing
◦ LB hands off session
◦ Node answers client directly
◦ Much more scalable
MySQL Proxy
 Communicates with a MySQL client
using MySQL Network Protocol
 Provides an API to a script
environment
◦ Pass query from client
◦ Return result set to the client
 Uses Lua scripting language
MySQL Proxy – Supplied
Constants and Methods
 global
proxy.global.connect_retry=3
proxy.global.conRemote={}
proxy.global.MySQLenv=luasql.mysql()
proxy.global.conRemote[tonumber(details.nodeId)] =
proxy.global.MySQLenv:connect(...)

 queries – list of queries going to the


server
proxy.queries:reset()
MySQL Proxy – Supplied
Constants and Methods
 response – response from server to
client
proxy.response.type = proxy.MYSQLD_PACKET_OK
proxy.response.resultset = {...}

 Constants
MYSQL_PACKET_OK
MYSQL_PACKET_ERR
MYSQL_TYPE_LONG
PROXY_SEND_RESULT
MySQL Proxy
MySQL 3306 MySQL
client server

MySQL 4040 MySQL 3306 MySQL


client Proxy server

MySQL
Proxy
MySQL 4040 3306 MySQL
client Lua
server
LuaSQL
MySQL Proxy
MySQL MySQL
client server

MySQL Proxy

Lua

read_query( )
read_query_result( )
MySQL Proxy
MySQL MySQL
client server

MySQL Proxy

Lua

LuaSQL calls
LuaSQL
 Connect directly to databases from
Lua scripts

environment

:connect cursor :fetch return set


:execute
(select)
connection
:execute
(ins/upd/del)
return code
& error msg
LuaSQL Connection Timeouts
 Connection drops after sleeping 10s
call the execute method
if cursor object is nil:
call connect method for the environment
call execute again
LuaSQL Connection Timeouts
function select_db (node,sql)
local reconnects = 0
local cur
local err
repeat
cur, err = proxy.global.conRemote[tonumber(node)]:execute(sql)
if cur == nil then
print ('Reconnecting select_db. Error='..err)
reopen_remote_db (node)
end
reconnects = reconnects+1
until (cur ~= nil or reconnects==proxy.global.connect_retry)
if cur==nil then
error("Could not reconnect / No result set in select_db",2)
else
return cur:fetch()
end
end
LuaSQL Connection Timeouts
function execute_db (node,sql)
local reconnects = 0
local LOST_CONNECTION = "MySQL server has gone away"
local rc
local err
repeat
rc, err = proxy.global.conRemote[tonumber(node)]:execute(sql)
if rc ~= nil then print ("execute_db RC="..rc); end
if err~=nil then print ("execute_db error="..err); end
if rc == nil and err:find(LOST_CONNECTION) then
print ('Reconnecting execute_db. Error='..err)
reopen_remote_db (node)
end
reconnects = reconnects+1
until (rc ~= nil or reconnects==proxy.global.connect_retry or not err:find(LOST_CONNECTION) )
if rc == nil and err:find(LOST_CONNECTION) then
error("Could not reconnect in execute_db.",2)
else
return rc,err
end
end
System Info Script
 Keeps cluster up to date on disk
usage
 Update Node table on all nodes with
disk size & free of this node
Accepting Queries from Client
 Query comes in to Proxy,
read_query is called
 The query appears in read_query's
parameter
 First byte of variable indicates the type
of query:
 proxy.COM_QUERY (Query)
proxy.COM_PROCESS_INFO (Process List )
proxy.COM_CONNECT (Connect)
proxy.COM_PROCESS_KILL (Kill)
Returning a result set
 Two response types
◦ "OK"
◦ Error
 Set properties of the "response" object
in the "proxy" environment
Returning a result set: Error
 Array:
◦ type
◦ errmsg
◦ errcode
◦ sqlstate
proxy.response={
type= proxy.MYSQLD_PACKET_ERR,
errmsg= "Malformed INSERT statement.",
errcode= 1064,
sqlstate= "42000"
}
return proxy.PROXY_SEND_RESULT
Returning a result set: OK
 Empty Result Set
 Array:
◦ type
proxy.response.type=proxy.MYSQLD_PACKET_OK
return proxy.PROXY_SEND_RESULT
Returning a result set: OK
 Full result set proxy

 Array: resultset (etc.)


◦ type (value)
◦ resultset (table) fields rows

 fields (table)
value value
 rows (table) type name

value value value value


Returning a result set: OK
fields rows

value value
type name

value value value value

proxy.response.type = proxy.MYSQLD_PACKET_OK
proxy.response.resultset = {
fields = {
{type=proxy.MYSQL_TYPE_LONGLONG, name="blogPostID" },
{type=proxy.MYSQL_TYPE_LONG, name="partitionKey"},
{type=proxy.MYSQL_TYPE_VAR_STRING, name="rawContent" } },
rows = { { tonumber(itemId),
tonumber(partKey),
contentValue } }
}
return proxy.PROXY_SEND_RESULT
Cluster Operation Overview
Cluster: Librarian
 Talks to the client
 Accepts items to store
 Retrieves items / gives them to client
 The single Lua script that runs under
Proxy
Cluster: Librarian
 Directory
◦ itemId
◦ nodeId
◦ partitionKey
 Event table
◦ serial no.
◦ event type
◦ event data
Cluster: Librarian
 Content_partitionKey
◦ itemId
◦ item – compressed into largeblob
 Node table
◦ nodeId
◦ connection / authentication info
◦ system status (disk)
◦ capacity factor
Cluster: Librarian
no
insert
yes
bad
syntax error
good
yes
exists error
no
Error: table
doesn't exist
store on this node create table store on this node

update ALL
directories

store Event

return OK to client
Cluster: Librarian
 Returning an error
proxy.response={
type= proxy.MYSQLD_PACKET_ERR,
errmsg= "Duplicate entry '"..itemId.."' for key 1",
errcode= 1062,
sqlstate= "23000"}
return proxy.PROXY_SEND_RESULT

Content Tables
◦MyISAM (concurrent_insert=2)
◦One table per partition key
◦For us: 300 MB / hour
Cluster: Librarian
no
select
yes
bad
syntax error
good
yes return empty result
1=0
set to client
no
yes calculate
info return result set
max(ID) and to client
table
count(*)
no
remote read doesn't exist
get item from return empty result
remote db dir set to client
local

get item from


this db

return result set to


client
Cluster: Librarian
 Federated
select max(id)from sample;
SELECT `id`, `val`, `name` FROM `sample`;

 Client
create table _content_info (
_max_id bigint unsigned,
_count bigint unsigned)
engine=federated
connection=(...schema/_content_info...);

 Target
create view _content_info as
select max(id) _max_id, count(*) _count
from _content;
Cluster: Librarian
 show table status like `_content`;
Auto_increment 0

 cluster node add


cluster node status
cluster node offline
Cluster: Rebalancer
 Enforces redundancy policy
◦ for new inserts
◦ to heal from node loss
 Rebalance
◦ node added
◦ equal disk usage
Cluster: Rebalancer
Policy enforce
for new items

get earliest new item no new


items

find another node with


the most free space

copy the item there

update all directories

next step
Cluster: Rebalancer
Policy enforce
for other items

get one item from the Directory which no items


exists on an insufficient number of nodes
and has "me" as the first listed node

copy the item to the node with the most


free space that doesn't already have it

update all directories

next step
Cluster: Rebalancer
Rebalancer

do I have the most no


free space of all the
nodes
yes

find the node with the


least free space

find an item on that node


that's not on this node (that
obeys the redundancy policy)

move the item from that


node to me

update all directories

done
Member Add
 Add new node to Node table
 Get a copy of the directory from all
nodes (a piece from each)
Member Remove
 Set free-space margin to 0
 Force "free disk space" for this node
to 0
 Wait for Rebalance to copy everything
off
 Restore the free-space margin and
update the Node list
Member Fail
 Update node list
 Remove all directory entries for this
node
Information Age-Out
 Drop tables for obsolete partition keys
 Remove directory entries for those
partitions
Table Optimizer
 information_schema.tables
compare data_free to data_length
 "Lock" that partition on that node
(row in PartitionLock table)
 Run optimize table but abandon if
any node fails
 Unlock table
Backup
 "Write Lock" one partition across all
nodes
 Copy that partition table for all nodes
 Copy directory entries for that node
 Unlock
Development Directions
 Bulletproof error handling
 Performance tuning
Alternative Architectures
 Load Balancer choices
 Move SQL parsing to a complex
proxy-based load balancer,
communicate with nodes on network
sockets
 Librarian in Perl, Java, C
 Alternative databases
Resources
 MySQL Documentation
 http://forge.mysql.com/wiki/MySQL_Proxy
 http://forge.mysql.com/tools/search.php?t=tag&k=mysqlproxy
 http://www.lua.org
 http://www.keplerproject.org/luasql
 http://lua-users.org
 http://jan.kneschke.de/projects/mysql/mysql-proxy
 http://www.linuxvirtualserver.org
Thank you!

Bob Burgess
bob.burgess@radian6.com

www.radian6.com/mysql

You might also like