mixi.

jp
scaling out with open source
Batara Kesuma mixi, Inc. bkesuma@mixi.co.jp

Introduction
•Batara Kesuma •CTO of mixi, Inc.

What is mixi?
•Social networking service
album, etc. • Invitation only

• Diary, community, message, review, photo

•Largest and fastest growing
SNS in Japan

Friends

Latest information - Friends new diary - Comments history - Communities topics - Friends new reviews - Friends new albums

My latest diaries and reviews

Community listing

User Testimonials

History of mixi
•Development started in
December 2003
• Only 1 engineer (me) • 4 months of coding

•Opened on February 2004

Two months later
•10,000 users •600,000 PV/day

The “Oh crap!” factor
•This model works •But how do we scale out?

The first year
•The online population of mixi
grew significantly •600 users to 210,000 users

The second year
•210,000 users to 2 million users

And now?

More than 3.7 million users 15,000 new users/day
Population of Japan is: 127 million Internet users: 86.7 million
Source CIA Factbook

70% of active users
(last login less than 72 hours)

Average user spends 3 hours 20 minutes on mixi per week

Ranked 35th on Alexa worldwide, and 3rd in Japan

PV growth in 2 years
Google Japan mixi

Amazon Japan

Users growth in 2 years
3,500,000

Users
2,625,000

1,750,000

875,000

0

04/03

05/03

06/03

Our technology solutions

The technology behind
•Linux 2.6 •Apache 2.0 •MySQL •Perl 5.8 •memcached •Squid

mod_proxy

R

QU E

ST E

RE

QU

EST

images

mod_perl

memcached HOT OBJECTS

diary cluster

message cluster

other cluster
Powered by

MySQL
•More than 100 MySQL servers •Add more than 10 servers/month •Non-persistent connection •Mostly InnoDB •Heavily rely on the use of DB
partitioning (our own solution)

DB replication
•MySQL server load gets heavy •Add more slaves
DB
RI W T E)
Replicate REQUEST

Y( R UE Q

mod_perl

QUERY (READ)

DB

DB replication
•Classic problem with DB
replication
SLAVES MASTER
50 writes/s 50 writes/s 50 reads/s

SLAVES
50 writes/s
25 reads/s

50 writes/s

MASTER
50 writes/s

25 reads/s

50 writes/s
25 reads/s

100 reads/s

50 writes/s 50 reads/s

100 reads/s

50 writes/s
25 reads/s

Some statistics
•Diary
• Read

related tables related tables

85% • Write 15%

•Message
• Read

75% • Write 25%

DB partitioning
•Replication couldn’t keep up
anymore •Try to split the DB

How to split?
user A user B user C

message tables diary tables other tables

DB

Splitting vertically by users or splitting horizontally by table types

Vertical partition
user A user B
message tables diary tables

user C

other tables

DB

DB 1

DB 2

Vertical partition
•Too many tables to deal with at
one time •The transition in splitting gets complex and difficult

Horizontal partition
$dbh = $db->load_dbh(); $dbh = $db->load_dbh(type => “message”);

message tables diary tables other tables

message tables

NEW DB
$dbh = $db->load_dbh(type => “diary”);

diary tables

NEW DB
Also called level 1 partitioning within mixi

OLD DB

Partition map for level 1
•Small and static •Just put it in configuration file •For example:
$DB_DIARY = ‘DBI:mysql:host=db1;database=diary’; $DB_MESSAGE = ‘DBI:mysql:host=db2;database=message’; ...

Easy transition
mod_perl
1

Writes to both DBs

RI W

RI TE

TE

W

AD

RE

AD

RE
Shifts reads

3

OLD DB
2

SELECT INSERT IGNORE
Copies in background

NEW DB

Problems with level 1
•Cannot use JOIN anymore
using FEDERATED TABLEs • If table is small, just duplicate it

• Use FEDERATED TABLE from MySQL 5 • Or do SELECT twice which is faster than

Next step
•When the new DB gets
overloaded •We split the DB, yet again •Get ready for level 2

Partitioning key
•user id, message id •Choose wisely!
user A user B

message tables

or

message tables

user id

message id

Level 2 partition
user A user B user C user D

message tables

LEVEL 1 DB
message tables message DB NEW tables

NODE 1

NODE 2

Partition map for level 2
•Big and dynamic •Cannot put it all in configuration
file

Partition map for level 2
•Manager based
mapping

• Use another DB to do the partition • Partition map is counted inside

•Algorithm based

application • node_id = member_id % TOTAL_NODE

Manager based
MANAGER DB
node_id=2
2

message tables

NODE 1
Returns node_id

message tables

1

user_id=14

Asks for node_id

NODE 2
mod_perl
3

message tables
Connects to node

NODE 3

Algorithm based
1

Computes node_id

message tables
number of nodes = 3

node_id=(user_id%3)+1 node_id=3

NODE 1
message tables

mod_perl

NODE 2
2

Connects to node

message tables

NODE 3

Manager based
•Pros:
• Easy to manage • Add a new node, move data between
nodes

•Cons:

• This process increases by 1 query for

partition map • It needs to send a request to the manager

Algorithm based
•Pros:
• Application servers can compute node id
by themselves • Bypass the connection to the manager

•Cons:

• Difficult to manage • Adding new nodes is tricky

Adding nodes is tricky
old_node_id=(member_id%2)+1
number of nodes = 2
3

Copies in background

new_node_id=(member_id%4)+1 1 Adds a new application logic

number of nodes = 4

NODE 1 NODE 2

COPY

mod_perl

WRITE READ
WR ITE
2

Writes to both DBs if node_id is different

+

NODE 3
RE
4

COPY

Shifts reads

AD

NODE 4

Problems with level 2
member tables

• • • • •

NODE 1
member tables

Too many connections to different DBs Fortunately, on mixi, the majority are small data sets Cache them all by using distributed memory caching We rarely hit the DB Average page load time is about 0.02 sec*

NODE 2
member tables

NODE 3
community tables

NODE 1
community tables

NODE 2

* depending on data sets average load time may vary

Caching
•memcached •Install server on mod_perl
• Also used in LiveJournal, Slashdot, etc

machine •39 machines x 2 GB memory

Summary of DB partitioning
•Level

1 partition (split by table types) •Level 2 partition (split by partitioning key)
• Manager

based • Algorithm based

Summary of DB partitioning
1

Split by table types

user A

user B

user C

message tables diary tables
Split by partitioning key
2

message tables LEVEL 1

other tables

OLD DB

message tables

message tables

LEVEL 2

LEVEL 2

Image Servers

Statistics
•Total size is more than 8 TB of
storage •Growth rate is about 23 GB / day •We use MySQL to store metadata only

Two types of images
•Frequently accessed images
(about a few million files) • For example, user profile photos, community logos

• Number of image files is relatively small

•Rarely accessed images

• About hundred millions of image files • Diary photos, album photos, etc

Frequently accessed images

•Few hundred GBs of files •Distribute via the use of FTP and
Squid •Third party Content Delivery Network

Frequently accessed images
Squid
sto1.mixi.jp
2

CDN
Pull images from storage

sto2.mixi.jp

mod_perl UPLOAD
1

Storage

Uploads to storage

Rarely accessed images
•Few TBs of files •Newer files get accessed more
often •Cache hit ratio is very bad •Distribute directly from storage

Uploading rarely accessed images
MANAGER DB

Storage
2

Arranges a pair of area_id

sto1.mixi.jp

area_id =1,2
1

Storage
sto2.mixi.jp

Assigns a id for an image file
PL O AD

abc.gif

Storage
sto3.mixi.jp
3

mod_perl

U

AD LO P

U

Uploads image to storage

Storage
sto4.mixi.jp

Viewing rarely accessed images
User
1 7

Asks for abc.gif
8

Storage
sto1.mixi.jp

Returns abc.gif

Asks for view_diary.pl

6

Returns view_diary.pl and URL for abc.gif
5

Storage
sto2.mixi.jp

2

Detects abc.gif in view_diary.pl

mod_perl
4

Creates image URL

Storage
sto3.mixi.jp

3

abc.gif Asks for area_id

area_id =1 Returns area_id

MANAGER DB

Storage
sto4.mixi.jp

To do

•Try MySQL Cluster •Try to implement better
algorithm

•Level 3 partitioning?

• Consistent hashing? • Linear hashing?

• Split again by timestamp?

Questions?

Thank you
•Further questions to
bkesuma@mixi.co.jp •We are hiring :) •Have a nice day!