Architectur
e
Presented by:
VIJET HEGDE
1st Sem, Mtech(S.E.)
M.S.R.I.T.
Agenda
• Introduction
• Platform
• Statistics
• Architecture
– Web servers
– Video serving
– Serving Thumbnails
– Databases
• References
Introduction
• What is Youtube?
• Largest video site.
• It is offering video-hosting for free.
• Gradually replacing Google video.
Platforms
• Apache
• Python
• Linux (SuSe)
• MySQL
• psyco, a dynamic python->C
compiler
• lighttpd for video instead of Apache
Statistics
• Founded 2/2005
• 3/2006 30 million video views/day
• Reached 1 billion views/day.
• #4 Largest Site on the Internet
• #1 Largest video site on the web
• In August 2008, youtube became the
#2 search engines over Yahoo on
Web.
Statistics Contd..,
• Videos uploaded per day: over
150,000
• Total videos uploaded as of March
17th 2008: 78.3 Million
• 15 Hours of video uploaded every
minute.
Serving Web Servers
Web Servers
• Linux is used.
• Generally scaled by adding some
machines.
• Less than 100ms to serve a page.
• Psycho- c-compiler of python.
• Pre- generated html is used.
Serving Video
• Main issues: Cost of bandwidth,
hadware and power consumption.
• Each video hosted by mini-cluster.
• Started with Apache.
• Apache-> lighttpd.
• Single process to multiprocess.
Contd..,
Serving Thumbnails
• Surprisingly difficult to handle
• Large number
• High no. of request/sec
• Apache performed badly on high load.
• Squid is used.
• But performance degraded as load
increased.
• So lighttpd is used.
contd..,
Contd..,
Solution to the problem
• Then also problem continued.
• To create new machine it took 24hrs.
• To reboot 6-10 hrs.
• Used google’s Big Table.
• Images are replicated to different
data centers using BigTable.
• Avoids small file problem.
• Fast, fault tolerant.
Databases
• Mysql.
• Stores metadata.
• Started with one main database and
a backup.
• It was good until users started using
the site.
• So database replication is done.
Replication
Replica Lag
• The main down side of MySQL
replication.
• Replication is asynchronous.
• Replicas fall behind master database,
so serve old data.
Replication : Master
Replication: Replica
Replicas falling down!
• Too many replicas.
• Writes started crowding reads.
• Replication lag was horrible.
• Extraordinary measures were needed.
Database Partitioning
Monolithic database
Shard 1 Shard 2 Shard 3
user1 user4 user7
user2 user5 user8
user3 user6 user9
Advantages of sharding
• A shard architecture partitions data on
to multiple servers so each server holds
shard of data.
• Faster backup.
• Recovery
• Data can fit into memory.
• Easier to manage.
Partition
• Partition by user.
• Spreads write and read.
• 30% hardware reduction.
• Replica lag is reduced to 0.
References
• www.google.com
• www.highscalability.com
• http://youtubereport2009.com
• www.google.video