You are on page 1of 25

Architectur

e
Presented by:
VIJET HEGDE
1st Sem, Mtech(S.E.)
M.S.R.I.T.
Agenda
• Introduction
• Platform
• Statistics
• Architecture
– Web servers
– Video serving
– Serving Thumbnails
– Databases
• References
Introduction
• What is Youtube?

• Largest video site.

• It is offering video-hosting for free.

• Gradually replacing Google video.


Platforms
• Apache
• Python
• Linux (SuSe)
• MySQL
• psyco, a dynamic python->C
compiler
• lighttpd for video instead of Apache
Statistics
• Founded 2/2005
• 3/2006 30 million video views/day
• Reached 1 billion views/day.
• #4 Largest Site on the Internet
• #1 Largest video site on the web
• In August 2008, youtube became the
#2 search engines over Yahoo on
Web.
Statistics Contd..,
• Videos uploaded per day: over
150,000

• Total videos uploaded as of March


17th 2008: 78.3 Million

• 15 Hours of video uploaded every


minute.
Serving Web Servers
Web Servers

• Linux is used.
• Generally scaled by adding some
machines.
• Less than 100ms to serve a page.
• Psycho- c-compiler of python.
• Pre- generated html is used.
Serving Video
• Main issues: Cost of bandwidth,
hadware and power consumption.
• Each video hosted by mini-cluster.
• Started with Apache.
• Apache-> lighttpd.
• Single process to multiprocess.
Contd..,
Serving Thumbnails
• Surprisingly difficult to handle
• Large number
• High no. of request/sec
• Apache performed badly on high load.
• Squid is used.
• But performance degraded as load
increased.
• So lighttpd is used.
contd..,
Contd..,
Solution to the problem
• Then also problem continued.
• To create new machine it took 24hrs.
• To reboot 6-10 hrs.
• Used google’s Big Table.
• Images are replicated to different
data centers using BigTable.
• Avoids small file problem.
• Fast, fault tolerant.
Databases
• Mysql.
• Stores metadata.
• Started with one main database and
a backup.
• It was good until users started using
the site.
• So database replication is done.
Replication
Replica Lag
• The main down side of MySQL
replication.

• Replication is asynchronous.

• Replicas fall behind master database,


so serve old data.
Replication : Master
Replication: Replica
Replicas falling down!
• Too many replicas.

• Writes started crowding reads.

• Replication lag was horrible.

• Extraordinary measures were needed.


Database Partitioning

Monolithic database

Shard 1 Shard 2 Shard 3


user1 user4 user7
user2 user5 user8
user3 user6 user9
Advantages of sharding
• A shard architecture partitions data on
to multiple servers so each server holds
shard of data.
• Faster backup.
• Recovery
• Data can fit into memory.
• Easier to manage.
Partition
• Partition by user.

• Spreads write and read.

• 30% hardware reduction.

• Replica lag is reduced to 0.


References
• www.google.com

• www.highscalability.com

• http://youtubereport2009.com

• www.google.video