0% found this document useful (0 votes)
953 views25 pages

YouTube Architecture Overview

This document summarizes the architecture of YouTube. It discusses how YouTube uses Apache, Python, Linux, and MySQL. It describes how YouTube scales its web servers, video serving, and thumbnail serving. It also discusses how YouTube overcame issues with databases by moving from a single database to replication and then sharding across multiple databases to improve performance and scalability.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
953 views25 pages

YouTube Architecture Overview

This document summarizes the architecture of YouTube. It discusses how YouTube uses Apache, Python, Linux, and MySQL. It describes how YouTube scales its web servers, video serving, and thumbnail serving. It also discusses how YouTube overcame issues with databases by moving from a single database to replication and then sharding across multiple databases to improve performance and scalability.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 25

Architectur

e
Presented by:
VIJET HEGDE
1st Sem, Mtech(S.E.)
M.S.R.I.T.
Agenda
• Introduction
• Platform
• Statistics
• Architecture
– Web servers
– Video serving
– Serving Thumbnails
– Databases
• References
Introduction
• What is Youtube?

• Largest video site.

• It is offering video-hosting for free.

• Gradually replacing Google video.


Platforms
• Apache
• Python
• Linux (SuSe)
• MySQL
• psyco, a dynamic python->C
compiler
• lighttpd for video instead of Apache
Statistics
• Founded 2/2005
• 3/2006 30 million video views/day
• Reached 1 billion views/day.
• #4 Largest Site on the Internet
• #1 Largest video site on the web
• In August 2008, youtube became the
#2 search engines over Yahoo on
Web.
Statistics Contd..,
• Videos uploaded per day: over
150,000

• Total videos uploaded as of March


17th 2008: 78.3 Million

• 15 Hours of video uploaded every


minute.
Serving Web Servers
Web Servers

• Linux is used.
• Generally scaled by adding some
machines.
• Less than 100ms to serve a page.
• Psycho- c-compiler of python.
• Pre- generated html is used.
Serving Video
• Main issues: Cost of bandwidth,
hadware and power consumption.
• Each video hosted by mini-cluster.
• Started with Apache.
• Apache-> lighttpd.
• Single process to multiprocess.
Contd..,
Serving Thumbnails
• Surprisingly difficult to handle
• Large number
• High no. of request/sec
• Apache performed badly on high load.
• Squid is used.
• But performance degraded as load
increased.
• So lighttpd is used.
contd..,
Contd..,
Solution to the problem
• Then also problem continued.
• To create new machine it took 24hrs.
• To reboot 6-10 hrs.
• Used google’s Big Table.
• Images are replicated to different
data centers using BigTable.
• Avoids small file problem.
• Fast, fault tolerant.
Databases
• Mysql.
• Stores metadata.
• Started with one main database and
a backup.
• It was good until users started using
the site.
• So database replication is done.
Replication
Replica Lag
• The main down side of MySQL
replication.

• Replication is asynchronous.

• Replicas fall behind master database,


so serve old data.
Replication : Master
Replication: Replica
Replicas falling down!
• Too many replicas.

• Writes started crowding reads.

• Replication lag was horrible.

• Extraordinary measures were needed.


Database Partitioning

Monolithic database

Shard 1 Shard 2 Shard 3


user1 user4 user7
user2 user5 user8
user3 user6 user9
Advantages of sharding
• A shard architecture partitions data on
to multiple servers so each server holds
shard of data.
• Faster backup.
• Recovery
• Data can fit into memory.
• Easier to manage.
Partition
• Partition by user.

• Spreads write and read.

• 30% hardware reduction.

• Replica lag is reduced to 0.


References
• www.google.com

• www.highscalability.com

• http://youtubereport2009.com

• www.google.video

You might also like