Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Download
Standard view
Full view
of .
Look up keyword
Like this
2Activity
0 of .
Results for:
No results containing your search query
P. 1
Wikipedia: Site Internals Configuration, Code Examples and Management Issues

Wikipedia: Site Internals Configuration, Code Examples and Management Issues

Ratings: (0)|Views: 19|Likes:
Published by api-26925601

More info:

Published by: api-26925601 on Oct 14, 2008
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

03/18/2014

pdf

text

original

Table of contents
Introduction\ue000
2
The big picture\ue000
2
Content delivery network\ue000
4
Cacheable content\ue000
5
Cache ef\ufb01ciency\ue000
6
CDN Con\ufb01guration \ufb01les\ue000
6
CDN Notes\ue000
8
Application\ue000
9
Components\ue000
9
Caching\ue000
11
Pro\ufb01ling\ue000
13
Media storage\ue000
15
Database\ue000
16
Database balancing\ue000
16
Database API\ue000
19
Database servers\ue000
21
External Storage\ue000
22
Database queries\ue000
23
Splitting\ue000
24
Data itself\ue000
25
Compression\ue000
27
Search\ue000
28
LVS: Load balancer\ue000
29
Administration\ue000
30
NFS\ue000
30
dsh\ue000
30
Nagios\ue000
30
Ganglia\ue000
30
People\ue000
30
Wikipedia: Site internals, con\ufb01guration, code examples and management issues\ue000
\ue000
Domas Mituzas, MySQL Users Conference 2007\ue000
1
Introduction

Started as Perl CGI script running on single server in 2001, site has grown into distributed
platform, containing multiple technologies, all of them open. The principle of openness
forced all operation to use free & open-source software only. Having commercial alterna-
tives out of question, Wikipedia had the challenging task to build ef\ufb01cient platform of freely
available components.

Wikipedia\u2019s primary aim is to provide a platform for building collaborative compendium of knowledge. Due to different kind of funding (it is mostly donation driven), performance and ef\ufb01ciency has been prioritized above high availability or security of operation.

At the moment there\u2019re six people (some of them recently hired) actively working on inter- nal platform, though there\u2019re few active developers who do contribute to the open-source code-base of application.

The Wikipedia technology is in constant evolution, information in this document may be
outdated and not re\ufb02ecting reality anymore.
The big picture

Generally, it is extended LAMP environment - core components, front to back, are:
\u2022Linux - operating system (Fedora, Ubuntu)
\u2022PowerDNS - geo-based request distribution
\u2022LVS - used for distributing requests to cache and application servers
\u2022Squid - content acceleration and distribution
\u2022lighttpd - static \ufb01le serving
\u2022Apache - application HTTP server
\u2022PHP5 - Core language
\u2022MediaWiki - main application
\u2022Lucene, Mono - search
\u2022Memcached - various object caching
Many of the components have to be extended to have ef\ufb01cient communication with each

other, what tends to be major engineering work in LAMP environments.
This document describes most important parts of gluing everything together - as well as
required adjustments to remove performance hotspots and improve scalability.
Wikipedia: Site internals, con\ufb01guration, code examples and management issues\ue000
\ue000
Domas Mituzas, MySQL Users Conference 2007\ue000
2
Content acceleration
& distribution network
People & Browsers
Application
Thumbs service
Media storage
Core database
Auxiliary databases
Search
Object cache
Management

As application tends to be most resource hungry part of the system, every component is built to be semi-independent from it, so that less interference would happen between mul- tiple tiers when a request is served.

The most distinct separation is media serving, which can happen without accessing any
PHP/Apache code segments.
Other services, like search, still have to be served by application (to apply skin, settings
and content transformations).
The major component, often overlooked in designs, is how every user (and his agent)
treats content, connections, protocols and time.
Wikipedia: Site internals, con\ufb01guration, code examples and management issues\ue000
\ue000
Domas Mituzas, MySQL Users Conference 2007\ue000
3

You're Reading a Free Preview

Download
scribd
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->