Drupal Performance and Scalability

http://books.tag1consulting.com/book/export/html/1

Drupal Performance and Scalability
Obtaining Optimal Performance From Drupal And The LAMP Stack. This book is written by Jeremy Andrews and licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 Unported License.

Section 1: Drupal Performance
The first section of this book offers details on how to get good performance out of your Drupal powered website, and how to scale it as demand grows. The majority of the features discussed in this section are available without making any modifications to Drupal.

Chapter 1: Setting Goals
This first chapter discusses the importance of understanding your own performance and scalability goals. It helps the reader to identify specifically what they hope to accomplish, showing them how to set concrete and attainable goals, breaking larger requirements into smaller pieces. This chapter will teach the importance of maintaining historical performance logs, discussing several technologies and services that are available to aide in this effort. It will stress the importance of making regular backups, and of testing backups before changes are made. Finally, it will explain the importance of first testing changes on development servers, and explain best practices for deploying tested changes onto production servers. 1. Setting Goals 1. Performance and Scalability Checklist 2. Understanding and Defining the Problem 3. Goals versus Requirements 2. Measuring Progress 1. Setting A Baseline 2. Measuring Progress 3. Online Services 3. Backups 1. What To Backup
1 of 19 7/18/08 10:30 AM

Drupal Performance and Scalability

http://books.tag1consulting.com/book/export/html/1

2. Backup Schedules 3. Validating Backups 4. Staging Changes 1. Testing Changes 2. Source Control 3. Database Schema Changes 4. Pushing Changes To Production

Chapter 2: Drupal Infrastructure
This chapter will provide an overview of what is coming up later in the book. It will talk about cheap $5/month web hosts, versus slightly more powerful Virtual Private Servers, versus dedicated servers and server farms. It will collect together network diagrams for the various configurations, and point to later chapters where the various features are more fully explained. 1. Bargain Basement Hosting 1. Advantages 2. Squeezing Water From A Rock 3. Development and Testing 4. Outgrowing Your Host 5. Diagram 2. Virtual Private Servers 1. Advantages 2. What Is Virtualization? 3. Competing For Resources 4. Outgrowing Your Host 5. Diagram 3. Multiple Installations versus Multi-site Installations 1. Advantages 2. Security Considerations 3. Diagrams 4. Dedicated Hosting 1. Single Server 2. Multiple Servers 3. Sharing Files And File Systems 4. Load Balancers 5. High Availability 6. Scaling Up vs. Scaling Out 7. Caching

2 of 19

7/18/08 10:30 AM

Drupal Performance and Scalability

http://books.tag1consulting.com/book/export/html/1

8. Network Diagrams

Chapter 3: Performance Configuration
This chapter introduces Drupal's built-in performance features. It explains how Drupal's built-in page cache works, and details how it can be configured. The chapter also discusses Drupal's built-in CSS and JS aggregation and compression. The importance of regularly purging Drupal's logs will be discussed. And finally, the chapter will explore Drupal's throttle module. 1. Drupal's Page Cache 1. Normal Caching 2. Aggressive Caching 3. Minimum Cache lifetime 2. CSS and JavaScript Aggregation 1. Aggregation 2. Compression? 3. Purging Logs 1. Watchdog Logs 2. The Access Log 4. The Throttle Module 1. Background 2. Configuration 3. Modules 4. Blocks 5. Integrating Custom Themes and Modules 6. Why There Won't Be A Throttle In Drupal 7

Chapter 4: Too Many Modules
This chapter takes an in depth look at Drupal's modular design. It explores the concept behind Drupal's “hooks”, using the nodeapi as an example. It also looks at Drupal's menu system. The chapter then puts all of this together by tracing what happens when you enable a single Drupal module. Finally, it discusses the temptation to enable hundreds of contributed modules. 1. Modules and Hooks 1. Drupal modules
3 of 19 7/18/08 10:30 AM

Drupal Performance and Scalability

http://books.tag1consulting.com/book/export/html/1

2. Adding Features With Hooks 3. Example: the nodeapi 2. Menus 1. Defining Pages 3. Enabling Modules 1. Memory Limits 2. .install Files 3. Drupal 7 Registry Preview 4. All You Can Eat?

Chapter 5: Caching Layer
This chapter dives into Drupal's code, taking a look at the underlying caching layer. It will begin with an accessible, high-level description before it dives into the actual implementation. Finally, it will teach module developers how to better use Drupal's built-in caching layers. 1. Understanding Drupal's Caching Layer 1. Overview 2. Variables 3. The Many Cache Tables 2. Developing With Drupal's Caching Layer 1. Drupal's Cache API 2. Caching With Custom Modules 3. Sessions

Chapter 6: The devel Module
This chapter will take a look at the contributed devel module, explaining its key importance in performance tuning a Drupal website. It will discuss the many configuration options, and explain how the module can be used to profile page loads. 1. More Then A Development Tool 1. Visualizing Slow Queries 2. Timing Page Creation 3. Page Elements Versus The Database 2. Configuration 3. Profiling Database Queries

4 of 19

7/18/08 10:30 AM

Drupal Performance and Scalability

http://books.tag1consulting.com/book/export/html/1

1. Identifying Slow Queries 2. Identifying Duplicate Queries 3. Common Queries and What They Mean

Chapter 7: To Patch Or Not To Patch
Drupal offers considerable performance and scalability without modifying the code in any way. However, much more performance can be obtained by patching the core code. This chapter weighs the pros and cons of patching Drupal, and the impact this has on keeping up to date with security patches and upgrading to new releases. 1. The Case For Patching 1. Optimal Performance 2. Community Patchsets 3. Backports 4. Hitting Modularity Limitations 2. The Case For Not Patching 1. Avoiding The Unknown and Under Tested 2. Keeping Up With Security Updates 3. Upgrading To New Releases

Section 2: Front End Performance
This second section of the book begins to look at the underlying LAMP stack, discussing how it can be optimized specifically to get the most out of Drupal. Most of the information will be presented so it is accessible to people without a background in system administration, though advanced topics will also be discussed.

Chapter 8: Optimizing PHP
This chapter will look at tuning PHP with php.ini. It will explain how to read phpinfo(), and discuss PHP's memory footprint. It will explain how PHP is compiled for each page, unless you enable an opcode cache. It will then review some of the most popular opcode caches, how they work with Drupal, and

5 of 19

7/18/08 10:30 AM

Drupal Performance and Scalability

http://books.tag1consulting.com/book/export/html/1

known issues and fixes. 1. Configuring PHP 1. What is php.ini 2. Finding php.ini 3. phpinfo() 2. Tuning PHP 1. Modifying php.ini 2. PHP's memory footprint 3. Disabling Unnecessary Features 3. Writing Good Code 1. Common Pitfalls 2. Investment vs. Return 4. Opcode Caches 1. Scripting Languages 2. APC 3. Xcache 4. eAccelerator 5. The White Screen Of Death

Chapter 9: Optimizing Apache
This chapter will review how Apache can be optimized to achieve better Drupal performance. It will discuss performance oriented Apache configuration options. It will look at Apache modules, and will explore the importance of minimizing Apache's memory footprint. Finally, it will look at the various web server architectures, exploring the use of load balancers to scale out this layer. 1. Configuring Apache 1. httpd.conf 2. vhosts 3. Compression 2. Apache Modules 1. Performance Features 2. Memory Considerations 3. Load Testing 3. Infrastructure Choices 1. Basement Startups: All On One Server 2. Stand Alone Web Servers 3. Multiple Servers With Load Balancers

6 of 19

7/18/08 10:30 AM

Drupal Performance and Scalability

http://books.tag1consulting.com/book/export/html/1

4. Multiple Datacenters

Chapter 10: Alternatives To Apache
While Apache is the most popular open source web server, it's not the only open source web server. This chapter will review the advantages and disadvantages of serving pages with the most popular alternative, lighttpd. It will detail how to get Drupal up and running with lighttpd, and explore configuration options for improving performance. It will also look at using lighttpd to compliment Apache in an infrastructure, instead of replacing it. Later, the chapter will take a brief look at running Drupal on a newer and lesser known alternative, Nginx. Finally, it will also briefly explore WAMP based Drupal installs, tuning IIS on Windows. 1. Lighttpd 1. Feature Comparison 2. Benchmarks 3. Limitations 4. Configuration 2. Other Alternatives 1. Nginx 2. IIS (WAMP versus LAMP)

Chapter 11: Optimizing Your Theme
Drupal themes are what give websites their own unique look. This chapter explores the impact of creating overly complex designs with many images, CSS files, and external JavaScripts. It will take a fresh look at CSS and JavaScript aggregation, previously discussed in Chapter 2. It will also review best practices for using images, and how the size of images affects page load times. Finally, it will look at how to get a complex looking design without negatively slowing down the time it takes for each page to load. 1. Images 1. Multiple HTTP Requests 2. File size 2. CSS 1. Inline Styles
7 of 19 7/18/08 10:30 AM

Drupal Performance and Scalability

http://books.tag1consulting.com/book/export/html/1

2. External CSS files 3. Caching 4. Aggregation 5. Compression 3. JavaScript 1. Inline 2. External 3. Caching 4. Aggregation 5. Compression 4. Optimizations 1. Multiple sub-domains 2. Browser Cookies 3. Far-Future Expiration 4. JQuery

Chapter 12: Content Delivery Networks
This chapter will provide background on Content Delivery Networks, or CDNs, explaining how they speed up page load times by bringing the contents of a web page physically closer to the visitor. It will examine contributed modules for quickly integrating Drupal websites with CDNs. It will also offer some insight into the pros and cons of some of the more powerful CDN services currently being offered. 1. Background 1. Concepts 2. Building a mini-CDN 2. Integration 1. Modules 2. Themes 3. CDN Lineup 1. Panther Express 2. Akamai 3. EdgeCast 4. Limelight

8 of 19

7/18/08 10:30 AM

Drupal Performance and Scalability

http://books.tag1consulting.com/book/export/html/1

Chapter 13: Front-end Performance Tools
There are several useful tools freely available for the open source FireFox web browser. This chapter will explore how to use FireBug to take apart and understand the elements that combine to form a web page. It will also explore the Yslow extension, detailing how to use its extremely useful performance reports. (I will research to see if similar tools are available for other browsers, and if so will also cover them in this chapter.) 1. FireFox 1. FireBug 2. YSlow

Section 3: Improved Caching and Searching
This section will focus in on two key areas where Drupal can benefit from third party integration, caching and searching. Many of these advanced topics will require patching Drupal's core.

Chapter 14: Reverse Proxies
This chapter explores the usage of reverse proxies, adding additional layers of caching to your web infrastructure. It explains how this improves both performance and scalability. It then looks at several specific reverse proxy options, and their configurations, including Squid, Varnish, and Apache's mod_proxy. 1. Reverse Proxy Architecture 2. The Benefits Of Reverse Proxies 1. Performance 2. Scalability 3. Layered Caching 3. Selecting and Configuring a Reverse Proxy 1. Squid 2. Varnish
9 of 19 7/18/08 10:30 AM

Drupal Performance and Scalability

http://books.tag1consulting.com/book/export/html/1

3. Apache and mod_proxy

Chapter 15: Integrating Third Party Caches
This chapter introduces the concept of integrating Drupal with a third-party cache. It will examine the use, advantages and limitation of file-based caches. It will also review the use, advantages and limitations of using PHP opcode caches for caching other content. The chapter will detail the many projects helping with this integration, explaining their configuration and use. It will also review the many patches available for improving Drupal's caching, including the advcache project, block caching, and improved taxonomy caching. Finally, the chapter will provide an initial introduction to memcached. 1. File caches 1. Boost module 2. Fastpath_fscache 3. Cache Coherency 2. Patching Drupal 1. Advcache 2. Caching Blocks 3. Caching Taxonomy 3. Memory caches 1. Opcode Caches 2. Distributed Memory Cache

Chapter 16: Caching With Memcached
This chapter will offer an in depth look at what memcache is, and how it improves both website performance and scalability. It will look at how memcache achieves its performance, reviewing the difference between hash tables and databases, and explaining how memcache can help websites of all sizes. This chapter will explore Drupal's contributed memcache integration module, and the patches that come with the project. It will look at how to modify Drupal so anonymous pages can be served directly out of RAM, and so pages for logged-in users can be assembled from objects stored in RAM.

10 of 19

7/18/08 10:30 AM

Drupal Performance and Scalability

http://books.tag1consulting.com/book/export/html/1

Finally, it will look at the areas in Drupal that most benefit from memcached integration. 1. Memcache Background 1. LiveJournal.com 2. Hash Table Lookups 2. Infrastructure Design 1. Spare RAM 2. Distributed Caching 3. Failing Servers 4. Memcache Clusters 3. Memache Module 1. Overview 2. Base Features 3. Administration 4. Advanced Features 4. Beyond Core 1. Finding Good Candidates For Caching 2. Memcache Integration 3. Common Mistakes 4. AdvCache project

Chapter 17: Drupal's Search Module
This chapter will explore how Drupal's search module works, explaining limitations introduced by the fact that SQL was not designed as a searching language. It will discuss how to get the best performance out of Drupal's search module, and how to know when it's time to look consider other alternatives. This chapter will mostly look at search in Drupal 6, but will take a brief look at why an improved search API is likely to be one of the killer features in Drupal 7. 1. Search Module Design 1. Background 2. Searching With SQL 2. Performance Bottlenecks 1. InnoDB Performance 2. When To Replace

11 of 19

7/18/08 10:30 AM

Drupal Performance and Scalability

http://books.tag1consulting.com/book/export/html/1

3. Searching Drupal's Future 1. Search API in Drupal 7 2. Introducing Third Party Search Integration

Chapter 18: Searching With Xapian, Sphinx & Solr
Xapian and Sphinx are two unrelated standalone search technologies written in C++. This chapter will explain how Xapian supports real time indexing and exposes extremely flexible APIs, while Sphinx offers lightening fast search performance. It will also explore using the Java based Solr search engine, discussing the steeper requirements and its flexible, advanced feature set. This chapter will detail how each solution can be integrated into a Drupal website, replacing or enhancing Drupal's core search functionality. 1. Xapian 1. Background 2. Strengths 3. Weaknesses 4. Benchmarks 5. Integration 2. Sphinx 1. Background 2. Strengths 3. Weaknesses 4. Benchmarks 5. Integration 3. Solr 1. Background 2. Strengths 3. Weaknesses 4. Benchmarks 5. Integration

Section 4: Optimizing the Database Layer
12 of 19 7/18/08 10:30 AM

Drupal Performance and Scalability

http://books.tag1consulting.com/book/export/html/1

The fourth section of the book will examine database administration for a Drupal powered website.

Chapter 19: Drupal's Database Abstraction Layer
This chapter will note that Drupal is described as being “database agnostic”, as the code strives to not depend on the underlying database that is being used. It will review the database abstraction layer, and will talk about the currently supported databases. It will detail how in spite of this noble aim, MySQL is still strongly favored. It will compare MySQL support with PostgreSQL support. Finally, it will offer a preview of the database layer rewrite that is happening for Drupal 7, detailing how this may finally make Drupal database agnostic. 1. Abstraction Layer Design 1. Abstraction Concepts 2. MySQL Support 3. PostgreSQL Support 2. Database Abstraction in Drupal 7

Chapter 20: Choosing a Storage Engine
This chapter will primarily compare MyISAM and InnoDB. It will look at Drupal's history of being designed for MyISAM, and talk about some of the Drupal-specific pitfalls with using InnoDB. It will then explain the many advantages to using InnoDB, presenting this as currently being the only serious option for large high traffic websites using MySQL. This chapter will also briefly look at some of the up and coming MySQL storage engines currently being developed. 1. Storage Engines 1. Concepts 2. Mix and Match 2. MyISAM 1. Strengths 2. Weaknesses 3. InnoDB
13 of 19 7/18/08 10:30 AM

Drupal Performance and Scalability

http://books.tag1consulting.com/book/export/html/1

1. Strengths 2. Weaknesses 4. Previews 1. Falcon 2. Maria

Chapter 21: Monitoring MySQL
This chapter will first explain the importance of monitoring your database server. It will then present several useful tools for monitoring MySQL, including mytop and innotop. It will also discuss MySQL's built in reports, including SHOW FULL PROCESSLIST, SHOW GLOBAL STATUS, and SHOW INNODB STATUS. This chapter will also discuss MySQL's various logs. 1. Overview 1. Why 2. How Often 2. Monitoring Tools 1. MySQL's Built In Reports 2. mysqlreport 3. mytop 4. innotop 5. Cacti 6. MySQL Enterprise Montitors 3. Logs 1. Error Logs 2. Slow Query Logs 3. No Index Logs

Chapter 22: Tuning MySQL
This chapter will build upon what was learned in the previous chapter, detailing how to use that knowledge to isolate and fix performance bottlenecks. It will take a lengthy look at the mysqlreport perl script, explaining how it summarizes many of the reports discussed in the previous chapter, and how to use this tool to tune your server for optimal performance. It will highlight the MySQL configuration options that most affect Drupal performance. 1. Isolating Trouble Spots

14 of 19

7/18/08 10:30 AM

Drupal Performance and Scalability

http://books.tag1consulting.com/book/export/html/1

2. Tuning With mysqlreport 1. Examples 3. Deploying Changes 1. The Tortoise and the Hare 2. Historical Monitoring 3. Controlled Experimentation

Chapter 23: Slow Query Log, Indexes, and Query Performance
This chapter will take a closer look at MySQL queries. It will examine the mysqlsla perl script, detailing how it is used to quickly track down the database queries that are wasting the greatest amount of resources. It will then explain how to determine why a query is performing poorly. It discuss how some queries can be optimized by adding indexes, while also looking at the impact of adding too many indexes to your tables. It will offer an in depth look at how MySQL indexes work, comparing indexes in MyISAM versus InnoDB. It will also review when to use multiple simple queries instead of complex queries. Throughout these chapter, specific Drupal examples will be provided. 1. Revisiting the Slow Query Log 1. configuration 2. mysqlsla 3. micro-second patches 2. Query Performance 1. Reviewing the devel Module 2. Understanding Indexes 3. Joining Tables

Chapter 24: MySQL Replication
This chapter will define MySQL replication, explaining how it works and how it can be used to improve a Drupal website's performance and scalability. It will explore patches that have been deployed on Drupal.org to send some database queries to a slave server, and the rest to the master server. It will examine the idea of using Master-Master replication, arguing against this as a means for scaling Drupal websites. It will also briefly look at the concept of

15 of 19

7/18/08 10:30 AM

Drupal Performance and Scalability

http://books.tag1consulting.com/book/export/html/1

sharding, and look at plans in Drupal 7 for potentially supporting these advanced scalability features, reviewing the limitations imposed by Drupal's design in Drupal 5 and Drupal 6. 1. Concepts 1. Configuration 2. Monitoring 3. Backups 4. Errors 2. Drupal and Replication 1. Mixing Storage Engines 2. Redirecting Search Queries 3. High Availability 3. Federated Databases 1. Sharding 2. MySQL Proxy 3. Drupal 7 Preview

Section 5: Drupal In The Cloud
This final section will be considered a “bonus” in the first edition of this book. This is because cloud computing is very new and unproven. There is a significant amount of interest in the potential for scalability with cloud computing, so it is important to explore this topic in these final chapters, while acknowledging that this is a quickly changing landscape.

Chapter 25: Cloud Computing
This chapter will offer a high level overview of what cloud computing is, and how it potentially solves the scalability problem. It looks at the advantages to outsourcing your underlying infrastructure, as well as the limitations this imposes 1. Overview 1. Concepts 2. Pay For What You Use 3. Outsourcing 4. Scalability 5. Performance

16 of 19

7/18/08 10:30 AM

Drupal Performance and Scalability

http://books.tag1consulting.com/book/export/html/1

6. Latency 7. Impermanence

Chapter 26: Running Drupal on Amazon's EC2
This chapter will provide details on how to get Drupal up and running with Amazon's EC2 cloud computing service. This chapter will include screenshots, as it will be a high level guide to getting things up and running. It will then examine performance concerns introduced by the high latency often found in a cloud environment. It will provide specific suggestions for improving Drupal's performance while running in the cloud. It will also look at cloud impermanence, and how to provide reliability through redundancy, replication, and backups. 1. Getting Started 1. Requirements 2. ElasticFox 3. AMIs 4. 32-bit versus 64-bit 5. Helpful Links 2. Drupal in the Cloud 1. Installation 2. Configuration 3. Benchmarks 3. Cloud Impermanence 1. Re-installation Scripts and Images 2. Rsync 3. Replication Across Zones 4. Load Balancing 5. Preserving IP Addresses 6. Automated Backups and S3 4. Performance 1. Dealing With Latency 2. Striping Drives 3. Layered Websites 4. Revisiting Memcache

17 of 19

7/18/08 10:30 AM

Drupal Performance and Scalability

http://books.tag1consulting.com/book/export/html/1

Chapter 27: Scaling In The Clouds
This chapter will take a high level look at the many benefits of scaling Drupal websites in the clouds. It will explore many of the advanced features Amazon is planning for EC2, and how this will continue to make cloud computing a more attractive option. 1. Endless Scalability? 2. The Future

About The Author
Jeremy Andrews has been a core Drupal contributer since early 2002, when he was originally introduced to the project by its creator. His hobby web page, KernelTrap.org, was the first online community to push Drupal to scale beyond its modest beginnings and to achieve popular recognition as a competitive CMS solution [1]. He has worked to improve Drupal's caching layer, optimized Drupal's bootstrap process, improved core Drupal queries, focused on improving Drupal's overall performance, and written core modules that are still included with every single copy of Drupal. He has given seminars on Drupal performance and scalability, both in person [2] and over the Internet [3]. Jeremy formed Tag1 Consulting, Inc in 2007, a successful consulting company that focuses on Drupal performance and scalability [4], recognized by Drupal's creator as being among the very best at what they do [5]. [1] http://luckofseven.com/vlog/episode13 [2] http://www.lullabot.com/seminar/drupal_performance_and_scalability /sunny... [3] http://www.mysql.com/news-and-events/web-seminars/display-94.html [4] http://tag1consulting.com/ [5] http://tag1consulting.com/blog/jeremy /Drupal_Creator_Praises_Tag1_Consul...

Back Cover
Drupal is a very flexible, modular framework often used as a content management system. This book is aimed at people who have already learned the basics of Drupal administration and theming. Through hands on
18 of 19 7/18/08 10:30 AM

Drupal Performance and Scalability

http://books.tag1consulting.com/book/export/html/1

instruction, this book will take you to the next level of understanding, teaching you how to achieve optimal Drupal performance and scalability. Master Drupal's built in performance features, and learn to scale Drupal through integration with third party searching and caching tools. Gain a greater understanding of the underlying LAMP stack on which Drupal runs with useful recipes and tips for monitoring and tuning Linux, Apache, MySQL and PHP. Learn performance secrets from other popular websites that continue to push Drupal to new levels, gaining insights from problems they've experienced and how they solved them. Whether your Drupal-powered website has outgrown its current infrastructure, you want to be prepared for future growth, or you want to understand what Drupal is capable of before you commit to using it for your website, this book will be your guide.

19 of 19

7/18/08 10:30 AM

Sign up to vote on this title
UsefulNot useful