You are on page 1of 4

WHITEPAPER

How Solid Database


Planning Can Future
Proof Your Business
Introduction
Companies thrive on great ideas. A great idea for an application is often the start of an exciting and
profitable new business enterprise. Early design decisions however can affect the application as it
grows and changes to meet customer and business needs.

One example is not enough planning or sufficient futureproofing for the database environment to
support the application. After all, designing the application is the fun part; the database is often
viewed as a necessary component that functions in the background.

The approach is like building a house on a bad foundation. You can put as many windows and useful
fixtures and finishes in the house as you want, but eventually the poor foundation will cause issues.
Building your application on a stable database backend prevents future problems that can be costly
and time consuming to resolve. In the worst case, a lack of strategy for dealing with data can lead to
the demise of a company.

Let’s Use the Email Address

A company was designing a new social media platform and needed to make some decisions about
data storage. They chose MySQL as their database and set about defining the schema for the
database. The application was designed such that the users would log into the platform using their
email address and password, and all interactions were associated with their email address.
Therefore, it made logical sense to choose the user’s email address as the primary field: it was
guaranteed as unique, it could not be null, and all activity correlated to it.

Fortunately, the company was on a solid growth trajectory. This meant that their database
environment was growing rapidly. Eventually, they had about 600 database servers balancing their
database load, most of which were replaced yearly or every other year to ensure the fastest
performance possible. They were nearly doubling the number of servers every year.

While developing a five-year plan, they realized that the rate of growth and spend was not
sustainable. Given that they were normally doubling the number of servers each year, in five years,
they would require almost 10,000 servers. Each of these costs roughly $5,000 a year to maintain,
and the hardware eventually needs to be upgraded or replaced. This meant they were looking at
nearly fifty million in spending over the next five years—just for hardware storage.

Obviously, something needed to be done to reduce this number.

Character Based Primary Key

One of the biggest issues was traced back to the decision to use the user’s email address as a
primary key.

When initially setting up a database, one of the decisions is what field you want to use as your
primary key. There are some seemingly obvious choices, such as customer_id for a sales
environment, or employee_id for an HR database. However, attributes of that primary key field can
create storage and performance concerns later that can cost millions of dollars in the end.
The MySQL Reference Manual defines the primary key as:

The primary key for a table represents the column or set of columns that you use in your
most vital queries. It has an associated index, for fast query performance. Query
performance benefits from the NOT NULL optimizations, because it cannot include any
NULL values. With the InnoDB storage engine, the table data is physically organized to do
ultra-fast lookups and sorts based on the primary key column or columns.

If your table is big and important but does not have an obvious column or set of columns to
use as a primary key, you might create a separate column with auto-increment values to
use as the primary key. These unique IDs can serve as pointers to corresponding rows in
other tables when you join tables using foreign keys.

While this is a great description of a primary key it leaves out a few critical details (although there are
hints hidden in the auto-increment suggestion).

Given that the email address is character based, it required far more storage than a unique numeric ID
would.

In MySQL, the primary key is used to sort table data on disk. That primary key is then used in all other
indexes to quickly find the data on disk. A character field, such as email address, can take 20-40 bytes to
store, while a numeric would only require 4-8 bytes, depending on the numeric data type chosen. This
means that if you have 100M rows, 10 indexes on your table, and use an email address with an average
size of 30 bytes as your primary key, you would need roughly 31GB of space for indexes. The same setup
using a numeric key requires about 4GB.

Now imagine this across multiple tables, with 2-3 times the data volume. Even though disk storage is
cheap, there is still a practical limit to how much data can fit into memory. And, this company was
growing, so there was no real limit to the amount of data that they would record.

A Change to the Numeric Primary Key

Simply changing the primary key from the character-based email address to an automatically
incremented numeric value reaped massive benefits to this customer. Performance improved by a factor
of three and data size was reduced by a factor of seven.

Before the change, they had 600 servers. After the change, the data could be stored on 198 servers.
Assuming cost savings of $5,000 per server, that nets a savings of over $2 million. Given that they had
fewer servers to manage, they were also able to reduce costs for power, cooling, rack space, and
maintenance.
While this is an impressive win, the decision to use the email address as a primary key had some impact on
the company. Remember that they were doubling servers each year. Had this change been implemented
earlier, or at the outset, the cost savings would have been noticed much earlier. There were also
performance improvement because of the change that could have been recognized earlier as well.

There were many opportunities to proactively discover this issue and resolve it:

• During regular code deployments


• When redesigning the website
• By thoroughly reviewing why their database costs were increasing
• By proactive tuning and maintenance

Even though this company was growing and could afford the extra expense, what could they have done with
a few extra million dollars? What projects could they have started and competed? What additional revenue
streams could they have brought forward?

This is an example of the how seemingly obvious decisions can have long range consequences to an
organization. Luckily, this company could withstand the issue and survive, but this could have fatal effect on
another organization.

Conclusion

In the rush to get products out the door, meet deadlines and boost sales, companies often overlook the need
to proactively maintain, tune and optimize their infrastructure. Designing this infrastructure is often done
quickly and without thought to long-term business goals. Decisions are made in the application development
process can (and do) affect how a business meets its goals—especially as the goals change with new
business objectives or business growth.

By committing to regular reviews of your infrastructure and application, as well as a regiment of proactive
activities, you can avoid wasting money, improve your ROI and keep your business running smoothly.

Managing your organization’s database operations, on-premises or in the cloud, requires in-depth
knowledge of potential issues plus diligent, dedicated practice. Being aware of the issues above will help
protect your organizations data-based applications when migrating your database to the cloud. It will also
significantly enhance both performance and scalability to deliver a better user experience.

To learn about how Percona can help you, contact us at 1-888-316-9775 or


0-800-051-8984 in Europe or email sales@percona.com

You might also like