You are on page 1of 7

WHITE PAPER

Permabit Albireo
Data Optimization Software
Virtual Data Optimizer (VDO) for Linux
August 2012

Permabit Technology Corporation


One Alewife Center, Suite 410
Cambridge, MA 02140 USA
Phone: 617.252.9600
FAX: 617.252.9977
info@permabit.com
www.permabit.com

Contents
Executive Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Market Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Low-End NAS Market Big Storage is Coming to Town. . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Enterprise Flash New Solutions, New Challenges. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Cloud Computing A Growth Opportunity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Ready-to-Run Deduplication Software. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Albireo VDO Data Optimization Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Albireo VDO Architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Albireo VDO Deduplication Savings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Albireo VDO Resource Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Why Permabit? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Innovation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Focus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Expertise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
About Permabit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Find Out More . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

The Albireo technology


from Permabit will save an
OEM 18-24 months getting
to market, if they can do
it at all. This stuff is so far
ahead in its capabilities
and performance I cant
see why you would want
to do it yourself, unless you
already have it baked.
Steve Duplessie
Founder & Sr. Analyst
Enterprise Strategy Group

Executive Summary
Rampant data growth is the IT industrys single most important issue, affecting budgets, operating
costs, floor space and of course capital expenditure through the amount of data created and its
associated cost. According to IDC, the amount of electronic data created is expected to reach 35
zettabytes by 20201. The conundrum facing every business is how to afford to store, analyze, manage
and house this data.
With recent advancements in data optimization, IT organizations are exploring this technology
as a more efficient means of housing information. In fact, another study by ESG shows that data
efficiency is the number one priority of storage professionals in IT today. As a direct result of this
increasing customer demand, storage manufacturers and online service providers are now offering
data optimization capabilities. Of the available data optimization technologies, data deduplication
is the one with the greatest potential to deliver substantial and recurring impact on the cost and
manageability of data growth.
The challenge for many of these manufacturers and providers is to incorporate deduplication
technology that can be leveraged across storage platforms while meeting market timing demands
and not derailing other high priority R&D projects. Several have made investments in deduplication
technology specifically to address problems associated with backup, only to find the resulting
solutions were not suitable for primary storage workflows or were not extendable to new
technologies.
Albireo Virtual Data Optimizer (Albireo VDO) provides the fastest route to market for deduplication
on systems running the popular Linux operating system. Powered by Permabits Albireo Data
Optimization Software, Albireo VDO is a complete, ready-to-run solution that offers block-level
deduplication services for both Linux-based storage OEMs and online service providers.

1 Extracting Value from Chaos, IDC, June 2011

Market Dynamics
There are dozens of Linux
based NAS vendors that
are looking for ways to
differentiate themselves
from their competitors.
Building the Albireo VDO
technology into their
devices is an excellent way
to differentiate themselves
and deliver a truly usable
deduplication solution.
George Crump
Founder
Storage Switzerland

It turns out that, like


chocolate and peanut
butter, data deduplication
and SSDs combine to
create a whole greater
than the sum of its parts.
Howard Marks
Data Deduplication And
SSDs: Two Great Tastes
That Taste Great Together
Network Computing

Unbridled information growth is forcing all organizations to re-think their storage strategies
in light of flat IT budgets. As a result, the $25+ billion storage market is facing a sea change
that promises to reset the competitive landscape. Technologies that substantially reduce overall
storage costs are must have requirements. OEMs that provide the highest storage efficiency and
most substantial top-line and bottom-line impacts are poised for the greatest success.

Low-End NAS Market Big Storage is Coming to Town


The market for Linux-based OEM storage has grown 300% over a 3 year period. Big Storage
providers recognize this growth as an opportunity and are moving down-market using data
optimization technologies to deliver highly efficient storage at extremely low realized costs per GB.
Gartner defines this as the Low-end Enterprise NAS market segment, where Linux-based storage
dominates. Deduplication technology is allowing Big Storage to be increasingly competitive in this
cost-sensitive market. The challenge facing todays value-oriented, Linux-based storage OEM is: how
can they continue to leverage open source to compete effectively with larger competitors who have
data optimization capabilities?

Enterprise Flash New Solutions, New Challenges


At the same time, on the high-end of the storage market, Enterprise Flash-based appliances have
rapidly evolved to become the performance leader in IT. An array of Enterprise Flash typically
accelerates overall application performance due to its I/O capabilities when compared to spinning
disk. Many of these appliances also provide caching capabilities that automatically move inactive
data to traditional hard drives. Mission-critical applications in the enterprise today such as database
indexing, online transaction processing, desktop and server virtualization, front-end Web serving,
and key infrastructure offerings (such as email and messaging) have been the targets of Enterprise
Flash deployments.
Over the past few years, the cost of high-performance storage has dropped significantly. The main
driver for this cost reduction has been the emergence of flash-based solid-state devices (SSDs)
in enterprise storage appliance configurations. When deduplication technologies are applied to
Enterprise Flash environments the effective costs become even more aligned with spinning disk
storage. In addition, deduplication optimizes flash write operations because less data is being written
relative to the amount of data stored, providing incremental life cycles to flash and improving data
safety.
Flash vendors are beginning to offer deduplication in these environments today, and deduplication is
the enabler that closes the cost and data safety gaps that have previously inhibited more widespread
adoption. However, deduplication technology designed for spinning disk-based primary or backup
storage does not readily apply to devices based on flash technology. Permabit Albireo for Enterprise
Flash has been designed specifically to meet the demands of these I/O intensive devices using
patented technology and algorithms that optimize resource efficiency in solid state environments.

Cloud Computing A Growth Opportunity


As mentioned above, the amount of electronic data stored worldwide is on track to exceed 35
zettabytes by 2020, and as the data footprint expands, the process of storing and managing
information becomes more complex. By 2015, nearly 20% of this information will be touched by
(and as much as 10% maintained in) the cloud2. Cloud-based server and desktop Virtual Machine
(VM) providers, in particular, stand to benefit from these growth trends. Gartner estimates that 5%
of all VMs will be hosted by cloud providers by 20143. Infrastructure providers have emerged with
solutions to help manage and protect this massive pool of desktops and servers.

2 Extracting Value from Chaos, IDC, June 2011


3 Virtual Machines Will Slow in the Enterprise, Grow in the Cloud, Gartner, March 2011

Ready-to-Run Deduplication Software


Market dynamics justify storage manufacturer and service provider efforts to develop or
integrate comprehensive, sub-file-level deduplication capabilities into their existing single-tier
storage solutions, while providing a viable roadmap to tomorrows universal storage solutions.
The overarching requirement is for data optimization to increase storage efficiency without
incurring a performance penalty. All differentiating features of the storage platform must remain
intact, with no compromises in functionality, data ingestion, or data access performance.
Key requirements involve the following areas:
Performance Data optimization must be extremely efficient and maintain a level of
performance that does not impede overall storage performance on read and write operations.
Storage vendors have made billion dollar R&D investments to optimize their storage
performance as a means of differentiating their offerings.
Feature Set Compatibility Data optimization software must operate in conjunction with
existing storage software and not interfere with or impede existing features. Storage vendors
have invested millions into storage features that are vital to the operation and market value of
their respective storage solutions.
Resource Efficiency Cost is king, particularly in the Low-end NAS appliance space. Accordingly,
data optimization software cannot increase resource requirements that then impact that cost.

Albireo VDO Data Optimization Software


Albireo VDO provides ready-to-run data deduplication capabilities for Linux-based storage, enabling
OEMs to continue leveraging all of their storage solutions existing features, including existing Linux
file systems, storage virtualization features, and data protection capabilities. Because Albireo VDO
uses Permabits patented Albireo deduplication technology it is able to avoid costs associated with
todays high-end enterprise deduplication solutions that typically require large amounts of system
memory and proprietary PCI Express cards to achieve even a fraction of Albireos scalability and
performance.
Albireos high performance data deduplication provides a truly competitive feature set for mixed
applications and use cases. Albireo VDOs straightforward block-level, content-agnostic approach to
data optimization provides an effortless solution that is both transparent and non-disruptive to enduser customers. With Albireos record-breaking performance, Linux-based storage OEMs can extend
their deduplication capabilities and out-compete even the high-end proprietary storage players
by providing data optimization capabilities for mission-critical application storage while effectively
leveraging Linux open source to maximize value. Since Albireo VDO is implemented in terms of the
Linux device mapper, it provides the perfect solution for Linux-based storage providers who wish
to leverage their existing Linux integration investments, increase margins, and accelerate time-tomarket with leading-edge data optimization.

Albireo VDO Architecture


The Albireo index provides the foundation for the Albireo VDO solution. The single greatest
challenge when implementing a deduplication system is in rapidly identifying duplicate information
across a storage pool that can contain hundreds of billions of items. To achieve acceptable levels of
performance the system must, for each new piece of data, quickly determine if that piece is identical
to any previously stored piece of data. If a match is found, the storage system can then internally
reference the existing item to avoid storing the same information a second time. The Albireo Index
Engine can identify duplicates across large storage pools in memory more than 99.95% of the
time, eliminating the largest deduplication bottleneck, disk-based fetches. Index lookup averages
just 5 microseconds on flash or 10 microseconds on traditional hard disk-based storage orders
of magnitude faster than other deduplication solutions. This enables Albireo VDO to support
sustainable ingestion rates of over 1 GB/sec with a single 6-core processor.

The Albireo VDO Linux kernel module is implemented in terms of the Linux
device-mapper. In the Linux kernel, the device-mapper serves as a generic framework
to map one block device onto another. It forms the foundation of LVM2 and EVMS, software
RAIDs, dm-crypt disk encryption, and additional features such as file system snapshots. Devicemapper works by processing data passed in from a virtual block device, in this case Albireo VDO,
and then passing the resultant data on to another block device.
Figure 1:
Albireo VDO Architecture

Minimum System
Requirements
CPU Architecture:
64-bit x86
RAM: 350 MB
Disk Space: 55 GB
Linux Distribution:
Debian, SuSE, Red Hat,
CentOS, Ubuntu

Albireo VDO can be implemented asynchronously or synchronously:


When running in asynchronous mode, Albireo deduplication technology works in-line to
find duplicates and then writes only the unique blocks to underlying storage. As part of the
asynchronous data flow, VDO supports block-layer flush commands by persisting metadata
to ensure file system integrity. VDO also provides data integrity from unclean shutdowns by
ensuring that no more than 5 seconds of data is lost as a result of unexpected system crash.
In synchronous mode new blocks are always written to the underlying storage device first,
before Albireo checks for duplicates, to provide the highest level of data integrity. When VDO
receives deduplication advice from the Albireo Index, it removes duplicate blocks from storage.
In all cases, data optimization through block-level deduplication increases the overall capacity of the
underlying device.
In addition to deduplication, Albireo VDO provides thin provisioning services for Linux. Thin
provisioning allocates physical volume or file system capacity as applications write data, rather
than pre-allocating all physical capacity at the time of provisioning. This allows space savings to be
realized from the deduplication process, effectively making more virtual space accessible than is
physically available.

Albireo VDO Deduplication Savings


Deduplication savings are highly dependent on the way that data is used (workflow) as
well as the type of data being processed. Albireo has been tested on a wide range of popular
data types including common office productivity files, Backups, and VMware system images
(Table 1). Albireo achieved the best results with VMware images with a deduplication rate as
high as 99%. Excellent results were also achieved with the Exchange data and office files. Albireo
reduced Exchange data by 86% and office files by 33%. Across the board, Albireo deduplication
delivers massive cost savings.
Table 1:
Albireo VDO Deduplication Savings

Sample Data

Dedupe Rate (4 KB Chunks)

User Directories, Fixed Chunk

2.8 : 1

User Directories, Variable Chunk

3.9 : 1

Tar Backups, Fixed Chunk with LZ77 compression

25.1 : 1

VMware Images, Fixed Chunk

36.3 : 1

Albireo VDO Resource Efficiency


Albireo VDO requires a single, dedicated Intel (or compatible) CPU core, 350 MB of memory, and
52GB of disk space to address deduplication requirements for a 1 TB storage partition. Efficiency is
improved for larger configurations. For example, 32 GB of memory can be used to support a 256 TB
storage partition (0.13 GB of RAM/TB of disk). Albireo VDO also requires 42 GB of physical storage for
indexing along with 1 GB of physical storage per TB of logical storage for handling metadata.
Table 2:
Albireo VDO Resource Efficiency

Physical Capacity
Logical Capacity
Memory Requirements
Disk Requirements*

1 TB

16 TB

64 TB

256 TB

10 TB

160 TB

640 TB

2.5 PB

0.35 GB

2.1 GB

8.4 GB

32 GB

55 GB

247 GB

859 GB

3.2 TB

*Assumes 10x logical storage

Why Permabit?
Innovation
Only Permabit Albireo VDO enables OEMs to rapidly deliver high performance deduplication for
Linux-based storage solutions. Albireo VDO is a plug-and-play OEM solution that flexibly integrates
within the constraints of existing storage architectures and leverages existing significant R&D
investments.

Focus
Permabit is an expert in the development of highly scalable, next-generation storage solutions
that deploy full inline data deduplication. By offering the industrys first embedded OEM data
optimization solution, Permabit is enabling Linux-based storage OEMs to compete effectively with
breakthrough technology. Emerging storage vendors can capitalize on this major market shift by
introducing new storage solutions that take market share away from incumbents. Leading storage
vendors can leverage Albireo to further solidify their market position.

Expertise
The Permabit track record in storage expertise and innovation is without peer for a company of
its age and size. Permabit has a total of 37 patents filed and 28 patents granted, all in the storagerelated field. Its MIT-educated engineers have earned multiple awards for product innovation. Since
2000, Permabit has worked to develop the latest storage technology to address the challenges of
highly scalable storage. With the release of Albireo, Permabit has made its core intellectual property
for data optimization available for the first time as an OEM offering to other manufacturers and
service providers. The Albireo architecture is a proven technology that has been implemented in
production environments as a core technology in the Permabit Enterprise Archive and Cloud Storage.

Conclusion
Data centers are dealing with explosive data growth and flat budgets. As a result,
IT organizations are making storage purchase decisions based on storage efficiency
and total storage costs versus simply buying cheap capacity. Storage vendors and online
service providers who will grow and flourish in todays business environment must adapt their
existing storage solutions and/or introduce new offerings that provide greater storage efficiency
and reduced operating cost.

Albireo (al-BEER-ee-oh) appears


to the naked eye to be a single
star but can be resolved with
a telescope into a double star,
consisting of a brighter yellow
star and a fainter blue star.

Albireo VDO delivers advanced data optimization technology that substantially reduces effective
storage costs without sacrificing existing storage features. Once deployed, Albireo VDO provides
unsurpassed performance and exceptional deduplication efficiency. In addition, the process of
storage allocation is greatly simplified with the introduction of thin provisioning capabilities.
By delivering a virtual block device that just works out-of-the-box with existing file systems
and data management features, Albireo VDO offers the fastest possible route to market for
Linux-based storage OEMs, both manufacturers and online service providers. Whether the OEM
is delivering NAS, SAN, or unified storage solutions based on traditional hard disks or flashbased storage, Albireo VDO provides the ideal ready-to-run data efficiency solution with leading
capabilities for powerful competitive differentiation.

About Permabit
Permabit is a recognized leader in data efficiency technology. We enable OEMs to leverage their
R&D investment, increase margin, accelerate time to market and achieve competitive advantage.
Permabit Albireo software massively improves performance and efficiency of data creation,
transmission and storage. Solutions built with Albireo are being delivered by leading hardware,
software and service providers.

Find Out More


To learn more about the Permabit Albireo technology, or to license our products, visit our website
at www.permabit.com or call us directly at 617.252.9600.

One Alewife Center, Suite 410


Cambridge, MA 02140
Phone: 617.252.9600
FAX: 617.252.9977
0923
2012 Permabit Technology Corporation. All Rights Reserved. Permabit is a registered trademark and the Permabit logo, Albireo logo, Permabit Enterprise
Archive, and Scalable Data Reduction are trademarks of the Permabit Technology Corporation. All other products or services mentioned may be covered by
registered trademarks, trademarks, service marks, or product names as designated by the companies who market those products.

info@permabit.com
www.permabit.com

You might also like