You are on page 1of 13

SAP HANA Overview

“Significant shifts in market share and fortunes occur not because companies try to play the game
better than the competition but because they change the rules of the game”

— Constantinos Markides1

Every industry has a certain set of “rules” that govern the way the companies in that industry operate.
The rules might be adjusted from time to time as the industry matures, but the general rules stay
basically the same — unless some massive disruption occurs that changes the rules or even the entire
game. SAP HANA is one of those massively disruptive innovations for the enterprise IT industry.

To understand this point, consider that you’re probably reading this book on an e-reader, which is a
massively disruptive innovation for the positively ancient publishing industry. The book industry has
operated under the same basic rules since Gutenberg mechanized the production of books in 1440.
There were a few subsequent innovations within the industry, primarily in the distribution chain, but the
basic processes of writing a book, printing it, and reading it remained largely unchanged for several
hundred years. That is — until Amazon and Apple came along and digitized the production, distribution,
and consumption of books. These companies are also starting to revolutionize the writing of books by
providing new authoring tools that make the entire process digital and paper-free. This technology
represents an overwhelming assault of disruptive innovation on a 500+ year-old industry in less than 5
years.

Today, SAP HANA is disrupting the technology industry in much the same way that Amazon and Apple
have disrupted the publishing industry. Before we discuss how this happens, we need to consider a few
fundamental rules of that industry.

The IT Industry: A History of Technology Constraints

Throughout the history of the IT industry, the capabilities of applications have always been constrained
to a great degree by the capabilities of the hardware that they were designed to run on. This explains
the “leapfrogging” behavior of software and hardware products, where a more capable version of an
application is released shortly after a newer, more capable generation of hardware — processors,
storage, memory, and so on — is released. For example, each version of Adobe Photoshop was designed
to maximize the most current hardware resources available to achieve the optimal performance.
Rendering a large image in Photoshop 10 years ago could take several hours on the most powerful PC. In
contrast, the latest version, when run on current hardware, can perform the same task in just a couple
of seconds, even on a low-end PC.
Enterprise software has operated on a very similar model. In the early days of mainframe systems, all of
the software — specifically, the applications, operating system, and database — was designed to
maximize the hardware resources located inside the mainframe as a contained system. The
transactional data from the application and the data used for reporting were physically stored in the
same system. Consequently, you could either process transactions or process reports, but you couldn’t
do both at the same time or you’d kill the system. Basically, the application could use whatever
processing power was in the mainframe, and that was it. If you wanted more power, you had to buy a
bigger mainframe.

The Database Problem: Bottlenecks

When SAP R/3 came out in 1992, it was designed to take advantage of a new hardware architecture —
client-server — where the application could be run on multiple, relatively cheap application servers
connected to a larger central database server. The major advantage of this architecture was that, as
more users performed more activities on the system, you could just add a few additional application
servers to scale out application performance. Unfortunately, the system still had a single database
server, so transmitting data from that server to all the application servers and back again created a huge
performance bottleneck.
Eventually, the ever-increasing requests for data from so many application servers began to crush even
the largest database servers. The problem wasn’t that the servers lacked sufficient processing power.
Rather, the requests from the application servers got stuck in the same input/output (IO) bottleneck
trying to get data in and out of the database. To address this problem, SAP engineered quite a few
“innovative techniques” in their applications to minimize the number of times applications needed to
access the database. Despite these innovations, however, each additional database operation continued
to slow down the entire system.

This bottleneck was even more pronounced when it came to reporting data. The transactional data —
known as online transaction processing, or OLTP — from documents such as purchase orders and
production orders were stored in multiple locations within the database. The application would read a
small quantity of data when the purchasing screen was started up, the user would input more data, the
app would read a bit more data from the database, and so on, until the transaction was completed and
the record was updated for the last time. Each transactional record by itself doesn’t contain very much
data. When you have to run a report across every transaction in a process for several months, however,
you start dealing with huge amounts of data that have to be pulled through a very slow “pipe” from the
database to the application.

To create reports, the system must read multiple tables in the database all at once and then sort the
data into reports. This process requires the system to pull a massive amount of data from the database,
which essentially prevents users from doing anything else in the system while it’s generating the report.
To resolve this problem, companies began to build separate OLAP systems such as SAP Business
Warehouse to copy the transaction data over to a separate server and offload all that reporting activity
onto a dedicated “reporting” system. This arrangement would free up resources for the transactional
system to focus on processing transactions.

Unfortunately, even though servers were getting faster and more powerful (and cheaper), the
bottleneck associated with obtaining data from the disk wasn’t getting better; in fact, it was actually
getting worse. As more processes in the company were being automated in the transactional system, it
was producing more and more data, which would then get dumped into the reporting system. Because
the reporting system contained more, broader data about the company’s operations, more people
wanted to use the data, which in turn generated more requests for reports from the database under the
reporting system. Of course, as the number of requests increased, the quantities of data that had to be
pulled correspondingly increased. You can see how this vicious (or virtuous) cycle can spin out of control
quickly

The Solution: In-Memory Architecture


This is the reality that SAP was seeing at their customers at the beginning of the 2000’s. SAP R/3 had
been hugely successful, and customers were generating dramatically increasing quantities of data. SAP
had also just released SAP NetWeaver2, which added extensive internet and integration capabilities to
its applications. SAP NetWeaver added many new users and disparate systems that talked to the
applications in the SAP landscape. Again, the greater the number of users, the greater the number of
application servers that flooded the database with requests. Similarly, as the amount of operational data
in the SAP Business Warehouse database increased exponentially, so did the number of requests for
reports. Looking forward, SAP could see this trend becoming even more widespread and the bottleneck
of the database slowing things down more and more. SAP was concerned that customers who had
invested massive amounts of time and money into acquiring and implementing these systems to make
their businesses more productive and profitable would be unable to get maximum value from them.

Fast forward a few years, and now the acquisitions of Business Objects and Sybase were generating
another exponential increase in demands for data from both the transactional and analytic databases
from increasing numbers of analytics users and mobile users. Both the volume of data and the volume
of users requesting data were now growing thousands of times faster than the improvements in
database I/O.

Having become aware of this issue, in 2004 SAP initiated several projects to innovate the core
architecture of their applications to eliminate this performance bottleneck. The objective was to enable
their customers to leverage the full capabilities of their investment in SAP while avoiding the data
latency issues. The timing couldn’t have been better. It was around this time that two other key factors
were becoming more significant: (1) internet use and the proliferation of data from outside the
enterprise, and (2) the regulatory pressures on corporations, generated by laws such as Sarbanes-Oxley,
to be answerable for all of their financial transactions. These requirements increased the pressure on
already stressed systems to analyze more data more quickly. The SAP projects resulted in the delivery of
SAP HANA in 2011, the first step in the transition to a new in-memory architecture for enterprise
applications and databases. SAP HANA flips the old model on its head and converts the database from
the “boat anchor” that slows everything down into a “jet engine” that speeds up every aspect of the
company’s operations.

SAP’s Early In-Memory Projects

SAP has a surprisingly long history of developing in-memory technologies to accelerate its applications.
Because disk I/O has been a performance bottleneck since the beginning of three-tier architecture, SAP
has constantly searched for ways to avoid or minimize the performance penalty that customers pay
when they pull large data sets from disk. So, SAP’s initial in-memory technologies were used for very
specific applications that contained complex algorithms that needed a great deal of readily accessible
data.

SAP In-Memory Technology Evolution

However, passing a benchmark and running tests in the labs are far removed from the level of
scalability and reliability needed for a database to become the mission-critical heart of a Fortune 50
company. So, for the next four years, SAP embarked on a “bullet-proofing” effort to evolve the “project”
into a “product”.

In May 2010, Hasso Plattner, SAP’s supervisory board chairman and chief software advisor, announced
SAP’s vision for delivering an entirely in-memory database layer for its application portfolio. If you
haven’t seen his keynote speech, it’s worth watching. If you saw it when he delivered it, it’s probably
worth watching again. It’s Professor Plattner at his best.

Different Game, Different Rules: SAP HANA

One year later, SAP announced the first live customers on SAP HANA and that SAP HANA was now
generally available. SAP also introduced the first SAP applications that were being built natively on top
of SAP HANA as an application platform. Not only did these revelations shock the technology world into
the “new reality” of in-memory databases, but they initiated a massive shift for both SAP and its
partners and customers into the world of “real-time business”.

In November 2011, SAP achieved another milestone when it released SAP Business Warehouse 7.3. SAP
had renovated this software so that it could run natively on top of SAP HANA. This development sent
shockwaves throughout the data warehousing world because almost every SAP Business Warehouse
customer could immediately3 replace their old, disk-based database with SAP HANA. What made this
new architecture especially attractive was the fact that SAP customers did not have to modify their
current systems to accommodate it. To make the transition as painless as possible for its customers, SAP
designed Business Warehouse 7.3 to be a non-disruptive innovation.
Innovation without Disruption

Clay Christensen’s book The Innovator’s Dilemma was very popular reading among the Tracker team
during the early days. In addition to all the technical challenges of building a completely new enterprise-
scale database from scratch on a completely new hardware architecture, SAP also had to be very
thoughtful about how its customers would eventually adopt such a fundamentally different core
technology underneath the SAP Business Suite.

To accomplish this difficult balancing act, SAP’s senior executives made the team’s primary objective
the development of a disruptive technology innovation that could be introduced into SAP’s customers’
landscapes in a non-disruptive way. They realized that even the most incredible database would be
essentially useless if SAP’s customers couldn’t make the business case to adopt it because it was too
disruptive to their existing systems. The team spoke, under NDA, with the senior IT leadership of several
of SAP’s largest customers to obtain insights concerning the types of concerns they would have about
such a monumental technology shift at the bottom of their “stacks.” The customers provided some
valuable guidelines for how SAP should engineer and introduce such a disruptive innovation into their
mission-critical landscapes. Making that business case involved much more than just the eye-catching
“speeds and feeds” from the raw technology. SAP’s customers would switch databases only if the new
database was minimally disruptive to implement and extremely low risk to operate. In essence, SAP
would have to build a hugely disruptive innovation to the database layer that could be adopted and
implemented by its customers in a non-disruptive way at the business application layer.

The Business Impact of a New Architecture

When viewed from a holistic perspective, the entire “stack” needed to run a Fortune 50 company is
maddeningly complex. So, to engineer a new technology architecture for a company, you first have to
focus on WHAT the entire system has to do for the business. At its core, the new SAP database
architecture was created to help users run their business processes more effectively4. It had to enable
them to track their inventory more accurately sell their products more effectively, manufacture their
products more efficiently, and purchase materials economically. At the same time, however, it also had
to reduce the complexity and costs of managing the landscape for the IT department.

Today, every business process in a company has some amount of “latency” associated with it. For
example, one public company might require 10 days to complete its quarterly closing process, while its
primary competitor accomplishes this task in 5 days — even though both companies are using the same
SAP software to manage the process. Why does it take one company twice as long as its competitor to
complete the same process? What factors contribute to that additional “process latency”?

The answers lie in the reality that the software is simply the enabler for the execution of the business
process. The people who have to work together to complete the process, both inside and outside the
company, often have to do a lot of “waiting” both during and between the various process steps. Some
of that waiting is due to human activities, such as lunch breaks or meetings. Much of it, however, occurs
because people have to wait while their information systems process the relevant data. The old saying
that “time is money” is still completely true, and “latency” is just a nice way of saying “money wasted
while waiting.”

As we discussed earlier, having to wait several minutes or several hours or even several days to obtain
an answer from your SAP system is a primary contributor to process latency. It also discourages people
from using the software frequently or as it was intended. Slow-performing systems force people to take
more time to complete their jobs, and they result in less effective use of all the system’s capabilities.
Both of these factors introduce latency into process execution.

Clearly, latency is a bad thing. Unfortunately, however, there’s an even darker side to slow systems.
When businesspeople can’t use a system to get a quick response to their questions or get their job done
when they need to, they invent workarounds to avoid the constraint. The effort and costs spent on
“inventing” workarounds to the performance limitations of the system waste a substantial amount of
institutional energy and creativeness that ideally should be channeled into business innovation. In
addition, workarounds can seriously compromise data quality and integrity.

As we have discussed, the major benefits of in-memory storage are that users no longer have to wait
for the system, and the information they need to make more intelligent decisions is instantly available at
their fingertips. Thus, companies that employ in-memory systems are operating in “real time.”
Significantly, once you remove all of the latency from the systems, users can focus on eliminating the
latency in the other areas of the process. It’s like shining a spotlight on all the problem areas of the
process now that the system latency is no longer clouding up business transparency.

The Need for Business Flexibility

In addition to speeding up database I/O throughput and simplifying the enterprise system architecture,
SAP also had to innovate in a third direction: business flexibility. Over the years, SAP had become adept
at automating “standard” business processes for 24 different industries globally. Despite this progress,
however, new processes were springing up too fast to count. Mobile devices, cloud applications, and big
data scenarios were creating a whole new set of business possibilities for customers. SAP’s customers
needed a huge amount of flexibility to modify, extend, and adapt their core business processes to reflect
their rapidly changing business needs. In 2003, SAP released their service-oriented architecture, SAP
NetWeaver, and began to renovate the entire portfolio of SAP apps to become extremely flexible and
much easier to modify. However, none of that flexibility was going to benefit their customers if the
applications and platform that managed those dynamic business processes were chained to a slow,
inflexible, and expensive database.

The only way out of this dilemma was for SAP to innovate around the database problem entirely. None
of the existing database vendors had any incentive to change the status quo (see The Innovator’s
Dilemma for all the reasons why), and SAP couldn’t afford to sit by and watch these problems continue
to get worse for their customers. SAP needed to engineer a breakthrough innovation in in-memory
databases to build the foundations for a future architecture that was faster, simpler, more flexible, and
much cheaper to acquire and operate. It was one of those impossible challenges that engineers and
business people secretly love to tackle, and it couldn’t have been more critical to SAP’s future success.
Faster, Better, Cheaper

There’s another fundamental law of the technology industry: Faster, Better, Cheaper. That is, each new
generation of product or technology has to be faster, better, and cheaper than the generation it is
replacing, or customers won’t purchase it. Geoffrey Moore has some great thoughts on how game-
changing technologies “cross the chasm.” He maintains, among other things, that faster, better, and
cheaper are fundamental characteristics that must be present for a successful product introduction.

In-memory computing fits the faster, better, cheaper model perfectly. I/O is thousands to millions of
times faster on RAM than on disks. There’s really no comparison in how rapidly you can get memory off
a database in RAM than off a database on disk. In-memory databases are a better architecture due to
their simplicity, tighter integration with the apps, hybrid row/column store, and ease of operations.
Finally, when you compare the cost of an in-memory database to that of a disk-based database on the
appropriate metric — cost per terabyte per second — in-memory is actually cheaper. Also, when you
compare the total cost of ownership (TCO) of in-memory databases, they’re even more economical to
operate than traditional databases due to the reduction of superfluous layers and unnecessary tasks.

But faster, better, cheaper is even more important than just the raw technology. If you really look at
what the switch from an “old” platform to a “new” platform can do for overall usability of the solutions
on top of the platform, there are some amazing possibilities.

In-Memory Basics

Thus far, we’ve focused on the transition to in-memory computing and its implications for IT. With this
information as background, we next “dive into the deep end” of SAP HANA. Before we do so, however,
here are a few basic concepts about in-memory computing that you’ll need to understand. Some of
these concepts might be similar to what you already know about databases and server technology.
There are also some cutting-edge concepts, however, that merit discussion.

Storing data in memory isn’t a new concept. What is new is that now you can store your whole
operational and analytic data entirely in RAM as the primary persistence layer5. Historically database
systems were designed to perform well on computer systems with limited RAM. As we have seen, in
these systems slow disk I/O was the main bottleneck in data throughput. Today, multi-core CPUs —
multiple CPUs located on one chip or in one package — are standard, with fast communication between
processor cores enabling parallel processing. Currently server processors have up to 64 cores, and 128
cores will soon be available. With the increasing number of cores, CPUs are able to process increased
data volumes in parallel. Main memory is no longer a limited resource. In fact, modern servers can have
2TB of system memory, which allows them to hold complete databases in RAM. Significantly, this
arrangement shifts the performance bottleneck from disk I/O to the data transfer between CPU cache
and main memory (which is already blazing fast and getting faster).

In a disk-based database architecture, there are several levels of caching and temporary storage to keep
data closer to the application and avoid excessive numbers of round-trips to the database (which slows
things down). The key difference with SAP HANA is that all of those caches and layers are eliminated
because the entire physical database is literally sitting on the motherboard and is therefore in memory
all the time. This arrangement dramatically simplifies the architecture.

It is important to note that there are quite a few technical differences between a database that was
designed to be stored on a disk versus one that was built to be entirely resident in memory. There’s a
techie book6 on all those conceptual differences if you really want to get down into the details. What
follows here is a brief summary of some of the key advantages of SAP HANA over its aging disk-based
cousins.

Pure In-Memory Database

With SAP HANA, all relevant data are available in main memory, which avoids the performance penalty
of disk I/O completely. Either disk or solid-state drives are still required for permanent persistency in the
event of a power failure or some other catastrophe. This doesn’t slow down performance, however,
because the required backup operations to disk can take place asynchronously as a background task.1

Main Features:

SAP HANA Architecture SAP HANA is an in-memory data platform that can be deployed on premise or
on demand. At its core, it is an innovative in-memory relational database management system. SAP
HANA can make full use of the capabilities of current hardware to increase application performance,
reduce cost of ownership, and enable new scenarios and applications that were not previously possible.
With SAP HANA, you can build applications that integrate the business control logic and the database
layer with unprecedented performance. As a developer, one of the key questions is how you can
minimize data movements.
The more you can do directly on the data in memory next to the CPUs, the better the application will
perform. This is the key to development on the SAP HANA data platform. SAP HANA In-Memory
Database SAP HANA runs on multi-core CPUs with fast communication between processor cores, and
containing terabytes of main memory. With SAP HANA, all data is available in main memory, which
avoids the performance penalty of disk I/O. Either disk or solid-state drives are still required for
permanent persistency in the event of a power failure or some other catastrophe. This does not slow
down performance, however, because the required backup operations to disk can take place
asynchronously as a background task.

Columnar Data Storage A database table is conceptually a two-dimensional data structure


organized in rows and columns. Computer memory, in contrast, is organized as a linear structure. A
table can be represented in row-order or column order. A row-oriented organization stores a table as a
sequence of records. Conversely, in column storage the entries of a column are stored in contiguous
memory locations. SAP HANA supports both, but is particularly optimized for column-order storage.
Columnar data storage allows highly efficient compression. If a column is sorted, often there are
repeated adjacent values. SAP HANA employs highly efficient compression methods, such as run-length
encoding, cluster coding and dictionary coding. With dictionary encoding, columns are stored as
sequences of bit-coded integers. That means that a check for equality can be executed on the integers;
for example, during scans or join operations. This is much faster than comparing, for example, string
values. Columnar storage, in many cases, eliminates the need for additional index structures. Storing
data in columns is functionally similar to having a built-in index for each column. The column scanning
speed of the in-memory column store and the compression mechanisms – especially dictionary
compression – allow read operations with very high performance. In many cases, it is not required to
have additional indexes. Eliminating additional indexes reduces complexity and eliminates the effort of
defining and maintaining metadata.
Parallel Processing SAP HANA was designed to perform its basic calculations, such as analytic joins,
scans and aggregations in parallel. Often it uses hundreds of cores at the same time, fully utilizing the
available computing resources of distributed systems. With columnar data, operations on single
columns, such as searching or aggregations, can be implemented as loops over an array stored in
contiguous memory locations. Such an operation has high spatial locality and can SAP HANA Developer
Guide Introduction to SAP HANA Development efficiently be executed in the CPU cache. With row-
oriented storage, the same operation would be much slower because data of the same column is
distributed across memory and the CPU is slowed down by cache misses. Compressed data can be
loaded into the CPU cache faster. This is because the limiting factor is the data transport between
memory and CPU cache, and so the performance gain exceeds the additional computing time needed
for decompression. Column-based storage also allows execution of operations in parallel using multiple
processor cores. In a column store, data is already vertically partitioned. This means that operations on
different columns can easily be processed in parallel. If multiple columns need to be searched or
aggregated, each of these operations can be assigned to a different processor core. In addition,
operations on one column can be parallelized by partitioning the column into multiple sections that can
be processed by different processor cores.2
References:

1) Jeffery, word. (2014) SAP Hana Essentials , Epistemy press

2) SAP HANA Platform SPS 09 Document Version: 1.1 – 2015-02-16

You might also like