You are on page 1of 6

SAP HANA: Simply Explained

This article is for SAP HANA beginners, who dont really understand what it is about and are
lost in all the marketing and technical jargon.
What is SAP HANA?
SAP HANA is the latest in-memory analytics product from SAP; using HANA companies can do
ad hoc analysis of large volumes of data in real-time.
SAP HANA is a combination of hardware and software specifically made to process massive real
time data using In-Memory computing.
If a business asks a question and gets the answer after 3 days probably one would even forget
what was the question was after 3 days.
So one of the key challenges IT is facing now is not only the capability to extract reports on huge
amount of real time data which keeps growing in an exponential rate and from different sources
but also analyzing it in different perspectives, that too in seconds.
When we say real real Time Data for a retail shop perspective the POS (Point of sale) data
would be available for analytics even before the customer leaves the retail store.
What is in-memory?
In-memory means all the data is stored in the memory (RAM). This is no time wasted in loading
the data from hard-disk to RAM or while processing keeping some data in RAM and temporary
some data on disk. Everything is in-memory all the time, which gives the CPUs quick access to
data for processing.
What is real-time analytics?
Using HANA, companies can analyze their data as soon as it is available. In older days,
professionals had to wait at least few hours before they could analyze the data being generated
around the company. To put this in perspective, let us take an example suppose a super market
chain wants to start giving you discount coupons when you visit them based on your shopping
habits. Before: they could only mail them to your address or give you coupons for your next
purchase. Now: while you are checking out, your entire shopping history can be processed and
discount could be given on the current shopping. Imagine the customer loyalty for such a chain!
So is SAP making/selling the software or the hardware?
SAP has partnered with leading hardware vendors (HP, Fujitsu, and IBM, Dell etc.) to sell SAP
certified hardware for HANA. SAP is selling licenses and related services for the SAP HANA
product which includes the SAP HANA database, SAP HANA Studio and other software to load
data in the database. Also, as already announced, the vision is to run all the application layer
enterprise software on the HANA platform; that is ERP/BW/CRM/SCM etc /etc will use HANA
as their database.

Hardware Design:
Huge amount of data is divided into multiple sets which are then crunched separately by the
Blades as shown below.

Pic: Data is divided into 4 blades with 2 standby blades


The Blades are composed of multiple CPUs per blade and each CPU has multiple cores per
CPU.
This means if you say for example 8 cores per CPU and 4 such CPUs per blade. So mere 4
Blades will have 128 cores crunching data in parallel.
Software Design:
HANA stores data in Column wise for fast computing. The below diagram compares, how data is
stored row wise and column wise.

For example: If system wants to find aggregate of the second column i.e. 10+35+2+40+12.
In Row wise: The system has to jump memory address to collect subsequent values for
aggregation.
That is data records are available as complete tuples in one read which makes accessing of few
attributes expensive operation.
In Column wise: A single scan would fetch the results much faster.
Can I just increase the memory of my traditional Oracle database to 2TB and get similar
performance?
Well, NO. You might have performance gains due to more memory available for your current
Oracle/Microsoft/Teradata database but HANA is not just a database with bigger RAM. It is a
combination of a lot of hardware and software technologies. The way data is stored and
processed by the In-Memory Computing Engine (IMCE) is the true differentiator. Having that
data available in RAM is just the icing on the cake.
Is HANA really fast? How is it possible that HANA is so fast?
HANA is fast due to many reasons. The following picture1 depicts a very simplified version
of whats inside the In-memory Computing Engine. Most of the chip maker companies have
been focusing on building multiple core as there is limitation on increasing the Clock Speed
as the more is the Clock Speed the more heat it emits while processing which results in
expensive ways to control the temperature. If you observe the last decade the Clock Speed has
not made any impressive change.
2002
2006
2010

Cores/CPU - 1 Core
Clock Speed 1.8 GHz

4 Cores
1.6- 3GHz

8 Cores
2.26 GHz

Software has always been made for the Hardware which these Top chip makers design.
There is always a need to build Software which can fully exploit the capabilities of the Hardware
design.
One of the main strong points of SAP HANA is its ability to process data in parallel, cutting the
initial (large) amount of data into small chunks, and then giving each chunk to a separate CPU to
work onhence the need for the large number of CPU cores.
One other aspect of the system is that wherever possible, data is kept in memory, in order to
speed up access time. Where a traditional database system might set aside a gigabyte or two of
memory as a cache, SAP HANA takes this to the next level, using nearly all the server's memory
for the data, making access times nearly instantaneous.

Column Storage
While the traditional databases store the relational table one row after another, IMCE stores
tables in columns. Hopefully the following figure explains the difference between the two
storage mechanisms easily.

Frankly, storing data in columns is not a new technology, but it has been not leveraged to its full
potential YET. The columnar storage is read optimized, that is, the read operations can be
processed very fast. However, it is not write optimized, as a new insert might lead to moving of a
lot of data to create place for new data. HANA handles this well with delta merge (which in itself
is a topic for an entire article coming next), so let us just assume here, that the columnar storage
performs very well while reading and the write operations are taken care of by the IMCE in some
other ways. The columnar storage creates a lot of opportunities as follows:
1. Compression: As the data written next to each other is of same type, there is no need to
write the same values again and again. There are many compression algorithms in HANA
with the default being the dictionary algorithm, which for example maps long strings to
integers
Example of dictionary algorithm: You have a Country column in your Customer table
in your database. Lets say you have 10 million customers from 100 countries. In the
standard row-based storage you will need 10 million string values stored in memory.
With the dictionary compression, the 100 country values will be assigned an integer
based index and now you need only 10 million integers + the 100 string values + the
mapping of these values. This is a lot of compression in terms of bytes stored in memory.
There are more advanced compression algorithms (RTE etc) which would even reduce
the 10 million integer storage.
Now imagine a scenario with 100 tables and a few thousand columns. You get the picture.
Less data is exponentially proportional to fast processing. The official tests show a
compression of 5-10x, that is a table which used to take 10GB of space would now need
only 1-2GB of storage space.
2. Partitioning: SAP HANA supports two types of partitioning. A single column can be
partitioned to many HANA servers and different columns of a table can be partitioned in
different HANA servers. Columnar storage easily enables this partitioning.
3. Data stripping: There are often times when querying a table, a lot of columns are not
used. For example, when you just want the revenue information from a Sales table which
stores a lot of other information as well. The columnar storage enables that the
unnecessary data is not read or processed. As the tables are stored in vertical fashion,
there is no time wasted trying to read only the relevant information from a lot of
unnecessary data.
4. Parallel Processing: It is always performance critical to make full use of the resources
available. With the current boost in the number of CPUs, the more work they can do in
parallel, the better the performance. The columnar storage enables parallel processing as
different CPUs can take one column each and do the required operations (aggregations
etc) in parallel. Or multiple CPUs can take a partitioned column and work in parallel for
faster output.

Multiple Engines
SAP HANA has multiple engine inside its computing engine for better performance. As SAP
HANA supports both SQL and OLAP reporting tools, there are separate SQL and OLAP engines
to perform operations respectively. There is a separate calculation engine to do the calculations.

There is a planning used for financial and sales reporting. Above all sits something like a
controller which breaks the incoming request into multiple pieces and sends sub queries to these
engines which are best at what they do. There are separate row and column engines to process
the operations between tables stored in rows and tables stored in column format.
Caution: Currently, you can't perform a join between a table stored in row format and a table
stored in column format. Also, the query/reporting designer needs to be careful about which
engines are being used by the query. As the performance reduces if for example the SQL engine
has to do the job of the calculation engine because the controller was not able to optimize the
query perfectly.
What is ad hoc analysis?
In traditional data warehouses, such as SAP BW, a lot of pre-aggregation is done for quick
results. That is the administrator (IT department) decides which information might be needed for
analysis and prepares the result for the end users. This results in fast performance but the end
user does not have flexibility. The performance reduces dramatically if the user wants to do
analysis on some data that is not already pre-aggregated. With SAP HANA and its speedy
engine, no pre-aggregation is required. The user can perform any kind of operations in their
reports and does not have to wait hours to get the data ready for analysis.
I hope the above information is useful to get a better understanding of SAP HANA. Please let me
know your comments/suggestions.
1: The picture is obviously a much simplified version of the engine and there is much more to it
than represented in the picture.