Professional Documents
Culture Documents
Research Papers: Sap Hana and Its Performance Benefits
Research Papers: Sap Hana and Its Performance Benefits
ABSTRACT
In-memory computing has changed the landscape of database technology. Within the database and technology field,
advancements occur over the course of time that has had the capacity to transform some fundamental tenants of the
technology and how it is applied. The concept of Database Management Systems (DBMS) was realized in industry during
the 1960s, allowing users and developers to use a navigational model to access the data stored by the computers of that
day as they grew in speed and capability. This manuscript is specifically examines the SAPHigh Performance Analytics
Appliance(HANA) approach, which is one of the commonly used technologies today. Additionally, this manuscript
provides the analysis of the first two of the four common main usecases to utilize SAP HANA's in-memory computing
database technology. The performance benefits are important factors for DB calculations.Some of the benefits are
quantified and the demonstrated by the defined sets of data.
Keywords: Database Management Systems (DBMS), SAPHigh Performance Analytics Appliance(HANA), Structured Query
Language (SQL), Relational Database Management system (RDBMS).
administration and maintenance. has proven itself and the linear nature of the number of
Frequently, IT is not seen as a valuable partner in meeting components in integrated circuits has doubled at a
the objectives of the business and IT is not characterized predictable pace. The linear nature of this relationship as it
as the means by which a business can create new applies to Central Processing Unit (CPU) performance and
opportunities and grow. More difficult times lead to further CPU cost has begun to change very recently. CPU
budget tightening and reductions in the needed performance improvement has begun to slow and clock
hardware refreshes. All of these factors leave no time for rates have remained nearly steady for several years.
innovation, creativity and the lack of budget for Meanwhile the cost of these components continues to
evaluating the latest new technology. decline substantially. Furthermore, the cost of memory
has also declined substantially. During this same time
The reality is a declining benefit of investment into physical
physical disks have seemed to reach a performance
disks. The apparent limitation in the physics of spinning
plateau and not offered the same declining cost and
disks leads to questioning previous systems and
improving performance relationship as CPU and RAM
development paradigms. With the continued need for
have demonstrated.
deeper, mobile and real-time access to all of the
companies' data calls for innovative solutions. These changes have enabled hardware advancements
including more cores and more CPUs on a given server
Emerging technology in the area of in-memory
and far more RAM is now feasible [7]. RAM costs have also
computing and products like SAP HANA are changing the
changed such that it is less necessary to limit the amount
database technology arena and the analytics field and
of RAM and use it only for caching of critical processes.
the overall system architectures [3]. These database
Instead, RAM can now go toe-to-toe with fixed disk for
advancements can bring improvements in the timeliness
storage and also combine with the CPU to provide a
of access to data and can enable improved decision
tremendous performance differential as compared to
making and enable transformation of business processes
disk-based systems and databases [8]. When compared
to make use of this new capability.
with disk, the physics of spinning platters and their resulting
1. A Historical View
database, read and write characteristics are best
In order to proceed with the subject at hand, an overview
measured in milliseconds. RAM arrives without those
of Relational Database Management System (RDBMS)
physical limitations and delivers nanosecond response
advancements is needed.
Timeline Era Important Products
The database technology field dates to the 1960s, with
Before 1970 Predatabase File managers
the introduction of the Integrated Data Store by Charles 1970-1980 Early database ADABAS, System 2000,
Bachman [4]. Prior to 1970 the relational database was Total, IDMS, IMS
1978-1985 Emergence of the DB2, Oracle
conceptual and data was stored in separate files stored relational model
on magnetic tape [5]. Memory was expensive, while 1982-1992 Microcomputer dBase - II, R:base, Paradox,
DBMS products Access
storage was relatively inexpensive and the Moore's law
1985-2000 Object - Oriented Oracle ODBMS and others
was a maturing concept without its full validity or its DBMS
temporal limitations [6]. 1995-present Web database IIS, Apache, PHP, ASP.NET,
and Java
The proceeding decades of database history can be 1995-present Open source MySQL, PostgreQL, and
DBMS product others
tabulated into nine distinct eras. In viewing this data alone
1998-present XML and web XML, SOAP, WSDL, UDDI,
a slowing trend of advancements can be seen since the services
2009-present NoSQL Apache Cassandra, dbXML
mid-1990s. Table 1 outlines the 9 eras. movement begins MonetDB/XQuery, and
others
Through the proceeding decades of database history
Table 1. Outlines the eras in the Timeline of Database
and computer science theory specifically, Moore's law Technology Development [5].
times [9]. Increasing RAM is one possibility -one must begin based solutions and in-memory database solutions are
to think in terms of terabytes of RAM and no longer increasingly becoming part of the reference architecture.
gigabytes of RAM, in much the same way computer At a minimum, the question of in-memory computing is
science has moved beyond kilobytes and bytes of RAM now being asked as part of architecture planning for
quite some time ago. According to IBM, some infrastructure updates. These technologies are
applications simply cannot run without in-memory increasingly being accepted as proven and ready for
computing even with high-data volumes [10]. long-term capital investment.
In-memory computing does dramatically change the The problem of data growth and the impact it has on run-
landscape of the database technology [11]. In-memory times and data accessibility is faced by system
computing involves the use of main memory not as a administrators, programmers and functional analysts. Due
cache of memory but instead as a means of storage, to the data growth, a program, system process that used to
retrieval and computational performance in close run each day, can now run once a week. Another problem is
proximity to the Central Processing Unit (CPU). The use of the runtime of a report which creates the need for a
row and columnar storage of data creates the ability to developer to reduce the user's ability to ask questions of the
remove the need for the index which is predominating in data, limiting its selection criteria. These problems may also
the disk-based database world. The lack of the index is a impact system administrators as they are asked to add
key area of compression and columnar storage allows database indexes to improve the programs performance.
further compression. Data growth is viewed as the largest datacenter
According to IBM and as Shown in Figure 1, “the main infrastructure challenge based on Gartner research [12].
memory (RAM) is the fastest storage type that can hold a Survey data indicated business continuity and data
significant amount of data” [10]. availability were the most critical, followed by cost
These advancements arrive at a time when RDBMS containment initiatives and maintained or improving user
advancements have increased and the typical IT service levels was the third most important factor for data
infrastructure is faced with the declining budgets and the center planning and strategy development [12].
expectations of improving performance results; all at the Another way to view the large data analytics is analyze
same time. If a hardware refresh is being planned, Linux- three characteristics of big data – volume, velocity and
variety [13].
2. The Setup
The hardware specifications are two-servers were
configured to act as one database. When combined the
system has eight (8) Westmere EX Intel E7-8870, ten (10)
core CPUs (80 cores) with a 2.4 GHz clock rate. The system
has two (2) SSD (Solid-State Drives) Fusion-IO drives with
640GB of capacity each for database log volumes. The
system has sixteen (16) 600GB 10,000 RPM SAS drives in a
RAID-5 configuration for persistence of the data volume in
addition to its storage and calculation on sixteen (16)
64GB DDR3 RDIMM.
3. SAP HANA in Action
It has been established that the emerging technology in
Figure 1. Data Access Times of Various Storage Types,
Relative to RAM (logarithmic scale) the area of in-memory computing is making promising
advancements and the technology is being applied in The results were quantified and the system was measured
ways that were not previously anticipated. According to in terms of the aggregation levels of a given report.
IBM, SAP High Performance Analytics Appliance (HANA) Typically, a report is severely constrained in selection
“enables marketers to analyze and segment their criteria to aide runtime performance, effectively
customers in ways that were impossible to realize without removing scope from the user's ability to query the data.
the use of in-memor y technology ” [10]. Some Another measurement will be in terms of data
applications are not available without in-memory compression. The source database tables were
technology [10]. measured for index and data size and the compression of
Unique application of this emerging technology has been the data size calculated. As the data is stored in columnar
applied within healthcare field, for personalized patient form, the need for the index is removed. Finally,
treatment planning in real-time, to greatly accelerated measurements in preparing the data for query including
cancer screenings and human genome analysis, out of runtime of all background jobs, extracts, transformations,
the box thinking is needed to explore and find the limits of loads, aggregation and index in addition to query time will
these new technologies [14]. be compared to the replicated data and its query
response time using the in-memory database.
These four use-case areas for SAP HANA are [15]
The first of two use-cases accelerated a complex table-
·real-time replication of data from SAP and non-SAP
set and performed an analytics use-case demonstrating
source systems for business intelligence analytics,
the real-time nature of the data access, the improved
·using this table replication capability to accelerate
response time and the data compression characteristics
programs and reports from the SAP ERP system and
of an in-memory database. The second use-case utilized
reading them faster than they can be read from the disk-
the improved read-rate performance to accelerate an
based database serving the SAP ERP system,
existing table-set and performed a database call from an
·as a database replacement for SAP Business
existing program with the table-set residing on SAP HANA,
Warehouse (BW) allowing a full and complete database
instead of directing the database call to the disk-based
migration and replacement path from various disk-based
database.
technologies to SAP HANA as the in-memory database for
The hardware utilized was from one of several hardware
SAP Business Warehouse (BW) and
vendors partnering with SAP to provide hardware capable
·as a database replacement for SAP ERP allowing a full
of utilizing SAP HANA as an in-memory computing engine.
and complete database migration and replacement
As discussed previously, disk-based hardware
path from various disk-based technologies to SAP HANA as
advancements have slowed, the cost of various system
the in-memory database for SAP ERP.
components have fundamentally changed and the user
When compared to a disk-based database, the in-
base along with the system administrator is faced with an
memory database yields far faster read-rate response
ever increasing growth of data size and along with
times [10], when comparing disk-based technologies the
increasingly complex data with the result being slower
measurements are in milliseconds while in-memory
and slower system performance.
databases are measured in nanoseconds. The unit of
The first of two use-cases accelerated a complex table-
measure shift alone is a matter of calculation and not the
set and performed an analytics use-case demonstrating
result of a test to outline the significant change from
the real-time nature of the data access, the improved
measurement in 1/1,000 (one one-thousanth) of a
response time and the data compression characteristics
second to 1/1,000,000,000 (one one-billitionth) of a
of an in-memory database.
second.
In this first use-case the SAP HANA appliance hardware
4. Measured Results
Table 2 presents the data compression summarized. traditional database had a run-time of 291 seconds. The
Dependent on the cardinality of the table, the data same unoptimized code calling the same table set
compression for a given table ranged from <1x replicated on SAP HANA had a run-time of 22 seconds. The
compression to near 6x compression; overall this data set difference represented a 13x improvement in the run-
compressed 3x. When comparing the compressed data time of the sample program tested. Identifying
set in-memory to the data set including indexes on the unoptimized code and mitigation of the unoptimized
traditional database the data compression for a given code further improved the run-time on the traditional
table may range from <1x compression to over 13x database to 12 seconds, an improvement of 24x versus
compression, overall this data set compressed 7x. the baseline. The now optimized code calling SAP HANA
As outlined by Figure 3, the unoptimized code calling the now has a run time performance of .67 seconds, an
·A comparative study of in-memory technologies The second use-case outlined the ability to access the
would be desirable to further analyze the pros and cons of SAP HANA database through a secondary database
in-memory computing connection, and utilizing existing SAP ERP ABAP code to
access replicated tables in the in-memory database.
·Financial implications and risks should be examined
The program can yield significant improvements in run-
for in-memory applications
time even on unoptimized code.
·As accountability for lost data keeps increasing, data
As the traditional disk-based architecture data footprint
loss and vaulting of cache memory for in-memory
grows at over 36% per year [16], innovations such as in-
applications needs to be analyzed
memory computing are increasing in adoption rate and
·The limitations of RAM and data sets for in-memory
organizations adopting this technology are also gaining a
applications need to be further examined
competitive business advantage [10].
When hardware refreshes are planned, SAP HANA and
Linux based operational systems require a thorough
review for integration into the architecture and where
business requirements demand dynamic access to
increasing complex data sets in real-time. Real-time
terminology is the key in understanding the benefits of in-
memory computing. Some applications and techniques
would not be even possible without in-memory
technologies [10].
“In May 2011, the market research firm IDC released its
Digital Universe report estimating data growth. The IDC
researchers estimated that stored data grew by 62
Figure 3. Comparison of Sample Program Response percent or 800,000 petabytes in 2010. By the end of 2011,
Times on Traditional Database and SAP HANA with 1.2 zettabytes (ZB) of information will be held in computer
and without Code Optimization.
storage systems, according to IDC. [7]. Borka, J. B., & Chien A., (2011). Communications of
And that is only the beginning of the story. By 2020, data the ACM, Vol.54 No. 5, Pages 67-77, The Future of
will have grown another 44-fold. Only 25 percent of that Microprocessors. Retrieved from
data will be original. The rest will be copies [17]. In a survey http://cacm.acm.org/magazines/2011/5/107702-the-
of 237 respondents from a broad range of organizations future-of-microprocessors/fulltext#T1
12% indicated they had installed an in-memory [8]. Missbach, M., Hoffman U. (2001). Sap Hardware
database in the last 5 years, 27% between 2 and 5 years Solutions: Servers, Storage, and Networks for Mysap.Com
ago, 24% between 1-2 years ago, 15% within the last 6 (pp. 120-121, 162). Hewlett-Packard Company, Prentice
months to 1 year ago and 22% within the last 6 months. Hall: Upper Saddle River, NJ.
The data indicates the technology has had some [9]. ITL Education Solutions Unlimited (2006). Introduction
attention for nearly 5 years, and while it is still maturing to its to Computer Science (p.126). Dorling Kindsersley (India)
potential it is certainly a technology worth further Pvt. Ltd.
consideration whether it has been evaluated previously or
[10]. Vey, G., Krutov, I., (2012). SAP In-Memory Computing
not.
on IBM eX5 Systems. Retrieved on October 9, 2012, from
Adding to the problem of exploding data is the new reality h t t p : / / w w w. r e d b o o k s . i b m . c o m / r e d p a p e r s /
of a declining benefit of investment on physical disks and pdfs/redp4814.pdf
an apparent limitation in the physics of spinning disks.
[11]. Plattner, H., Zeie, A., (2011). In-Memory Data
With the continued need for real-time mobile access to all
Management: An Inflection Point for Enterprise
of a companies' data, this convergence technology is
Applications (pp.1-2). Springer-Verlag Berlin Heidelberg:
creating the demand for innovative solutions such as in-
London, New York.
memory computing.
[12]. Pettey, C., & Tudor, B. (2010). Gartner Survey Shows
References
Data Growth as the Largest Data Center Infrastructure
[1]. Codd, E. F. (1970). A Relational Model of Data for Challenge. Retrieved from http://www.gartner.com/
Large Shared Data Banks Retrieved on June 12, 2012, it/page.jsp?id=1460213
from http://www.seas.upenn.edu/~zives/03f/cis550/
[13]. Dumbill, E. (2012). Volume, Velocity, Variety: What
codd.pdf
You Need to Know About Big Data. Retrieved on October
[2]. Sumathi, S., Esakkirajan, S., (2007). Fundamentals of 14, 2012, from http://www.forbes.com/sites/oreillymedia/
Relational Database Management Systems. (pp. 96- 2012/01/19/volume-velocity-variety-what-you-need-to-
97).Springer-Verlag Berlin Heidelberg. know-about-big-data/
[3]. King, C. (2012). SAP Innovation Day - Setting a [14]. Omar, R. (2012). Personalized Patient Treatments.
Standard for the Next 40 Years. Retrieved on October 14, Retrieved from https://www.experiencesaphana.com/
2 0 1 2 f r o m h t t p: / / w w w. e c o m m e r c e t i m e s. c o m / docs/DOC-1577
story/75896.html
[15]. Stagge, C. (2011). Four Use Cases of SAP HANA.
[4]. Bachman, C. (1965). Integrated Data Store." DPMA Retrieved on September 29, 2012, from
Quarterly, January. http://en.sap.info/teched-2011-madrid-sikka-keynote-
[5]. Kroenke, D.M., & Auer, D.J., (2012). Database hana/59111
processing: Fundamentals, design, and implementation. [16]. Rowe, N. (2012). In-memory Computing: Lifting the
(12 ed.). Pearson:Boston. Burden of Big Data Retrieved from
[6]. Brock, D. (2006). Understanding Moore's Law: Four http://www.aberdeen.com/Aberdeen-Library/7377/RA-
Decades of Innovation, (pp. 25-37). Chemical Heritage big-data-business-intelligence.aspx
Foundation: Philadelphia, PA.
Craig Brockman is a business analyst for Lockheed Martin. He has over 14 years of experience in aeronautics, IT Project
Management, IT systems strategies, RFID implementations, programming and many other technologies.