You are on page 1of 124

http://saphanatutorial.

com/prerequisites-for-learning-saphana/
Prerequisite to learn SAP HANA
Do you want to learn and make your career in SAP HANA but don't know from where to start?
Do you have questions like:

What are the prerequisite to learn SAP HANA?

Is ABAP required to learn SAP HANA?

I don't know SAP BW, SAP BO, SAP BI. Can I learn and make career in SAP HANA?

Can people with varied background (JAVA, PHP, .NET, JavaScript, HTML etc.) and no prior
SAP knowledge be able to succeed in SAP HANA?

From where to start learning SAP HANA?

How to get access to cost effective SAP HANA Server, SAP HANA Studio and other client
tools?

What are the SAP HANA certifications and how does it help in boosting my career?

If I want to make a career in SAP HANA what is the roadmap?

Do you feel like this?

Then this article is for you. Continue reading and all your questions will be answered.

Career Growth in SAP HANA:


If you want to ensure your career growth then your best bet is SAP HANA. SAP has positioned
HANA as the database as well as the platform on which all applications will run in the future. This

migration is already starting to happen. Also HANA is extremely sell-able to clients looking to invest
in IT as it is both the now and the future.

The career in HANA is quite enriching and would go a long way.


SAP HANA has become the fastest growing software product in the world.

To know more, check the article Top 10 Reasons Customers Choose SAP HANA

Lets take a look into the SAP HANA from a beginner's point of view and get some answers.

What are the prerequisite to learn SAP HANA?


At its core, SAP HANA is like a relational database. You should know Database Conceptsand
should have basic knowledge of SQL before starting SAP HANA.

You can find plenty of materials on web on database concepts and SQL. Or let me Google that for
you. :)

Is ABAP required to learn SAP HANA?


NO.
The knowledge of ABAP may help in understanding the Business logic implementation. It is an
advantage, but not mandatory.

However if you are already an experienced ABAP programmer and thinking: Whether ABAPer time
has come to an end? Then also you should not worry.
There is a new ABAP - Which is SMARTER, LIGHTER and FASTER" - and at the bottom sits SAP
HANA powering ABAP silently without any disruption. This is what we call "ABAP FOR SAP HANA".

Learning SAP HANA together with ABAP will give a new boost to your career.

I don't know SAP BW, SAP BO, SAP BI. Can I learn and make
career in SAP HANA?
Lets check it one by one.
SAP Business Information Warehouse (SAP BW):
Knowing BW helps you in understanding Modeling concepts and when you want DXC to transfer
data from SAP Business Suite System to HANA.

But even if you don't have the knowledge of BW, you can easily learn HANA Modeling concepts.
BW knowledge is a must if you are going to work on BW on HANA.
SAP BusinessObjects (BO):
BO or Business Objects is the Front end Reporting tool set from SAP.
If you have knowledge of BO then reporting on HANA would be a piece of cake for you. But even if
you don't have knowledge of BO, when you start learning HANA Concepts on Reporting you will be
easily able learn BO concepts.

You might want to gain the understanding of different reporting tools in BO (Explorer, WebI etc).
There are many step by step guides that can help you to learn these tools.

SAP BI:
BI or SAP BI is the Data Warehousing package implementation tool from SAP. The realization of
Data Warehousing Concepts in SAP BI will help understand the implementation aspects from BW
on HANA perspective. Again unless you are planning to work on BW on HANA, you don't
necessarily have to learn SAP BI.

Can people with varied background (JAVA, PHP, .NET, JavaScript,


HTML etc.) and no prior SAP knowledge be able to succeed in
SAP HANA?
As we discussed earlier, prior knowledge of SAP is an advantage, but it is not must.
If you are from different technology background like JAVA, JavaScript, PHP etc you can still step
into SAP HANA and upgrade your resume.

Java, PHP, Python, .NET works pretty well with SAP HANA

From where to start learning SAP HANA?


We are providing FREE Online Training courses.
Click here to open SAP HANA Online Training Courses: SAP HANA Online Training Courses

You can also find lots of SAP HANA learning materials at this site. Just go to the Learning material
section of SAP HANA Tutorial. The contents are categorized in a nice and simple way.

Once you have gain knowledge in SAP HANA, you should also test your understanding by checking
SAP HANA Interview Questions and Answers

Note: We will come up with more topics on SAP HANA. If you want a particular topic to be included
please leave a comment.

How to get access to SAP HANA Server, SAP HANA Studio and
other client tools?
To get free access to SAP HANA Server, check the article:
SAP HANA Server Access

SAP HANA Studio and Clients:


Check the article SAP HANA Studio Overview

What are the SAP HANA certifications and how does it help in
boosting my career?
SAP offers 2 main certification paths.
1 path is for administration and operations and is considered more technical.
The 2nd path is for implementation and modeling.

To know more about them, check the article SAP HANA Certification

If I want to make a career in SAP HANA what is the roadmap?

If you want a career in HANA there are several areas you can specialize in such as:
The SAP HANA Modeling
In this role you will need SAP HANA modeling skills. SAP BW on HANA skills will also come in
handy for you.
The SAP HANA Modeler learning roadmap also has an associate and professional certification that

will boost your career.


The SAP HANA Application Development
In this role you will need the SAP HANA development skill. Based on the application types HANA
XS or ABAP on HANA knowledge will also be helpful.
The SAP HANA Database Administration and Security
In this role you will be responsible for backup/recovery, performance, security and general
administration on the SAP HANA database.
The SAP HANA Data Replication SLT Replication, BODS and DXC
In this role you will be responsible for setting up replication, support data replication from different
source to HANA systems.

This topic is not finished yet. We will come up with a more specific and time-line based roadmap.
Currently this a little difficult to outline the SAP HANA Career Roadmap as market trend is evolving
and there will be several new areas which will be enlightened in future.

Click here to check SAP HANA Training Materials


SAP HANA Training Materials

Click here to open SAP HANA Online Training Courses


SAP HANA Online Training Courses

Click here to open SAP HANA Certifications


All about SAP HANA Certifications

If any of your question is still not answered, feel free to contact us or leave a comment. We will try
our best to guide you in SAP HANA journey.

Introduction To SAP HANA Database- For


Beginners
SAP HANA Database:

SAP HANA is an in-memory database:

- It is a combination of hardware and software made to process massive real


time data using In-Memory computing.

- It combines row-based, column-based database technology.

- Data now resides in main-memory (RAM) and no longer on a hard disk.

- Its best suited for performing real-time analytics, and developing and
deploying real-time applications.

An in-memory database means all the data is stored in the memory (RAM). This is no time wasted
in loading the data from hard-disk to RAM or while processing keeping some data in RAM and
temporary some data on disk. Everything is in-memory all the time, which gives the CPUs quick
access to data for processing.

The speed advantages offered by this RAM storage system are further accelerated by the use of
multi-core CPUs, and multiple CPUs per board, and multiple boards per server appliance.

Complex calculations on data are not carried out in the application layer, but are moved to the
database.

SAP HANA is equipped with multiengine query processing environment which supports relational as
well as graphical and text data within same system. It provides features that support significant
processing speed, handle huge data sizes and text mining capabilities.

Conclusion: SAP HANA In Memory Technology

So is SAP making/selling the software or the hardware?


SAP has partnered with leading hardware vendors (HP, Fujitsu, IBM, Dell etc) to sell SAP certified
hardware for HANA. SAP is selling licenses and related services for the SAP HANA product which
includes the HANA database, easy to use data modeling tool called HANA studio and other software
to load data in the database.

SAP HANA Hardware Partners:

Want to know more about SAP HANA Hardware? Check the article - SAP HANA hardware

SAP HANA Architecture

With the help of technology like SLT replication, data can be moved to HANA in real time. It is also
possible to copy data from SAP BW or other database into SAP HANA. In HANA, we can use
modeling tool called HANA Studio to build the logic and structures and use tools e.g. SAP
BusinessObjects, SAP Visual Intelligence to visualize or analyze data.

Want to know more about SAP HANA Architecture? Check this.


An insight into SAP HANA Architecture

Can I just increase the memory of my traditional Oracle


database to 2TB and get similar performance?
NO. You might have performance gains due to more memory available for your current
Oracle/Microsoft/Teradata database but HANA is not just a database with bigger RAM. It is a
combination of a lot of hardware and software technologies. The way data is stored and processed
by the In-Memory Computing Engine (IMCE) is the true differentiator. Having that data available in
RAM is just the icing on the cake.

SAP HANA Database: Business Value Proposition

Make Decisions in Real-time Access to real time analysis; fast and easy creation of ad-hoc
business statistics.

Accelerate Business Processes Increase speed of information processes such as


planning, forecasting, pricing, offers..
Unlock New Insights Remove constraints for analyzing massive data volumes, trends, data
mining, predictive analytics
Improve IT Efficiency Manage growing data volume and complexity with lower cost of

ownership

Top 10 Reasons Customers Choose SAP HANA


SAP HANA is one of the fastest growing products in SAPs history and is viewed by the industry as
a break through solution for in-memory databases. SAP HANA claims that it accelerates analytics
and applications on a single, in-memory platform as well as combining databases, data processing,
and application platform capabilities.

SAP HANA is a next-generation business platform which brings together

Business transactions

Advanced analytics

Social media

Mobile experience

Collaborative business

Design connections

You may be thinking, So what? or How does this help my business? or How can SAP HANA
help my company make more money?
In this article, we look at what we consider to be the top 10 reasons why customers should choose
SAP HANA.

1. Speed:

The speed SAP HANA enables is sudden and significant, and has the potential to transform
entire business models.

SAP HANA manages massive data volume at high speeds.


It delivers the real real-time enterprise through the most advanced in-memory technology.
SAP HANA provides a foundation on which to build a new generation of applications, enabling
customers to analyze large quantities of data from virtually any source, in real time.

A live analysis by a consumer products company reveals how SAP HANA analyzes current point-ofsale data in real timeempowering this organization to review segmentation, merchandising,
inventory management, and forecasting information at the speed of thought.

2. Real Time:
SAP HANA delivers the real real-time enterprise through the most advanced in-memory
technology

Pull up-to-the-minute data from multiple sources. Evaluate options to balance financial, operational,
and strategic goals based on todays business

3. Any Data:
SAP HANA helps you to gain insights from structured and unstructured data.

SAP HANA integrates structured and unstructured data from internal and external sources, and can
work on detailed data without aggregations.

4. Any Source:
SAP HANA provides multiple ways to load your data from existing data sources into SAP
HANA.

SAP HANA can be integrated into a wide range of enterprise environments, allowing it to handle
data from Oracle databases, Microsoft SQL Server, and IBM DB2.

5. Insight - Unlock new insights with predictive, complex


analysis:
Before SAP HANA, analytics meant:

Preconfigured dashboards based on fixed business requirements.

Long wait times to produce custom reports.

Reactive views and an inability to define future expectations.


With SAP HANA, you can:

Quickly and easily create ad-hoc views without needing to know the data or query type - allowing
you to formulate your actions based on deep insights

Receive quick reactions to newly articulated queries so you can innovate new processes and
business models to outpace the competition.

Enable state-of-the-art, interactive analyses such as simulations and pattern recognition to create
measurable, targeted actions.

6. Innovation - The ultimate platform for business innovation:


SAP HANA is an early innovator for in-memory computing. Its configurability, easy integration, and
revolutionary capabilities make it flexible enough for virtually anything your business requires.

Some examples of this include:

Energy Management

Utility companies use SAP HANA to process and analyze vast amounts of data generated by smart
meter technology, improving customers energy efficiency, and driving sustainability initiatives.
Real-time Transit Routing

SAP HANA is helping research firms calculate optimal driving routes using real-time GPS data
transmitted from thousands of taxis.
Software Piracy Detection and Prevention

Tech companies use SAP HANA to analyze large volumes of complex data to gain business insights
into software piracy, develop preventive strategies, and recover revenue.

7. Simplicity - Fewer layers, simpler landscape, lower cost:

Reduce or eliminate the data aggregation, indexing, mapping and exchange-transfer-load (ETL)
needed in complex data warehouses and marts.

Incorporate prepackaged business logic, in-memory calculations and optimization for multicore 64bit processors.

Spend less on real-time computing

8. Cloud:

Step up to one of the worlds most advanced clouds.


SAP HANA powers SAPs next- generation enterprise cloud.

Fast:

A single-location stack removes latency enabling real-time collaboration, processing, and


planning.
Scalable:

A highly robust cloud service allows quick deployment of current and next generation applications,
scaled to your business needs.
Secure:

We secure your data through the entire cloud solution with independently audited standards of data
security and governance.

9. Cost:
SAP HANA reduces your total IT cost so you can increase spending on innovation.

10. Choice:
SAP HANA provides you choice at every layer to work with your preferred partners.

Run on the hardware of your choice.

Work with the software you prefer.


Collaboration with a number of partners means that SAP can complete the software stacks of our
diverse customer base in configurations that make sense for their business.
Plus, a variety of different options means that you wont be locked in by a single provider.

Next

SAP HANA Hardware

SAP HANA is a combination of hardware and software made to process massive real time data
using In-Memory computing. To leverage the full power of the SAP HANA platform, you need the
right hardware infrastructure.

The SAP HANA can only be installed and configured by certified hardware partners.

SAP HANA Hardware Partners:

Currently SAP HANA Hardware partners are:


HP, FUJITSU,CISCO, IBM, HITACHI, NEC and DELL.

You can find all SAP HANA components and respective SAP HANA hardware and software
requirements in the Product Availability Matrix(PAM).

More about SAP HANA Hardware Partners:


HP:
http://h30507.www3.hp.com/t5/Converged-Infrastructure/HP-supports-SAP-in-the-unveiling-of-SAPBusiness-Suite-powered/ba-p/129773

IBM:
http://www-03.ibm.com/systems/power/solutions/bigdata-analytics/sap-hana/

Fujitsu:
http://www.fujitsu.com/fts/solutions/high-tech/solutions/datacenter/sap/hana/

Cisco:
http://www.cisco.com/en/US/netsol/ns1160/index.html

Hitachi:
http://www.hds.com/solutions/applications/sap-application/

NEC:
http://www.nec.com/en/global/prod/express/related/sap_certified.html

DELL:
http://www.dell.com/Learn/us/en/555/shared-content~data-sheets~en/Documents~sap-hana-techsheet.pdf

SAP HANA Architecture Overview:

The SAP HANA database is developed in C++ and runs on SUSE Linux Enterpise Server. SAP
HANA database consists of multiple servers and the most important component is the Index Server.
SAP HANA database consists of Index Server, Name Server, Statistics Server, Preprocessor Server
and XS Engine.

Index Server:

Index server is the main SAP HANA database component

It contains the actual data stores and the engines for processing the data.

The index server processes incoming SQL or MDX statements in the context of
authenticated sessions and transactions.
Persistence Layer:
The database persistence layer is responsible for durability and atomicity of transactions. It ensures
that the database can be restored to the most recent committed state after a restart and that
transactions are either completely executed or completely undone.
Preprocessor Server:
The index server uses the preprocessor server for analyzing text data and extracting the information
on which the text search capabilities are based.

Name Server:
The name server owns the information about the topology of SAP HANA system. In a distributed
system, the name server knows where the components are running and which data is located on
which server.
Statistic Server:
The statistics server collects information about status, performance and resource consumption from
the other servers in the system.. The statistics server also provides a history of measurement data
for further analysis.
Session and Transaction Manager:
The Transaction manager coordinates database transactions, and keeps track of running and
closed transactions. When a transaction is committed or rolled back, the transaction manager
informs the involved storage engines about this event so they can execute necessary actions.
XS Engine:
XS Engine is an optional component. Using XS Engine clients can connect to SAP HANA database
to fetch data via HTTP.

The heart of SAP HANA Index Server

The SAP HANA Index Server contains the majority of the magic behind SAP HANA.

Connection and Session Management

This component is responsible for creating and managing sessions and connections for the
database clients.

Once a session is established, clients can communicate with the SAP HANA database using
SQL statements.

For each session a set of parameters are maintained like, auto-commit, current transaction
isolation level etc.

Users are Authenticated either by the SAP HANA database itself (login with user and
password) or authentication can be delegated to an external authentication providers such as an
LDAP directory.
The Authorization Manager

This component is invoked by other SAP HANA database components to check whether the
user has the required privileges to execute the requested operations.

SAP HANA allows granting of privileges to users or roles. A privilege grants the right to
perform a specified operation (such as create, update, select, execute, and so on) on a specified
object (for example a table, view, SQLScript function, and so on).

The SAP HANA database supports Analytic Privileges that represent filters or hierarchy
drilldown limitations for analytic queries. Analytic privileges grant access to values with a certain
combination of dimension attributes. This is used to restrict access to a cube with some values of
the dimensional attributes.
Request Processing and Execution Control:

The client requests are analyzed and executed by the set of components summarized as
Request Processing and Execution Control. The Request Parser analyses the client request and
dispatches it to the responsible component. The Execution Layer acts as the controller that invokes
the different engines and routes intermediate results to the next execution step.

SQL Processor:

Incoming SQL requests are received by the SQL Processor. Data manipulation
statements are executed by the SQL Processor itself.

Other types of requests are delegated to other components. Data definition


statements are dispatched to the Metadata Manager, transaction control statements are forwarded
to the Transaction Manager, planning commands are routed to the Planning Engine and procedure
calls are forwarded to the stored procedure processor.
SQLScript:

The SAP HANA database has its own scripting language named SQLScript that is designed
to enable optimizations and parallelization. SQLScript is a collection of extensions to SQL.

SQLScript is based on side effect free functions that operate on tables using SQL queries for
set processing. The motivation for SQLScript is to offload data-intensive application logic into the
database.
Multidimensional Expressions (MDX):

MDX is a language for querying and manipulating the multidimensional data stored in OLAP
cubes.

Incoming MDX requests are processed by the MDX engine and also forwarded to the Calc
Engine.

Planning Engine:

Planning Engine allows financial planning applications to execute basic planning operations
in the database layer. One such basic operation is to create a new version of a data set as a copy of
an existing one while applying filters and transformations. For example: planning data for a new
year is created as a copy of the data from the previous year.

Another example for a planning operation is the disaggregation operation that distributes
target values from higher to lower aggregation levels based on a distribution function.
Calc engine:

The SAP HANA database features such as SQLScript and Planning operations are
implemented using a common infrastructure called the Calc engine.

The SQLScript, MDX, Planning Model and Domain-Specific models are converted into
Calculation Models. The Calc Engine creates Logical Execution Plan for Calculation Models. The
Calculation Engine will break up a model, for example some SQL Script, into operations that can be
processed in parallel.
Transaction Manager:
In HANA database, each SQL statement is processed in the context of a transaction. New sessions
are implicitly assigned to a new transaction. The Transaction Manager coordinates database
transactions, controls transactional isolation and keeps track of running and closed transactions.
When a transaction is committed or rolled back, the transaction manager informs the involved
engines about this event so they can execute necessary actions.
The transaction manager also cooperates with the persistence layer to achieve atomic and durable
transactions.
Metadata Manager:

Metadata can be accessed via the Metadata Manager component. In the SAP HANA
database, metadata comprises a variety of objects, such as definitions of relational tables, columns,
views, indexes and procedures.

Metadata of all these types is stored in one common database catalog for all stores. The
database catalog is stored in tables in the Row Store. The features of the SAP HANA database
such as transaction support and multi-version concurrency control, are also used for metadata
management.

In the center of the figure you see the different data Stores of the SAP HANA database. A store is a
sub-system of the SAP HANA database which includes in-memory storage, as well as the
components that manages that storage.

The Row Store:


The Row Store is the SAP HANA database row-based in-memory relational data engine.
The Column Store:
The Column Store stores tables column-wise. It originates from the TREX (SAP NetWeaver Search
and Classification) product.
Want to know more about Row Data and Column Data Storage?
Check Column Data Storage vs Row Data Storage: How Different are they Really?

Persistence Layer:
The Persistence Layer is responsible for durability and atomicity of transactions. This layer ensures
that the database is restored to the most recent committed state after a restart and that transactions
are either completely executed or completely undone. To achieve this goal in an efficient way, the
Persistence Layer uses a combination of write-ahead logs, shadow paging and savepoints.

The Persistence Layer offers interfaces for writing and reading persisted data. It also contains the
Logger component that manages the transaction log. Transaction log entries are written explicitly by
using a log interface or implicitly when using the virtual file abstraction

Column Vs Row Data Storage


In the article An insight into SAP HANA Architecture we explained the basic architecture of SAP
HANA.
In this article we will learn about column data store and row data store in HANA.

Overview of Row Data Storage and Column Data Storage


Relational databases typically use row-based data storage. However Column-based storage is
more suitable for many business applications. SAP HANA supports both row-based and columnbased storage, and is particularly optimized for column-based storage.

As shown in the figure below, a database table is conceptually a two-dimensional structure


composed of cells arranged in rows and columns.

Because computer memory is structured linearly, there are two options for the sequences of cell
values stored in contiguous memory locations:

Row Storage - It stores table records in a sequence of rows.


Column Storage - It stores table records in a sequence of columns i.e. the entries of a column is
stored in contiguous memory locations.

Traditional databases store data simply in rows. The HANA in-memory database stores data in both
rows and columns. It is this combination of both storage approaches that produces the speed,
flexibility and performance of the HANA database.

Advantages of column-based tables:

Faster Data Access:


Only affected columns have to be read during the selection process of a query. Any of the columns
can serve as an index.

Better Compression:
Columnar data storage allows highly efficient compression because the majority of the columns
contain only few distinct values (compared to number of rows).

Better parallel Processing:


In a column store, data is already vertically partitioned. This means that operations on different
columns can easily be processed in parallel. If multiple columns need to be searched or
aggregated, each of these operations can be assigned to a different processor core

Advantages and disadvantages of row-based tables:


Row based tables have advantages in the following circumstances:

The application needs to only process a single record at one time (many selects
and/or updates of single records).

The application typically needs to access a complete record (or row).

Neither aggregations nor fast searching are required.

The table has a small number of rows (e. g. configuration tables, system tables).

Row based tables have dis-advantages in the following circumstances:

In case of analytic applications where aggregation are used and fast search and
processing is required. In row based tables all data in a row has to be read even though the
requirement may be to access data from a few columns.

Which type of tables should be preferred - Row-based or Columnbased?


In case of analytic applications where aggregations are used and fast search and processing is
required row-based storage are not good. In row based tables all data in a row has to be read even

though the requirement may be to access data from a few columns. Hence these queries on huge
amounts of data take a lot of time.

In columnar tables, this information is stored physically next to each other, significantly increasing
the speed of certain data queries.

The following example shows the different usage of column and row storage, and positions them
relative to row and column queries. Column storage is most useful for OLAP queries (queries using
any SQL aggregate functions) because these queries get just a few attributes from every data entry.
But for traditional OLTP queries (queries not using any SQL aggregate functions), it is more
advantageous to store all attributes side-by-side in row tables. HANA combines the benefits of both

row- and column-storage tables.

To enable fast on-the-fly aggregations, ad-hoc reporting, and to benefit from compression
mechanisms it is recommended that transaction data is stored in a column-based table.

The SAP HANA data-base allows joining row-based tables with column-based tables. However, it is
more efficient to join tables that are located in the same row or column store. For example, master
data that is frequently joined with transaction data should also be stored in column-based tables

HANA Memory Usage


So far we learnt about SAP HANA Architecture and Column Data Storage Vs Row Data Storage
In this article we will explain memory usage and calculation in SAP HANA.

Introduction:

SAP HANA is a leading in-memory database and data management platform, specifically developed
to take full advantage of the capabilities provided by modern hardware to increase application
performance. By keeping all relevant data in main memory (RAM), data processing operations are
significantly accelerated.

"SAP HANA has become the fastest growing product in SAP's history."

A fundamental SAP HANA resource is memory. Understanding how the SAP HANA system
requests, uses and manages this resource is crucial to the understanding of SAP HANA. SAP
HANA provides a variety of memory usage indicators, to allow monitoring, tracking and alerting.

This article explores the key concepts of SAP HANA memory utilization, and shows how to
understand the various memory indicators.

Memory Concepts:
As an in-memory database, it is critical for SAP HANA to handle and track its memory consumption
carefully and efficiently. For this purpose, the SAP HANA database pre-allocates and manages its

own memory pool and provides a variety of memory usage indicators to allow monitoring.

SAP HANA tracks memory from the perspective of the host. The most important concepts are as
follows:

Physical memory:
The amount of (system) physical memory available on the host.

SAP HANA Allocated memory


The memory pool reserved by SAP HANA from the operating system.

SAP HANA Used memory


The amount of memory from this pool that is actually used by the SAP HANA database.

Determining Physical Memory Size:


Physical memory (DRAM) is the basis for all memory discussions. On most SAP HANA hosts, it
ranges from 256 gigabytes to 2 terabytes. It is used to run the Linux operating system, SAP HANA,
and all other programs that run on the host. The following table lists the various ways of determining

the amount of physical memory:

You can use the M_HOST_RESOURCE_UTILIZATION view to explore the amount of Physical
Memory as follows:

Determine Available Physical Memory:


Execute the SQL query:
select round((USED_PHYSICAL_MEMORY + FREE_PHYSICAL_MEMORY) /1024/1024/1024, 2)
as "Physical Memory GB"
from PUBLIC.M_HOST_RESOURCE_UTILIZATION;

Execute the Linux command:


cat /proc/meminfo | grep MemTotal

Determine Free Physical Memory:


Execute the SQL query:
select round(FREE_PHYSICAL_MEMORY/1024/1024/1024, 2)
as "Free Physical GB"
from PUBLIC.M_HOST_RESOURCE_UTILIZATION;

Execute the Linux command:


awk 'BEGIN {sum = 0};
/^(MemFree|Buffers|Cached):/ {sum = sum + $2}; END {print sum}' /proc/meminfo

SAP HANA Allocated Memory Pool:


The SAP HANA database (across its different processes) reserves a pool of memory before actual
use.

This pool of allocated memory is pre-allocated from the operating system over time, up to a
predefined global allocation limit, and is then efficiently used as needed by the SAP HANA database
code. More memory is allocated to the pool as used memory grows. If used memory nears the
global allocation limit, the SAP HANA database may run out of memory if it cannot free memory.
The default allocation limit is 90% of available physical memory, but this value is configurable.

To find the global allocation limit of the database, run below SQL query:
select HOST, round(ALLOCATION_LIMIT/1024/1024/1024, 2) as "Allocation Limit GB"
from PUBLIC.M_HOST_RESOURCE_UTILIZATION

Effective Allocation Limit:


In addition to the global allocation limit, each process running on the host has an allocation limit, the
process allocation limit. Given that all processes cannot collectively consume more memory than
the global allocation limit, each process also has what is called an effective allocation limit. The
effective allocation limit of a process specifies how much physical memory a process can in reality
consume given the current memory consumption of other processes.

Example:
A single-host system has 100 GB physical memory. Both the global allocation limit and the individual
process allocation limits are 90% (default values). This means the following:

Collectively, all processes of the HANA database can use a maximum of 90 GB.

Individually, each process can use a maximum of 90 GB.


If 2 processes are running and the current memory pool of process 1 is 50 GB, then the effective
allocation limit of process 2 is 40 GB. This is because process 1 is already using 50 GB and
together they cannot exceed the global allocation limit of 90 GB.

SAP HANA Used Memory:


Used memory serves several purposes:

Program code and stack

Working space and data tables (heap and shared memory)


The program code area contains the SAP HANA database itself while it is running. Different parts
of SAP HANA can share the same program code.

The stack is needed to do actual computations.

The heap and shared memory are the most important part of used memory. It is used for working

space, temporary data and for storing all data tables.

You can use the M_SERVICE_MEMORY view to explore the amount of SAP HANA Used Memory
as follows:

Total Memory Used:


SELECT round(sum(TOTAL_MEMORY_USED_SIZE/1024/1024)) AS "Total Used MB"
FROM SYS.M_SERVICE_MEMORY;

Code and Stack Size:


SELECT round(sum(CODE_SIZE+STACK_SIZE)/1024/1024) AS "Code+stack MB"
FROM SYS.M_SERVICE_MEMORY;

Total Memory Consumption of All Columnar Tables:


SELECT round(sum(MEMORY_SIZE_IN_TOTAL)/1024/1024) AS "Column Tables MB"
FROM M_CS_TABLES;

Total Memory Consumption of All Row Tables

SELECT round(sum(USED_FIXED_PART_SIZE +
USED_VARIABLE_PART_SIZE)/1024/1024) AS "Row Tables MB"
FROM M_RS_TABLES;

Total Memory Consumption of All Columnar Tables by Schema:


SELECT SCHEMA_NAME AS "Schema",
round(sum(MEMORY_SIZE_IN_TOTAL) /1024/1024) AS "MB"
FROM M_CS_TABLES GROUP BY SCHEMA_NAME ORDER BY "MB" DESC;

Memory Consumption of Columnar Tables:


The SAP HANA database loads columnar tables into memory column by column only upon use.
This is sometimes called "lazy loading". This means that columns that are never used are not
loaded, which avoids memory waste.

When the SAP HANA database runs out of allocated memory, it may also unload rarely used
columns to free up some memory. Therefore, if it is important to precisely measure the total, or
"worst case", amount of memory used for a particular table, it is best to ensure that the table is fully
loaded first by executing the following SQL statement:
LOAD table_name ALL.
To examine the memory consumption of columnar tables, you can use the M_CS_TABLES and
M_CS_COLUMNS views.

The following examples show how you can use these views to examine the amount of memory
consumed by a specific table. You can also see which of its columns are loaded and the
compression ratio that was accomplished.

List All Columnar Tables of Schema 'SYSTEM':


SELECT TABLE_NAME AS "Table", round(MEMORY_SIZE_IN_TOTAL/1024/1024, 2) as "MB"
FROM M_CS_TABLES WHERE SCHEMA_NAME = 'SYSTEM' ORDER BY "MB" DESC;

Show Column Details of Table "TABLE1":


SELECT COLUMN_NAME AS "Column", LOADED AS "Is Loaded",
round(UNCOMPRESSED_SIZE/1024/1024) AS "Uncompressed MB",
round(MEMORY_SIZE_IN_MAIN/1024/1024) AS "Main MB",
round(MEMORY_SIZE_IN_DELTA/1024/1024) AS "Delta MB",
round(MEMORY_SIZE_IN_TOTAL/1024/1024) AS "Total Used MB",

round(COMPRESSION_RATIO_IN_PERCENTAGE/100, 2) AS "Compr. Ratio"


FROM M_CS_Columns WHERE TABLE_NAME = 'TABLE1;

Note: The M_CS_TABLES and M_CS_COLUMNS views contain a lot of additional information
(such as cardinality, main-storage versus delta storage and more). For example, use the following
query to obtain more information:
SELECT * FROM M_CS_COLUMNS WHERE TABLE_NAME = '"' and COLUMN_NAME = '"'

Memory Consumption of Row-Ordered Tables:


Several system tables are in fact row-ordered tables. You can use the M_RS_TABLES view to
examine the memory consumption of row-ordered tables.

For instance, you can execute the following SQL query, which lists all row tables of schema "SYS"
by descending size:
SELECT SCHEMA_NAME, TABLE_NAME, round((USED_FIXED_PART_SIZE +
USED_VARIABLE_PART_SIZE)/1024/1024, 2) AS "MB Used"
FROM M_RS_TABLES
WHERE schema_name = 'SYS' ORDER BY "MB Used" DESC, TABLE_NAME

Memory Consumption Configuration:


By default, SAP HANA can pre-allocate up to 90% of the available physical memory on the host.
There is normally no reason to change the value of this variable, except in the case where a license
was purchased for less than the total of the physical memory. In this case, you should change the
global allocation limit to remain in compliance with the license.

Example 1:
You have a server with 512GB, but purchased an SAP HANA license for only 384 GB. Set the
global_allocation_limit to 393216 (384 * 1024 MB).

Example 2:
You have a distributed HANA system on four hosts with 512GB each, but purchased an SAP HANA
license for only 768 GB. Set the global_allocation_limit to 196608 (192 * 1024 MB on each host).

Resident memory:
Resident memory is the physical memory actually in operational use by a process.

Over time, the operating system may "swap out" some of a process' resident memory, according to
a least-recently-used algorithm, to make room for other code or data. Thus, a process' resident
memory size may fluctuate independently of its virtual memory size. In a properly sized SAP HANA
appliance there is enough physical memory, and thus swapping is disabled and should not be
observed.

To display the size of the Physical Memory and Resident part, you can use the following SQL
command:
select HOST, round((USED_PHYSICAL_MEMORY +
FREE_PHYSICAL_MEMORY)/1024/1024/1024, 2) as "Physical Memory GB",
round(USED_PHYSICAL_MEMORY/1024/1024/1024, 2) as "Resident GB"
from PUBLIC.M_HOST_RESOURCE_UTILIZATION

Memory Sizing:
Memory sizing is the process of estimating, in advance, the amount of memory that will be required
to run a certain workload on SAP HANA. To understand memory sizing, you will need to answer the
following questions:

1. What is the size of the data tables that will be stored in SAP HANA?
You may be able to estimate this based on the size of your existing data, but unless you precisely
know the compression ratio of the existing data and the anticipated growth factor, this estimate may
only be partially meaningful.

2. What is the expected compression ratio that SAP HANA will apply to these tables?
The SAP HANA Column Store automatically uses a combination of various advanced compression
algorithms (dictionary, LRE, sparse, and more) to best compress each table column separately. The
achieved compression ratio depends on many factors, such as the nature of the data, its
organization and data-types, the presence of repeated values, the number of indexes (SAP HANA

requires fewer indexes), and more.

3. How much extra working memory will be required for DB operations and temporary
computations?
The amount of extra memory will somewhat depend on the size of the tables (larger tables will
create larger intermediate result-tables in operations like joins), but even more on the expected
work load in terms of the number of users and the concurrency and complexity of the analytical
queries (each query needs its own workspace).

SAP Notes 1514966, 1637145 and 1736976 provide additional tools and information to help you
size the required amount of memory, but the most accurate method is ultimately to import several
representative tables into a SAP HANA system, measure the memory requirements, and extrapolate
from the results.

SAP HANA Studio:


You can view some of the most important memory indicators on the Overview tab of the SAP HANA
studio administrative perspective:

For even more details, check out the new Memory Overview feature of the SAP HANA studio. To

access it, right click on a system in the Systems View, and select "Open Memory Overview" in the
context menu, as follows:

This will open the Memory Overview, which looks as follows:

Note: To view the Memory Overview, you need Monitoring privileges. E.g. use the following SQL
statement (replace 'youruser' with the actual user name): call
GRANT_ACTIVATED_ROLE('sap.hana.admin.roles::Monitoring','youruser')

Summary:
SAP HANA maintains many system views and memory indicators, to provide a precise way to
monitor and understand the SAP HANA memory utilization. The most important of these indicators
is Used Memory and the corresponding historic snapshots. In turn, it is possible to drill down into
very detailed reports of memory utilization using additional system views, or by using the convenient
Memory Overview from the SAP HANA studio.

Since SAP HANA contains its own memory manager and memory pool, external indicators, like the
host-level Resident Memory size, or the process-level virtual and resident memory sizes, can be
misleading when estimating the real memory requirements of a SAP HANA deployment

System Generated Schemas


A database schema is a way to logically group objects such as tables, views, stored procedures etc.
Think of a schema as a container of objects.

Types of Schemas
There are 3 types of schemas.
1.

User Defined Schema

2.

System Defined Schema

3.

SLT Derived Schema


User Defined Schema:
These are created by user (DBA or System Administrator)

SLT Derived Schema:


When SLT is configured, it creates schema in HANA system. All the tables replicated into HANA
system are contained in this schema

System Defined Schema:


These schemas are delivered with the SAP HANA database and contains HANA system
information. There are system schemas like _SYS_BIC, _SYS_BI, _SYS_REPO,
_SYS_STATISTICS etc.

System Generated Schemas

_SYS_BIC:
This schema contains all the columns views of activated objects. When the user activates the
Attribute View/Analytic View/Calculation View/Analytic Privilege /Procedure, the respective run-time
objects are created under _SYS_BIC/ Column Views.

_SYS_REPO:
Whatever the objects are there in the system is available in repository. This schema contains the list
of Activated objects, Inactive Objects, Package details and Runtime Objects information etc.
Also _SYS_REPO user must have SELECT privilege with grant option on the data schama.
Read more about "GRANT SELECT PRIVILEGE ON _SYS_REPO"

_SYS_BI:
This schema stores all the metadata of created column Views. It contains the tables for created
Variables, Time Data (Fiscal, Gregorian), Schema Mapping and Content Mapping tables.

_SYS_STATISTICS:
This schema contains all the system configurations and parameters.

_SYS_XS:
This schema is used for SAP HANA Extended Application Services.

Backup and Recovery


SAP HANA is an in-memory database. This means all the data is in RAM. As we all know
that RAM is a volatile memory and all the data get lost when power goes down.
This leads to a very obvious question:
What happens when power goes down in SAP HANA? Do we loose all the valuable
data?

The answer is NO.


SAP HANA is an in-memory database which means all the data resides in RAM. But there
is also a disc memory just for backup purpose.

In-memory computing is safe: The SAP HANA database holds the bulk of its data in
memory for maximum performance, but still uses persistent storage (disk memory) to
provide a fallback in case of failure.

Why Backup is Required?


In database technology, atomicity, consistency, isolation, and durability (ACID) is a set
of requirements that guarantees that database transactions are processed reliably:
A transaction has to be atomic. That is, if part of a transaction fails, the entire transaction
has to fail and leave the database state unchanged.
The consistency of a database must be preserved by the transactions that it performs.
Isolation ensures that no transaction is able to interfere with another transaction.
Durability means that after a transaction has been committed it will remain committed.

While the first three requirements are not affected by the in-memory concept, durability is a
requirement that cannot be met by storing data in main memory alone.
Main memory is volatile storage. That is, it looses its content when it is out of electrical
power. To make data persistent, it has to reside on non-volatile storage, such as hard
drives, SSD, or Flash devices.

How Backup and Recovery Works in SAP HANA?

The main memory (RAM) in SAP HANA is divided into pages. When a transaction changes
data, the corresponding pages are marked and written to disk storage in regular intervals.
In addition, a database log captures all changes made by transactions. Each committed
transaction generates a log entry that is written to disk storage. This ensures that all
transactions are permanent.

Figure below illustrates this. SAP HANA stores changed pages in savepoints, which are
asynchronously written to disk storage in regular intervals (by default every 5 minutes).
The log is written synchronously. That is, a transaction does not return before the
corresponding log entry has been written to persistent storage, in order to meet the
durability requirement, as described above.

After a power failure, the database can be restarted like a disk-based database.
The database pages are restored from the savepoints, and then the database logs are
applied (rolled forward) to restore the changes that were not captured in the savepoints.

This ensures that the database can be restored in memory to exactly the same state as
before the power failure.

Data backup can be taken manually or can be scheduled.

Few Important Concepts:


What is Database Backup and Recovery
Backup and Recovery is the process of copying/storing data for the specific purpose of
restoring. Backing up files can protect against accidental loss of user data, database
corruption, hardware failures, and even natural disasters.

Savepoint:
A savepoint is the point at which data is written to disk as backup. This is a point from which
the Database Engine can start applying changes contained in the backup disk during
recovery after an unexpected shutdown or crash.
The database administrator determines the frequency of savepoints.

Data and Log:


Data backups

Contain the current payload of the data volumes (data and undo information)

Manual (SAP HANA studio, SQL commands), or scheduled (DBA Cockpit)

Log backups

Contain the content of closed log segments; the backup catalog is also written
as a log backup

Automatic (asynchronous) whenever a log segment is full or the timeout for


log backup has elapsed

The SAP HANA Studio at a glance:

HANA Studio is an Eclipse-based, integrated development environment (IDE) that is


used to develop artifacts in a HANA server.

It enables technical users to manage the SAP HANA database, to create and
manage user authorizations, to create new or modify existing models of data etc.

It is a client tool, which can be used to access local or remote HANA system.

Supported Platforms:
The SAP HANA studio runs on the Eclipse platform 3.6. We can use the SAP HANA studio on the
following platforms:

Microsoft Windows x32 and x64 versions of: Windows XP, Windows Vista, Windows
7

SUSE Linux Enterprise Server SLES 11: x86 64-bit version


Note: For Mac OS, HANA studio is available but there is no HANA client for that.

System Requirements:
Java JRE 1.6 or 1.7 must be installed to run the SAP HANA studio. The Java runtime must be
specified in the PATH variable. Make sure to choose the correct Java variant for installation of SAP
HANA studio:

For a 32-bit installation, choose a 32-bit Java variant.

For a 64-bit installation, choose a 64-bit Java variant.

How to Download and Install SAP HANA Studio:


To download SAP HANA Studio, check the article:
SAP HANA Studio Download

HANA Client:
HANA Client is the piece of software which enables you to connect any other entity, including NonNative applications to a HANA server. This "other" entity can be, say, an NW Application Server, an
IIS server etc.
The HANA Client installation also provides JDBC, ODBC drivers. This enables applications written
in .Net, Java etc. to connect to a HANA server, and use the server as a remote database. So,
consider client as the primary connection enabler to HANA server.
HANA Client is installed separately from the HANA studio.

Installation Paths:
If we do not specify an Installation Path during installation, the following default values apply:

Microsoft Windows 32-bit -> C:\Program Files\sap\hdbstudio

Microsoft Windows 64-bit -> C:\Program Files\sap\hdbstudio

Microsoft Windows 32-bit (x86) -> C:\Program Files (x86)\sap\hdbstudio

Linux x86, 64-bit -> /usr/sap/hdbstudio

How to Open SAP HANA studio?


In Microsoft Windows:
1.
2.

Go to start menu
Start > All Programs > SAP HANA > SAP HANA Studio
The SAP HANA studio starts.
In Linux:

1.
2.

Open a shell and go to the installation directory, such as /usr/sap/hdbstudio


Execute the following command "./hdbstudio".
The SAP HANA studio starts.

SAP HANA Studio Perspectives:


The SAP HANA studio provides an environment for Administration, Modeling and Data
Provisioning. There are several predefined User Interface layouts addressing several
applications types called Perspectives.
In HANA Studio every HANA system has two main sub-nodes, Catalog and Content.

Modeler perspective:
Provides views and menu options that enable you to define your analytic model, for
example, attribute, analytic, and calculation views of SAP HANA data.

SAP HANA Development perspective: :


Provides views and menu options that enable you to perform all the tasks relating to
application development on SAP HANA XS, for example: to manage applicationdevelopment projects, display content of application packages, and browse the SAP HANA
repository.

The Debug perspective:


Provides views and menu options that help you test your applications, for example: to view
the source code, monitor or modify variables, and set break points.

Administration Console perspective:


Provides views that enable you to perform administrative tasks on SAP HANA instances.

Catalog and Content:


In HANA Studio every HANA system has two main sub-nodes, Catalog and Content.

Catalog

The Catalog represents SAP HANA's data dictionary, i. e. all data structures,
tables, and data which can be used.

All the physical tables and views can be found under the Catalog node.

This node contains a list of Schemas which a used to categorize tables


according to user defined groupings.

Content

The Content represents the design-time repository which holds all information
of data models created with the Modeler.

Physically these models are stored in database tables which are also visible
under Catalog.

The Models are organized in Packages. The Contents node just provides a
different view on the same physical data.

Add HANA System in HANA Studio


In order to connect to a SAP HANA system we need to know the Server Host ID and the Instance
Number. Also we need a Username & Password combination to connect to the instance. The left
side Navigator space shows all the HANA system added to the SAP HANA Studio.

Steps to add new HANA system:


1.

Right click in the Navigator space and click on Add System

Enter HANA system details, i.e. the Hostname & Instance Number and click Next.

Enter the database username & password to connect to the SAP HANA database. Click on Next
and then Finish.

The SAP HANA system now appears in the Navigator.

Next Article :
Download SAP HANA Studio

Reporting in HANA: Overview


Once the modeling view (attribute view, analytic view and calculation view) is created in SAP HANA,
you can see the output in HANA Studio.
Finally it will be non-technical end users (for example manager, sales executive, board members
etc.) who will consume the modeling views to visualize different business scenarios.
Asking a non-technical user to use HANA Studio is not really practical. You also would not want to
put the modeling software in the hands of all your users.

That's why we need separate reporting tools which can connect to SAP HANA, take the data
from modeling views and show in a nice, easy to understand format.

Currently there are number of tools/applications which can be used for reporting in SAP HANA. SAP
suggests using frontends of the SAP Business Objects BI Suite. Recommended frontend tools
include SAP Business Objects Crystal Reports, Analysis Office, and Explorer.

Reporting on SAP HANA can be done in most of SAP' Business Objects BI Suite of applications, or
in tools which can create and consume MDX queries and data.
Microsoft Excel can also be used for reporting on SAP HANA.
You can use Excel to connect to SAP HANA, access the modeling views and slice and dice the data
to create meaningful reports.
Check Build Reports in Excel using SAP HANA Data

Note: You can think of SAP Business Objects BI Suite as a collection of different front-end tools
provided by SAP. Most important of them are

SAP BusinessObjects Analysis

SAP BusinessObjects Explorer

SAP Lumira

SAP Business Objects Crystal Reports

SAP BusinessObjects Dashboards

Reporting on SAP HANA - Clients and Connectivity Options:


SAP HANA supports different connectivity options such as MDX, SQL, BICS etc. It provides JDBC
and ODBC drivers which are standardized programming interfaces to access relational databases.

Below image shows the different connectivity options supported for frontend tools and SAP HANA.

ODBO - OLE DB for OLAP

Microsoft-driven specification for multidimensional (MDX) reporting

Requests are sent to the database via MDX (MultiDimensional


eXpression language)

ODBC - Open DataBase Connectivity

Microsoft-driven specification for relational reporting

Database requests are made via SQL (Structured Query Language)

Heavily adopted in industry

No longer Microsoft-centric - Unix and Linux drivers exist

JDBC - Java DataBase Connectivity

Relational reporting drivers specified by the Java community. Popular


on Unix platforms.

BICS - BI Consumer Services


SAP Proprietary interface that offers advantages for OLAP access
over MDX on multidimensional reporting objects

Common driver technology used by SAP BusinessObjects Analysis,


Office Edition for connectivity to SAP NetWeaver BW

SQLDBC is SAP native database SDK

Build Reports in Excel using HANA


In the article SAP HANA Modeling Introduction and SAP HANA Calculation View we explained the
basics of SAP HANA data modeling.
This article describes how to build a simple report using data stored on SAP HANA.

Prerequisite:

You have access to an SAP HANA System

You have created a modeling view


Check Build Your First SAP HANA Modeling View in 10 Minutes

Download and install SAP HANA Client:


Excel can connect to SAP HANA using the MDX language in the form of pivot tables. These in turn
allow users to "slice and dice" data as they require, to extract the metrics they need.

In order to make MDX connections to SAP HANA, the SAP HANA Client software is needed. This is
separate to the Studio, and must be installed on the client system.

Like the Studio itself, it can be found on the SAP Service market place. Additionally, SAP provides a
developer download of the client software on SDN, at the following link:
HANA Developer Edition-SAP HANA Client

Note:Download and install the appropriate SAP HANA Client as per your operating system version
and Microsoft Office installation.
If you are using a 64-bit operating system in combination with a 32-bit Office installation, then you' ll
need the 32-bit version of the SAP HANA Client software.

Once the software is installed, there is no shortcut created on your desktop, and no entry will be
created in your "Start" menu, so don't be surprised to not see anything to run.

Connect to SAP HANA from Excel:

1.

Open Excel.

2.

Go to the Data tab, and click on From Other Sources, then From Data Connection Wizard,
as shown:

3.

Select Other/Advanced, then SAP HANA MDX provider, and then click Next.

4.

The SAP HANA Logon dialog will appear, so enter your Host, Instance, and login information
(the same information you use to connect to SAP HANA with the Studio).

5.

Click on Test Connection to validate the connection. If the test succeeds, click on OK to
choose the Modeling views to which you want to connect. Select the package which contains the

modeling views.

6.

Click on the name of the analytic view or calculation view. Click "Finish".

7.

On this screen there's a checkbox Save password in file - this will avoid having to type in the
SAP HANA password every time the Excel file is opened - but the password is stored in the Excel
file, which is a little less secure.

8.

Click on the Finish button to create the connection to SAP HANA, and your View.

9.

Now that you have established your connection to the SAP HANA database and specified
the data that you want to use, you can start exploring it in Microsoft Excel, using a pivot table.

Congratulations! You now have your reporting application available in Microsoft Excel

SAP HANA Text Analysis

Mining social media data for customer feedback is one of the greatest untapped
opportunities for customer analysis in many organizations today.

As many are aware, twenty-first century corporations are facing a crisis. Many corporations have
been accurately and comprehensively storing data for years. The data is in variety of forms like
social media posts, email, blogs, news, feedback, tweets, business documents etc.

It is very important to extract meaningful information without having to read every single sentence.
Now, what is meaningful information. The extraction process should identify the "who", "what",
"where", "when" and "how much" (among other things) from these data.
For example, use social media data to find out -

What people are saying about my brand or products?

How many people recommend my brand vs. advocate against it?

Text Analysis is the solution of all this problem.


In this article we will explain:

What is Text Analysis?

Why Text Analysis is so important for business?

How does SAP HANA support text analysis?

Before understanding Text Analysis, you will have to first understand Structured Data and
Unstructured Data.

Structured and Unstructured Data:


Structured Data:
Data that resides in a fixed field within a record or file is called structured data. This includes data
contained in relational databases and spreadsheets .
For example data stored in database tables are structured data.

Structured data has the advantage of being easily entered, stored, queried and analyzed.

Unstructured Data:
The phrase "unstructured data" usually refers to information that doesn't reside in a traditional
row-column database.

Unstructured data files often include text and multimedia content. Examples include e-mail
messages, word processing documents, videos, photos, audio files, presentations, webpages and

many other kinds of business documents.

Digging through
unstructured data can be cumbersome and costly. Email is a good example of unstructured data. It's
indexed by date, time, sender, recipient, and subject, but the body of an email remains unstructured.
Other examples of unstructured data include books, documents, medical records, and social media
posts.

Why unstructured data is so important for business?


Experts estimate that 80 to 90 percent of the data in any organization is unstructured. And the
amount of unstructured data in enterprises is growing significantly -- often many times faster than
structured databases are growing.

The only problem is extracting meaningful information from unstructured data.

What is Text Analysis?

Text Analysis is the process of analyzing unstructured text, extracting relevant information
and then transforming that information into structured information that can be leveraged in
different ways.

Text Analysis refers to the ability to do Natural Language Processing, linguistically understand the

text and apply statistical techniques to refine the results.


With the help of text analysis we can model and structure the information content of unstructured
data for the purpose of business analysis, research and investigation.

Mapping Business Needs to Text Analysis

Example of Meaning Extraction from a sentence

There are few important techniques being used in Text Analysis.

Full Text Search

Full Text Indexing

Fuzzy Search

Let's have a look into them one by one.

Full Text Search:


The primary function of full-text search is to optimize linguistic searches.

Full text search is designed to perform linguistic (language-based) searches against text and
documents stored in your database.
In a full-text search, the search engine examines all of the words in every stored document as it tries
to match search criteria (text specified by a user).

Full Text Indexing:


When dealing with a small number of documents, it is possible for the full-text-search engine to
directly scan the contents of the documents with each query, a strategy called "serial scanning."
This is what some rudimentary tools, such as grep, do when searching.

However, when the number of documents to search is potentially large, the problem of full-text
search is often divided into two tasks: indexing and searching.

The indexing stage will scan the text of all the documents and build a list of search terms (often
called an index). In the search stage, when performing a specific query, only the index is referenced,
rather than the text of the original documents.
The indexer will make an entry in the index for each term or word found in a document, and possibly
note its relative position within the document.

Conceptually, full-text indexes support searching on columns in the same way that indexes support
searching through books.

Fuzzy Search:
Also known as approximate string matching.
Fuzzy search is the technique of finding strings that match a pattern approximately (rather than
exactly).
It is a type of search that will find matches even when users misspell words or enter in only partial
words for the search.

A Real World Example:


If a user types "SAP HANA Tutorl" into Yahoo or Google (both of which use fuzzy matching), a list of
hits is returned along with the question, "Did you mean "SAP HANA Tutorial".

To know more about Fuzzy Search, please check


How to use the Fuzzy Search in SAP HANA

How Business can take leverage of Text Analysis:

All that tech talk is fine, but how can Text Analysis help companies make more money?

Below are the few real time examples.

Automate the process of customer response:


There is an airline company that wanted to automate the process of responding to customer
requests via email. Using SAP Text Analysis technology, they are able to classify incoming emails
and accurately and effectively respond to requests. This also helps them reduce their call-center
costs.

Automate document categorization, search and retrieval:


Another example is of a financial services company that uses SAP Text Analysis technology as the
backbone for their automatic content enrichment platform. They use Text Analysis to discover metadata in input text data feeds, making document categorization, search and retrieval a seamless
process.

Find public intent to buy a product from Twitter:


Suppose your company is planning to launch a new product (say smart phone, bike etc.) in market.
You can do a text analysis on Twitter data to find out

How many people are showing their interest to buy this product?

How frequent people are talking about this new product?

Is there any negative comments or rumor going around for this product?

Top Business Use-cases of text Analysis:

Brand/ Product/ Reputation Management

Market research and social media monitoring, i.e. what people are saying about my brand or
products
Voice of the Customer/ Customer Experience Management

Do I need to step in and offer customer service?

How many people recommend my brand vs. advocate against it?


Search, Information Access, or Questions Answering

Which bloggers are negative towards USA Policies?

Which of the hotels on India get great reviews for the room service?
Competitive Intelligence

What competing products are peope considering and why?

Are competitors's media sped generating purchase intent?

Implementation of Text Analysis in SAP HANA:


The implementation of Text Analysis is one of the coolest features of SAP HANA.
Text analysis is supported from SAP HANA SP05.

SAP HANA Text Analysis has market-leading, out-of-the-box predefined entity types that are
packaged as part of the platform. Looking at a clause, sentence, paragraph, or document, the

technology can identify the "who", "what", "where", "when" and "how much" and classify it
accordingly.
For example, in the following sentence "India celebrates Independence day on 15th August?, the
analysis can identify the country, holiday and month using HANA"s predefined core extraction.

If you have reach till this end, you should have a clear understanding on Text Analysis.
if you have any doubt or question, please leave a comment.

Implement SAP HANA Text Analysis in 10 minutes

In SAP HANA Text Analysis - One of the coolest features of SAP HANA we explained what is Text
Analysis and why it is so important for business now-a-days.
In this article we will show you how you can easily implement Text Analysis in SAP HANA.

Use-case:
Suppose I am planning to buy a new iPhone 5 and I want to know the review of this over internet. I
wanted to get a pulse of the iPhone 5 before I buy it not just from the critics but actual users like me.
I also want to search the blogs, news and social media to find out whether people's review are
positive, negative or neutral.
Lets see how we can do this with the help of SAP HANA Text Analysis.

Prerequisites:
Download unstructured data (iPhone-News.pdf)
To save time, I have created a pdf file which contains news and blog articles on iPhone 5. Download
this from here .

Create Table in SAP HANA

Create a table in SAP HANA which will contain this unstructured data. Replace <SCHEMA_NAME>
with your schema.
CREATE COLUMN TABLE <SCHEMA_NAME>."IPHONE_NEWS" (
"File_Name" NVARCHAR(20),
"File_Content" BLOB ,
PRIMARY KEY ("File_Name"));

Upload pdf file to SAP HANA using Python


Use the below Python code to upload pdf file to SAP HANA.

Note: Check below article to configure Python before running the Python code.
Power of Python Integrated with SAP HANA
import dbapi
# assume HANA host id is abcd1234 and instance no is 00
# and SAP HANA user id is USER1 and password is Password1
conn = dbapi.connect('abcd1234', 30015, 'USER1', 'Password1')
#Check if database connection was successful or not
print conn.isconnected()
#Open a cursor
cur = conn.cursor()
#Open file in read-only and binary
file = open('iPhone-News.pdf', 'rb')
#Save the content of the file in a variable
content = file.read()
#Save the content to the table - Replace SCHEMA1 with your schema
cur.execute("INSERT INTO SCHEMA1.IPHONE_NEWS VALUES(?,?)", ('iPhoneNews.pdf',content))
print 'pdf file uploaded to HANA'
#Close the file
file.close()
#Close the cursor
cur.close()
#Close the connection
conn.close()

After executing the above Python script the pdf data will be uploaded in HANA table.

Implement Text Analysis in SAP HANA:


The most impressive thing about Text Analysis is how easy it is to implement it.
The only thing we need to do is run the following statement:
Create FullText Index "PDF_FTI" On <SCHEMA_NAME>."IPHONE_NEWS"("File_Content")
TEXT ANALYSIS ON
CONFIGURATION 'EXTRACTION_CORE_VOICEOFCUSTOMER';

This will create a full text index called "PDF_FTI" (you can use any name) on the BLOB column
"File_Content" of the table "IPHONE_NEWS".
With the execution of this script a new column table is created called $TA_PDF_FTI
($TA_<Index_Name>) that contains the result of our Text Analysis Process.

Note: If you do not see this table under your schema, try to refresh that.

That's it. Yes, Text Analysis is implemented. Rest everything is done by SAP HANA.

Further Analysis:
2 columns of the table $TA_PDF_FTI is very important for us.
TA_TOKEN
This column contains the extracted entity or element (for example, an identifiable person, place,
topic, organization, or sentiment).
TA_TYPE
This is the category the entity falls under. For example PERSON, PLACE, PRODUCT etc.

To know people's review and sentiments about iPhone, we can query the table $TA_PDF_FTI like
this.
SELECT "TA_TYPE", ROUND("SENTIMENT_VALUE"/ "TOTAL_SENTIMENT_VALAUE" * 100,2)
AS "SENTIMENT_VALAUE_PERCENTAGE"
FROM
(
SELECT "TA_TYPE", SUM("TA_COUNTER") AS "SENTIMENT_VALUE"
FROM <SCHEMA_NAME>."$TA_PDF_FTI"
where TA_TYPE in('WeakPositiveSentiment','StrongPositiveSentiment','NeutralSentiment',
'WeakNegativeSentiment','StrongNegativeSentiment','MajorProblem','MinorProblem')
GROUP BY "TA_TYPE"
) AS TABLE1,
(
SELECT SUM("TA_COUNTER") AS "TOTAL_SENTIMENT_VALAUE"
FROM <SCHEMA_NAME>."$TA_PDF_FTI"
where TA_TYPE in('WeakPositiveSentiment','StrongPositiveSentiment','NeutralSentiment',
'WeakNegativeSentiment','StrongNegativeSentiment','MajorProblem','MinorProblem')
) AS TABLE2

You will get the output like this.

The result shows that more percentage of people are giving positive review of this product.
Good, now i can go ahead and buy my new iPhone 5.

What's Next:
We can use this full text index table to get a lot of information other than just sentiments.
Lets take a look into the structure of this table.
Column
Name

Ke
y

Description

Data Type

File_Name

Ye
s

This is the primary key of my table. If you have more


than one column in your primary key, the $TA table
will include every single column

Same as in source
table. In this case:
NVARCHAR(20)

RULE

Ye
s

Stores the rule package that yielded the token. In my


case: "Entity Extraction"

NVARCHAR(200)

COUNTER

Ye
s

Counts all tokens across the document

BIGINT

TOKEN

No

The token that was extracted (the "who", "what",


"where", "when" and "how much")

NVARCHAR(250)

LANGUAGE

No

You can either specify a language column when you


create the fulltext index or it can be derived from the
text. In my case it was derived from the text and is
English (en)

NVARCHAR(2)

TYPE

No

The Token Type, whether it is a "who", a "what", a


"where", etc.

NVARCHAR(100)

No

Stores a normalized representation of the token. This


becomes relevant e.g. for German with umlauts, or
./ss. Normalization with regards to capitalization
would not be as important as to justify this column.

NVARCHAR(250)

STEM

No

Stores the linguistic stemming information, e.g. the


singular nominative for nouns, or the indicative for
verbs. If text analysis yields several stems, only the
first stem will be stored, assuming this to be the best
match.

NVARCHAR(300)

PARAGRAP
H

No

The paragraph number where my token is located in


the document

INTEGER

SENTENCE

No

The sentence number where my token is located in


the document

INTEGER

CREATED_
AT

No

Creation timestamp

TIMESTAMP

NORMALIZ
ED

Hope you liked this article. If you have any question please leave a comment.

SAP HANA Text Analysis using Twitter Data


In this tutorial, we are going to do following things.
1.

Use the Twitter API to get the tweets

2.

Save the tweets into SAP HANA system using JDBC connection

3.

Run the Text Analysis in HANA on top of the tweets.


After this tutorial, you will be able to learn:

SAP HANA integration with Twitter

Program with SAP HANA using JDBC in Java language

SAP HANA Text Analysis

Prerequisites:
Register an Application at Twitter Developers:
As we are going to use the Twitter API to extract the data from Twitter, it is required to create an
application at Twitter Developer and we will need the authentication information of the application
and use them to invoke the APIs later.

In case you haven't use Twitter before, you need to create your twitter account firstly.
You can register an application and create your oAuth Tokens at Twitter Developers by following
below steps.

1.

Logon with your twitter account, click your profile picture and click on the "My applications".

2.

Click on the button "Create a new application".

3.

Provide the information. You can give any name and description of your choice.

4.

Follow the instructions and finally click on "Create your Twitter application"

5.

Scroll down the screen and you will see the button "Create my access token", click it to
generate the token.

6.

After that, you will be able to see the oAuth settings like below, save the values of Consumer
Key, Consumer secret, Access token and Access token secret.

Download Twitter API Java library - Twitter4J


Twitter4J is an unofficial open source Java library for the Twitter API. With Twitter4J, you can easily
integrate your Java application with the Twitter services.
The link to download it is http://twitter4j.org/en/index.html

Download "twitter4j-3.0.5.zip" and save it. We will need it later.

Prepare the HANA jdbc library


In order to access SAP HANA from java, we will need the jdbc library, which you can find it at

C:Program FilesSAPhdbclientngdbc.jar in windows

and /usr/sap/hdbclient/ngdbc.jar in Linux.

Download Eclipse IDE for Java Developers


In this exercise, we will use Eclipse IDE for Java Developers to run the Java Project.
You can add the Plugins in your HANA Studio or directly download the new IDE from here.

Now we are ready!! Let's fetch data from Twitter and save it in HANA.

Create a column table in HANA:


Before running the Java program, we need to create a table in HANA, where we want to store the
tweets we fetched from the twitter services.
Copy and paste below script in SQL editor and execute.
Note: You need to replace the <SCHEMA_NAME> with your own schema.
CREATE COLUMN TABLE <SCHEMA_NAME>.TWEETS(
"ID" INTEGER NOT NULL,
"USER_NAME" NVARCHAR(100),
"CREATED_AT" DATE,
"TEXT" NVARCHAR (140),
"HASH_TAGS" NVARCHAR (100),
PRIMARY KEY("ID")
);
CREATE SEQUENCE <SCHEMA_NAME>."TWEET_SEQUENCE"
INCREMENT BY 1 START WITH 1 NO CYCLE;

Create and configure JAVA program:


1.

Download the JAVA Project "TwitterAnalysis.zip" from here and save it to your local
computer.

2.

Open JAVA Eclipse and create a Java project called "TwitterAnalysis".

3.

Go to File -> Import and select "Archive File"

4.

Click on browse and select the "TwitterAnalysis.zip" file you downloaded in step 1. Click on
finish.

5.

Now you will be able to see the project with the structures like this:

Understanding the Java Project:


TwitterConnection.java
Build the connection to twitter services

HDBConnection.java
Build the jdbc connection to HANA

Configurations.java
The public interface for the network, twitter authentication configurations, override it by your own
account or settings

Tweet.java
The java bean class for the tweet objects

TweetDAO.java
The data access object

ngdbc.jar
SAP HANA jdbc library

twitter4j-core-3.0.3.ja
Twitter4j library for twitter services in java

Update the configurations


In the purpose to maintain the configurations easily, we put all the required information in a single
interface and it is mandatory for you update it with your own account or settings before you can
connect to either HANA or Twitter.

Open the file Configurations.java in your project. Basically, there are 4 category of setting you can
override:
Network Proxy Settings:
The proxy host and port, set the HAS_PROXY as false if you do not need to use proxy.
To get the proxy host is, open command prompt and type "ping proxy". This will show you proxy
host.

HANA Connection Settings:


Replace the HANA URL with your own HANA host and port, user, password and the schema where
you created your table.

Twitter Authentication Settings:


Replace with your own authentication information from your twitter application as described in the
prerequisites.

Search Term:
We will search the twitter based on the search term "HANA Training" and we want to know what
people were talking around the HANA Training in twitter. You can always replace it with your own
term if you are interested in other topics.

Test Connection to Twitter


Once have the twitter authentication maintained correctly in the previous step. You can
open TwitterConnection.java and run it.

You will see the message "Connection to Twitter Successfully!" following with your twitter user id in
the console as the screenshot shows below.

Test Connection to SAP HANA


Now let us open the file HDBConnection.java and run it.
You will see the message "Connection to HANA Successfully!" in the console as the screenshot
shows below.
Check the Configurations.java if you encountering any issue.

Invoke Twitter API and save the tweets into HANA:


Now it's time to the do the real stuff. Open the file SearchTweets.java and run it, which will search
the tweets based on the search term we specified in the Configurations.java and everything we got
will saved to HANA table.
You will see the messages in the console indicate the tweets have been inserted to HANA
successfully like the screenshot shows:

After that, you can run the data preview in HANA studio and see the contents of the table TWEETS
in your schema like this:

Run text analysis in HANA:


Now we already have the tweets stored in the HANA table. The next step, we are going to run the
text analysis to see what people are talking around the "HANA Training" in twitter.

To run the text analysis, the only thing we need to do is create a Full Text index for the column of
the table we want to analysis and HANA will process the linguistic analysis, entity extraction,
stemming for us and save the results in a generated table $TA_YOUR_INDEX_NAME at the same
schema.
After that, you can build views on top of the table and leverage all existing analysis tools around
HANA to do the visualization even the predictive analysis.

Copy the SQL statement and execute it in SQL console:


Note: Replace the <Scheme_Name> with your own Schema
Create FullText Index <Scheme_Name>."TWEETS_FTI"
On <Scheme_Name>."TWEETS"("TEXT")
TEXT ANALYSIS ON
CONFIGURATION 'EXTRACTION_CORE';

You will see Full-Text Index $TA_TWEETS_FTI under your schema.


In case you don't see that try to refresh the folder.

Text Analysis is done!! Yes it was that simple.

Do the data preview of $TA_TWEETS_FTI and to the Analysis tab.


Select the chart type as "Other" - "Tag Cloud" to have a better view.

Reference: This example was taken from SAP Startup Focus Program.
If you are from a startup, interested in developing on top of the in-memory database and application
platform SAP HANA, then you may check the SAP Startup Focus program for help

How to use the Fuzzy Search in SAP HANA

In this article we will talk about

What is Fuzzy Search?

Why Fuzzy Search is important?

Real Time Example of Fuzzy Search Based Applications.

How to Implement Fuzzy Search in SAP HANA?

What is Fuzzy Search?


Also known as approximate string matching.
Fuzzy search is the technique of finding strings that match a pattern approximately (rather than
exactly).
It is a type of search that will find matches even when users misspell words or enter in only partial
words for the search.
purpose:
With the help of Fuzzy Search Misspellings and typos still provide relevant results.

A Real World Example:


If a user types "SAP HANA Tutorl" into Yahoo or Google (both of which use fuzzy matching), a list of
hits is returned along with the question, "Did you mean "SAP HANA Tutorial"?"

Fuzzy Search in SAP HANA:

In SAP HANA, you can call the fuzzy search by using the CONTAINS predicate with the FUZZY
option in the WHERE clause of a SELECT statement.
Syntax:
SELECT * FROM <tablename>
WHERE CONTAINS (<column_name>, <search_string>, FUZZY (0.8))

A search with FUZZY(x) returns all values that have a fuzzy score greater than or equal to x.

The SCORE() Function


The fuzzy search algorithm calculates a fuzzy score for each string comparison. The higher the
score, the more similar the strings are. A score of 1.0 means the strings are identical. A score of 0.0
means the strings have nothing in common.

You can request the score in the SELECT statement by using the SCORE() function.

You can sort the results of a query by score in descending order to get the best records first (the
best record is the record that is most similar to the user input). When a fuzzy search of multiple

columns is used in a SELECT statement, the score is returned as an average of the scores of all
columns used.

So not only does it find a "fault tolerant" match, it also puts a score behind it.

Example:
When searching with 'SAP', a record like 'SAP AG' gets a high score, because the term 'SAP' exists
in the texts. A record like "BSAP Corp" gets a lower score, because 'SAP' is only a part of the longer
term 'BSAP Corp'.

Create the table and data:


-- REPLACE <Schema_Name> WITH YOUR SCHEMA NAME
CREATE COLUMN TABLE <Schema_Name>.COMPANIES(
ID INTEGER PRIMARY KEY,
COMPANY_NAME SHORTTEXT(200) FUZZY SEARCH INDEX ON);
INSERT INTO <Schema_Name>.COMPANIES VALUES (1, 'SAP');
INSERT INTO <Schema_Name>.COMPANIES VALUES (2, 'SAP in Walldorf');
INSERT INTO <Schema_Name>.COMPANIES VALUES (3, 'SAP AG');
INSERT INTO <Schema_Name>.COMPANIES VALUES (4, 'ASAP Corp');
INSERT INTO <Schema_Name>.COMPANIES VALUES (5, 'BSAP orp');
INSERT INTO <Schema_Name>.COMPANIES VALUES (6, 'IBM Corp');

Perform the search on one column:


SELECT SCORE() AS score, * FROM <Schema_Name>.COMPANIES
WHERE CONTAINS(COMPANY_NAME,'SAP',
FUZZY(0.7,'textSearch=compare,bestMatchingTokenWeight=0.7'))
ORDER BY score DESC;

The output of fuzzy search contains 5 entries. Based on the fuzzy search factor (which is 0.7 in this
case), it will also consider the similar words. In this case "SAP AG", "BSAP orp" etc.

A Real Time Example of Fuzzy Search:

Use Case
A call center agent who receives an order by phone needs to know the customer number or, in the
case of a new entry, the system has to inform him about a potentially duplicate entry.
There are chances that name can be misspelled or there can be different person with same name
but different spellings. For example "Jimi Hendricks" can be misspelled as "Jimy Hendricks" or "Jimi
Hendrix". Or the address can also be spelled differently. For example "Berliner Platz 43" or "Berliner
Plats 43" or "Berliner Platz"

Without fuzzy search system can only find the exact match means the only entries that are 100%
identical. But with fuzzy search system can find the misspelled words too.

Create table and some data:


-- REPLACE <Schema_Name> WITH YOUR SCHEMA NAME
create column table <Schema_Name>."CUSTOMERS"(
"CUSTOMER_ID" VARCHAR (5) not null default '',
"FIRST_NAME" VARCHAR (20) null default '',
"LAST_NAME" VARCHAR (20) null default '',
"STREET" VARCHAR (20) null default '',
"CITY" VARCHAR (20) null default '',
"COUNTRY" VARCHAR (20) null default '',
"POSTAL_CODE" VARCHAR (20) null default '',
primary key ("CUSTOMER_ID"));
insert into <Schema_Name>."CUSTOMERS" values('00001','Jimi','Hendricks','Berliner Platz
43','Munchen','Germany','80805');

insert into <Schema_Name>."CUSTOMERS" values('00002','Jimy','Hendricks','Berlinr Platz


43','Munchen','Germany','80805');
insert into <Schema_Name>."CUSTOMERS" values('00003','Jimi','Hendrix','Berliner Plats
43','Munchen','Germany','80805');
insert into <Schema_Name>."CUSTOMERS"
values('00004','Jimy','Feuer','Berliner','Munchen','Germany','80805');
insert into <Schema_Name>."CUSTOMERS" values('00006','Sven','Ottlieb','Walserweg
21','Aachen','Germany','52066');
insert into <Schema_Name>."CUSTOMERS" values('00007','Philip','Cramer','Maubelstr.
90','Brandenburg','Germany','14776');
insert into <Schema_Name>."CUSTOMERS" values('00008','Renate','Messner','Magazinweg
7','Frankfurt','Germany','60528');
insert into <Schema_Name>."CUSTOMERS" values('00009','Alexander','Feuer','Heerstr.
22','Leipzig','Germany','04179');
insert into <Schema_Name>."CUSTOMERS" values('00010','Antonio','Moreno','Mataderos
2312','Mexico','Mexico','05023');
insert into <Schema_Name>."CUSTOMERS" values('00011','Thomas','Hardy','120
Hanover','London','UK','WA1 1DP');
insert into <Schema_Name>."CUSTOMERS" values('00012','Christina','Berglund','Berguvsvagen
8','Lulea','Sweden','S-958 22');

Without Fuzzy Search:


Suppose you want to search a customer with name "Jimi".
SQL Query:
SELECT * FROM <Schema_Name>."CUSTOMERS"
WHERE CONTAINS(FIRST_NAME, 'Jimi')
ORDER BY "CUSTOMER_ID" DESC;

The output will contain only one entry which contains exact match of "Jimi".

Now let us try the fuzzy search function.


SQL Query:

SELECT SCORE() AS score, * FROM <Schema_Name>."CUSTOMERS"


WHERE
CONTAINS(FIRST_NAME, 'Jimi', FUZZY(0.7))
ORDER BY score DESC;

The output of fuzzy search contains 4 entries. Based on the fuzzy search factor (which is 0.7 in this
case), it will also consider the similar words. In this case "Jimy".

We can also do fuzzy search on 2 columns. For example First Name and Last Name.
SQL Query:
SELECT SCORE() AS score, * FROM <Schema_Name>."CUSTOMERS"
WHERE
CONTAINS(FIRST_NAME, 'Jimi', FUZZY(0.7))
and CONTAINS(LAST_NAME, 'Hendricks', FUZZY(0.7))
ORDER BY score DESC;

The output contains 3 entries. Based on the fuzzy search factor (which is 0.7 in this case), it will also
consider the similar names. In this case "Jimy Hendricks" and "Jimi Hendrix".

SAP BW on HANA

SAP BW on HANA is the next wave of SAP's in-memory technology vision that enables SAP
NetWeaver BW to use SAP HANA as a fully functioning in-memory database.
Running SAP BW on HANA results in dramatically improved performance, simplified administration
and streamlined IT landscape resulting in lower total cost of ownership.

In this article we will explore

What is SAP BW and SAP BW on HANA?

SAP BW on RDBMS Vs SAP BW on HANA.

Different aspects of SAP BW migration to HANA.

What is SAP BW?


SAP NetWeaver Business Warehouse (SAP NetWeaver BW) or also called SAP BW in short, is the
name of the Business Intelligence, analytical, reporting and Data Warehousing solution produced by
SAP.

Serving as a powerful Enterprise Data Warehouse application platform BW provides flexible


reporting and analysis tools. It is a packaged, comprehensive business intelligence product
centered around a data warehouse that is optimized for (but not limited to) the R/3 environment
from SAP.

What is SAP HANA?


SAP HANA is an in-memory database.
Check the article What is SAP HANA? for more details.

What is SAP BW on HANA?

SAP BW on HANA is nothing but SAP's existing NetWeaver BW data warehouse, running on SAP
HANA.
SAP now supports SAP HANA as the underlying database for the NetWeaver BW Data
Warehouse.

Because SAP HANA is much faster than regular relational databases like Oracle or Microsoft SQL
Server, the data warehouse performs much faster.
The purpose of SAP BW on HANA is to combine the power of both.

What are the drawbacks of SAP BW on a RDBMS like Oracle?


Time to change:
Any change in structure or report takes too much time. Normally it takes 3-12 months to get a
change in structure or report.

Flexibility:
It lacks the granular level details. In SAP BW data is aggregated and materialized so users can't get
the way they need or at the granularity they require.

Performance:
Every report has to be tuned for acceptable performance.

Real time data not available:


Real time data are not available for analysis. Data is moved to SAP BW in batches and takes time.

SAP BW on HANA Benefits:


SAP BW on HANA is smarter, simpler and more efficient.

Customer value of SAP BW on HANA:

Excellent query performance for improved decision making

Performance boost for Data Load processes for decreased data latency

Accelerated In-Memory planning capabilities for faster planning scenarios

Flexible - combine EDW with HANA-native data for real-time insights and decision
making

Data persistency layers are cut off and reduced administration efforts

Simplified data modeling and remodeling

Faster decision-making - Having the right information when you need it


In today's business world, fast access and manipulation is required on top of massive data stores.
This is beyond the capabilities of traditional disk-based systems.

SAP HANA helps your SAP NetWeaver Business Warehouse run better than ever. HANA enables
you to analyze large amounts of data, from virtually any source, in near real time, making it possible
to access reports with up-to-the-minute information. As an example, having the most current order
and logistics information makes it possible to manage your inventory more efficiently, and to predict
Available to Promise (ATP) more accurately.

Lower Total Cost of Ownership (TCO):


When you take into account the cost of hardware, software licenses, maintenance, performance
tuning and project development, SAP BW is always cheaper in terms of TCO when running on
HANA.

Whether HANA also is cheaper in terms of Total Cost of Acquisition (TCA) depends on where you
are in your procurement lifecycle. Do you have to replace hardware anyhow? Can you free up
resources on expensive UNIX equipment? Can you reuse the Oracle or DB2 licenses elsewhere or
save on maintenance revenue? Do you have to complete a SAP system upgrade as well? Do you
save substantially on storage costs? Many times, the answer is yes!

Simplified configuration and operational management:


Non-disruptive innovation and advanced administrative tools.
The current business processes inside BW can stay as they are and will mesh perfectly with HANA.
System operation stays as it is, and process chains do not need to be remodeled.

With HANA there is no need to retrain end users familiar with BW. There is still the same BW
application process but, with BW on SAP HANA, it is now possible to run queries, updates and
reports much faster than before. Expert users do not need to get retrained because they can
continue to use their current BI or other frontend tools.

In addition, HANA supports the BW Analysis Authorization Concept, and can be integrated with
NetWeaver Identity Management to ensure security remains intact.

When should customers move to SAP BW on HANA?


The answer should be: now! The benefits are huge for most BW systems and it is now Generally
Available and stable. Check SAP BW on HANA - Customer Success Stories.

Migration to SAP BW on HANA:


The migration to SAP BW on HANA is quite simple and requires no additional modeling or
adaptation of existing information assets. BW is the first SAP application to be re-architected and
rewritten to take full advantage of the enhanced capabilities of HANA. Processing that would
traditionally be done at the application (ABAP) layer is pushed down into the database where it can
be optimally executed with the various calculation and aggregation engines.

BW includes two model types optimized for best performance on the HANA platform. Existing BW
models can be used, in most cases, with only a parameter setting change.

Existing BW client tools, like SAP Business Explorer, are supported by BW on SAP HANA. The

HANA database also supports direct clients like Microsoft Excel and SAP's Business Objects BI
(Business Intelligence) tools.

Do we need to rewrite all the code and stuff after SAP BW


migration to HANA?
Absolutely not! All your models will run just like they did before. There are definitely instances where
you may choose to optimize your models to run better on SAP HANA but this is not a requirement.

What do we need to do after migration to SAP HANA?


The answer is that you don't need to do anything. However in most cases you would choose to do
the following:

Remove SAP BW aggregates (they're only overheads in SAP HANA and this is done automatically
for you).

Convert your SAP BW cubes and DSOs to SAP HANA cubes and DSOs (a simple process that
improves performance and reduces space).
Note that everything is optional.

SAP BW on HANA - Customer Success Stories

In our previous article SAP BW on HANA we explored about the different aspects of SAP BW on
HANA.

In this article we will talk about customer success stories. Let us see some numbers and facts for
reference. These figures were reported by customers, which already have BW on HANA operating
in their IT landscape.

Red Bull
Red Bull started to run its SAP NetWeaver 7.3. Business Warehouse on HANA in 2011.

The migration:

Non-disruptive fast migration

Project completed in less than 2 weeks

Database size before migration: 1.5 TB; reduced by 80% after migration
The result:

Boost in load performance

Simplified architecture - InfoCubes no longer required as an aggregation layer

Data can be read directly out of the DataStore Object (DSO).

Replication of data to InfoCubes and Business Warehouse Accelerator (BWA)


Indexes no longer necessary

Utilities industry representitive


This utilities customer also upgraded to and operated its SAP Netweaver 7.3. BW on the latest
HANA database release.

Data Load Performance:

For DSO activation of 5.2 million records, performance improved by factor 32x.
Before: 21 hours 40 minutes - After: 40 minutes.

Data load into write-optimized DSO write optimized and InfoCube Upload with 50K
invoice header + 350K items. Load acceleration by factor 2.7x. Before: 1hour 30 minutes - After: 30
minutes.
Query Performance

Query with aggregated result set: Performance dramatically improved by factor 471x
for queries on aggregated data. Before: 471 seconds - After: 1 second.

Performance acceleration by factor 5x on granular data sets. Before: 308 seconds After: 64 seconds.
Massive Data Volume Reduction

From xRDBMS space of 4.3 TB reduced to 0.73 TB RAM on HANA

Average compression factor (column tables) 5.8x

Automotive industry representative


This automotive customer upgraded and migrated their data on BW 7.3 on HANA.
Data Load Performance:

InfoCube loading speeds up about 1.39x.

DSO activation accelerated 4.15x


Query Performance:

Compared to BW on a traditional relational Database Management System (DBMS)

BW queries were 8.5x faster

BO WebI execution, 11.6x faster

Response times with 40 millions data records in HANA-optimized InfoCube: 45


seconds; 26 times faster compared to the legacy DBMS

Power of Python Integrated with SAP HANA

Python is a high level, object-oriented programming language for the web.


Python is Easy to Write, Easy to Read and Easy to Understand.

SAP HANA works pretty well with python too.

This article will give a basic idea of how to use python on top of SAP HANA.

Note:Even if you are not aware of python, no worries. We will cover each and everything in detail
for you.

Python API:
There are several APIs available to connect python to SAP HANA. In this article we will use one of
the simplest one which is DB API .
dbapi API is Python DB API 2.0.

Configure Python API in SAP HANA:


1.

Navigate to the path where HANA client is installed and copy these 3 files.
__init__.py, dbapi.py, resultrow.py

By default it will saved in C:/Program Files/sap/hdbclient/hdbcli folder.


Note: If you have not installed the HANA client, then install it from here
2.

Go to the python folder under HDBClient folder and copy all 3 files into the Lib folder.
By default the location will be C:/Program Files/sap/hdbclient/Python/Lib

3.

Copy pyhdbcli.pdb, pyhdbcli.pyd files from hdbclient folder.


By default this location will be "C:/Program Files/sap/hdbclient"

4.

Go to the same Python/Lib folder and paste these 2 files.

That's all.Configuration is done!!

Connect to SAP HANA and Run SQL Queries using Python:


1.

Write below code in a notepad and save with .py extension.


import dbapi
## Replace SCHEMA1 with your schema
# assume HANA host id is abcd1234 and instance no is 00. user id is USER1 and password is
Password1
conn = dbapi.connect('abcd1234', 30015, 'USER1', 'Password1')
#Check if database connection was successful or not
print conn.isconnected()
# create a table
cursor = conn.cursor()
stmnt = 'Create column table SCHEMA1.table1 (ID integer, name varchar(10))'
cursor.execute(stmnt)
print 'table created'
# insert some data into table
stmnt = 'insert into SCHEMA1.table1 values (?, ?)'
cursor.execute(stmnt, (1,'A'))
cursor.execute(stmnt, (2,'B'))
print '2 records inserted into table'
# fetch table data
stmnt = 'select * from SCHEMA1.table1'
cursor.execute(stmnt)
result = cursor.fetchall()
print result

Provide your HANA system details and credential in the file.

Make sure you change the schema name with your schema.

Now save this file to hdbclientPython folder.


The default location of Python folder is "C:/Program Files/sap/hdbclient/Python"

Open Command prompt. Navigate to Python path and run the command
python filename.py

Image Processing in SAP HANA


In this article we will learn:

How image is saved in HANA database in BLOB format

How to upload image in SAP HANA

How to download image from HANA and show in HTML

What is a BLOB?
BLOB (Binary Large Object) is a field type for storing Binary Data.
BLOB could store a large chunk of data, document types and even media files like audio or video
files. In HANA the BLOB size can be up to 2 GB.

In HANA database, we can use BLOB data type to store image.

Create Image Table:


Lets first create a table in HANA to store images.
Copy and paste below script to create table.
--REPLACE <SCHEMA_NAME> WITH YOUR SCHEMA NAME
CREATE COLUMN TABLE "<SCHEMA_NAME>"."IMAGE_STORE"(
IMAGE_NAME NVARCHAR(20) PRIMARY KEY,
IMAGE_CONTENT BLOB
);

Download some sample images:


Download some sample images and save in a folder.
For this example I downloaded some images of SAP HANA and saved in C:\ImageToBeUploaded

Upload images into HANA database:


There are several ways of uploading images into HANA database. For example: Java, Python etc.
Here we will use Java program to upload image.
Note: You can either use a different eclipse tool for java or use the same HANA Studio to create a
Java project. If you are using HANA Studio, then make sure you have selected SAP HANA
Development perspective.

1.

Download the JAVA Project ImageUploader.zip and save it to your local computer.

2.

Open JAVA Eclipse (or HANA Studio) and create a Java project called "ImageUploader".

Go to File --> Import --> General and select "Archive File".

Click on browse and select the " ImageUploader.zip " file you downloaded in step 1. Click on finish.

Open ImageUploader.java file and change the HANA system details.

Run the Java program. You should see similar message in console.

We have successfully uploaded the image files into HANA database.

Have the images really uploaded to HANA?


You might be wondering that have the files really uploaded to HANA. Let us check that.
Open HANA studio and open SQL console. Check the output of IMAGE_STORE table.
SELECT * FROM "<SCHEMA_NAME>"."IMAGE_STORE";

The output shows that image files are uploaded in BLOB format.

Download image from SAP HANA:


We need to create a HANA XS project to download the image from HANA.

1.

Create a HANA XS project.

Download the zip file and extract it somewhere in your local system.

Copy all the content inside ImageProcessing folder.

Go to HANA Studio and paste everything inside

Open GetImage.xsjs file and change the YOUR_SCHEMA_NAME in first line to your schema name.
Make sure it is the same schema where you created the IMAGE_STORE table.

Right click on the project and select Team --> Activate.

Done!!
Now let us test this.

Open ImageBrowser.html file under ui folder. Right click and select Run As --> HTML
Search for the image that you uploaded. If the image is found it will display the image.

You can search with the name of the images you uploaded.

If you have any question or doubt, please leave a comment or contact us.

SAP HANA Live Overview

What is SAP HANA Live?

SAP HANA Live (previously known as SHAF SAP HANA Analytic Foundation) is solution for realtime reporting on HANA.
It is a separate package that comes with predefined SAP HANA content across the SAP Business
Suite.

What does SAP HANA Live provides?


SAP HANA Live provides SAP-delivered content (similar in concept like SAP BW content), in form of
SAP HANA calculation views for real-time operational reporting. The calculation views spans
across majority of ECC modules (FI, CO, MM, PP, SD, PS, CRM, GTS, AM and GRC).

The content is represented as a VDM - virtual data model, which is based on the transactional and
master data tables of the SAP Business Suite.

Currently more than 2000 views are delivered in HANA Live Package.

Architecture of HANA Live:

HANA Live calculation views are designed on top of SAP Business Suite tables. These views are
optimized for best performance and analytic purposes. These views form aVirtual Data Model
(VDM) that customers and partners can reuse.
Data provided by the virtual data model can be presented through multi-purpose analytical UIs,
such as SAP BusinessObjects BI Suite UIs, and domain-specific web applications.

The HANA Live views are divided into 3 categories:

Query Views: The views which are exposed for consumption by end user for reporting
needs.

Reusable views: Customers can build upon reusable views to create their own custom
query views.

Private Views: Views that are built on top of the tables and not intended to be changed.

Advantages of HANA Live:

SAP HANA Live for SAP Business Suite provides the following advantages compared to regular
reporting solutions:

Open
Any access to the reporting framework is based on standard mechanisms such as SQL or MDX. No
BW modeling or ABAP programming will be required.

Uniform
One approach is chosen for all SAP Business Suite applications, enabling a common reporting
across application boundaries.

Intuitive
The virtual data model hides the complexity and Customizing dependencies of our SAP Business
Suite data model to make data available without requiring a deep understanding of SAP models.

Fast
SAP HANA Live for SAP Business Suite features SAP HANA as the underlying computing engine, to
enable fast analytics on high data volumes.

Real-time
Since all reporting happens on primary data (or a real-time replication of it), there is no need to wait
for data warehousing loading jobs to finish. The cycle time from recording to reporting is

dramatically reduced.

How HANA Live helps customers?

If customers wants to create a new custom report or modify/enhance existing reports on native ECC
it takes lot of time to find right ABAP resources, coding in ABAP, testing and promoting to various
stages and finally release it.

With SAP HANA Live all customers have to do is edit existing Virtual models/Views provided by SAP
or create new HANA Models or Views to support new development in less time and all is happening
in virtual layer and the development is much efficient, faster and no need to know ABAP.

This reduces development time and there by cost of development and support, increases SAP
usability by a faster development time. We can easily create cross functional reporting across
various SAP modules

SAP HANA Live Overview - Part2


In the previous article SAP HANA Live Overview part 1, we explained the basic features and
functionalities of HANA Live. In this article, we will see some more important points about HANA
Live.

Deployment Options for HANA Live:

There are 2 deployment options for HANA Live.

Side-by-side scenario
In the side-by-side scenario, the database tables that are used by the SAP HANA Live products
need to be replicated from the corresponding SAP Business Suite back-end system into the SAP
HANA database. This is done using SAP Landscape Transformation Replication Server. If you want

to execute SAP HANA Live views, the data from the corresponding tables must be available.

SAP recommends to create all required tables as specified in the SAP Notes corresponding to the
SAP HANA Live products, and to replicate the data only for those tables that are used in executed
analytical scenarios. This ensures that no unnecessary data is replicated, that no unnecessary SAP
Landscape Transformation Replication Server resources are consumed, and that no unnecessary
DB memory is consumed.

Integrated scenario
In the integrated scenario, you do not need to create and replicate the database tables, as they are
already available in the SAP HANA database. They are maintained through the data dictionary of
the corresponding ABAP Application Server. Therefore, all steps regarding table creation and data
replication are not relevant in this scenario.

Since the ABAP server creates all tables in one specific database catalog schema (typically
<SAPSID>), this needs to be mapped to the authoring schema of the imported content packages.
See Schema Mapping.

Frontend tools for HANA Live reporting:

SAP HANA Live reports can be accessed by HTML 5, native Excel or SAP BusinessObjects
Business Analytics applications.

Following SAP BusinessObjects analytical solutions are recommended:

SAP Crystals Reports for Enterprise 4.0.4

SAP BusinessObjects Analysis (Edition for Microsoft Office) 1.0.3

SAP BusinessObjects Explorer

SAP BusinessObjects Dashboards 4.0.4

SAP Lumira

Backend tools to access HANA Live views:

HANA Live back-end can be accessed using the SAP HANA Studio. We can also use html-based
SAP HANA Live View Browser to access the structures and elements of the virtual data model.

SAP HANA Live Tools:


SAP HANA Live Browser:
With this application you can quickly and easily search, browse, tag and consume HANA analytical
content views using an internet web browser. This application is available as an option for business
users who want to consume HANA content views or preview data without using HANA studio.
You can access the application using the URL http://<HANA Server Host>:80<SAP HANA Instance
Number>/sap/hba/explorer

SAP HANA Live Authorization Assistant:


With the SAP HANA Live Authorization Assistant, you can provide users authorizations in the SAP
HANA system that is required to access business data displayed by the virtual data model of SAP
HANA Live.
SAP HANA Live Authorization Assistant is used to manage both the analytical privileges that are
restricting access to specific business data, and the object privileges that are controlling the
database views the user uses to report