Best Practices For SAP HANA Modelling and SAP Data Services Data Loading

Best Practices for SAP HANA Modelling and
SAP Data Services Data Loading
Dr. Bjarne Berg

PwC
Produced by Wellesley Information Services, LLC, publisher of SAPinsider. © 2016 Wellesley Information Services. All rights reserved.
In This Session
• We will explore SAP Data Services and how to load information into SAP HANA
• You will learn how to create transformations, merges, and joins
• We will look at the best practices of modeling in SAP HANA
• We will see step-by-step how to create calculation, attribute, and analytical views
• At the end of this session, you will know how to load data and create views to analyze the data
2
What We’ll Cover
• SAP Data Services
• SAP HANA
• Wrap-up
3
Data Services Overview
• SAP Data Services is a leading technology for enterprise information management providing
solutions for:
 Data integration
 Data quality
 Data profiling
 Text data processing
SAP Data Services transforms, refines, and delivers

trusted data for the EDW
4
Step-by-Step: Creating Batch Jobs
1. Create a new project and give it a 2. Right-click on the project to create

relevant project name a new batch job
The practice of giving relevant names to

your projects and batch jobs is useful for
organization purposes
5
Step-by-Step: Loading from Flat Files
1. Select the related batch job to

enter into its workspace
2. From the “Format” category in the

“Local Object Library” panel, right-click
on “Flat Files” and select “New”
6
Examples of Other Available Data Sources
• There are many other data sources that can be used in Data Services
Use the local object library to find existing data

sources under the “Datastore” category
Can upload more files under the “Format” category

7
Formatting the Flat File
3. In the “File Format Editor” popup, fill in the appropriate fields
Date format must match

data format
“Tab” was chosen because data fields

were separated by tabs
8
Defining Table Fields
4. Enter in the field properties
Notice the updated view below
9
Preview Data
5. In the Repository, under Format, right-click and select “View Data” to preview the newly added data source
This allows you to check if the source data

populated without error before using the data
10
SAP Data Services
• In this section:
 Data Services overview, creating batch jobs and loading from flat files
 Building transforms and using functions
 Creating table joins and Utilizing data merging
11
Transforms Overview
• Transforms are built-in objects that process source data to
bring about desired outputs
• Most commonly used transform is Query Transform
• Query Transform enables you to:
 Filter and select data from a source
 Join data from multiple sources

 Map columns from input to output schemas
 Perform data nesting and unnesting
 Add new columns to the output schema

 Assign primary keys to output schema
12
Adding a Data Flow Object to the Workspace
• The tool palette contains icons which allow the creation of new objects in the workspace
1. Drag a data flow icon from the

tool palette to the workspace
2. Double click on the data flow to

enter its workspace
When creating a reusable object, such as a data flow object, it will

automatically appear in the local object library
13
Adding a Data Source to the Workspace
1. Drag a data source (i.e., flat file)

from the local object library on to
the workspace
2. Create a connection between

the data source and query
14
Query Editor Overview
• The query editor is a graphical interface for carrying out query operations
• It contains three areas:
Schema out area

Schema in area
3. Double-click on the query
transform to open the “Query
Editor”
Parameters area
15
Setting Up the Output Table
4. Drag the desired output fields to “Schema Out”

from the “Schema In” section
It is not necessary to drag all fields from schema in to schema out

unless you want all the fields to appear in schema out
16
Creating a New Output Column
1. Right-click on an output field and
select “New Output Column”
2. Select where to insert the new column
New columns can be created to

display results from calculations
17
Defining Column Properties
3. The “Column Properties” will pop up

for you to define and rename the new
column and its properties
Give the column a descriptive name

that properly identifies what the column
is used for
18
Using Functions
1. Double-click in the cell under

“Mapping”
3. Select the appropriate category and

then the specific function
2. Click on
“Functions”
For this demo, we want to calculate the number of days a case was open
19
Setting Up the Function
Use the drop-down list to state the input

parameters to avoid typos
4. Define the input parameters

for the function
Notice the updated code in this panel for the NO_DAYS_CASE_OPEN

column after defining the input parameters. This formula will deliver the
number of days from ODATE to CDATE, giving us a measurement of how
long it takes to close a case.
20
Adding an Output Table to the Workspace
1. Drag and drop a table template in the A template table is an object that can
workspace to be our output table be used as a target for data to populate
in when a job gets executed
successfully and can also be saved in
the object library for use as a data
source at a later time
A template table allows us to view

the specific information we want
without the risk of altering the
2. Link the query to the template table source data. The data that gets
populated in the template table is
based on the output schema
requirements in the query
transform.
21
Executing a Job
1. Right-click on the job and select execute
To analyze any issues that may occur

during data loading, click “Enable
auditing” and make sure that “Use
collected statistics” is checked
22
Job Log Overview
• The log file displays a list of actions in the job execution
• If any errors occur, the error icon will appear. Otherwise, “Job is completed successfully” will be
displayed
• The job log has five columns:

 Pid: Process thread identification number of the executing thread
 Tid: Thread identification number of the thread
 Number: Number prefix of the error followed by a number
 Time Stamp: Date and time the thread generated a message
 Message: Error description of the thread
23
Job Log Overview (cont.)
A successful job execution
Double-click on the error icon to view the list

A job with errors will show the error icon of errors as shown below
24
How to Preview the Output Table
1. Click on the Data Flow to open its workspace
2. Click on the magnify glass of the output

table to view data in the output table
Notice that the column created earlier is formatted correctly as a

number and that the data is the result of the function defined 25
SAP Data Services
 Data Services overview, creating batch jobs and loading from flat files
 Building transforms and using functions
 Creating table joins and Utilizing data merging
26
Creating Table Joins
A join can be used to combine data from multiple sources into one target
Source 1
Source 2
Use the Query Transform FROM clause to join the

two sources: Query and Join
In this example, Source 1 has the Car Description for the case, while Source 2 has the Resolution to the
case. The query transform will combine the data from the two sources in the schema out section to produce
a result displaying the overall case solution.
27
Result from a Table Join
1. Once the tables have been joined in the 2. Enter the data flow workspace and click on the
query transform, execute the job as discussed magnifying glass to view the results in the output
in the earlier slides table
Notice in the output table below how the

Solution column from the Join source is
now combined with the fields from the
Query transform
28
Merges Overview
You can merge rows from two or more sources into a single data set
• All sources must have the same schema to execute the Merge Transform
• Same # of columns
• Same column names
• Columns must have the same data type
29
How to Create a Merge
1. To merge two sources, add a query
form to each source to format all the
data to be the same in both sources
2. Join the queries to a “Merge Transform”
3. When opening the “Merge Transform,” notice how all the

fields and data types match for all output and input fields
30
How to Avoid Creating Duplicated Data in Merges
4. To avoid duplicate rows, add a query

transform to display distinct rows only
5. Execute the job to complete the merged table

31
Demo of Data Loading with Data Services
32
SAP HANA
 SAP HANA overview
 Creating attribute views and analytical views
 Making calculation views
33
SAP HANA — In-Memory Options
• SAP HANA is sold as an in-memory appliance. This
means that both Software and Hardware are
included from the vendors.
• Currently you can buy SAP HANA solutions from

Cisco, Dell, Fujitsu, IBM, Lenovo, HP, NEC, Hauwei,
Silicon Graphics and others
• SAP HANA indexes and compresses the data from

a variety of sources, including ERP, and stores the
Source SAP SE, 2016
data in-memory SAP HANA can radically change the

way databases operate and make
systems dramatically faster
34
HANA Editions and Components
• While HANA is sold as an appliance, there are
Area Component ID
BC-DB-HDB
Component Name
SAP HANA database many internal components, and the edition you buy
BC-DB-HDB-ENG
BC-DB-HDB-PER
SAP HANA database engine
SAP HANA database persistence may contain different licenses to these
BC-DB-HDB-SYS
BC-DB-HDB-DBA
SAP HANA database interface
SAP HANA database/DBA cockpit components
BC-DB-HDB-POR SAP HANA DB Porting
BC-DB-HDB-BAC SAP HANA Backup and Recovery
BC-CCM-HAG SAP Host agent
Area Component ID Component Name
BC-DB-HDB-CCM SAP HANA CCMS
BC-HAN-SL-STP SAP HANA unified installer
BC-DB-HDB-CLI SAP HANA Clients (JDBC/ODBC)
Lifecycle BC-HAN-UPD Software Update Manager
BC-DB-HDB-R SAP HANA Integration with R
Management BC-DB-HDB-INS SAP HANA database installation
Platform BC-DB-HDB-SCR SAP HANA SQL scripts
BC-DB-HDB-UPG SAP HANA database upgrade
Edition BC-DB-HDB-MDX MDX engine: Microsoft Excel client
BC-HAN-DXC SAP HANA Direct Extractor Connection
BC-HAN-MOD SAP HANA Studio - Information Modeler Enterprise Edition EIM-DS SAP Data Services: ETL-based
BC-HAN-3DM Information Composer
BC-HAN-LOA SAP HANA Load Controller: log-based
BC-HAN-SRC SAP HANA UI toolkit (also have platform
edition components) BC-HAN-LTR SAP Landscape Transformation (SLT): trigger-based
BC-DB-HDB-TXT SAP HANA Text and Search features BC-HAN-REP Sybase Replication Server: log-based
BC-DB-HDB-DXC SAP HANA Direct extraction connector BI-BIP-CMC, BI-BIP BI Platform
BC-DB-HDB-SEC SAP HANA Security and User Mgmt BI-RA-WBI Web Intelligence
BC-DB-HDB-XS SAP HANA Application Services BI-RA-XL Dashboard Designer
BC-DB-HDB-AFL SAP HANA Advanced functions library End User Clients BI-RA-CR, BI-BIP-CRS SAP Crystal reports
BC-DB-HDB-AFL-PAL SAP HANA Predictive analysis library BI-RA-EXP SAP BusinessObjects Explorer
BC-DB-HDB-AFL-SOP SAP HANA Sales & Operations Planning BI-BIP-IDT Information Design Tool (for universes)
BC-DB-HDB-PLE SAP HANA Planning Engine BI-RA-AO-XLA Microsoft Excel add-in
35
Hardware Options as of July 2016 (changes often)
36
SAP HANA
 Creating attribute views and building analytical views
37
Attribute Views — Overview
• Master data reporting can be modeled using attribute views
• Can be regarded as Master Data Tables
• Can be linked to fact tables in Analytic Views
• A measure, e.g., weight, can be defined as an attribute
38
Creating a New Attribute View
1. Open HANA Studio and expand

the “Content” folder
2. Right-click on the appropriate

package in your system
3. Navigate to New  Attribute

View …
39
Naming the New Attribute View
1. Give the view a name

2. Add a description
The name and description that is

provided should accurately describe the
Attribute view you want to create
3. Finish and start adding and joining tables

to the view
40
Adding Tables to the Data Foundation
1. Open the “Catalog” folder 3. Expand the “Tables” folder
2. Expand the system 4. Drag the necessary table to the “Data
Foundation”
41
Adding More Tables to the Data Foundation
Add tables into the data foundation by

dragging another table to the data foundation
area
Join type is set using the Properties panel
The first table that was added

will be on the left in the
“Details” panel
42
Applying Filters to the View
• Filters can be used to
limit the data being
displayed
• Right-click on the attribute

you want to filter on and
select “Apply Filter” from
the context menu
This example shows the creation of a filter on the “VALID_TO” date field. Setting that value to
“9999-12-31” forces the result set to only show values that are always valid.
43
Making Attributes Visible to End Users
1 & 2. To make an Attribute visible to users, simply click the circle beside each attribute
3. An attribute can be set to a

key or changed to a certain type
of label
Save and Validate once

complete
44
Analytic View — Overview
• Logically close to ‘star-schemas’ modeling

• Join together one central fact table containing measures for reporting
• Can consist of calculated measures and variables
• Analytic views do not store data
• Data is found in the column store table or view based
on Analytic view structure
An example of an analytic view might be sales by product,

customer, and organizational entity
45
Adding a New Analytic View
1. Find the appropriate package

2. Right click and choose “New  Analytic View”
3. Provide a technical name and a

description in the popup that follows
Make sure that the “View Type”

dropdown is set to Analytic View
46
Adding Fields to the Output
Add tables to the data foundation by clicking and dragging tables to it
You should also select which attributes will be shown in the output by
selecting the gray circles next to each item
47
Setting Attributes and Measures
• In the semantic layer, you

can assign attributes and
measures to the items that
were selected to be in the
output
• This is necessary for

attributes and measures to
be displayed and aggregated
properly in the reporting
layer
48
Joining Tables
In the “Logical Join,” two

or more tables must be
joined together on fields
that are identical or that
share the same results
1. Select the “Logical Join” node

2. Drag another view or table into the node
3. Drag from one view to the other on the common field (i.e., Product to Product)
By default, this creates a referential join of the table to the “Data Foundation”
49
Creating a New Calculated Column
Now we will add a new calculated field called “Net
Sales”
Using the “Advanced” tab, you can set the type of value,
such as currency or percentage
50
Demo — Building Attribute and Analytical Views
51
SAP HANA
 Creating attribute views and building analytical views
52
Creating a New Calculation View
• A calculation view will now be
created to join together other
tables and views and utilize
calculations and aggregations to
analyze the data
1. Right click on the appropriate

package
2. In the context menu, click

“New  Calculation View”
53
Naming the New Calculation View
Give the calculation view a proper name and label
The “Copy From” option can be used to copy

and extend an existing calculation view without
editing the original view or having to create a
new one each time
54
Propagate to Semantics
In the projection layer, right-

click on the attributes you
want to display in the
semantic layer and choose
“Propagate to Semantics”
If you choose “Add to Output” instead, that field in every node

will have to be activated manually
55
Creating a New Calculation in the View
• Calculated columns are used to derive

some meaningful information in the form
of columns from existing columns
1. Give the column a proper name

2. Set the “Data Type”
3. Choose a function
4. Select the text within the parentheses
5. Choose an element (or attribute in your table)
6. Validate the syntax
You can add your own calculations to the

calculation view just as in the analytic view
56
Aggregation — Overview
• Aggregation Node – columns will be rolled up or aggregated when placed in this layer
Customer Product Amount

1 1 20
With an aggregated column on customer 1 1 20
and amount, you would get a data set that
2 2 30
looks like the following:
3 3 25
4 4 20
Customer Amount
1 40
Customer 1’s amounts were added up, so 2 30
there is one less row to display 3 25
4 20
57
Adding a Calculated Column to the Aggregation Layer
• In the aggregation node, calculated columns can be added as aggregated columns
If calculations are not added to a

projection layer and then sent to an
aggregation node, the totals will not
work properly in reporting
58
Assigning Column Types to the View
• In the semantics layer, each item

needs to be assigned the “type”
attribute or measure
1. Click on the “Semantics” node

2. Click the “Auto Assign” button to
automatically assign the “Type”
3. If any of the types are incorrect, you
can manually adjust them
• Once all assignments are

complete, save and validate the
view
You can set each of these types manually,
but the automatic assignments are usually correct
59
What We’ll Cover
• SAP Data Services
• SAP HANA
• Wrap-up
60
Where to Find More Information
• www.sap-press.com/products/SAP-HANA%3A-An-Introduction-(2nd-Edition).html
 Bjarne Berg and Penny Silvia, SAP HANA: An introduction (SAP PRESS, 2014).
• www.saphana.com/welcome
 SAP’s main page for all SAP HANA-related information
• www.saphana.com/community/try
 Try HANA for free
• http://scn.sap.com/community/hana-in-memory
 SAP HANA and In-Memory Computing by SAP HANA Community
61
7 Key Points to Take Home
• SAP Data Services transforms, refines, and delivers trusted data for the Enterprise Data Warehouse
• Multiple data sources can be used for Data Services, including Flat Files, DTDs, XML Schemas, Excel
Workbooks, and more
• Utilize built-in transforms, which are objects that process source data to bring about desired outputs
• SAP HANA indexes data from a variety of sources and stores the results on a dedicated server
• Attributes add details and can be modeled using Attribute Views
• Analytic views join together one central fact table consisting of calculated measures and variables for
reporting
• Calculation views bring together database tables, attribute views, analytic views, and other calculation
views
62
Your Turn!
How to contact me:

Dr. Berg
Bjarne.Berg@pwc.com
Please remember to complete your session evaluation
63
Disclaimer
SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP SE (or an SAP affiliate company) in Germany and other
countries. All other product and service names mentioned are the trademarks of their respective companies. Wellesley Information Services is neither owned nor controlled by SAP SE.
64

Best Practices For SAP HANA Modelling and SAP Data Services Data Loading

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Best Practices For SAP HANA Modelling and SAP Data Services Data Loading

Uploaded by

Copyright:

Available Formats

Best Practices for SAP HANA Modelling and

SAP Data Services Data Loading

Dr. Bjarne Berg

 Text data processing

SAP Data Services transforms, refines, and delivers

1. Create a new project and give it a 2. Right-click on the project to create

The practice of giving relevant names to

1. Select the related batch job to

2. From the “Format” category in the

Use the local object library to find existing data

Can upload more files under the “Format” category

Date format must match

“Tab” was chosen because data fields

Notice the updated view below

This allows you to check if the source data

 Join data from multiple sources

 Add new columns to the output schema

1. Drag a data flow icon from the

2. Double click on the data flow to

When creating a reusable object, such as a data flow object, it will

1. Drag a data source (i.e., flat file)

2. Create a connection between

Schema out area

4. Drag the desired output fields to “Schema Out”

It is not necessary to drag all fields from schema in to schema out

2. Select where to insert the new column

New columns can be created to

3. The “Column Properties” will pop up

Give the column a descriptive name

1. Double-click in the cell under

3. Select the appropriate category and

Use the drop-down list to state the input

4. Define the input parameters

Notice the updated code in this panel for the NO_DAYS_CASE_OPEN

A template table allows us to view

1. Right-click on the job and select execute

To analyze any issues that may occur

• The job log has five columns:

A successful job execution

Double-click on the error icon to view the list

1. Click on the Data Flow to open its workspace

2. Click on the magnify glass of the output

Notice that the column created earlier is formatted correctly as a

Use the Query Transform FROM clause to join the

Notice in the output table below how the

2. Join the queries to a “Merge Transform”

3. When opening the “Merge Transform,” notice how all the

4. To avoid duplicate rows, add a query

5. Execute the job to complete the merged table

• Currently you can buy SAP HANA solutions from

• SAP HANA indexes and compresses the data from

data in-memory SAP HANA can radically change the

• Master data reporting can be modeled using attribute views

• Can be regarded as Master Data Tables

• Can be linked to fact tables in Analytic Views

• A measure, e.g., weight, can be defined as an attribute

1. Open HANA Studio and expand

2. Right-click on the appropriate

3. Navigate to New  Attribute

1. Give the view a name

The name and description that is

3. Finish and start adding and joining tables

Add tables into the data foundation by