You are on page 1of 64

Best Practices for SAP HANA Modelling and

SAP Data Services Data Loading

Dr. Bjarne Berg


PwC
Produced by Wellesley Information Services, LLC, publisher of SAPinsider. © 2016 Wellesley Information Services. All rights reserved.
In This Session
• We will explore SAP Data Services and how to load information into SAP HANA
• You will learn how to create transformations, merges, and joins
• We will look at the best practices of modeling in SAP HANA
• We will see step-by-step how to create calculation, attribute, and analytical views
• At the end of this session, you will know how to load data and create views to analyze the data

2
What We’ll Cover
• SAP Data Services
• SAP HANA
• Wrap-up

3
Data Services Overview
• SAP Data Services is a leading technology for enterprise information management providing
solutions for:

 Data integration

 Data quality

 Data profiling

 Text data processing

SAP Data Services transforms, refines, and delivers


trusted data for the EDW
4
Step-by-Step: Creating Batch Jobs

1. Create a new project and give it a 2. Right-click on the project to create


relevant project name a new batch job

The practice of giving relevant names to


your projects and batch jobs is useful for
organization purposes
5
Step-by-Step: Loading from Flat Files

1. Select the related batch job to


enter into its workspace

2. From the “Format” category in the


“Local Object Library” panel, right-click
on “Flat Files” and select “New”

6
Examples of Other Available Data Sources
• There are many other data sources that can be used in Data Services

Use the local object library to find existing data


sources under the “Datastore” category

Can upload more files under the “Format” category


7
Formatting the Flat File
3. In the “File Format Editor” popup, fill in the appropriate fields

Date format must match


data format

“Tab” was chosen because data fields


were separated by tabs

8
Defining Table Fields
4. Enter in the field properties

Notice the updated view below

9
Preview Data

5. In the Repository, under Format, right-click and select “View Data” to preview the newly added data source

This allows you to check if the source data


populated without error before using the data

10
SAP Data Services
• In this section:
 Data Services overview, creating batch jobs and loading from flat files
 Building transforms and using functions
 Creating table joins and Utilizing data merging

11
Transforms Overview
• Transforms are built-in objects that process source data to
bring about desired outputs
• Most commonly used transform is Query Transform
• Query Transform enables you to:
 Filter and select data from a source

 Join data from multiple sources


 Map columns from input to output schemas
 Perform data nesting and unnesting

 Add new columns to the output schema


 Assign primary keys to output schema

12
Adding a Data Flow Object to the Workspace
• The tool palette contains icons which allow the creation of new objects in the workspace

1. Drag a data flow icon from the


tool palette to the workspace

2. Double click on the data flow to


enter its workspace

When creating a reusable object, such as a data flow object, it will


automatically appear in the local object library
13
Adding a Data Source to the Workspace

1. Drag a data source (i.e., flat file)


from the local object library on to
the workspace

2. Create a connection between


the data source and query

14
Query Editor Overview
• The query editor is a graphical interface for carrying out query operations
• It contains three areas:

Schema out area


Schema in area
3. Double-click on the query
transform to open the “Query
Editor”
Parameters area

15
Setting Up the Output Table

4. Drag the desired output fields to “Schema Out”


from the “Schema In” section

It is not necessary to drag all fields from schema in to schema out


unless you want all the fields to appear in schema out

16
Creating a New Output Column
1. Right-click on an output field and
select “New Output Column”

2. Select where to insert the new column

New columns can be created to


display results from calculations
17
Defining Column Properties

3. The “Column Properties” will pop up


for you to define and rename the new
column and its properties

Give the column a descriptive name


that properly identifies what the column
is used for

18
Using Functions

1. Double-click in the cell under


“Mapping”

3. Select the appropriate category and


then the specific function

2. Click on
“Functions”

For this demo, we want to calculate the number of days a case was open
19
Setting Up the Function

Use the drop-down list to state the input


parameters to avoid typos

4. Define the input parameters


for the function

Notice the updated code in this panel for the NO_DAYS_CASE_OPEN


column after defining the input parameters. This formula will deliver the
number of days from ODATE to CDATE, giving us a measurement of how
long it takes to close a case.
20
Adding an Output Table to the Workspace
1. Drag and drop a table template in the A template table is an object that can
workspace to be our output table be used as a target for data to populate
in when a job gets executed
successfully and can also be saved in
the object library for use as a data
source at a later time

A template table allows us to view


the specific information we want
without the risk of altering the
2. Link the query to the template table source data. The data that gets
populated in the template table is
based on the output schema
requirements in the query
transform.
21
Executing a Job

1. Right-click on the job and select execute

To analyze any issues that may occur


during data loading, click “Enable
auditing” and make sure that “Use
collected statistics” is checked
22
Job Log Overview
• The log file displays a list of actions in the job execution
• If any errors occur, the error icon will appear. Otherwise, “Job is completed successfully” will be
displayed

• The job log has five columns:


 Pid: Process thread identification number of the executing thread
 Tid: Thread identification number of the thread
 Number: Number prefix of the error followed by a number
 Time Stamp: Date and time the thread generated a message
 Message: Error description of the thread

23
Job Log Overview (cont.)

A successful job execution

Double-click on the error icon to view the list


A job with errors will show the error icon of errors as shown below

24
How to Preview the Output Table

1. Click on the Data Flow to open its workspace

2. Click on the magnify glass of the output


table to view data in the output table

Notice that the column created earlier is formatted correctly as a


number and that the data is the result of the function defined 25
SAP Data Services
• In this section:
 Data Services overview, creating batch jobs and loading from flat files
 Building transforms and using functions
 Creating table joins and Utilizing data merging

26
Creating Table Joins
A join can be used to combine data from multiple sources into one target

Source 1

Source 2

Use the Query Transform FROM clause to join the


two sources: Query and Join

In this example, Source 1 has the Car Description for the case, while Source 2 has the Resolution to the
case. The query transform will combine the data from the two sources in the schema out section to produce
a result displaying the overall case solution.
27
Result from a Table Join
1. Once the tables have been joined in the 2. Enter the data flow workspace and click on the
query transform, execute the job as discussed magnifying glass to view the results in the output
in the earlier slides table

Notice in the output table below how the


Solution column from the Join source is
now combined with the fields from the
Query transform

28
Merges Overview

You can merge rows from two or more sources into a single data set

• All sources must have the same schema to execute the Merge Transform
• Same # of columns
• Same column names
• Columns must have the same data type

29
How to Create a Merge
1. To merge two sources, add a query
form to each source to format all the
data to be the same in both sources

2. Join the queries to a “Merge Transform”

3. When opening the “Merge Transform,” notice how all the


fields and data types match for all output and input fields
30
How to Avoid Creating Duplicated Data in Merges

4. To avoid duplicate rows, add a query


transform to display distinct rows only

5. Execute the job to complete the merged table


31
Demo of Data Loading with Data Services

32
SAP HANA
• In this section:
 SAP HANA overview
 Creating attribute views and analytical views
 Making calculation views

33
SAP HANA — In-Memory Options
• SAP HANA is sold as an in-memory appliance. This
means that both Software and Hardware are
included from the vendors.

• Currently you can buy SAP HANA solutions from


Cisco, Dell, Fujitsu, IBM, Lenovo, HP, NEC, Hauwei,
Silicon Graphics and others

• SAP HANA indexes and compresses the data from


a variety of sources, including ERP, and stores the
Source SAP SE, 2016

data in-memory SAP HANA can radically change the


way databases operate and make
systems dramatically faster

34
HANA Editions and Components
• While HANA is sold as an appliance, there are
Area Component ID
BC-DB-HDB
Component Name
SAP HANA database many internal components, and the edition you buy
BC-DB-HDB-ENG
BC-DB-HDB-PER
SAP HANA database engine
SAP HANA database persistence may contain different licenses to these
BC-DB-HDB-SYS
BC-DB-HDB-DBA
SAP HANA database interface
SAP HANA database/DBA cockpit components
BC-DB-HDB-POR SAP HANA DB Porting
BC-DB-HDB-BAC SAP HANA Backup and Recovery
BC-CCM-HAG SAP Host agent
Area Component ID Component Name
BC-DB-HDB-CCM SAP HANA CCMS
BC-HAN-SL-STP SAP HANA unified installer
BC-DB-HDB-CLI SAP HANA Clients (JDBC/ODBC)
Lifecycle BC-HAN-UPD Software Update Manager
BC-DB-HDB-R SAP HANA Integration with R
Management BC-DB-HDB-INS SAP HANA database installation
Platform BC-DB-HDB-SCR SAP HANA SQL scripts
BC-DB-HDB-UPG SAP HANA database upgrade
Edition BC-DB-HDB-MDX MDX engine: Microsoft Excel client
BC-HAN-DXC SAP HANA Direct Extractor Connection
BC-HAN-MOD SAP HANA Studio - Information Modeler Enterprise Edition EIM-DS SAP Data Services: ETL-based
BC-HAN-3DM Information Composer
BC-HAN-LOA SAP HANA Load Controller: log-based
BC-HAN-SRC SAP HANA UI toolkit (also have platform
edition components) BC-HAN-LTR SAP Landscape Transformation (SLT): trigger-based
BC-DB-HDB-TXT SAP HANA Text and Search features BC-HAN-REP Sybase Replication Server: log-based
BC-DB-HDB-DXC SAP HANA Direct extraction connector BI-BIP-CMC, BI-BIP BI Platform
BC-DB-HDB-SEC SAP HANA Security and User Mgmt BI-RA-WBI Web Intelligence
BC-DB-HDB-XS SAP HANA Application Services BI-RA-XL Dashboard Designer
BC-DB-HDB-AFL SAP HANA Advanced functions library End User Clients BI-RA-CR, BI-BIP-CRS SAP Crystal reports
BC-DB-HDB-AFL-PAL SAP HANA Predictive analysis library BI-RA-EXP SAP BusinessObjects Explorer
BC-DB-HDB-AFL-SOP SAP HANA Sales & Operations Planning BI-BIP-IDT Information Design Tool (for universes)
BC-DB-HDB-PLE SAP HANA Planning Engine BI-RA-AO-XLA Microsoft Excel add-in

35
Hardware Options as of July 2016 (changes often)

36
SAP HANA
• In this section:
 SAP HANA overview
 Creating attribute views and building analytical views
 Making calculation views

37
Attribute Views — Overview

• Master data reporting can be modeled using attribute views

• Can be regarded as Master Data Tables

• Can be linked to fact tables in Analytic Views

• A measure, e.g., weight, can be defined as an attribute

38
Creating a New Attribute View

1. Open HANA Studio and expand


the “Content” folder

2. Right-click on the appropriate


package in your system

3. Navigate to New  Attribute


View …

39
Naming the New Attribute View

1. Give the view a name


2. Add a description

The name and description that is


provided should accurately describe the
Attribute view you want to create

3. Finish and start adding and joining tables


to the view

40
Adding Tables to the Data Foundation
1. Open the “Catalog” folder 3. Expand the “Tables” folder
2. Expand the system 4. Drag the necessary table to the “Data
Foundation”

41
Adding More Tables to the Data Foundation

Add tables into the data foundation by


dragging another table to the data foundation
area

Join type is set using the Properties panel

The first table that was added


will be on the left in the
“Details” panel

42
Applying Filters to the View
• Filters can be used to
limit the data being
displayed

• Right-click on the attribute


you want to filter on and
select “Apply Filter” from
the context menu

This example shows the creation of a filter on the “VALID_TO” date field. Setting that value to
“9999-12-31” forces the result set to only show values that are always valid.
43
Making Attributes Visible to End Users

1 & 2. To make an Attribute visible to users, simply click the circle beside each attribute

3. An attribute can be set to a


key or changed to a certain type
of label

Save and Validate once


complete

44
Analytic View — Overview

• Logically close to ‘star-schemas’ modeling


• Join together one central fact table containing measures for reporting
• Can consist of calculated measures and variables
• Analytic views do not store data
• Data is found in the column store table or view based
on Analytic view structure

An example of an analytic view might be sales by product,


customer, and organizational entity
45
Adding a New Analytic View

1. Find the appropriate package


2. Right click and choose “New  Analytic View”

3. Provide a technical name and a


description in the popup that follows

Make sure that the “View Type”


dropdown is set to Analytic View

46
Adding Fields to the Output

Add tables to the data foundation by clicking and dragging tables to it

You should also select which attributes will be shown in the output by
selecting the gray circles next to each item
47
Setting Attributes and Measures

• In the semantic layer, you


can assign attributes and
measures to the items that
were selected to be in the
output

• This is necessary for


attributes and measures to
be displayed and aggregated
properly in the reporting
layer

48
Joining Tables

In the “Logical Join,” two


or more tables must be
joined together on fields
that are identical or that
share the same results

1. Select the “Logical Join” node


2. Drag another view or table into the node
3. Drag from one view to the other on the common field (i.e., Product to Product)

By default, this creates a referential join of the table to the “Data Foundation”
49
Creating a New Calculated Column
Now we will add a new calculated field called “Net
Sales”

Using the “Advanced” tab, you can set the type of value,
such as currency or percentage
50
Demo — Building Attribute and Analytical Views

51
SAP HANA
• In this section:
 SAP HANA overview
 Creating attribute views and building analytical views
 Making calculation views

52
Creating a New Calculation View
• A calculation view will now be
created to join together other
tables and views and utilize
calculations and aggregations to
analyze the data

1. Right click on the appropriate


package

2. In the context menu, click


“New  Calculation View”

53
Naming the New Calculation View

Give the calculation view a proper name and label

The “Copy From” option can be used to copy


and extend an existing calculation view without
editing the original view or having to create a
new one each time

54
Propagate to Semantics

In the projection layer, right-


click on the attributes you
want to display in the
semantic layer and choose
“Propagate to Semantics”

If you choose “Add to Output” instead, that field in every node


will have to be activated manually
55
Creating a New Calculation in the View

• Calculated columns are used to derive


some meaningful information in the form
of columns from existing columns

1. Give the column a proper name


2. Set the “Data Type”
3. Choose a function
4. Select the text within the parentheses
5. Choose an element (or attribute in your table)
6. Validate the syntax

You can add your own calculations to the


calculation view just as in the analytic view
56
Aggregation — Overview
• Aggregation Node – columns will be rolled up or aggregated when placed in this layer

Customer Product Amount


1 1 20
With an aggregated column on customer 1 1 20
and amount, you would get a data set that
2 2 30
looks like the following:
3 3 25
4 4 20

Customer Amount
1 40
Customer 1’s amounts were added up, so 2 30
there is one less row to display 3 25
4 20

57
Adding a Calculated Column to the Aggregation Layer
• In the aggregation node, calculated columns can be added as aggregated columns

If calculations are not added to a


projection layer and then sent to an
aggregation node, the totals will not
work properly in reporting

58
Assigning Column Types to the View

• In the semantics layer, each item


needs to be assigned the “type”
attribute or measure

1. Click on the “Semantics” node


2. Click the “Auto Assign” button to
automatically assign the “Type”
3. If any of the types are incorrect, you
can manually adjust them

• Once all assignments are


complete, save and validate the
view
You can set each of these types manually,
but the automatic assignments are usually correct
59
What We’ll Cover
• SAP Data Services
• SAP HANA
• Wrap-up

60
Where to Find More Information
• www.sap-press.com/products/SAP-HANA%3A-An-Introduction-(2nd-Edition).html
 Bjarne Berg and Penny Silvia, SAP HANA: An introduction (SAP PRESS, 2014).
• www.saphana.com/welcome
 SAP’s main page for all SAP HANA-related information
• www.saphana.com/community/try
 Try HANA for free
• http://scn.sap.com/community/hana-in-memory
 SAP HANA and In-Memory Computing by SAP HANA Community

61
7 Key Points to Take Home
• SAP Data Services transforms, refines, and delivers trusted data for the Enterprise Data Warehouse
• Multiple data sources can be used for Data Services, including Flat Files, DTDs, XML Schemas, Excel
Workbooks, and more
• Utilize built-in transforms, which are objects that process source data to bring about desired outputs
• SAP HANA indexes data from a variety of sources and stores the results on a dedicated server
• Attributes add details and can be modeled using Attribute Views
• Analytic views join together one central fact table consisting of calculated measures and variables for
reporting
• Calculation views bring together database tables, attribute views, analytic views, and other calculation
views

62
Your Turn!

How to contact me:


Dr. Berg
Bjarne.Berg@pwc.com

Please remember to complete your session evaluation

63
Disclaimer
SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP SE (or an SAP affiliate company) in Germany and other
countries. All other product and service names mentioned are the trademarks of their respective companies. Wellesley Information Services is neither owned nor controlled by SAP SE.

64

You might also like