Professional Documents
Culture Documents
Alex Kotopoulis
Senior Principal Product Manager
Page 2 of 42
Architecture Overview
This Hands-on lab is based on a fictional movie streaming company that provides online access
to movie media. The goal of this lab is to load customer activity data that includes movie rating
actions as well as a movie database sourced from a MySQL DB into Hadoop Hive, aggregate
and join average ratings per movie, and load this data into an Oracle DB target.
Flume
Log-stream
Logs
Task 3:
Hive Map
Avg. Movie
Ratings
Hive
movie_rating
Task 2:
Sqoop Map
MySQL
Movie
Task 6:
OGG Load
Task 4:
OLH Map
Hive
movie
Task 1:
Topology
and Models
Oracle
MOVIE_RATING
Task 5:
ODI Package
Overview
Time to Complete
Perform all 6 tasks 60 Minutes
Prerequisites
Before you begin this tutorial, you should
Page 3 of 42
2. In the Start/Stop Services window, scroll down with arrow keys to ORCL Oracle
Database 12c and select it. Press OK.
Note: The ORCL option is initially not visible, you need to scroll down.
Page 4 of 42
4. Within the Physical Architecture accordion on the left, expand the Technologies folder
Note: For this HOL the setting Hide Unused Technologies has been set to hide all
technologies without a configured dataserver.
Page 5 of 42
Info: A technology is a type of datasource that can be used by ODI as source, target, or other
connection. A data server is an individual server of a given technology, for example a database
server. A data server can have multiple schemas. ODI uses a concept of logical and physical
schemas to allow execution of the same mapping on different environments, for example on
development, QA, and production environments.
6. Double-click on the Hive data server (
) to review settings
Page 6 of 42
8. Switch to the Designer navigator and open the Models accordion. Expand all models.
Info: A model is a set of metadata definitions regarding a source such as a database schema or
a set of files. A model can contain multiple data stores, which follow the relational concept of
columns and rows and can be database tables, structured files, or XML elements within an
XML document .
Page 7 of 42
Logs
Task 3:
Hive Map
Avg. Movie
Ratings
Hive
movie_rating
Task 2:
Sqoop Map
MySQL
Movie
Task 6:
OGG Load
Task 4:
OLH Map
Hive
movie
Task 1:
Topology
and Models
Oracle
MOVIE_RATING
Task 5:
ODI Package
1. The first mapping to be created will load the MySQL Movie table into the Hive movie
table
To create a new mapping, open the Project accordion within the Designer navigator:
Page 8 of 42
Info: A mapping is a data flow to move and transform data from sources into targets. It
contains declarative and graphical rules about data joining and transformation.
9. In the New Mapping dialog change the name to A - Sqoop Movie Load and press OK.
Page 9 of 42
4. For this mapping we will load the table MOVIE from model MySQL to the table movie
within the model HiveMovie.
To view the models open the Models accordion
5. Drag the datastore MOVIE from model MySQL as a source and the datastore movie
from Model HiveMovie as a target onto the mapping diagram panel.
Page 10 of 42
7. Click OK on the Attribute Matching dialog. ODI will map all same-name fields from
source to target.
8. The logical flow has now been setup. To set physical implementation click on the
Physical tab of the editor.
Page 11 of 42
9. The physical tab shows the actual systems involved in the transformation, in this case
the MySQL source and the Hive target.
In the physical tab users can choose the Load Knowledge Module (LKM) that controls
data movement between systems as well as the Integration Knowledge Module (IKM)
that controls transformation of data.
Select the access point MOVIE_AP to select an LKM.
Note: The KMs that will be used have already been imported into the project.
Info: A knowledge module (KM) is a template that represents best practices to perform an
action in an interface, such as loading from/to a certain technology (Load knowledge module
or LKM), integrating data into the target (Integration Knowledge Module or IKM), checking
data constraints (Check Knowledge Module or CKM), and others. Knowledge modules can be
customized by the user.
Page 12 of 42
Page 13 of 42
13. Review the list of IKM Options for this KM. These options are used to configure and
tune the Sqoop process to load data. Change the option TRUNCATE to true.
Page 14 of 42
15. Click OK for the run dialog. We will use all defaults and run this mapping on the local
agent that is embedded in the ODI Studio UI. After a moment a Session started dialog
will appear, press OK there as well.
16. To review execution go to the Operator navigator and expand the All Executions node
to see the current execution. The execution might not have finished, then it will show
the icon
for an ongoing task. You can refresh the view by pressing to refresh
once or
to refresh automatically every 5 seconds.
Page 15 of 42
18. Go to Designer navigator and Models and right-click HiveMovie.movie. Select View
Data from the menu to see the loaded rows.
Page 16 of 42
Page 17 of 42
Logs
Task 3:
Hive Map
Avg. Movie
Ratings
Hive
movie_rating
Task 2:
Sqoop Map
MySQL
Movie
Task 6:
OGG Load
Task 4:
OLH Map
Hive
movie
Task 1:
Topology
and Models
Oracle
MOVIE_RATING
Task 5:
ODI Package
For this mapping we will use two Hive source tables movie and movieapp_log_avro as sources
and the Hive table movie_rating as target.
1. To create a new mapping, open the Project accordion within the Designer navigator,
expand the Big Data HOL > First Folder folder, and right-click on Mappings and click
New Mapping
Page 18 of 42
3. Open the Models accordion and expand the model HiveMovie. Drag the datastores
movie and movieapp_log_avro as sources and movie_rating as target into the new
mapping.
4. First we would like to filter the movie activities to only include rating activities (ID 1).
For this drag a Filter from the Component Palette behind the movieapp_log_avro
source.
Page 19 of 42
5. Drag the attribute activity from movieapp_log_avro onto the FILTER component. This
will connect the components and use the attribute activity in the filter condition.
6. Select the FILTER component and go to the Property Editor. Expand the section
Condition and complete the condition to movieapp_log_avro.activity = 1
Page 20 of 42
8. Drag and drop the attributes movieid and rating from movieapp_log_avro directly
onto AGGREGATE in order to map them. They are automatically routed through the
filter.
Page 21 of 42
10. Now we would like to join the aggregated ratings with the movie table to obtain
enriched movie information. Drag a Join component from the Component Palette to
the mapping.
Page 22 of 42
12. Highlight the JOIN component and go to the property editor. Expand the Condition
section and check the property Generate ANSI Syntax
Page 23 of 42
14. Click OK on the Attribute Matching dialog. ODI will map all same-name fields from
source to target.
Page 24 of 42
16. The logical flow has now been setup. Compare the diagram below with your actual
mapping to spot any differences. To set physical implementation click on the Physical
tab of the editor.
17. The physical tab shows that in this mapping everything is performed in the same
system, the Hive server. Because of this no LKM is necessary.
Select the target MOVIE_RATING to select an IKM.
Page 25 of 42
19. The mapping is now complete. Press the Run button on the taskbar above the mapping
editor. When asked to save your changes, press Yes.
20. Click OK for the run dialog. After a moment a Session started dialog will appear, press
OK there as well.
21. To review execution go to the Operator navigator and expand the All Executions node
to see the current execution.
Page 26 of 42
23. In the Session Task Editor that opens click on the Code tab on the left. The generated
SQL code will be shown. The code is generated from the mapping logic and contains a
WHERE condition, JOIN and GROUP BY statement that is directly related to the
mapping components.
Page 27 of 42
25. A data view editor appears with all rows of the movie_rating table in Hive.
Page 28 of 42
Logs
Task 3:
Hive Map
Avg. Movie
Ratings
Hive
movie_rating
Task 2:
Sqoop Map
MySQL
Movie
Task 6:
OGG Load
Task 4:
OLH Map
Hive
movie
Task 1:
Topology
and Models
Oracle
MOVIE_RATING
Task 5:
ODI Package
1. To create a new mapping, open the Project accordion within the Designer navigator,
expand the Big Data HOL > First Folder folder, and right-click on Mappings and click
New Mapping
Page 29 of 42
3. Open the Models accordion and expand the model HiveMovie. Drag the datastore
movie_rating as source into the new mapping. Then open model OracleMovie and
drag in the datastore MOVIE_RATING_ODI as a target.
4. Drag from the output port of the source movie_rating to the input port of the target
MOVIE_RATING_ODI.
Page 30 of 42
6. The logical flow has now been setup. To set physical implementation click on the
Physical tab of the editor.
Page 31 of 42
Page 32 of 42
10. The mapping is now complete. Press the Run button on the taskbar above the mapping
editor. When asked to save your changes, press Yes.
11. Click OK for the run dialog. We will use all defaults and run this mapping on the local
agent that is embedded in the ODI Studio UI. After a moment a Session started dialog
will appear, press OK there as well.
Page 33 of 42
Page 34 of 42
Page 35 of 42
Logs
Task 3:
Hive Map
Avg. Movie
Ratings
Hive
movie_rating
Task 2:
Sqoop Map
MySQL
Movie
Task 6:
OGG Load
Task 4:
OLH Map
Hive
movie
Task 1:
Topology
and Models
Oracle
MOVIE_RATING
Task 5:
ODI Package
1. To create a new package, open the Designer navigator and Project accordion on the
Big Data HOL / First Folder, then right-click on Packages and select New Package.
Info: A package is a task flow to orchestrate execution of multiple mappings and define
additional logic, such as conditional execution and actions such as sending emails, calling web
services, uploads/downloads, file manipulation, event handling, and others.
Page 36 of 42
Notice the green arrow on this mapping which means it is the first step.
4. Drag the mappings B Hive Calc Ratings and C OLH Load Oracle onto the panel
Page 37 of 42
7. The package is now setup and can be executed. To execute the interface click the
Execute ( ) button in the toolbar. When prompted to save click Yes.
8. Click OK in the Run dialog. After a moment a Session started dialog will appear, press
OK there as well.
9. To review execution, go to the Operator navigator and open the latest session
execution. The 3 steps are separately shown and contain the same tasks as the
mapping executions in the prior tutorials.
Page 38 of 42
Logs
Task 3:
Hive Map
Avg. Movie
Ratings
Hive
movie_rating
Task 2:
Sqoop Map
MySQL
Movie
Task 6:
OGG Load
Task 4:
OLH Map
Task 1:
Topology
and Models
Hive
movie
Oracle
MOVIE_RATING
Task 5:
ODI Package
Capture
Extract
EMOV
Trail
File
TM
Pump
Extract
PMOV
Java
Adapter
HDFS file
ogg_movie
MySQL table
MOVIE
EMOV.prm
Hive table
movie
Page 39 of 42
Page 40 of 42
7. Insert a new row into the MySQL table movie by executing the following command:
insert into MOVIE (MOVIE_ID,TITLE,YEAR,BUDGET,GROSS,PLOT_SUMMARY) values
(1, 'Sharknado 2', 2014, 500000, 20000000, 'Flying sharks attack city');
Note: Alternatively you can execute the following command:
source ~/movie/moviework/ogg/mysql_insert_movie.sql;
8. Go to the ODI Studio and open the Designer navigator and Models accordion. Rightclick on datastore HiveMovie.movie and select View Data.
Page 41 of 42
Summary
You have now successfully completed the Hands on Lab, and have successfully performed an
end-to-end load through a Hadoop Data Reservoir using Oracle Data Integrator and Oracle
GoldenGate. The strength of this products is to provide an easy-to-use approach to developing
performant data integration flows that utilize the strength of the underlying environments
without adding proprietary transformation engines. This is especially relevant in the age of Big
Data.
Page 42 of 42