Professional Documents
Culture Documents
Big Data Workshop Lab Guide
Big Data Workshop Lab Guide
Lab Guide
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
TABLE OF CONTENTS
Big Data Workshop Lab Guide......................................................................................................... i
1. Introduction................................................................................................................................... 4
2. Hadoop Hello World...................................................................................................................... 7
2.1 Introduction to Hadoop............................................................................................................ 7
2.2 Overview of Hands on Exercise..............................................................................................8
2.3 Word Count............................................................................................................................. 8
2.4 Summary............................................................................................................................... 22
3. Pig Exercise................................................................................................................................ 23
3.1 Introduction to Pig................................................................................................................. 23
3.2 Overview Of Hands On Exercise........................................................................................... 23
3.3 Working with PIG.................................................................................................................. 23
3.4 Summary............................................................................................................................... 43
4. Hive Coding................................................................................................................................ 44
4.1 Introduction to Hive............................................................................................................... 44
4.2 Overview Of Hands On Exercise........................................................................................... 44
4.3 Queries with Hive.................................................................................................................. 44
4.4 Summary............................................................................................................................... 55
5. Oracle ODI and Hadoop............................................................................................................. 56
5.1 Introduction To Oracle Connectors........................................................................................ 56
5.2 Overview of Hands on Exercise............................................................................................ 57
5.3 Setup and Reverse Engineering in ODI................................................................................57
5.4 Using ODI to import text file into Hive...................................................................................64
5.5 Using ODI to import Hive Table into Oracle...........................................................................77
5.6 Using ODI to import Hive Table into Hive..............................................................................93
5.7 Summary............................................................................................................................. 109
6. Working with External Tables.................................................................................................... 110
6.1 Introduction to External Tables............................................................................................ 110
6.2 Overview of Hands on Exercise..........................................................................................110
6.3 Configuring External Tables.................................................................................................110
6.4 Summary............................................................................................................................. 120
7. Working with Mahout................................................................................................................ 121
7.1 Introduction to Mahout........................................................................................................ 121
7.2 Overview of Hands on Exercise.......................................................................................... 121
7.3 Clustering with K-means..................................................................................................... 121
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
1. INTRODUCTION
Big data is not just about managing petabytes of data. It is also about managing large numbers of
complex unstructured data streams which contain valuable data points. However, which data
points are the most valuable depends on who is doing the analysis and when they are doing the
analysis. Typical big data applications include: smart grid meters that monitor electricity usage in
homes, sensors that track and manage the progress of goods in transit, analysis of medical
treatments and drugs that are used, analysis of CT scans etc. What links these big data
applications is the need to track millions of events per second, and to respond in real time. Utility
companies will need to detect an uptick in consumption as soon as possible, so they can bring
supplementary energy sources online quickly. Probably the fastest growing area relates to location
data being collected from mobile always-on devices. If retailers are to capitalise on their
customers location data, they must be able to respond as soon as they step through the door.
In the conventional model of business intelligence and analytics, data is cleaned, cross-checked
and processed before it is analysed, and often only a sample of the data is used in the actual
analysis. This is possible because the kind of data that is being analysed - sales figures or stock
counts, for example can easily be arranged in a pre-ordained database schema, and because BI
tools are often used simply to create periodic reports.
At the center of the big data movement is an open source software framework called Hadoop.
Hadoop has become the technology of choice to support applications that in turn support petabytesized analytics utilizing large numbers of computing nodes. The Hadoop system consists of three
projects: Hadoop Common, a utility layer that provides access to the Hadoop Distributed File
System and Hadoop subprojects. HDFS acts as the data storage platform for the Hadoop
framework and can scale to massive size when distributed over numerous computing nodes.
Hadoop MapReduce is a framework for processing data sets across clusters of Hadoop nodes.
The Map and Reduce process splits the work by first mapping the input across the control nodes
of the cluster, then splitting the workload into even smaller data sets and distributing it further
throughout the computing cluster. This allows it to leverage massively parallel processing, a
computing advantage that technology has introduced to modern system architectures. With MPP,
Hadoop can run on inexpensive commodity servers, dramatically reducing the upfront capital costs
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
Avro is a data serialization system that converts data into a fast, compact binary data
format. When Avro data is stored in a file, its schema is stored with it
Chukwa is a large-scale monitoring system that provides insights into the Hadoop
distributed file system and MapReduce
Hive is a data warehouse infrastructure that provides ad hoc query and data
summarization for Hadoop- supported data. Hive utilizes a SQL-like query language call
HiveQL. HiveQL can also be used by programmers to execute custom MapReduce jobs
Data exploration of Big Data result sets requires displaying millions or billions of data points to
uncover hidden patterns or records of interest as shown below:
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
Many vendors are talking about Big Data in terms of managing petabytes of data. For example
EMC has a number of Big Data storage platforms such as it's new Isilon storage platform. In reality
the issue of big data is much bigger and Oracle's aim is to focus on providing a big data platform
which provides the following:
Deep Analytics a fully parallel, extensive and extensible toolbox full of advanced and
novel statistical and data mining capabilities
Massive Scalability the ability to scale analytics and sandboxes to previously unknown
scales while leveraging previously untapped data potential
Low Latency the ability to instantly act based on these advanced analytics in your
operational, production environment
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
NOTE: During this exercise you will be asked to run several scripts. If you would like to see the
content of these scripts type cat scriptName and the contents of the script will be displayed in
the terminal
2. To get into the folder where the scripts for the first exercise are, type in the terminal:
cd /home/oracle/exercises/wordCount
Then press Enter
3. Lets look at the java code which will run word count on a Hadoop cluster. Type in the
terminal:
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
gedit WordCount.java
Then press Enter
4. A new window will open with the java code for word count. We would like you to look at
line 14 and 28 of the code. You can see there the Mapper and Reducer Interfaces are
being implemented.
5. When you are done evaluating the code you can click on the X in the right upper corner of
the screen to close the window.
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
6. We can now go ahead and compile the Word Count code. We need run the compile.sh
script which will set the correct classpath and output directory while compiling
WordCount.java. Type in the terminal:
./compile.sh
Then press Enter
7. We can now create a jar file from the compile directory of Word Count. This jar file is
required as the code for word count will be sent to all of the nodes in the cluster and the
code will be run simultaneous on all nodes that have appropriate data. To create the jar file
in the terminal type:
./createJar.sh
Then press Enter
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
10
8. For the exercise to be more interesting we need to create some file on which word count
will be executed. To create some file go the terminal and type:
./createFiles.sh
Then press Enter
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
11
In the terminal window you will see the contents of the two files. Each file having 4 words
in it. Although these are quite small files the code would run identical with more than 2 file
and with files that are several Gigabytes of Terabytes in size.
10. Now that we have the files ready we must move them into the Hadoop File System
(HDFS). Hadoop can now work with file on other file systems; they must be within the
HDFS for them to be usable. It is also important to note that files which are within HDFS
are split into multiple chunks and stored on separate nodes for parallel parsing. To upload
our two file into the HDFS you need to use the copyFromLocal commanding Hadoop. Run
the command by typing at the terminal
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
12
hadoop
dfs
-copyFromLocal
/user/oracle/wordcount/input/file01
Then press Enter
file01
For convince you can also run the script copyFiles.sh and it will upload the files for you
so do not need to type in this and the next command.
11. We should now upload the second file. Go to the terminal and type:
hadoop
dfs
-copyFromLocal
/user/oracle/wordcount/input/file02
Then press Enter
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
file02
13
12. We can now run our MapReduce job to do a word count on the file we just uploaded. Go
the the terminal and typing:
hadoop
jar
WordCount.jar
org.myorg.WordCount
/user/oracle/wordcount/input /user/oracle/wordcount/output
Then press Enter
For your convenience you can also run the script runWordCount.sh and will run the
Hadoop job for you so do not need to type in the above command.
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
14
A lot of text will role by in the terminal window. This is informational data coming from the
Hadoop infrastructure to help track the status of the job. Wait for the job to finish, this is
signaled by the command prompt coming back.
13. Once you have you command prompt back your MapReduce task is complete. It is now
time to look at the results. We can display they results file right from the HDFS files by
using the cat command from Hadoop. Go to the terminal and type the following command:
hadoop dfs -cat /user/oracle/wordcount/output/part-00000
Then press Enter
For your convenience you can also run the script viewResults.sh and will run the
Hadoop command for you to see the results.
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
15
In the terminal the word count results are displayed. You will see that job counted the
number of times each word appears.
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
16
14. As an experiment lets try to run the Hadoop job again. Go to the terminal and type:
hadoop
jar
WordCount.jar
org.myorg.WordCount
/user/oracle/wordcount/input /user/oracle/wordcount/output
Then press enter
For your convenience you can also run the script runWordCount.sh and will run the
Hadoop job for you so do not need to type in the above command.
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
17
15. You will notice an error message appears and no map reduce task is executed. This is
easily explained by the immutability of data. Since Hadoop does not allow an update of
data files (just read and write) you cannot update the data in the results directory hence
the execution has nowhere to place to output. For you to re-run the Map-Reduce job you
must either point it to another output directory or clean out the current output directory.
Lets go ahead and clean out the previous output directory. Go to the terminal and type:
hadoop dfs -rmr /user/oracle/wordcount/output
Then press Enter
For convince you can also run the script deleteOutput.sh and it will delete the files for
you so do not need to type in this command
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
18
16. Now we have cleared the output directory and can re-run the map reduce task. Lets just
go ahead and make sure it works again. Go to the terminal and type:
hadoop
jar
WordCount.jar
org.myorg.WordCount
/user/oracle/wordcount/input /user/oracle/wordcount/output
Then press enter
For your convenience you can also run the script runWordCount.sh and will run the
Hadoop job for you so do not need to type in the above command.
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
19
Now the Map Reduce job ran fine again as per the output on the screen.
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
20
17. This completes the word count example. You can now close the terminal window; go to the
terminal window and type:
exit
Then press Enter
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
21
2.4 Summary
In this exercise you were able to see the basic steps required in setting up and running a very
simple Map Reduce Job. You say what interfaces must be implemented when creating a
MapReduce task, you saw how to upload data into HDFS and how to run the map reduce task. It is
important to talk about execution time for the exercise and the amount of time required to count 8
words is quite high in absolute terms. It is important to understand that Hadoop needs to start a
separate Java Virtual Machine to process each file or chunk of a file on each node of the cluster.
As such even a trivial job has some processing time which limits the possible application of
Hadoop as it can only handle bath jobs. Real time application where answers are required cant be
run on a Hadoop cluster. At the same time as the data volumes increase processing time does not
increase that much as long as there are enough processing nodes. A recent benchmark of a
Hadoop cluster saw the complete sorting of 1 terabyte of data in just over 3 minutes on 910 nodes.
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
22
3. PIG EXERCISE
3.1 Introduction to Pig
Apache Pig is a platform for analyzing large data sets that consists of a high-level language for
expressing data analysis programs, coupled with infrastructure for evaluating these programs. The
salient property of Pig programs is that their structure is amenable to substantial parallelization,
which in turns enables them to handle very large data sets.
At the present time, Pig's infrastructure layer consists of a compiler that produces sequences of
Map-Reduce programs, for which large-scale parallel implementations already exist (e.g., the
Hadoop subproject). Pig's language layer currently consists of a textual language called Pig Latin,
which has the following key properties:
Ease of programming. It is trivial to achieve parallel execution of simple, "embarrassingly parallel"
data analysis tasks. Complex tasks comprised of multiple interrelated data transformations are
explicitly encoded as data flow sequences, making them easy to write, understand, and maintain.
Optimization opportunities: The way in which tasks are encoded permits the system to optimize
their execution automatically, allowing the user to focus on semantics rather than efficiency.
Extensibility: Users can create their own functions to do special-purpose processing.
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
23
2. To get into the folder where the scripts for the first exercise are, type in the terminal:
cd /home/oracle/exercises/pig
Then press Enter
3. To get an idea of what our dividends file looks like lets look at the first couple of rows. In
the terminal type:
head NYSE_dividends
Then press Enter
The first 10 rows of the data file will be displayed on the screen
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
24
4. Now that we have an idea what our data file looks like, lets load it into the HDFS for
processing. To load the data we use the copyfromLocal function of Hadoop, go to the
terminal and type:
hadoop
dfs
/user/oracle/NYSE_dividends
Then press Enter
-copyFromLocal
NYSE_dividends
For convince you can also run the script loadData.sh and it will upload the files for you.
This is so you do not need to type in the command above.
5. We will be running our PIG script in interactive mode so we can see each step of the
process. For this we will need to open the PIG interpreter called grunt. Go the terminal and
type
pig
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
25
6. Once at the grunt shell we can start typing Pig script. The first thing we need to do is load
the datafile from HDFS into Pig for processing. The data is not actually copied but a
handler is created for the file so Pig know how to interperate the data. Go to the grunt shell
and type:
dividends = load
dividend);
Then Press Enter
'NYSE_dividends'
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
as
(exchange,
symbol,
date,
26
7. Now the data is loaded as a for column table lets see what the data looks like in PIG. Go
to the grunt shell and type:
dump dividends;
Then press Enter
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
27
You will see output similar to the first exercise on the screen. This is normal as Pig is
merarly a high level language all commands which process data simply run Map Reduce
rasks in the backgroup so the dump command simply becomes a map reduce task that is
run. This will apply to all of the command you will run in Pig. The output on the screen will
show you all of the rows of the file in tuple form
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
28
8. The first step in alalizing the data will be grouping the data by stock symbol so we have all
of the dividends of one compay grouped together. Go to the grunt shell and type:
grouped = group dividends by symbol;
Then press Enter
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
29
9. Lets go ahead and dump this grouped variable to the screen to see what its contents look
like Go to the grunt shell and type:
dump grouped;
Then press Enter
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
30
Open the screen you will see all of the groups displayed in tuple of typle form. As the
output might look a bit confusing only one tuple is hiligted in the screen shot below to help
clarity. The hilighed region show all of the rows of the table for the CATO stock symbol
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
31
10. In the next step we will go through each group tuple and get the group name and the
average dividend. Go to the grunt shell and type:
avg = foreach grouped generate group, AVG(dividends.dividend);
Then press Enter
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
32
11. Lets go ahead and see what this output looks like. Go to the grunt shell and type:
dump avg
Then press Enter
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
33
Now you can see on the screen a dump of all stock symbols with their respective average
dividend. A couple of them are hilighed in the image below
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
34
12. Now that we have the dividents for each company it would be ideal if we had them in order
from highest to lowest dividend. Lets get that list, go to the grunt shell and type:
sorted = order avg by $1 DESC;
Then press Enter
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
35
We can now see what the sorted list looks like. Go to the grunt terminal and type:
dump sorted;
Then press Enter
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
36
On the screen you now see the list sorted in descending order. On the screen are the
lowest dividens but can scroll up the see the rest of the value.
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
37
13. We now have the final results we want. It might be worth writing these results out to
HDFS. Lets do that. Go to the grunt shell and type:
store sorted into 'average_dividend';
Then press Enter
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
38
14. The new calculated data is now permanently stored in HDFS. We can now exit the grunt
shell. Go to the grunt shell and type:
quit;
Then press Enter
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
39
15. Now back at the terminal lets view the top 10 companies by average dividend directly from
HDFS. Go to the terminal and type:
hadoop dfs -cat /user/oracle/average_dividend/part-r-00000 | head
Then press Enter
For convince you can also run the script viewResults.sh and it will display the files for
you. This is so you do not need to type in the command above.
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
40
This command simply did cat on the results file available in the HDFS. The results are
seen on the screen.
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
41
16. That concludes the the Pig exercise you can now close the terminal window. Go to the
terminal and type:
exit
Then press Enter
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
42
3.4 Summary
In this exercise you saw what a pig script looks like and how to run it. It is important to understand
that pig is a scripting language which ultimately runs MapReduce jobs on a Hadoop cluster hence
all of the power of a distributed system and the high data volume / size which HDFS can
accommodate are exploitable through pig. Pig provides an easier interface to the MapReduce
infrastructure allow for scripting paradigms to be used rather than direct java coding.
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
43
4. HIVE CODING
4.1 Introduction to Hive
Hive is a data warehouse system for Hadoop that facilitates easy data summarization, ad-hoc
queries, and the analysis of large datasets stored in Hadoop compatible file systems. Hive
provides a mechanism to project structure onto this data and query the data using a SQL-like
language called HiveQL. At the same time this language also allows traditional map/reduce
programmers to plug in their custom mappers and reducers when it is inconvenient or inefficient to
express this logic in HiveQL.
2. To get into the folder where the scripts for the Hive exercise are, type in the terminal:
cd /home/oracle/exercises/hive
Then press Enter
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
44
3. We already have an idea what our data file looks like, so lets load it into the HDFS for
processing. This is done identically to the way it was done in the first two exercises. We
will see a better way to load data in the next exercise. To load the data we use the
copyfromLocal function of Hadoop. Go to the terminal and type:
hadoop dfs -copyFromLocal NYSE_dividend /user/oracle/NYSE_dividend
Then press Enter
For convince you can also run the script loadData.sh and it will display the files for you.
This is so you do not need to type in the command above.
4. Lets now enter the Hive interactive shell environment to create tables and run queries
against those tables. To give an analogy this is similar to SQL *Plus but on this
environment is specifically for the HiveQL language. To enter the environment go to the
terminal and type:
hive
Then press Enter
5. This first thing we need to do in hive is create a table. We will create a table named
dividends with four fields field called exchange, symbol, dates and dividends something
that looks very natural based on the data set. Go the terminal and type:
create table dividends(exchange
string,
dividend float)
Then press Enter
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
string,
symbol
string,
dates
45
An OK should be printed on the screen indicating the success of the operation. This OK
message will be printed for all operation but we will only mention it this time. It if left up to
the user to check for this message on future HiveQL commands.
6. We can now run a command to see all of the tables available to this OS user. Go to the
hive terminal and type:
show tables;
Then press Enter
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
46
You can see the only table currently available is the one we just created.
7. As with normal SQL you also have a describe command available to see the columns in
the table. Go the terminal and type
describe dividends;
Then press Enter
As you can see that the dividends table has the 4 fields each with their own Hive specific
data type. This is to be expected as this is the way we created the table.
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
47
8. Lets go ahead and load some data into this table. Data is loaded into hive from flat files
available in the HDFS file system. Go the terminal and type:
load
data
inpath
dividends;
Then press Enter
/user/oracle/NYSQ_dividend
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
into
table
48
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
49
Five lines from the table are printed to the screen; only 3 of the lines are highlighted in the
image below.
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
50
10. Now that we have all of the data loaded into a Hive table we can run SQL queries on the
code. As we has the same data set as in the Pig exercises lets try to extract the same
data. We will look for the top 10 companies by average dividend. Go to the terminal and
type:
select symbol, avg(dividend) avg_dividend from dividends group by
symbol order by avg_dividend desc limit 10;
Then press Enter
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
51
On the screen you will see a lot of log information scroll through. Most of this is generated
by Hadoop as Hive (just like Pig) takes the queries you write and rewrites them as
MapReduce jobs then executes them. The query we wrote can take full advantage of the
distributed computational power of Hadoop as well as the striping and parallelism that
HDFS enables.
When the query is done you should see on the screen the top 10 companies in
descending order. This output shows the exact same information as we got with the
previous exercises. As the old idiom goes there is more than one way to skin a cat. As with
hadoop there is always more than one way to achieve any task.
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
52
11. This is the end of the Hive exercise. You can now exit the hive interpreter. Go to the
terminal and type:
exit;
Then press Enter
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
53
12. Then close the terminal. Go the the terminal and type:
exit
Then press Enter
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
54
4.4 Summary
In this exercise you were introduces to the Hive Query Language. You saw how to create
and view tables using the HiveQL. Once tables were created you were introduced to the JSON
native interface as well as some of standard SQL constructs which HiveQL has available. It is
important to understand that Hive is an abstraction layer for Hadoop and MapReduce jobs. All
queries written in HiveQL get transformed into a DAG (Directed Acyclic Graph) of MapReduce
tasks which are then run on the Hadoop cluster, hence taking advantage of all performance,
scalability capabilities, but also maintain all of the limitations of Hadoop.
HiveQL has most of the functionality available with standard SQL, have s series of DDL and DML
functions implemented, but at the same time it does not strictly adhere to SQL-92 standard HiveQL
offers extensions not in SQL, including multitable inserts and create table as select, but only offers
basic support for indexing. Also, HiveQL lacks support for transactions and materialized views, and
only limited subquery support. It is intended for long running queries of a Data Warehousing type
rather than a transactional OLTP type of data load.
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
55
Loading data into Hadoop from the local file system and HDFS.
Loading processed data from Hadoop to Oracle Database for further processing and
generating reports.
Typical processing in Hadoop includes data validation and transformations that are programmed
as MapReduce jobs. Designing and implementing a MapReduce job requires expert programming
knowledge. However, using Oracle Data Integrator and the Oracle Data Integrator Application
Adapter for Hadoop, you do not need to write MapReduce jobs. Oracle Data Integrator uses Hive
and the Hive Query Language (HiveQL), a SQL-like language for implementing MapReduce jobs.
The Oracle Data Integrator graphical user interface enhancing the developer's experience and
productivity while enabling them to create Hadoop integrations.
When implementing a big data processing scenario, the first step is to load the data into Hadoop.
The data source is typically in the local file system, HDFS, Hive tables, or external Hive tables.
After the data is loaded, you can validate and transform the data using HiveQL like you do in SQL.
You can perform data validation such as checking for NULLS and primary keys, and
transformations such as filtering, aggregations, set operations, and derived tables. You can also
include customized procedural snippets (scripts) for processing the data.
When the data has been aggregated, condensed, or crunched down, you can load it into Oracle
Database for further processing and analysis. Oracle Loader for Hadoop is recommended for
optimal loading into Oracle Database.
Knowledge Modules:
KM Name
Description
IKM File To
Hive (Load
Data)
Loads data from local and HDFS files into Hive tables. It File
provides options for better performance through Hive System
partitioning and fewer data movements.
Hive
IKM Hive
Control
Append
Hive
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
Source
Target
56
KM Name
Description
Source
Target
IKM Hive
Transform
Integrates data into a Hive target table after the data Hive
has been transformed by a customized script such as
Perl or Python.
IKM File-Hive
to Oracle
(OLH)
CKM Hive
NA
Hive
RKM Hive
Hive
Metadata
NA
Hive
2. To get into the folder where the scripts for the first exercise are, type in the terminal:
cd /home/oracle/exercises/odi
Then press Enter
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
57
3. Next we need to run a script to setup the environment for the next exercise. We will be
loading the same data as in the previous exercise (the dividens table) only this time we will
be ODI to perform this task. For this we need to drop that table and recreate it so it is
empty for the import. Also we need to start the hive server to enable ODI to communicate
with Hive. We have a script which will perform both tasks. Go to the terminal and type:
./setup.sh
Then press Enter
4. Next we need to start Oracle Data Integrator. Go to the terminal and type
./startODI
Then press Enter
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
58
5. Once ODI opens we need to connect to the repository. In the right upper corner of the
screen screen click on the Connect To Repository
6. In the dialog that pops up all of the connection details should already be configured.
Login Name: DEFAULT_LOGIN1
User: SUPERVISOR
Password: Welcome1
If all of the data is entered correctly you can simple click OK
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
59
7. Once you login make sure you are on the Designer Tab. Near the top of the screen on the
left side click on Designer.
8. Near the bottom of the screen on the left side there is a Models tab click on it.
You will notice that we have already created a File, Hive, and Oracle mode for you. These
were pre-created as to reduce the number of steps in the exercise. For details on how use
flat files and Oracle database with ODI please see the excellent ODI tutorials offered by
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
60
the
Oracle
by
Example
Tutorials
found
http://www.oracle.com/technetwork/tutorials/index.html.
at
9. The first feature of ODI we would like to show involved reverse engineering a data store.
The reverse engineering function takes a data store and finds all of the table and their
structure automatically. In the Models tab on the left side of the screen there Hive is a
model. Lets click on the + to expand it out.
10. You will notice there is no information about the data that is stored in that particular
location. Right click on the Hive folder and select Reverse Engineer
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
61
11. You will see two items appear in the Hive Folder called dividends and dividends2. It is the
tables we created in Hive. You can click on the + beside dividends to see some more
information.
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
62
12. You can also expand the Columns folder to see all of the column in this particular table.
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
63
This is the power of the Hive Reverse Engineering Knowledge Module (RKM) integrated in
Oracle Data Integrator. Once you define a data store (in our case a hive source) the RKM
will automatically discovery all tables and their corresponding columns available at that
source. Once a data model is created there is no need to rewrite it in ODI. ODI will
automatically discover that model for you to be able to get straight to the development of
the data movement.
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
64
2. In the new window that opened up on the right side of the screen enter the following
information:
Name: Hadoop
Code: HADOOP
Then click on the Save All in the right upper corner of the screen.
3. In the left hand menu in the Projects section a new item appeared called Hadoop. Click on
the + to expand it out
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
65
4. Next to the folder called First Fold er there is another + expand out that folder as well by
clicking the +
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
66
6. We can now start to define the new interface. In this interface we will map out the columns
in the text file and move the data into the hive table. To start out lets give the interface a
name. In the new tab that opened on the right side of the screen type in the following
information.
Name: File_To_Hive
7. Next we need to move to the mapping tab of the File_To_Hive interface. Click on Mapping
at the bottom of the screen.
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
67
8. We now need to define the sources and target data stores. On the left bottom of the
screen in the Models Section expand the File folder by clicking on the + beside it.
9. Now we can drag and drop the Dividends table from the File model into the source
section of the interface.
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
68
10. Next we will drag and drop the dividends Hive table into the target section of the interface.
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
69
11. A pop up window will appear which will ask if you would like to create automatic mapping.
This will try to automatically match source columns with target columns based on column
name. Click on Yes to see what happens.
12. By name it was able to match of the columns. The mapping is now complete. Lets go back
to the Overview tab to setup one last thing. Click on the Overview tab on the left side of
the screen.
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
70
In the definitions tab Tick the box Staging Area Different From Target.
13. A drop down menu below the tick now gets activated. Select File: Comments File
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
71
14. We can now click on the flow tab at the bottom of the screen to see what the interface will
look like.
15. On the screen in the top right box you will see a diagram of the data flow. Lets see all of
the options for the integration. Click on the Target(Hive Server) header.
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
72
16. At the bottom of the screen a new window appeared, a Property Inspector. There you can
inspect and modify the configuration of the integration process. Let change one of the
properties. We dont need a staging table so lets disable it. Set the following options:
USE_STAGING_TABLE: false
Lets now execute this Interface. Click on the Execute button at the top of the screen.
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
73
17. You will be asked to save your interface before running it. Click Yes
18. Next you will be asked for the Execution options. Here you can choose agents contexts
and other elements. You can just accept the default options and click OK
19. An informational screen will pop up telling you the session has started. Click OK
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
74
20. We will now check if the execution of the interface was successful. In the left menu click
on the Operator Tab
21. In the menu on the left side make sure the Date tab is expanded. Then expand the Today
folder
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
75
You have now successfully moved data from a flat file into a hive table without touching
the terminal. All of the data was moved without cumbersome command line interface and
allowing for the use of all of the powerful functionality of a powerful ETL tool.
22. You can now move back to the Desinger tab in the left menu and close all of the open tabs
on the right side menu
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
76
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
77
2. On the right side of the screen a new window pops up. Enter the following name for the
Interface
Name: Hive_To_Oracle
3. Next at the bottom of the screen we will need to move the mapping tab to setup the
interface. At the bottom of the screen click on Mapping
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
78
4. To give up more viewing space lets clean up the Models tab in the left bottom part of the
screen. Minimize the File Tab and the dividends table to give up more viewing space.
5. In the same tab (the Models tab) we now see the Oracle folder. Lets expand that out as
we will need to Oracle tables as our target. Click on the + beside the Oracle folder
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
79
6. We can now drag and drop the Hive dividends table into the sources window
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
80
7. Similarly you can drag the Oracle DIVIDENDS tables into the destination window.
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
81
8. As before you will be asked if you would like to perform automatic mapping. Click on Yes
STOCK
DATES
DIVIDEND
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
82
10. One of the advantages of an ETL tool can be seen when doing transformations during the
data movement. Lets concatenate the exchange and symbol into one string and load that
into the STOCK column in the database. Go to the Property Inspector screen of the
STOCK column buy click on it in the targets window
11. The property inspector window should open at the bottom of the screen. In the
implementation edit box type the following
concat(DIVIDENDS.exchange, DIVIDENDS.symbol)
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
83
12. The transformation is now setup. Lets now go back to the Overview Tab to configure the
staging area. Click on the Overview tab
13. In the definitions tab Tick the box Staging Area Different From Target.
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
84
14. A drop down menu below the tick now gets activated. Select Hive: Hive Store
15. We are now ready to run the interface. To run the interface go to the left upper corner of
the screen and click the execute button
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
85
16. A window will pop up telling you a save is required before the execution can continue. Just
click Yes
17. Another window will pop pop asking you to configure the Execution Context and Agent.
The default options are fine just click OK
18. A final window will pop up tell you the session has started. Click OK
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
86
19. Lets now go to the Operator tab to check if the execution was successful. In the top left
corner of the screen click on the Operator tab
20. When you get to the Operator Tab you might see a lightning bolt beside the
Hive_To_Oracle execution. This means integration is executing wait for a little bit until the
checkmark appears.
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
87
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
88
23. You can now see all of the steps taken to perform this particular mapping. Lets investigate
further the forth step in the process. Double click on 4 Integration Hive_To_Oracle
Create Hive staging table to open up a window with its details.
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
89
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
90
In this widow you will see exactly what code was run. If an error occurs this information
becomes quite useful in debugging your transformations.
25. To check that data that is now in the oracle database go back to the designer Tab, by
going to the left upper corner of the screen and clicking on Designer
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
91
26. Then in the Models Section at the left bottom of the screen right click on the table LOGS in
the Oracle folder and select View Data
On the left side of the screen a new window will pop up with all of the data inside that table
of the oracle database.
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
92
27. We can now go ahead and close all of the open windows in the right side of the screen to
prepare for the next exercise.
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
93
2. As before lets give the interface a name. In the new tab that opened on the right side of
the screen type in the following information.
Name: Hive_To_Hive
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
94
4. We will first drag the Hive dividends table as our source window on the right
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
95
5. Next we will drag the dividends2 table into the target window on the right
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
96
6. You will be asked if you would like to perform auto mapping. Just click Yes
7. All of the mapping auto complete without a problem. We now need to specify the
Integration Knowledge Modules (IKM) which will be used to perform the integration. In ODI
an IKM is the engine which has all of the code templates for the integration process; hence
without an appropriate IKM the integration is now possible. In the previous section there
was only one appropriate IKM hence it was auto chosen for use. In this case there are
multiple possible IKMs so we need select one. In the left upper corner of the screen in the
Designer window right click on Knowledge Modules and select Import Knowledge
Modules.
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
97
8. A window will pop up which will allow you to import Knowledge Modules. First we need to
specify the folder in which the Knowledge Modules are stored. Fill in the following
information.
File import directory: /u01/ODI/oracledi/xml-reference
Then Press Enter
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
98
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
99
11. Lets now add a constraint to the target tables to see what happens
during the data movement. In the left bottom part of the screen in the models window
expand out the dividends2 store buy pressing the + beside it.
12. In the subsections that appear in the dividends2 you will see a section called Constraints.
Right click on it and select New Condition.
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
100
13. On the right side a new window will open allowing you to define the
properties of this condition. We will set a check condition which will check if the dividends
value is too low. Enter the following information.
Name: Low_Dividend
Type: Oracle Data Integrator Condition
Where: dividends2.dividend>0.01
Message: Dividend Too Low
14. We now need to save our constraint. In the top right corner of the screen click on the Save
button
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
101
15. We are now ready to run the Interface. In the top right section of the screen click back to
our interface by click on the Hive_To_Hive tab.
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
102
16. Now at the top of the screen we can click the Play button to run the interface
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
103
17. A new window pop up saying you need to save all of the changes before the interface can
be run. Just click Yes
18. A new window will pop up asking for the execution context, just click OK
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
104
19. An informational pop will show up telling you the execution has started. Simply click OK
20. It is now time to check our constraint. In the left bottom part of the screen (in the Models
section) right click on the dividends2 model then to to the Control section and click on
Check
21. This check is its own job that must be run; hence a window will pop up asking you to select
a context for the execution. The default option are good so just click OK
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
105
22. An informational window pops up telling to the execution has started. Just click OK
23. We can now see all of the rows that failed our check. Again in the right bottom part of the
screen (in the Models section) right click on the dividends2 model go to the Control menu
and select Errors
A now tab will pop up on the right side of the screen. You will see of the columns which did
not pass the constraint.
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
106
24. This concludes the ODI section of the workshop. Go to the right upper corner of the screen
and click the X to close ODI.
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
107
5.7 Summary
In this exercise you were introduced the Oracles integration of ODI with Hadoop. It is
worthy to note this integration with ODI is only available for the oracle database and only available
from Oracle. It is a custom extension for ODI developed by Oracle to allow users how already have
ETL as part of the Data Warehousing methodologies to continue using the same tools and
procedures with the new Hadoop technologies.
It is quite important to note that ODI is a very powerful ETL tool which can offer all of the
functionality typically found in an enterprise quality ETL. Although the examples given in this
exercise are quite simple this does not mean the integration of ODI and Hadoop is. All of the power
and functionality of ODI is available when working with Hadoop. Workflow definition, complex
transforms, flow control, multi source just to name a few of the functionalities of the ODI and
inherently feature that can be used with Hadoop.
Through this exercise you were introduced to three Knowledge Modules of ODI. Reverse
Integration for Hive, Integration into hive and Integration from hive to Oracle. These are not the
only knowledge modules available, and we encourage you to review the table available in section
5.2 of this document to get a better idea of all the functionality currently available.
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
108
Access CSV files and Data Pump files generated by Oracle Loader for Hadoop
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
109
2. To get into the folder where the scripts for the external tables
exercise are, type in the terminal:
cd /home/oracle/exercises/external
Then press Enter
3. This first step in this exercise is to create some random files. This is just so we have some
data in Hadoop to load as an external file. We will create three files called sales1, sales2
and sales3 with a single row comprised of 3 numbers in each file. To create the files go to
the terminal and type:
./createFiles.sh
Then press Enter
4. Next we will load these files in HDFS. We have a script for that processes as well. Go to
the terminal and type:
./loadFiles.sh
Then press Enter
5. Next we will need to create the external table in Oracle. As the SQL code is quite long we
have written a script with that code. This being quite important lets look at what that code
looks like. In the terminal type:
gedit createTable.sh
Then press Enter
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
110
Looking at the code for creating the table you will notice very similar syntax for other types
of external tables except for 2 line; the preprocessor and type highlighted in the image
below
6. When you are done evaluating the code you can close the window by clicking the X in the
right upper corner of the window
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
111
7. Lets go ahead now and run that piece of code. In the terminal type:
./createTable.sh
Then
press
Enter
8. Now that the table is created we need to connect that table with the files we loaded into
HDFS. To make this connection we must run a Hadoop job which calls oracle loader code.
Go to the terminal and type:
./connectTable.sh
Then press Enter
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
112
9. You will be asked to enter a password for the code to be able to login to the database user.
Enter the following information
[Enter Database Password:]: tiger
Then Press Enter
NOTE: No text will appear while you type
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
113
10. We can now use SQL from oracle to read those files in HDFS. Lets experiment with that.
First we connect to the database using SQL* Plus. Go to the terminal and type:
sqlplus scott/tiger
Then press Enter
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
114
11. Now lets query that data. Go to the terminal and type:
select * from sales_hdfs_ext_tab;
Then press Enter
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
115
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
116
12. This concludes this exercise. You can now exit SQL* Plus. Go to the terminal and type:
exit;
Then press Enter
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
117
13. Then close the terminal. Go the the terminal and type:
exit
Then press Enter
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
118
6.4 Summary
In this chapter we show how data in HDFS can be queried using standard SQL right from the
oracle database. With the data stored in HDFS all of the parallelism and striping that would
naturally occur is taken full advantage of while at the same time you can use all of the power and
functionality of the Oracle Database.
When implementing this method in interaction with data parallel processing is extremely important
when working with large volumes of data. When using external tables, consider enable parallel
query with this SQL command:
ALTER SESSION ENABLE PARALLEL QUERY;
Before loading data into Oracle Database from the external files created by Oracle Direct
Connector, enable parallel DDL:
ALTER SESSION ENABLE PARALLEL DDL;
Before inserting data into an existing database table, enable parallel DML with this SQL command:
ALTER SESSION ENABLE PARALLEL DML;
Hints such as APPEND and PQ_DISTRIBUTE also improve performance when inserting data.
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
119
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
120
2. To get into the folder where the scripts for the Hive exercise are, type in the terminal:
cd /home/oracle/exercises/mahout
Then press Enter
3. To get an idea of what our forum file looks like lets look at the first couple of rows. In the
terminal type:
head n 1 synthetic_control.data
Then press Enter
As you can see on the screen all there is in the file are random data points. It is within the
field of that we would like to find patterns.
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
121
4. The first step in analyzing this data is loading it into the HDFS. Lets go ahead and do that.
Go to the terminal and type:
./loadData.sh
Then press Enter
5. Now that the data is loaded we can run mahout against the data. This is an example we
are running were the data is already in vector form and a distance function has already
been compiled into the example. When clustering your own data, the command line for
running the clustering should include the distance function written and compiled in java.
Go to the terminal and type:
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
122
This would be an excellent time to get a cup of coffee. The clustering is quite
computationally intensive and execution should take a couple of minutes to execute.
6. Once you get back the command prompt the clustering is done, but the results are stored
in binary format inside Hadoop. We need to first bring all of the results out of Hadoop and
then convert the data from binary format to text format. We have a script which will
perform both tasks. Lets run that script go to the terminal and type:
./extractData.sh
Then press Enter
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
123
7. We can how go ahead and look at the results of the clustering. We will look at the text
output of the results. Go to the terminal and type:
gedit Clusters
Then press Enter
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
124
The output is not very user friendly but there are several indicators to look for as followed:
n= the number of clusters
c= the centers of each one of the clusters
r= the radius of the circle which defines the cluster
Points= the data points in each cluster
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
125
8. Once you are done evaluating the results you can click the X in the right upper corner of
the screen to close the window.
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
126
9. Despite the highlighting data points are not very representative. Mahout also has some
graphing functions for simple data points. We will run a much simpler clustering with points
that can be displayed on a X,Y plane to visually see the results. Go to the terminal and
type:
./displayClusters
Then press Enter
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
127
A new window will pop up with a visual display of a K-means cluster. The black squares
represent data points the red circles define the clusters. The yellow and green lines
represent the error margin for each cluster.
10. Once you are done evaluating the image you can click the X in the right upper corner of
the window to close it.
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
128
11. This concludes our mahout exercise. You can now close the terminal window. Go to the
terminal and type:
exit
Then press Enter
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
129
7.4 Summary
In this exercise you were introduced to the K-mean clustering algorithm and how to run the
algorithm using mahout and inherently on a Hadoop cluster. It is important to note that Mahout
does not only focus on k-mean but also has many different algorithms in the categories of
Clustering, Classification, Pattern Mining, Regression, Dimension reduction, Evolutionary
Algorithms, Recommendation/ Collaboration Filtering and Vector Similarity. Most of these
algorithms have special variants which are optimized to run on a massively distributed
infrastructure (Hadoop) to allow for rapid results on very large data sets.
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
130
8. PROGRAMMING WITH R
8.1 Introduction to Enterprise R
R is a language and environment for statistical computing and graphics. It is a GNU project which
is similar to the S language and environment which was developed at Bell Laboratories (formerly
AT&T, now Lucent Technologies) by John Chambers and colleagues. R can be considered as a
different implementation of S. There are some important differences, but much code written for S
runs unaltered under R.
R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests,
time-series analysis, classification, clustering, etc.) and graphical techniques, and is highly
extensible. The S language is often the vehicle of choice for research in statistical methodology,
and R provides an Open Source route to participation in that activity.
One of R's strengths is the ease with which well-designed publication-quality plots can be
produced, including mathematical symbols and formulae where needed. Great care has been
taken over the defaults for the minor design choices in graphics, but the user retains full control.
Oracle R Enterprise integrates the open-source R statistical environment and language with
Oracle Database 11g, Exadata, Big Data Appliance, and Hadoop massively scalable computing.
Oracle R Enterprise delivers enterprise-level advanced analytics based on the R environment.
Oracle R Enterprise allows analysts and statisticians to leverage existing R applications and use
the R client directly against data stored in Oracle Database 11gvastly increasing scalability,
performance and security. The combination of Oracle Database 11g and R delivers an enterpriseready, deeply integrated environment for advanced analytics. Data analysts can also take
advantage of analytical sandboxes, where they can analyze data and develop R scripts for
deployment while results stay managed inside Oracle Database.
As an embedded component of the RDBMS, Oracle R Enterprise eliminates Rs memory
constraints since it can work on data directly in the database. Oracle R Enterprise leverages
Oracles in-database analytics and scales R for high-performance in Exadata and the Big Data
Appliance. Being part of the Oracle ecosystem, ORE enables execution of R scripts in the
database to support enterprise production applications and OBIEE dashboards, both for structured
results and graphics. Since its R, were able to leverage the latest R algorithms and contributed
packages.
Oracle R Enterprise users not only can build models using any of the data mining algorithms in the
CRAN task view for machine learning, but also leverage in-database implementations for
predictions (e.g., stepwise regression, GLM, SVM), attribute selection, clustering, feature
extraction via non-negative matrix factorization, association rules, and anomaly detection.
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
131
1.
2.
3.
4.
5.
6.
2. To get into the folder where the scripts for the R exercises are, type in the terminal:
cd /home/oracle/exercises/R
Then press Enter
3. To work with R you can write scripts for the interpreter to execute or you can use the
interactive shell environment. To get a more hands on experience with R we will use the
interactive shell. To start the interactive shell go to the terminal and type:
R
Then press Enter
4. During the login process many different library load which extend functionality of R. If a
particular library is not loaded automatically one can load it manually after login. We will
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
132
need to load a library to interface with HDFS so lets load that now.
Go to the R shell and type:
library(ORHC)
Then press Enter
5. Now lets go ahead and generate some pseudo random data points so have some data to
play with. We will generate 2D data points so we can easily visualize the data. Go to the R
terminal and type:
myDataPoints=rbind(matrix(rnorm(100, mean=0, sd=0.3),ncol=2)
,matrix(rnorm(100, mean=1, sd=0.3), ncol=2))
Then press Enter
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
133
Now the variable myDataPoints will have some data points in it.
6. To be able to save data into the database or HDFS you need to have the data in columns
(as we already do) and you also need to have each of the the columns labeled. This is
because column names are required within a database to be able to identify the columns.
Lets go ahead and label the columns x and y. Go to the R terminal and type:
colnames(myDataPoints) <- c(x, y)
Then press Enter
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
134
7. We can now create a data frame which will load the data into the Oracle Database. Go to
the terminal and type:
ore.create(as.data.frame(myDataPoints, optional = TRUE),
table=DATA_POINTS)
Then press Enter
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
135
8. If required we can even load this data into HDFS. Lets go ahead and do that. Go to the R
terminal and type:
hdfs.put(DATA_POINTS,dfs.name=data_points)
Then press Enter
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
136
9. Now that we have loaded the data into both the database and HDFS lets exit R and look at
that data. Go to the R shell and type:
q()
Then press Enter
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
137
10. You will be asked if you want to save workspace image. Type:
n
Then press Enter
Note: when typing n the information typed does not appear on the screen.
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
138
11. At this point all data and calculated data would be wiped from the memory and hence lost
in class R. With R Enterprise Edition we saved our data in the database, so lets go and
query that data. Go to the terminal and type:
./queryDB.sh
Then press Enter
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
139
On the screen you will see the table displayed which contains our data points.
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
140
12. We can also look at the data we stored inside HDFS. Go to the terminal and type:
./queryHDFS.sh
Then press Enter
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
141
Again on the screen you will see all of the data points displayed.
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
142
As you can see all of the work done in R can now be exported to the database or HDFS
for further processing based on business needs.
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
143
2. Lets now go ahead and load the data from the Oracle database. Go to the R shell and
type:
myData=ore.pull(DATA_POINTS)
Then press Enter
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
144
3. Now that we have our data inside the database we can manipulate the data. Let do kmeans clustering on the data. Go the the R shell and type:
cl <- kmeans(myData, 2)
Then Press Enter
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
145
4. The clustering is now done, but displaying the data in text format it not very interesting.
Lets graph the data. Go to the R terminal and type:
plot(myData, col = cl$cluster)
Then press Enter
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
146
5. A new window pops up with the data. The two color (red and black) differentiate the two
clusters we asked the algorithm to find. We can even see where the cluster centers are.
Go back to the R shell. The terminal might hidden behind the graph move the windows
around until you find the terminal then type:
points(cl$centers, col=1:2, pch = 8, cex=2)
Then press Enter
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
147
When you go back to the graph you will see the centers marked with a * and points
marked with circles, data clustered for raw random data using the K-means algorithm.
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
148
6. When you are done evaluating the image you can click on the X in the right upper corner
of the window.
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
149
7. You can also close the R terminal by going the R shell and typing:
q()
Then press Enter
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
150
8. When asked if you want to save workspace image go to the terminal and type:
n
Then Press Enter
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
151
9. This concludes this exercise. You can now go ahead and close the terminal. Go to the
terminal and type:
exit
Then press Enter
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
152
8.5 Summary
In this exercise you were introduced to the R programming language and how to do clustering
using the programming language. You also saw one of the advantages of Oracle R enterprise
Edition where you can save your results into the Oracle database as well as extract data from the
database for further calculations. Oracle R Enterprise edition also has a small set of function which
can be run on data in the database directly in the database. This enables the user to use very
large data sets which would not if into the normal memory of R.
Oracle R Enterprise provides these collections of functions:
ore.corr
ore.crosstab
ore.extend
ore.freq
ore.rank
ore.sort
ore.summary
ore.univariate
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
153
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
154
2. To get into the folder where the scripts for the R exercises are, type in the terminal:
cd /home/oracle/exercises/noSQL
Then press Enter
3. Before we do anything with the noSQL database we must first start it. So let go ahead and
do that. Go to the terminal and type:
./startNoSQL.sh
Then press Enter
4. To check if the database is up and running we can do a ping on the database. Lets do
that. Go to the terminal and type:
./pingNoSQL.sh
Then press Enter
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
155
You will see Status: RUNNING displayed within the text. This show the database is
running.
5. Oracle NoSQL database uses a Java interface to interact with the data. This is a dedicated
java API which will let you insert update delete and query data in the Key Value store
that is the NoSQL database. Lets look at a very simple example of java code where we
insert a Key-Value pair into the database and then retrieve it. Go to the terminal and type:
gedit Hello.java
Then press Enter
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
156
A new window will pop up with the code. In this code there are a couple of thing to be
noted. We see the config variable which holds our connection string and the store variable
which is our connection factory to the database. They are the initiation variable for the
Key-Value Store and are highlighted in yellow. Next we define 2 variable of type Key and
Value, they will serve as our payload to be inserted. These are highlighted in green. Next
we have highlighted in purple the actual insert command. Highlighted in blue is the retrieve
command for getting data out of the database.
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
157
6. When you are done evaluating the code press the X in the right upper corner of the
window to close it.
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
158
7. Lets go ahead and compile that code. Go to the terminal and type:
javac Hello.java
Then press Enter
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
159
8. Now that the code is complied lets run it. Go to the terminal and type:
java Hello
Then press Enter
You will see printed on the screen Hello Big Data World which is the key, and the value we
inserted in the database.
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
160
9. Oracle NoSQL database has the possibility of having a major and minor component to the
key. This feature can be very useful when trying to group and retrieve multiple items at the
same time from the database. In the next code we have 2 major components to the key
(Mike and Dave) and each major component has a minor component (Question and
Answer). We will insert a value for each key but we will use a multiget function to retrieve
all of the values regardless of the minor component of the key for Mike and completely
ignore Dave. Lets see what that code looks like. Go to the terminal and type:
gedit Keys.java
Then press Enter
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
161
10. A new window will pop up with the code. If you scroll to the bottom with will remark the
following piece of code. Highlighted in purple as the insertion calls which will add the data
to the database. The retrieval of multiple records is highlighted in blue, and the green
shows the display of the retrieved data. Do note there were 4 Key-Value pairs inserted in
to the database.
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
162
11. When you are done evaluating the code press the X in the right upper corner of the
window to close it.
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
163
12. Let go ahead and compile that code. Go to the terminal and type:
javac Keys.java
Then press Enter
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
164
13. Now that the code is complied lets run it. Go to the terminal and type:
java Keys
Then press Enter
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
165
You will see the 2 values that are stored under the Mike major key displayed on the
screen, and no data points for Dave major key. Major and minor parts of the key can
actually be composed of multiple string and further filtering can be done. This is left up the
participants to experiment with.
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
166
14. The potential of a Key-Value store grows significantly when integrated with the power of a
Hadoop and distributed computing. Oracle NoSQL database can be used as a source and
target for the data used by and produced by NoSQL. Lets look at a modified example of
word count run in Hadoop only this time we will count the number of Values under the
Major component of the key in the NoSQL database. To see the code go to the terminal
and type:
gedit Hadoop.java
Then press Enter
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
167
The code see is very similar to the Word Count seen in the first section of the workshops.
There are only difference 2 differences. The first (highlighted in yellow) is the retrieval of
data from the NoSQL database rather than a flat file.
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
168
The second can be seen of scroll down into the run function and
notice the InputFormatClass is now KVInputFormat
15. When you are done evaluating the code press the X in the right upper corner of the
window to close it.
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
169
16. Lets go ahead and run that code. We will need to go through the entire procedure of the
first exercise where we compile the code, create a jar then execute it on the Hadoop
cluster. We have written a script which will do all of that for us. Lets run that script, go to
the terminal and type
./runHadoop.sh
Then press Enter
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
170
17. You will see a Hadoop job being executed with all of the terminal display it comes with.
Once the execution is done it is time to see results. We will just cat the results directly from
HDFS. Go to the terminal and type
./viewHadoop.sh
Then press Enter
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
171
You will see, displayed on the screen a word count based on the major component of keys
in the NoSQL database. In the previous exercises we inserted 2 pieces of data under the
major key Dave and Mike. We also inserted a hello key for the first exercise. This is
exactly the data the word count displays.
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
172
18. That concludes our exercises on NoSQL database. It is time to shutdown our NoSQL
database. Go to the terminal and type:
./stopNoSQL
Then press Enter
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
173
19. We can now close our terminal window. Go to the terminal and type:
exit
Then press Enter
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
174
9.4 Summary
In this exercise you were introduced to Oracles NoSQL database. You saw how to insert and
retrieve key-value pairs as well as the mutiget function where multiple values could be retrieved
under the same major component of a key. The last example showed how a NoSQL database can
be used as a source for a Hadoop job and how the two technologies can be integrated.
It is important to note here the differences between the NoSQL database and a traditional RDBMS.
With relational data the queries performed are much more powerful and more complex while
NoSQL simply stores and retrieves values from a specific key. Given that simplicity in NoSQL
storage type, it has a significant performance and scaling advantage. A NoSQL database can store
petabytes worth of information in a distributed cluster and still maintain very good performance on
data interaction at a much low cost per megabyte of data. NoSQL has many uses and has been
implemented successfully in many different circumstances but at the same time it does not mimic
or replace the use of a traditional RDBMS.
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
175
APPENDIX A
A.1 Setup of a Hive Data Store
1. Once we are connected to ODI we need to setup our models; the logical and physical
definition of our data sources and targets. To start off, at the top of the screen click on
Topology.
2. Next in the left menu make sure you are on the Physical Architecture tab and expand the
Technologies list
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
176
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
177
4. In this folder we need to create a new Data Server. Right click on the
Hive Technology and select New Data Server
5. A new tab will open on the right side of the screen. Here You can define all of the
properties of this data server. Enter the following details:
Name: Hive Server
Then click on the JDBC tab in the left menu
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
178
6. On the right of the JDBC Driver field click on the Magnifying Glass to select the JDBC
Driver
7. A new Window will pop up which will allow you to select from a list of drivers. Click on the
Down Array to see the list
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
179
8. For the list that appears select Apache Hive JDBC Driver.
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
180
11. We need to set some Hive specific variable. On the menu on the left go now to the tab
Flexfields
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
181
12. It the Flexfields tab uncheck the Default check box and write the following information:
Value: thrift://localhost:10000
Dont forget to press Enter when done typing to set the variable
13. It is now time to test to ensure we set everything up correctly. In the left upper corner of
the right windows click on Test Connection
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
182
14. A window will pop up asking if you would like to save you data before testing. Click OK
15. An informational message will pop up asking to register a physical schema. We can ignore
this message as that will be our next step. Just click OK
16. You need to select an agent to use for the test. Leave the default
Physical Agent: Local(No Agent)
Then click Test
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
183
If any other message is displayed please ask for assistance to debug. It is critical for the
entirety of this exercise this connection is fully functional.
18. Now in the menu on the left side of the screen, in the Hive folder, there should now be a
Physical server created called Hive Server. Right click on it and select New Physical
Schema.
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
184
19. A new tab will again open on the right side of the screen to enable you to define the details
of the Physical Schema. Enter the following details.
Schema (Schema): default
Schema (Work Schema): default
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
185
20. Then click Save All in the left upper part of the screen
21. A warning will appear about No Context specified. This again will be the next step we
undertake. Just click OK
22. We now need to expand the Logical Architecture tab in the left menu. Toward the left
bottom of the screen you will see Logical Architecture tab click on it.
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
186
23. In the Logical Architecture tab you will need to again find the Hive folder and click on the +
to expand it.
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
187
24. Now to create the logical store, right click on the Hive Folder and select New Logical
Schema.
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
188
25. In the new window that open on the right of the screen enter the following information:
Name: Hive Store
Context: Global
Physical Schemas: Hive Server.default
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
189
26. This should setup the Hive data store to enable us to move data into and out of Hive with
ODI. We now need to save all of the changes we made. In the left upper corner of the
screen click on the Save All button.
27. We can close all of the tabs we have opened on the right side of the screen. This will help
in reducing the clutter. Click on the X for all of the windows.
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
190
We would theoretically need to repeat steps 7 29 for each of the different type of data
store. As the procedure is almost the same a flat file source and an Oracle database target
have already been setup for you. This is to reduce the number of steps in this exercise.
For details on how use flat files and Oracle database with ODI please see the excellent
ODI tutorials offered by the Oracle by Example Tutorials found at
http://www.oracle.com/technetwork/tutorials/index.html.
28. We now need to go to the Designer Tab in the left menu to perform the rest of our
exercise. Near the top of the screen on the left side click on the Designer tab.
29. Near the bottom of the screen on the left side there is a Models tab click on it.
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
191
30. You will notice there is already a File and Oracle mode created for you. These were precreated as per the note at step 29. Lets now create a model for the Hive data store we
just created. In the middle of the screen in the right panel there is a folder icon next to the
work Models. Click on the Folder icon and select New Model
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
192
31. In the new tab that appears on the right side enter the following information:
Name: Hive
Code: HIVE
Technology: Hive
Logical Schema: Hive Store
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
193
32. We can now go up the left upper corner of the screen and save this Model by clicking on
the Save All icon.
http://www.oracle-developer-days.com
Copyright 2012, Oracle and/or its affiliates. All rights reserved
194