IN 961 BigDataTrialSandboxforHortonworksInstallandConfig PDF

Informatica Big Data Trial Sandbox for
Hortonworks Quick Start
2014 Informatica Corporation. No part of this document may be reproduced or transmitted in any form, by any
means (electronic, photocopying, recording or otherwise) without prior consent of Informatica Corporation. All other
company and product names may be trade names or trademarks of their respective owners and/or copyrighted
materials of such owners.
Abstract
This document describes how to use Informatica Big Data Edition Sandbox for Hortonworks to run sample mappings
based on common big data uses cases. After you understand the sample big data use cases, you can create and run
your own big data mappings.
Supported Versions
Informatica 9.6.1 HotFix 1
Table of Contents
Installation and Configuration Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Step 1. Download the Software. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Download and Install VMWare Player. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Register at Informatica Marketplace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Download the Big Data Trial Sandbox for Hortonworks Files. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Step 2. Start the Big Data Trial Sandbox for Hortonworks Virtual Machine. . . . . . . . . . . . . . . . . . . . . . . . 4
Step 3. Configure and Install the Big Data Trial Sandbox for Hortonworks Client. . . . . . . . . . . . . . . . . . . . 4
Configure the Domain Properties on the Windows Machine. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Configure a Static IP Address on the Windows Machine. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Install the Big Data Trial Sandbox for Hortonworks Client. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Step 4. Access the Big Data Trial Sandbox for Hortonworks Sandbox. . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Apache Ambari. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Informatica Administrator. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Informatica Developer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Big Data Trial Sandbox for Hortonworks Samples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Running Common Tutorial Mappings on Hadoop. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Performing Data Discovery on Hadoop. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Performing Data Warehouse Optimization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Processing Complex Files. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Reading and Parsing Complex Files. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Writing to Complex Files. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Working with NoSQL Databases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
HBase. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2
Troubleshooting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Installation and Configuration Overview

Big Data Trial Sandbox for Hortonworks consists of a virtual machine component and a client component. Use Big
Data Trial Sandbox for Hortonworks to run Informatica mappings on a Hortonworks virtual machine configured for the
Hadoop environment.
The Big Data Trial Sandbox for Hortonworks virtual machine has the following components:
9.6.1 Informatica services

Hortonworks 2.1.3
Sample data
Sample mappings for common big data use cases
Note: The Informatica Big Data Trial Sandbox for Hortonworks installation and configuration document is available on
the desktop of the virtual machine.
The Big Data Trial Sandbox for Hortonworks client installs the libraries and binaries required for the Informatica
Developer (Developer tool) client.
Step 1. Download the Software

Before you download the Big Data Trial Sandbox for Hortonworks software, you must download and install VMware
Player. Then, register at Informatica Marketplace and download the Big Data Trial Sandbox for Hortonworks virtual
machine and client.
Download and Install VMWare Player

To play the Big Data Trial Sandbox for Hortonworks virtual machine download and install VMware Player.
Download VMware Player from the following VMware website:

https://my.vmware.com/web/vmware/free#desktop_end_user_computing/vmware_player/6_0
The software available for download at the referenced links belongs to a third party or third parties, not Informatica
Corporation. The download links are subject to the possibility of errors, omissions or change. Informatica assumes no
responsibility for such links and/or such software, disclaims all warranties, either express or implied, including but not
limited to, implied warranties of merchantability, fitness for a particular purpose, title and non-infringement, and
disclaims all liability relating thereto.
You must have at least 10 GB of RAM and 30 GB of disk space available on the machine on which you download and
install VMWare Player.
Register at Informatica Marketplace

Register at Informatica Marketplace. Then, create an account to log in to Informatica Marketplace to download the Big
Data Trial Sandbox for Hortonworks client and server software.
You can access Informatica Marketplace here: https://marketplace.informatica.com/bdehortonworks.
When you register with Informatica Marketplace, you get a free 60-day trial to use Big Data Trial Sandbox for
Hortonworks.
3
Download the Big Data Trial Sandbox for Hortonworks Files
After you log in to Informatica Marketplace, download the Big Data Trial Sandbox for Hortonworks virtual machine and
client.
Download the following files:

BigDataTrialSandboxForHortonworks.ova
Includes the Big Data Trial Sandbox for Hortonworks virtual machine. Download the file to the machine on
which VMware Player is installed.
961_BigDataTrial_Client_Installer_win32_x86.zip
Includes the compressed Big Data Trial Sandbox for Hortonworks client. Download the file to an Informatica
client installation directory on a Microsoft Windows-32 machine.
Extract files in the client zip file to a directory on your local machine. For example, extract the files to the C:/
drive on your machine.
Step 2. Start the Big Data Trial Sandbox for Hortonworks

Virtual Machine
Open the Big Data Trial Sandbox for Hortonworks virtual machine in VMware Player.
1. Go to the directory where you downloaded BigDataTrialSandboxForHortonworks.ova and double-click the

file.
VMware Player opens and starts the BigDataTrialSandboxForHortonworks virtual machine.
2. Optionally, in VMware Player click Browse > Import to extract the contents of the virtual machine to the
selected location and start the virtual machine. Then, click Play virtual machine.
You are logged in to the virtual machine. The Informatica services and Hadoop services start automatically.
Step 3. Configure and Install the Big Data Trial Sandbox for
Hortonworks Client
To communicate with the virtual machine before you run the client, you must configure the domain properties for the
Big Data Trial Sandbox for Hortonworks client installation.
Optionally, to avoid updating the IP address of the virtual machine each time it changes, you can configure a static IP
address for the virtual machine.
Then, you can run the silent installer to install the Big Data Trial Sandbox for Hortonworks client.
Configure the Domain Properties on the Windows Machine

Configure the IP address and host name of the virtual machine for the Developer tool.
1. Click Applications > System Tools > Terminal to open the terminal to run commands.
2. Run the ifconfig command to find the IP address of the virtual machine.
The ifconfig command returns all interfaces on the virtual machine. Select the eth interface to get values
for IP address.
4
The following image shows the ifconfig command with the return value for inet addr highlighted with a red
arrow:
3. Add the IP address and the default hostname hdp-bde-demo to the hosts file on the Windows machine on
which you install the Developer tool.
The hosts file can be located in the following location: C:\Windows\System32\drivers\etc\hosts. Add the
following line to the hosts file: <IP address> <hostname>. For example, add the following line:
192.168.159.159 hdp-bde-demo
Configure a Static IP Address on the Windows Machine

Optionally, to avoid updating the IP address in the hosts file each time the IP address of the virtual machine changes,
configure a static IP address for the virtual machine.
1. Click Applications > System Tools > Terminal to open the terminal to run commands.
2. Run the ifconfig command to find the IP address and hardware ethernet address of the virtual machine.
The ifconfig command returns all interfaces on the virtual machine. Select the eth interface to get values
for the hardware ethernet address.
The following image shows the ifconfig command with the return values for inet addr and HWaddr outlined
with red boxes:
3. Edit vmnetdhcp.conf to add the values for host name, IP address, and hardware ethernet address.
vmnetdhcp.conf is located in the following directory: C:\ProgramData\VMware
Add the following entry before the #END tag at the end of the file:
host <hostname> {
hardware ethernet <your HWaddr>;
fixed-address <your inet addr>;
}
The following sample code shows how to set a static IP address:
host hdp-bde-demo {
hardware ethernet 00:0C:29:10:F9:4C;
fixed-address 192.168.159.159;
}
4. Add the IP address and the default hostname hdp-bde-demo to the hosts file on the Windows machine on
which you install the Developer tool.
5
The hosts file can be located in the following location: C:\Windows\System32\drivers\etc\hosts. Add the
following line to the hosts file: <IP address> <hostname>. For example, add the following line:
192.168.159.159 hdp-bde-demo
5. Shut down the virtual machine.
6. Restart the host machine and virtual machine.
Install the Big Data Trial Sandbox for Hortonworks Client

To install the client libraries and binaries perform the following steps:
1. Go to the directory that contains the client installation files.

2. Click silentInstall.bat to run the silent installer.
The silent installer runs in the background. The process can take several minutes.
The command window displays a message that indicates that the installation is complete.
You can find the Informatica_Version_Client_InstallLog.log file in the following directory: C:\Informatica
\9.6.1_BDE_Trial\
After the installation process is complete, you can launch the Big Data Trial Sandbox for Hortonworks Client.
Step 4. Access the Big Data Trial Sandbox for Hortonworks

Sandbox
You can log in to Apache Ambari to install, configure, and manage Hadoop clusters
You can log in to Informatica Administrator (the Administrator tool) to monitor Informatica services and the status of
mapping jobs.
You can log in to the Developer tool to run the sample mappings based on common big data use cases. You can
create your own mappings and run the mappings from the Developer tool.
For more information on how to run mappings in the Developer tool, see the Informatica Big Data Trial Sandbox for
Hortonworks User Guide.
Apache Ambari
You can log in to Ambari from the following URL: http://hdp-bde-demo:8080/#/login.
Enter the following credentials to log in to Ambari:
User name: admin
Password: admin
Informatica Administrator
You can access the Administrator tool from the following URL: http://hdp-bde-demo:6005
Enter the following credentials to log in to the Administrator tool:
User name: Administrator
Password: Administrator
6
Informatica Developer
You can start the Developer tool client from the Windows Start menu.
Enter the following credentials to connect to the Model repository Infa_mrs:
User name: Administrator
Password: Administrator
Big Data Trial Sandbox for Hortonworks Samples

The Big Data Trial Sandbox for Hortonworks provides samples based on common Hadoop use cases.
The Big Data Trial Sandbox for Hortonworks includes samples for the following use cases:
Running common tutorial mappings on Hadoop.

Performing data discovery on Hadoop.
Performing data warehouse optimization.
Processing complex files.
Working with NoSQL databases.
After you run the mappings in the Developer tool, you can monitor the mapping jobs in the Administrator tool.
Running Common Tutorial Mappings on Hadoop

Big Data Trial Sandbox for Hortonworks provides a sample tutorial mapping that reads text files and counts how often
words occur. The word count mappings appear in the Hadoop_tutorial project in the Developer tool. After you open a
mapping, you can right-click the mapping to run the mapping on Hadoop.
The Hadoop_tutorial project contains the following sample mappings:
m_DataLoad_1
m_DataLoad_1 loads data from the READ_WordFile1 flat file from your machine to the
WRITE_HDFSWordFile1 flat file on HDFS.
The following image shows the mapping m_DataLoad_1:
m_DataLoad_2
m_DataLoad_2 loads data from the READ_WordFile2 flat file from your machine to the
WRITE_HDFSWordFile2 file on HDFS.
7
The following image shows the mapping m_DataLoad_2:
m_WordCount
m_WordCount reads two source files from HDFS and parses the data and the output to a flat file on HDFS.
The following image shows the mapping m_WordCount:
The mapping contains the following objects:
Sources. HDFS files.

Expression transformations. Removes the carriage return and new line characters from a word.
Union transformation. Forms a collective data set.
Aggregator transformation. Counts the occurrence of each word in the mapping.
Target. Flat file on HDFS.
Performing Data Discovery on Hadoop

Big Data Trial Sandbox for Hortonworks provides samples that you can use to discover data on Hadoop and run and
create profiles on the data. After you open the profile, you can right-click the profile to run the profile. Running a profile
on any data source in the enterprise gives you a good understanding of the strengths and weaknesses of its data and
metadata.
The DataDiscovery project in the Developer tool includes the following samples that you can use to perform data
discovery on Hadoop:
CustomerData. Flat file data source that includes customer information.

Profile_CustomerData. Profiles the customer data to determine the characteristics of the customer data.
Use the samples to understand how to perform data discovery on Hadoop. You want to discover the quality of the
source customer data in the CustomerData flat file before you use the customer data as a source in a mapping. You
8
should verify the quality of the customer data to determine whether the data is ready for processing. You can run the
Profile_CustomerData profile based on the source data to determine the characteristics of the customer data.
The profile determines the characteristics of columns in a data source, such as value frequencies, unique values, null
values, patterns, and statistics.
The profile determines the following characteristics of source data:
The number of unique and null values in each column, expressed as a number and percentage.
The patterns of data in each column and the frequencies with which these values occur.
Statistics about the column values, such as the maximum value length, minimum value length, first value, and
last value in each column.
The data types of the values in each column.
The following figure shows the profile results that you can analyze to determine the characteristics of the customer
data:
Performing Data Warehouse Optimization

You can optimize an enterprise data warehouse with the Hadoop system to store more terabytes of data cheaply in the
warehouse. Big Data Trial Sandbox for Hortonworks provides samples that demonstrate how to perform data
warehouse optimization on Hadoop.
The DataWarehouseOptimization project in the Developer Tool includes samples that you can use to perform data
warehouse optimization on Hadoop.
Use the samples to analyze customer portfolios by processing the records that have changed in a 24 hour time period.
You can offload the data on Hadoop, find the customer records that have been inserted, deleted, and updated in the
last 24 hours, and then update those records in your data warehouse. You can capture these changes even if the
number of columns change or if the keys change in the source files.
To capture the changes, use the Data Warehouse Optimization workflow. The workflow contains mappings that move
the data from local flat files to HDFS, identify the changes, and then load the final output to flat files.
The following image shows the sample Data Warehouse Optimization workflow:
9
To run the workflow, enter the following command to run the workflow from the command line:
./infacmd.sh wfs startWorkflow -dn infa_domain -sn infa_dis -un Administrator -pd
Administrator -Application App_DataWarehouseOptimization -wf wf_DataWarehouseOptimization
To run the mappings in the workflow, open a mapping and right-click the mapping to run the mapping.
The workflow contains the following mappings and transformations:

Mapping_Day1
The workflow object Mapping_Day1 reads customer data from flat files in a local file system and writes to an
HDFS target for the first 24-hour period.
Mapping_Day2
The workflow object Mapping_Day 2 reads customer data from flat files in a local file system and writes to an
HDFS target for the next 24-hour period.
m_CDC_DWHOptimization
The workflow object m_CDC_DWHOptimization captures the changed data. It reads data from HDFS and
identifies the data that has changed. To increase performance, you can configure the mapping to run on
Hadoop cluster nodes in a Hive environment.
The following image shows the mapping m_CDC_DWHOptimization:
Sources. HDFS files that were the targets of the previous two mappings. The Data Integration Service
reads all of the data as a single column.
Expression transformations. Extract a key from the non-key values in the data. The expressions use the
INSTR function and SUBSTR function to perform the extraction of key values.
Joiner transformation. Performs a full outer join on the two sources based on the keys generated by the
Expression transformations.
Filter transformations. Use the output of the Joiner transformation to filter rows based on whether or not
the rows should be updated, deleted, or inserted.
Targets. HDFS files. The Data Integration Service writes the data to three HDFS files based on whether
the data is inserted, deleted, or updated.
10
Consolidated_Mapping
The workflow object Consolidated_Mapping consolidates the data in the HDFS files and loads the data to the
data warehouse.
The following figure shows the mapping Consolidated_Mapping:
Sources. The HDFS files that were the target of the previous mapping are the sources of this mapping.
Expression transformations. Add the deleted, updated, or inserted tags to the data rows.
Union transformation. Combines the records.
Target. Flat file that acts as a staging location on the local file system.
Processing Complex Files

Big Data Trial Sandbox for Hortonworks provides samples to process large volumes of data from complex files that
contain unstructured data. The data might be on the Hadoop Distributed File System (HDFS) or on your local file
system.
Big Data Trial Sandbox includes samples that demonstrate the following use cases to process complex files:
Reading and parsing complex files.

Writing to complex files.
Reading and Parsing Complex Files

Capturing and analyzing unstructured or semi-structured data such as web traffic records is a challenge because of the
volume of data involved. Big Data Trial Sandbox for Hortonworks provides samples to read and process semi-
structured or unstructured data in complex files.
The LogProcessing project in the Developer tool includes samples that you can use to read and parse complex files.
Use the samples to process daily web logs from an online trading site and write the parsed data to a flat file. The web
logs contain details about visitors who log in to the website and look up the value of stocks using stock symbols.
To process the web logs, use the web log processing workflow.
11
The following image shows the sample web log processing workflow:
To run the workflow, enter the following command to run the workflow from the command line:
./infacmd.sh wfs startWorkflow -dn infa_domain -sn infa_dis -un Administrator -pd
Administrator -Application app_logProcessing -wf wf_LogProcessing
You can run the following mappings and transformations in the workflow:
m_LoadData
The workflow object m_LoadData reads the parsed web log data and writes to a flat file target. The source
and target are flat files.
The following image shows the mapping m_LoadData:
m_sample_weblog_parsing
The workflow object m_sample_weblog_parsing is a logical data object read mapping reads data from a
HDFS source, parse the data using a Data Processor transformation, and writes to a logical data object.
12
The following image shows the mapping m_sample_weblog_parsing:
The following image shows the expanded logical data object read mapping m_sample_weblog_parsing:
Source. HDFS file that was the target of the previous mapping.
Data Processor transformation. Processes the input binary stream of data, parses the data, and writes to
XML format.
Joiner transformation. Combines the activity of visitors who return to the website on the same day with
stock queries.
Expression transformation. Adds the current date to each transformed record.
Target. Flat file.
Writing to Complex Files

Big Data Trial Sandbox for Hortonworks provides samples to read, parse, and write large volumes of unstructured data
to complex files.
The Complex_File_Writer project in the Developer tool includes samples that you can use to write unstructured data
to complex files.
Use the samples to generate a report in XML format of the sales by country for each customer. You know the customer
purchase order details such as customer ID, product names, and item quantity sold. The purchase order details are
stored in semi-structured compressed XML files in HDFS. Create a mapping that reads all the customer purchase
records from the files in HDFS and use a Data Processor transformation to process the sales by country for each
customer. The mapping converts the semi-structured data to relational data and writes it to a relational target.
13
The following figure shows the Complex File Writer sample mapping:

HDFS inputs
The inputs, Read_customers_flatfile, Read_products_flatfile, Read_sales_flatfile, Read_promotions_flatfile,
Read_countries_flatfile are flat files stored in HDFS.
Transformations
The Joiner transformation Joiner_products joins product and sales data.

The Joiner transformation Joiner_promotions joins sales and promotion data.
The Data Processor transformation, customer_sales_xml_generator, provides a binary, hierarchical output
for sales by country for each customer.
HDFS output
The output, Write_binary_single_file, is a complex file stored in HDFS.
Working with NoSQL Databases

Big Data Trial Sandbox for Hortonworks provides samples that demonstrate how to read from and write to NoSQL
databases. You can run the sample mappings to understand the simple extract, transform, and load scenarios when
you use a NoSQL database.
Big Data Trial Sandbox for Hortonworks provides samples for the following NoSQL database:
HBase
14
HBase
Use HBase when you need random real-time read and writes from a database. HBase is a non-relational distributed
database that runs on top of the Hadoop Distributed File System (HDFS) and can store sparse data. Big Data Trial
Sandbox for Hortonworks provides samples that demonstrate how to read and process binary data from HBase.
The HBase_Binary_Data project in the Developer tool includes samples that you can use to read and process binary
data in HBase tables to string data in a flat file target.
The sample HBase table contains the details of people and the cars that they purchased over a period of time. The
table contains the Details and Cars column families. The column names of the Cars column family are of String data
type. You can get all columns in the Cars column family as an single binary column. You can use the sample Java
transformation to covert the binary data to string data. You can join the data from both the column families and write it
to a flat file.
To process the Hbase binary data, use the wf_HBase_Binary_Data workflow.
The following figure shows the wf_HBase_Binary_Data workflow:
To run the workflow, enter the wfs startworkflow command to run the workflow from the command line.
The workflow contains following mappings and transformations:

m_person_Cars_Write_Static
The workflow object references the m_person_Cars_Write_Static HBase write data object mapping that
writes data to the columns in the Cars and Details column family.
m_preson_Cars_Write_Static1
The workflow object references the m_pers_cars_static_reader mapping that transforms the binary data in an
HBase data object to columns of the String data type and writes the details to a flat file data object.:
The HBase mapping contains the following objects:
15
Person_Car_Static_Read
The first source for the mapping is an HBase data object named Person_Car_Static that contains the
columns in the Details column family. The HBase read data object operation is named
Person_Car_Static_Read.
pers_cars_Static_bin_read
The second source for the mapping is an HBase data object named Person_cars_Static_bin that
contains the data in the Cars column family. The HBase read data object operation is named
pers_cars_Static_bin_read.
Transformations
The HBase_ProtoBuf_Read_String.xml Java transformation transforms the single column of binary

data in the Person_Car_Static data object to column values of the String data type.
The Sorter transformation sorts the data in ascending order based on the row ID.
The Expression and Aggregator transformations convert the row data to columnar data.
The Joiner transformation combines the data from both the HBase input sources before you load the
data to the flat file data object.
The Filter transformation filters out any person with age less than or equal to 43.
Write_Person_Cars_FF
The target for the mapping is a flat file data object named Person_Cars_FF. The flat file data object write
operation is named Write_Person_Cars_FF to write data from the Cars and Details column families.
The Data Integration Service converts the binary column in Person_cars_Static_bin, joins the data in
Person_Car_Static, and writes the data to the flat file data object Write_Person_Cars_FF.
Troubleshooting
This section describes troubleshooting information.
Informatica Services shut down
The Informatica services might shut down when the machine on which you run the virtual machine goes into
hibernation or when you resume the virtual machine.
Run the following command to restart the services on the operating system of the virtual machine: sh /home/
infauser/BDETRIAL/.cmdInfaServiceUtil.sh start
Debug mapping failures
To debug mapping failures, check the error messages in the mapping log file.
The mapping log file appears in the following location: /home/infauser/bdetrial_repo/informatica/

informatica/tomcat/bin/disTemp
Virtual machine does not start because of a 64-bit error
VMWare Player displays a message that states it cannot power on a 64-bit virtual machine. Or, VMware
Player might display the following error when you play the virtual machine: The host supports Intel VT-x,
but Intel VT-x is disabled. Intel VT-x might be disabled if it has been disabled in the
BIOS/firmware settings or the host has not been power-cycled since changing this setting.
You must enable the BIOS of the machine on which VMware Player runs to use Intel Virtualization
Technology. For more information, refer to the VMware Knowledge Base article here.
16
Virtual machine is in a suspended state
If the virtual machine is in a suspended state, you need to resume the virtual machine. You need to log in to
the virtual machine. After you log in, the Informatica services and Hadoop services start automatically.
In VMware Player, select the virtual machine and click Play virtual machine.
Enter a user name and password for the virtual machine. The default user name and password is: infa / infa
The Developer tool takes a long time to connect to the Model repository
The Developer tool might take a long time to connect to the Model repository because the virtual machine
cannot find the IP address and host name of the client machine.
You must add the IP address and host name of the client machine on the hosts file of the virtual machine.
Use the ipconfig and hostname commands from the command line of the Windows machine to find the IP
address and hostname of the Windows machine.
Add the IP address and the host name to the hosts file on the virtual machine.
For example, the hosts file can be located in the following directory on the virtual machine: /etc/hosts
Add the following line to the hosts file:

<IP address> <hostname>
Mapping fails and job execution failed errors appear in the mapping log
If the mapping fails and you cannot determine the cause of the job execution failed errors that appear in the
mapping log, you can clear the contents of the following directory on the machine that hosts the virtual
machine: /tmp/infa. Then, run the mapping again.
Author
Big Data Edition Team
17

IN 961 BigDataTrialSandboxforHortonworksInstallandConfig PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

IN 961 BigDataTrialSandboxforHortonworksInstallandConfig PDF

Uploaded by

Copyright:

Available Formats

Informatica Big Data Trial Sandbox for

Hortonworks Quick Start

Installation and Configuration Overview

9.6.1 Informatica services

Step 1. Download the Software

Download and Install VMWare Player

Download VMware Player from the following VMware website:

Register at Informatica Marketplace

You can access Informatica Marketplace here: https://marketplace.informatica.com/bdehortonworks.

Download the following files:

Step 2. Start the Big Data Trial Sandbox for Hortonworks

1. Go to the directory where you downloaded BigDataTrialSandboxForHortonworks.ova and double-click the

Configure the Domain Properties on the Windows Machine

Configure a Static IP Address on the Windows Machine

Install the Big Data Trial Sandbox for Hortonworks Client

1. Go to the directory that contains the client installation files.

Step 4. Access the Big Data Trial Sandbox for Hortonworks

Enter the following credentials to log in to Ambari:

User name: admin

Enter the following credentials to log in to the Administrator tool:

User name: Administrator

Enter the following credentials to connect to the Model repository Infa_mrs:

User name: Administrator

Big Data Trial Sandbox for Hortonworks Samples

Running common tutorial mappings on Hadoop.

Running Common Tutorial Mappings on Hadoop

The Hadoop_tutorial project contains the following sample mappings:

The following image shows the mapping m_DataLoad_1:

The following image shows the mapping m_WordCount:

The mapping contains the following objects:

Sources. HDFS files.

Performing Data Discovery on Hadoop

CustomerData. Flat file data source that includes customer information.

The profile determines the following characteristics of source data:

Performing Data Warehouse Optimization

The workflow contains the following mappings and transformations:

The following image shows the mapping m_CDC_DWHOptimization:

The mapping contains the following objects:

The following figure shows the mapping Consolidated_Mapping:

The mapping contains the following objects:

Processing Complex Files

Reading and parsing complex files.

Reading and Parsing Complex Files

The following image shows the mapping m_LoadData:

The mapping contains the following objects:

Writing to Complex Files

The mapping contains the following objects:

The Joiner transformation Joiner_products joins product and sales data.

Working with NoSQL Databases

To process the Hbase binary data, use the wf_HBase_Binary_Data workflow.

The following figure shows the wf_HBase_Binary_Data workflow:

The workflow contains following mappings and transformations:

The HBase mapping contains the following objects:

The HBase_ProtoBuf_Read_String.xml Java transformation transforms the single column of binary

Informatica Services shut down

Debug mapping failures

The mapping log file appears in the following location: /home/infauser/bdetrial_repo/informatica/

Virtual machine does not start because of a 64-bit error

Add the following line to the hosts file:

Big Data Edition Team

You might also like