You are on page 1of 25

Talend Real-Time Big Data Sandbox

Big Data Insights Cookbook


Talend Real-Time Big Data Sandbox
Big Data Insights Cookbook

Overview of Pre-requisites
Real-time Big to run
Data Sandbox Sandbox

Sandbox Obtaining a
Demo
Setup & Talend
(Scenario)
Configuration License
Talend Real-Time Big Data Sandbox
Big Data Insights Cookbook

Overview of Real-time Pre-requisites to Run Sandbox Setup & Obtaining a Demo


Big Data Sandbox Sandbox Configuration Talend License (Scenario)

About this cookbook

What is the Talend Cookbook?

Using the Talend Real-Time Big Data The demo is built on a real world use- Whether batch, streaming or real-
Platform, this Cookbook provides case in the Retail industry and time integration, understand how
step-by-step instructions to built and demonstrates how Talend, Spark, Talend can be used to address your
run an end-2-end integration NoSQL and real-time messaging can big data challenges and move you
scenario. be easily used together to provide into and beyond the sandbox stage.
real-time “offers” as part of an online
shopping experience.
Talend Real-Time Big Data Sandbox
Big Data Insights Cookbook

Overview of Real-time Pre-requisites to Run Sandbox Setup & Obtaining a Demo


Big Data Sandbox Sandbox Configuration Talend License (Scenario)

About Talend

What does Talend offer?

At Talend, it’s our mission to connect the data-driven enterprise, so our customers can operate in real-time with new insight about their
customers, markets and business.

• Talend helps companies with


big data challenges with the
most advanced big data
integration platform, used by
businesses to deliver timely
and easy access to all their
data.

• Talend provides the industry’s


first data integration platform
with native support for Apache
Spark, Spark Streaming and
Hadoop.

• Talend delivers unmatched


data processing speed and
enables any company to
convert streaming big data or
IoT sensor information into
immediately actionable
insights.
Talend Real-Time Big Data Sandbox
Big Data Insights Cookbook

Overview of Real-time Pre-requisites to Run Sandbox Setup & Obtaining a Demo


Big Data Sandbox Sandbox Configuration Talend License (Scenario)

About Talend Big Data

1st Data Integration Platform on Apache Spark

Visually develop jobs that run 100% on


Spark:
• 5X times faster using independent
benchmarks
• 10X developer productivity gained
over hand-coding Spark
• 100X faster with in-memory
processing

Over 100 new drag-n-drop Spark


components:
• HDFS, RDBMS, NoSQL, Cloud
Storage, Transformation, Messaging,
In-memory analytics & machine
learning recommendations, and
much more
• In-memory data caching &
“windowed” computations
• Click to enable Spark Streaming for
real-time data processing

Convert Talend MapReduce jobs to Spark


with the click of a button, future proofing
your investment
Talend Real-Time Big Data Sandbox
Big Data Insights Cookbook

Overview of Real-time Pre-requisites to Run Sandbox Setup & Obtaining a Demo


Big Data Sandbox Sandbox Configuration Talend License (Scenario)

What is the Big Data Sandbox?

Virtual Environment

Sandbox Examples
Sample
Talend Real-
scenarios Real-time
Time Big Data Data
pre-built and decisions
Platform
ready-to-run

The Talend Real-Time Big Data See how Talend can turn data into
Sandbox is a virtual environment that real-time decisions through sandbox
combines the Talend Real-Time Big examples that integrate Apache
Data Platform with some sample Kafka, Spark, Spark Streaming,
scenarios pre-built and ready-to-run. Hadoop and NoSQL.
Talend Real-Time Big Data Sandbox
Big Data Insights Cookbook

Overview of Real-time Pre-requisites to Run Sandbox Setup & Obtaining a Demo


Big Data Sandbox Sandbox Configuration Talend License (Scenario)

What Pre-requisites are required to run Sandbox ?

Talend Platform for Big Data includes a graphical IDE (Talend Studio),
teamwork management, data quality, and advanced big data features.

To see a full list of features please visit Talend’s Website: You will need a Virtual Machine player such as VMWare,
http://www.talend.com/products/platform-for-big-data which can be downloaded from VMware Player Site

Follow the VM Player install instructions from the provider

The recommended host machine

Disk
Memory
Space
8GB
20GB (10GB is for the
image download)
Talend Real-Time Big Data Sandbox
Big Data Insights Cookbook

Overview of Real-time Pre-requisites to Run Sandbox Setup & Obtaining a Demo


Big Data Sandbox Sandbox Configuration Talend License (Scenario)

How do I set-up & configure Sandbox ?

Download the Sandbox Virtual Machine file at You will receive an email with a license key attachment and
www.talend.com/talend-big-data-sandbox. a second email with a list of support resources and videos.

Follow the steps below to install and


configure your Big Data Sandbox:
1
1. Open the VMware Player.
2
2. Click on “Open a Virtual Machine”

3. Find the .ova file that you


downloaded. Select it and click
Open.

4. Select where you would like the


disk to be stored on your local host
3a 4a
machine: e.g. C:/vmware/sandbox

5. Click on “Import”. 3b
5

Note: The Username/Sudo Username = talend Having trouble with Sandbox configuration settings?
Password = talend click here for troubleshooting guide
Talend Real-Time Big Data Sandbox
Big Data Insights Cookbook

Overview of Real-time Pre-requisites to Run Sandbox Setup & Obtaining a Demo


Big Data Sandbox Sandbox Configuration Talend License (Scenario)

How do I set-up & configure Sandbox? (cont.)

6. Edit Settings if needed:


a) Right-click NAT icon in upper-right
corner and select settings. Check the
6a
setting to make sure the memory
and processors are not too high for
your host machine.
6b
b) It is recommended to have 8GB or
more allocated to the Sandbox VM
and it runs very well with 10GB if
your host machine can afford the
memory.

7. The “NAT” Network Adaptor should


already be configured for your VM. If it is
not, you can add it by following the steps
below:
a) Click “Add”
7c
b) Select Network Adapter : “NAT” and
select “Next” 7b

c) Once finished select Finish to return


7a
to the main Player home page. 8

8. Start the VM
Talend Real-Time Big Data Sandbox
Big Data Insights Cookbook

Overview of Real-time Pre-requisites to Run Sandbox Setup & Obtaining a Demo


Big Data Sandbox Sandbox Configuration Talend License (Scenario)

How do I set-up & configure Sandbox ?

Follow the steps below to install and


configure your Big Data Sandbox (Cont.):

1. Click on “Play Virtual Machine”

2. The virtual machine starts loading


2

1
Talend Real-Time Big Data Sandbox
Big Data Insights Cookbook

Overview of Real-time Pre-requisites to Run Sandbox Setup & Obtaining a Demo


Big Data Sandbox Sandbox Configuration Talend License (Scenario)

How do I set-up & configure Sandbox ?

Follow the steps below to install and


configure your Big Data Sandbox (Cont.):

1. Once virtual machine has finished


loading, you are brought to the
login screen. Enter the password
“talend” to continue

1
Talend Real-Time Big Data Sandbox
Big Data Insights Cookbook

Overview of Real-time Pre-requisites to Run Sandbox Setup & Obtaining a Demo


Big Data Sandbox Sandbox Configuration Talend License (Scenario)

How do I setup the Talend License on Virtual machine?

You should have been provided a license file by your Talend representative or
by an automatic email from the Talend Real-time Big Data Sandbox program.

If you did not receive a license key click on link

To obtain the license key: https://info.talend.com/prodevaltpbdrealtimesandboxdrive.html


Talend Real-Time Big Data Sandbox
Big Data Insights Cookbook

Overview of Real-time Pre-requisites to Run Sandbox Setup & Obtaining a Demo


Big Data Sandbox Sandbox Configuration Talend License (Scenario)

How do I setup the Talend License on Virtual machine?

This license file is required to open the Talend Studio and must reside within the VM.

To get the license file on the VM: 1a


1b
1. Click the Download button of the
license key document and click
Save As, to save it on your laptop 3 4
in a place you will be able to find
it. 2

2. In the Virtual Player, click Files

3. Double-click “Documents folder”

4. Locate License Key document and


Drag-and-Drop it into the
Documents folder on the Virtual
Player.

Important Notes:

“For VirtualBox users, there is a known issue with Drag-and-drop functionality. The easiest way to get the Talend license file onto the VM is by saving it to a cloud storage site
such as Dropbox.com or sending it to a web-based email client that you have access (such as gmail, yahoo, hotmail, etc…), then navigating to that location from within the
Virtual Machine web browser to download the file.”
Talend Real-Time Big Data Sandbox
Big Data Insights Cookbook

Overview of Real-time Pre-requisites to Run Sandbox Setup & Obtaining a Demo


Big Data Sandbox Sandbox Configuration Talend License (Scenario)

Real-time Recommendation Demo

• In this Demo you will see a


simple version of making your
website an Intelligent
Application.

Customers Channels

You will experience:


Email Streaming
• Building a Spark Website
Recommendation Model
Store
• Setting up a new Kafka topic to
help simulate live web traffic Shopping Cart
Spark Engine
coming from Live web users (Recommendation NOSQL Window Updates
(Recommendation)
browsing a retail web store. s)

• Most important you will see Internal Systems


first-hand with Talend how you POS
can take streaming data and Streaming
turn it into real-time Clickstream
recommendations to help
improve shopping cart sales. …….

The following Demo will help you see the value that using Talend can bring to your big data projects:
The Real-time Recommendation Demo is designed to illustrate the simplicity and flexibility Talend brings to using Spark in your Big Data Architecture.
Talend Real-Time Big Data Sandbox
Big Data Insights Cookbook

Overview of Real-time Pre-requisites to Run Sandbox Setup & Obtaining a Demo


Big Data Sandbox Sandbox Configuration Talend License (Scenario)

Real-time Recommendation Demo

In this Demo, you will see how you can…

Create a Steam Live


Create a
recommendation Recommendations
Kafka Topic
model Pipeline

Create a Kafka Topic to Produce Create a Spark recommendation See live streaming
and Consume real-time streaming model based on specific user recommendations to a Cassandra
data actions NoSQL database for “Fast Data”
access for a WebUI

If you are familiar with the ALS model, you can update the ALS parameters to enhance the model or just leave the default values.
Talend Real-Time Big Data Sandbox
Big Data Insights Cookbook

Overview of Real-time Pre-requisites to Run Sandbox Setup & Obtaining a Demo


Big Data Sandbox Sandbox Configuration Talend License (Scenario)

Real-time Recommendation Demo REQUIRED

Running a shell script:

1
1. From the Desktop, double click on
the “Start_Kafka Icon”. If
prompted for a password enter
talend.
2
2. You can stop Kafka at any time by
double-clicking on “Stop_Kafka”.
If prompted for a password, enter
talend.
Talend Real-Time Big Data Sandbox
Big Data Insights Cookbook

Overview of Real-time Pre-requisites to Run Sandbox Setup & Obtaining a Demo


Big Data Sandbox Sandbox Configuration Talend License (Scenario)

Real-time Recommendation Demo REQUIRED

Starting Talend Studio:


The first time you start up Talend
Studio you have to browse for the
license… 1

1. To begin, Click on “Talend-


Studio”
2a 2b
2. Click “My product license is on
the local file system” then click
“Browse” 3b
3. Navigate the “Documents”
folder. Click on the license file 3a
you downloaded

4. Click “OK” then click “Next”


4
5. Talend Real-Time Big Data
Platform window pops up, let it
load, and when complete click
“Finish”

5
Talend Real-Time Big Data Sandbox
Big Data Insights Cookbook

Overview of Real-time Pre-requisites to Run Sandbox Setup & Obtaining a Demo


Big Data Sandbox Sandbox Configuration Talend License (Scenario)

Real-time Recommendation Demo


To execute the Real-time Recommendation Demo:

First, a Kafka topic must be created . This task can 1


be completed by executing the following job 2
1. Navigate to the “job designs” folder:
2. Click on Standard Jobs > 3
Realtime_Recommendation_Demo
3. Double click on
OneTime_Create_Clickstream_Kafka_Topic 0.1
This opens the job in the designer window
4. From the Run tab, click on Run to execute 3b

Now you can generate the recommendation model


by loading the product ratings data into the
Alternating Least Squares (ALS) Algorithm. Rather
than coding a complex algorithm with Scala, a single
Spark component available in Talend Studio
simplifies the model creation process. The resultant
model can be stored in HDFS or in this case, locally.
4

If you are familiar with the ALS model, you can update the ALS parameters to enhance the model or just leave the default values.
Talend Real-Time Big Data Sandbox
Big Data Insights Cookbook

Overview of Real-time Pre-requisites to Run Sandbox Setup & Obtaining a Demo


Big Data Sandbox Sandbox Configuration Talend License (Scenario)

Real-time Recommendation Demo

Run the job to generate the


recommendation model
1

1. Navigate to the “job designs” folder:


2. Click on BigData batch >
Realtime_Recommendations_Demo
3. Double click on 2
Build_Recommendation_Model_with 3
_Spark This opens the job in the
designer window. 3b
4. From the Run tab, click on Run to
execute

With the Recommendation model created,


your lookup tables populated and your
Kafka topic ready to consume data, you
can now stream your Clickstream data into
your Recommendation model and put the
results into your Cassandra tables for 4
reference from a WebUI.
Talend Real-Time Big Data Sandbox
Big Data Insights Cookbook

Overview of Real-time Pre-requisites to Run Sandbox Setup & Obtaining a Demo


Big Data Sandbox Sandbox Configuration Talend License (Scenario)

Real-time Recommendation Demo

1. Navigate to the “job designs” folder:


2. Click on Standard Jobs >
Realtime_Recommendations_Demo
3. Double click on
Push_Clickstream_To_Kafka 0.1 This
opens the job in the designer
window

First, lets look quickly at the


Push_Clickstream_To_Kafka job.

This job is setup to simulate real-time


streaming of web traffic and clickstream
data into a kafka topic that will then be
consumed by our recommendation
engine to produce our
recommendations.

We are reviewing this job now. It


will be executed in the next few
steps
Talend Real-Time Big Data Sandbox
Big Data Insights Cookbook

Overview of Real-time Pre-requisites to Run Sandbox Setup & Obtaining a Demo


Big Data Sandbox Sandbox Configuration Talend License (Scenario)

Real-time Recommendation Demo

1. Navigate to the “job designs”


folder:
2. Click on Big Data Streaming >
Realtime_Recommendation_Demo
3. Double click on
Realtime_Recommendation_Engine
_Pipeline 0.1 This opens the job in
the designer window
Talend Real-Time Big Data Sandbox
Big Data Insights Cookbook

Overview of Real-time Pre-requisites to Run Sandbox Setup & Obtaining a Demo


Big Data Sandbox Sandbox Configuration Talend License (Scenario)

Real-time Recommendation Demo

Next, take a look at the


Realtime_Recommendation_Engine
_Pipeline job.

A• In this job, you will see the input is


your Kafka Consumer of Clickstream
Data.

• The data will be fed into your


Recommendation Engine to
produce Real-time “offers” based
on the current user’s activity. A

• Using the tWindow component,


you can control how often you
send recommendations.

• Your recommendations are sent


to 3 output streams - the
B
execution window for viewing
purposes, flat file for later
processing in your Big Data
Analytics environment and to a
Cassandra table for use in your
“Fast Data” layer by your WebUI.

B Click on “Run” to Start


Recommendation Engine
Talend Real-Time Big Data Sandbox
Big Data Insights Cookbook

Overview of Real-time Pre-requisites to Run Sandbox Setup & Obtaining a Demo


Big Data Sandbox Sandbox Configuration Talend License (Scenario)

Real-time Recommendation Demo


3 1
With your recommendation engine running,
you can start sending data to your Kafka topic.
1. Navigate back to the
Push_Clickstream_To_Kafka job and
2. Click “Run” on the run tab to execute
3. Once this job starts…switch back over to the
Recommendation Engine job
4. Watch the execution output window. You
will now see your real-time data coming
through with recommended products based
on your Recommendation Model. 2

Your recommendations are also written to a


Cassandra database so they can be
5
referenced by a WebUI to offer, for instance,
last minute product suggestions when a
customer is about to check-out.

5. Once you have seen the results, you can 4


“kill” the Recommendation Engine to stop
the streaming recommendations.
Talend Real-Time Big Data Sandbox
Big Data Insights Cookbook

Overview of Real-time Pre-requisites to Run Sandbox Setup & Obtaining a Demo


Big Data Sandbox Sandbox Configuration Talend License (Scenario)

Conclusion

Product recommendations have evolved…

• ETL – it would take weeks to gather and process required data


• MapReduce – Now you can process even more data then before in hours rather then days and weeks
• Spark – NOW you can process even more in minutes and even seconds

The good news is that…


With Talend, it is now just a few clicks to make this type of transformation a reality.

Let’s take
one final
What are your next steps? look at how
Talend will
Now that you understand how you can address your big data
opportunities using Talend... help you…

The next step would be to discuss with your Talend sales


representative your specific requirements and how Talend can help
“Jumpstart” your big data project into production.
Talend Real-Time Big Data Sandbox
Big Data Insights Cookbook

Overview of Real-time Pre-requisites to Run Sandbox Setup & Obtaining a Demo


Big Data Sandbox Sandbox Configuration Talend License (Scenario)

Conclusion

How will Talend help you?

Talend vastly simplifies big data Talend is built for batch and
real-time big data. Talend lowers operations costs
integration

First, Talend vastly simplifies big Second, Talend is built for batch And third, Talend lowers
data integration, allowing you to and real-time big data. Unlike other operations costs.
leverage in-house resources to solutions that “map” to big data or
use Talend's rich graphical tools support a few components, Talend Talend’s zero footprint solution
that generate big data code is the first data integration platform takes the complexity out of…
(Spark, MapReduce, PIG, Java) for built on Spark with over 100 Spark integration deployment,
you. components. management,
maintenance
Talend is based on standards such Whether integrating batch
as Eclipse, Java, and SQL, and is (MapReduce, Spark), streaming A usage based subscription
backed by a large collaborative (Spark), NoSQL, or in real-time, model provides a fast return on
community. Talend provides a single tool for all investment without large upfront
your integration needs. costs.
So you can up skill existing
resources instead of finding new Talend’s native Hadoop data quality
resources. solution delivers clean and
consistent data at infinite scale.

You might also like