Talend Preparing Metadata For HDFS Connection

This document provides instructions for preparing file metadata in the Talend Repository by retrieving the schema of files stored in HDFS. It describes retrieving the schema of the movies.csv file stored in HDFS to define its metadata in the Repository, so that the schema can be reused by other Big Data components without having to manually define the parameters. The steps include expanding the Hadoop cluster and HDFS connection nodes, right clicking the HDFS connection, selecting "Retrieve Schema" from the menu, browsing and selecting the movies.csv file, reviewing the retrieved schema, and clicking "Finish" to validate and see the file metadata under the HDFS connection in the Repository tree view.

Uploaded by

geoinsys

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as ODT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

160 views4 pages

Talend Preparing Metadata For HDFS Connection

Uploaded by

geoinsys

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as ODT, PDF, TXT or read online on Scribd

4.7.

Preparing file metadata

In the Repository, setting up the metadata of a file stored in HDFS allows you to directly reuse its
schema in a
related Big Data component without having to define each related parameter manually.
Prerequisites:
You have launched your Talend Studio and opened the Integration perspective.
The source files movies.csv and directors.txt have been uploaded into HDFS as explained in
Uploading files
to HDFS.
Talend Open Studio for Big Data Getting Started Guide
21Preparing file metadata
The connection to the Hadoop cluster to be used and the connection to the HDFS system of this
cluster have
been set up from the Hadoop cluster node in the Repository.
If you have not done so, see Setting up Hadoop connection manually and then Setting up
connection to HDFS
to create these connections.
The Hadoop cluster to be used has been properly configured and is running and you have the
proper access
permission to that distribution and the HDFS folder to be used.
You have ensured that the client machine on which the Talend Studio is installed can recognize the
host names
of the nodes of the Hadoop cluster to be used. For this purpose, add the IP address/hostname
mapping entries
for the services of that Hadoop cluster in the hosts file of the client machine.
For example, if the host name of the Hadoop Namenode server is talend-cdh550.weave.local and its
IP address
is 192.168.x.x, the mapping entry reads 192.168.x.x talend-cdh550.weave.local.
Since the movies.csv file you need to process has been stored in the HDFS system being used, you
can retrieve
its schema to set up its metadata in the Repository.
The schema of the directors.txt file can also be retrieved, but is intentionally ignored in the retrieval
procedure
explained below, because in this scenario, this directors.txt file is used to demonstrate how to
manually define
a schema in a Job.
1. Expand the Hadoop cluster node under Metadata in the Repository tree view.
2. Expand the Hadoop connection you have created and then the HDFS folder under it.
In this example, it is the my_cdh Hadoop connection.
3.
Right click the HDFS connection in this HDFS folder and from the contextual menu, select Retrieve
schema.
In this scenario, this HDFS connection has been named to cdh_hdfs.
A [Schema] wizard is displayed, allowing you to browse to files in HDFS.
4. Expand the file tree to show the movies.csv file, from which you need to retrieve the schema, and
select it.
In this scenario, the movies.csv file is stored in the following directory: /user/ychen/input_data.
5.
Click Next to display the retrieved schema in the wizard.
The schema of the movie data is displayed in the wizard and the first row of the data is
automatically used
as the column names.
If the first row of the data you are using is not used this way, you need to review how you set the
Header
configuration when you were creating the HDFS connection as explained in Setting up connection
to HDFS.
6. Click Finish to validate these changes.
You can now see the file metadata under the HDFS connection you are using in the Repository tree
view.

HDFS: Overview and Command Guide
No ratings yet
HDFS: Overview and Command Guide
19 pages
1 Hdfs Notes
No ratings yet
1 Hdfs Notes
38 pages
BigData Unit2
No ratings yet
BigData Unit2
80 pages
Hadoop HDFS Setup and Commands Guide
No ratings yet
Hadoop HDFS Setup and Commands Guide
35 pages
3a HDFS
No ratings yet
3a HDFS
17 pages
Unit 3
No ratings yet
Unit 3
61 pages
Lecture Notes On Big-Data Storage
No ratings yet
Lecture Notes On Big-Data Storage
9 pages
HDFS Basics for Tech Professionals
No ratings yet
HDFS Basics for Tech Professionals
13 pages
6 - HDFS
No ratings yet
6 - HDFS
37 pages
Big Data Class Activity Assignment 2
No ratings yet
Big Data Class Activity Assignment 2
17 pages
HDFS Commands Updated
No ratings yet
HDFS Commands Updated
87 pages
3 Hadoop
No ratings yet
3 Hadoop
40 pages
Understanding HDFS Architecture and Commands
No ratings yet
Understanding HDFS Architecture and Commands
8 pages
HDFS Commnads
No ratings yet
HDFS Commnads
26 pages
Hadoop MapReduce Overview and Commands
100% (1)
Hadoop MapReduce Overview and Commands
129 pages
HDFS Overview and Command-Line Guide
No ratings yet
HDFS Overview and Command-Line Guide
29 pages
Chapter 4 - Hadoop Ecosystem
No ratings yet
Chapter 4 - Hadoop Ecosystem
24 pages
Hadoop on EMC Isilon Configuration Guide
No ratings yet
Hadoop on EMC Isilon Configuration Guide
5 pages
HDFS Overview and Command Guide
No ratings yet
HDFS Overview and Command Guide
5 pages
Hadoop & HDFS for Data Engineers
No ratings yet
Hadoop & HDFS for Data Engineers
6 pages
HDFS Operations and Java API Guide
No ratings yet
HDFS Operations and Java API Guide
6 pages
Hadoop
No ratings yet
Hadoop
71 pages
Yahoo Hadoop Tutorial
No ratings yet
Yahoo Hadoop Tutorial
28 pages
HDFS Overview and Command Guide
No ratings yet
HDFS Overview and Command Guide
25 pages
Unit-2 Introduction To Hadoop
No ratings yet
Unit-2 Introduction To Hadoop
19 pages
Big Data and HDFS Overview Guide
No ratings yet
Big Data and HDFS Overview Guide
14 pages
CS441 Final Term Preparation Answers
100% (1)
CS441 Final Term Preparation Answers
4 pages
HDFS Guide for Developers
No ratings yet
HDFS Guide for Developers
49 pages
HDFS Concepts and Command Line Guide
No ratings yet
HDFS Concepts and Command Line Guide
42 pages
Hadoop
No ratings yet
Hadoop
51 pages
Final Bda 1-8 Lab Aayush
No ratings yet
Final Bda 1-8 Lab Aayush
17 pages
Hadoop Java Training Guide
No ratings yet
Hadoop Java Training Guide
26 pages
Introduction to Hadoop Ecosystem
No ratings yet
Introduction to Hadoop Ecosystem
13 pages
Hadoop Training with CentOS VM Setup
No ratings yet
Hadoop Training with CentOS VM Setup
26 pages
kh5 (Bda) Merged
No ratings yet
kh5 (Bda) Merged
21 pages
Hadoop & Hive Essentials
No ratings yet
Hadoop & Hive Essentials
30 pages
Wa Introhdfs PDF
No ratings yet
Wa Introhdfs PDF
11 pages
Hadoop FS Shell Command Overview
No ratings yet
Hadoop FS Shell Command Overview
8 pages
BDA Exp 1
No ratings yet
BDA Exp 1
7 pages
HDFS
No ratings yet
HDFS
18 pages
Hadoop Linux Commands
No ratings yet
Hadoop Linux Commands
8 pages
Understanding Hadoop HDFS Architecture
No ratings yet
Understanding Hadoop HDFS Architecture
20 pages
Introduction To HDFS
No ratings yet
Introduction To HDFS
18 pages
Overview of HDFS Architecture
No ratings yet
Overview of HDFS Architecture
20 pages
BDA UNIT-2dhhhhbv
No ratings yet
BDA UNIT-2dhhhhbv
23 pages
HDFS fsck Command Overview
No ratings yet
HDFS fsck Command Overview
7 pages
Hadoop and IBM Big Insights Overview
No ratings yet
Hadoop and IBM Big Insights Overview
112 pages
Big Data Unit - 2
No ratings yet
Big Data Unit - 2
18 pages
@bigdatalabfile 09
No ratings yet
@bigdatalabfile 09
35 pages
Understanding Hadoop Cluster and HDFS
No ratings yet
Understanding Hadoop Cluster and HDFS
64 pages
Apache Hive Installation and Basic Usage Guide
No ratings yet
Apache Hive Installation and Basic Usage Guide
10 pages
HDFS Queries in Hadoop Ecosystem
No ratings yet
HDFS Queries in Hadoop Ecosystem
5 pages
Introduction to Hadoop Basics
No ratings yet
Introduction to Hadoop Basics
26 pages
Hafs Commands
No ratings yet
Hafs Commands
17 pages
Big Data Hadoop HDFS
No ratings yet
Big Data Hadoop HDFS
32 pages
HDFS Shell Commands On AWS
No ratings yet
HDFS Shell Commands On AWS
12 pages
L2 Accessing HDFS On Cloudera Distribution
No ratings yet
L2 Accessing HDFS On Cloudera Distribution
5 pages
Customer and Supplier Data Mapping
No ratings yet
Customer and Supplier Data Mapping
37 pages
Linuxfun PDF
No ratings yet
Linuxfun PDF
365 pages
Chapter 4: Creating Simple Queries: 4.1 Filtering and Sorting Data
No ratings yet
Chapter 4: Creating Simple Queries: 4.1 Filtering and Sorting Data
76 pages
EAWB Product Sheet
No ratings yet
EAWB Product Sheet
2 pages
Big Data Crash Course: 3-Day Overview
No ratings yet
Big Data Crash Course: 3-Day Overview
3 pages
Couchbase Installation Guide for Windows & Linux
No ratings yet
Couchbase Installation Guide for Windows & Linux
1 page
Gasue34 003 PDF
No ratings yet
Gasue34 003 PDF
270 pages
Gasue34 003 PDF
No ratings yet
Gasue34 003 PDF
270 pages
Data Warehousing and OLAP Basics
No ratings yet
Data Warehousing and OLAP Basics
50 pages
Text Clustering Case Study
No ratings yet
Text Clustering Case Study
1 page
UNIX Shell Script Standards For PowerMart 622
No ratings yet
UNIX Shell Script Standards For PowerMart 622
3 pages
MicroStrategy Education Catalog PDF
100% (1)
MicroStrategy Education Catalog PDF
72 pages
Apache Spark: Unified Framework for Data Processing
No ratings yet
Apache Spark: Unified Framework for Data Processing
6 pages
Placement-Assured IT Training Programs
No ratings yet
Placement-Assured IT Training Programs
3 pages
Anna University B.E./B.Tech. Programs
No ratings yet
Anna University B.E./B.Tech. Programs
2 pages
3.2 Informatica - SCD
No ratings yet
3.2 Informatica - SCD
3 pages
Talend Tutorial12 Writing and Reading Data in HDFS
No ratings yet
Talend Tutorial12 Writing and Reading Data in HDFS
4 pages
Talend Tutorial8 Using Condition Based Filters
No ratings yet
Talend Tutorial8 Using Condition Based Filters
2 pages
Talend Data Integration Tutorial
No ratings yet
Talend Data Integration Tutorial
2 pages
Talend Tutorial7 Configuring Joins in TMap
No ratings yet
Talend Tutorial7 Configuring Joins in TMap
2 pages
Talend Metadata Creation and Usage Guide
No ratings yet
Talend Metadata Creation and Usage Guide
4 pages
Talend Tutorial9 Using Context Variables
No ratings yet
Talend Tutorial9 Using Context Variables
2 pages
Talend tMap Data Filtering Guide
No ratings yet
Talend tMap Data Filtering Guide
2 pages
Big Data Integration with Talend Studio
No ratings yet
Big Data Integration with Talend Studio
8 pages
Talend CSV File Reading Tutorial
No ratings yet
Talend CSV File Reading Tutorial
2 pages
Talend Studio Data Integration Guide
No ratings yet
Talend Studio Data Integration Guide
2 pages
Smart Postpaid Billing Statement
No ratings yet
Smart Postpaid Billing Statement
4 pages
UGC NET Paper I Solved Questions 2005
No ratings yet
UGC NET Paper I Solved Questions 2005
12 pages
Assignment 4
No ratings yet
Assignment 4
6 pages
Concept Study Extension of IO-Link For Single Pair Ethernet Transmission
No ratings yet
Concept Study Extension of IO-Link For Single Pair Ethernet Transmission
20 pages
S32G2PFEPB
No ratings yet
S32G2PFEPB
12 pages
AJP Sample Oral Questions
No ratings yet
AJP Sample Oral Questions
3 pages
4G LTE FDD-TDD Carrier Aggregation Guide
100% (2)
4G LTE FDD-TDD Carrier Aggregation Guide
21 pages
Automotive Diagnostic Tool Buyers
No ratings yet
Automotive Diagnostic Tool Buyers
3 pages
CCBoot Manual - System Requirements
No ratings yet
CCBoot Manual - System Requirements
36 pages
TOEIC Writing A Sentence Based On Picture
100% (1)
TOEIC Writing A Sentence Based On Picture
21 pages
Troubleshooting Oracle Clusterware Guide
No ratings yet
Troubleshooting Oracle Clusterware Guide
9 pages
SoMachine Basic - Operating Guide
No ratings yet
SoMachine Basic - Operating Guide
206 pages
UBC1326AA00 Wireless eMTA Installation Guide
No ratings yet
UBC1326AA00 Wireless eMTA Installation Guide
4 pages
Assignment CS206
No ratings yet
Assignment CS206
2 pages
XC220 G3v
No ratings yet
XC220 G3v
6 pages
Apc320 R1.30 en Ori
No ratings yet
Apc320 R1.30 en Ori
9 pages
3500 Galvanic Isolator Interface: Manual
100% (1)
3500 Galvanic Isolator Interface: Manual
36 pages
J Link
100% (1)
J Link
68 pages
Hardware Installation Guide (N610E) - (V100R006C00 - 02)
No ratings yet
Hardware Installation Guide (N610E) - (V100R006C00 - 02)
42 pages
FLEX I/O EtherNet/IP Adapter Guide
No ratings yet
FLEX I/O EtherNet/IP Adapter Guide
12 pages
Commander SK Catalog PDF
No ratings yet
Commander SK Catalog PDF
12 pages
Nessus 7.1.x User Guide: Last Updated: March 26, 2019
No ratings yet
Nessus 7.1.x User Guide: Last Updated: March 26, 2019
387 pages
Wireless Extender Manual
No ratings yet
Wireless Extender Manual
2 pages
Protect Yourself from Fake Profiles
No ratings yet
Protect Yourself from Fake Profiles
3 pages
ESET Smart Security 9
No ratings yet
ESET Smart Security 9
6 pages
AX1500 Specification
No ratings yet
AX1500 Specification
4 pages
TL-SG108E (UN) V3 IG 1478166819649d
No ratings yet
TL-SG108E (UN) V3 IG 1478166819649d
2 pages
Saphira 5.3 Manual PDF
No ratings yet
Saphira 5.3 Manual PDF
105 pages
Optus Push to Talk: Instant Mobile Communication
No ratings yet
Optus Push to Talk: Instant Mobile Communication
2 pages
ECE512 1stExamTopics1
No ratings yet
ECE512 1stExamTopics1
104 pages

Talend Preparing Metadata For HDFS Connection

Uploaded by

Talend Preparing Metadata For HDFS Connection

Uploaded by

4.7.

Preparing file metadata

You might also like