You are on page 1of 11

Big Data Huawei Course

Loader
AVISO
Este documento foi gerado a partir de um material de estudo
da Huawei. Considere as informações nesse documento
como material de apoio.

Centro de Inovação EDGE - Big Data Course


Table of Contents

1. What is Loader ........................................................................................................................ 1

2. Application Scenarios of Loader....................................................................................... 1

3. Position of Loader in FusionInsight................................................................................. 2

4. Features of Loader ................................................................................................................. 2

5. Module Architecture of Loader......................................................................................... 2

5.1. Module Description ............................................................................................... 3

6. Service Status WebUI of Loader ....................................................................................... 3

7. Job Management WebUI of Loader................................................................................ 3

7.1. Job Conversion Rules ............................................................................................ 4

8. Creating a Loader Job Basic Information...................................................................... 4

8.1. Creating a Loader Job – From ............................................................................ 5

8.2. Creating a Loader Job – Transform .................................................................. 5


8.3. Creating a Loader Job – To.................................................................................. 6

9. Monitoring Job Execution Status ..................................................................................... 6

9.1. Job Execution History........................................................................................... 6

9.2. Dirty Data................................................................................................................... 6

9.3. Job Execution Failure Alarm................................................................................ 7

Centro de Inovação EDGE - Big Data Course


Loader – Huawei Course
1. What is Loader

� Loader is a data loading tool. Function enhancement have been made for Loader based
on the open source Sqoop.
� Loader is used to exchange data and files between FusionInsight Hadoop, relational
database and file systems.

2. Application Scenarios of Loader

Centro de Inovação EDGE - Big Data Course 1


� Loader can import data from relational database or file servers, to the HBase/HDFS of
Fusioninsight. Or export data from the HBase/HDFS to relational database or file
servers.
� Note that Loader does not support data export from Hive currently

3. Position of Loader in FusionInsight

� Here shows the position of Loader in FusionInsight, which is used to exchange files be-
tween Fusioninsight Hadoop, relational database and file systems.

4. Features of Loader

� Loader manages jobs on the WebUI and also provides CLIs (command line interface) to
meet customer requirements for program scheduling and script automation.
� Loader uses MapReduce for parallel data processing. Parameters affect MapReduce
splits. Therefore, proper parameter configuration is required to ensure optimal data im-
port performance.
� Besides, Loader servers are deployed in active/standby mode which ensures high relia-
bility
� Secure Loader versions are configured in a unified manner by Fusioninsight Manager

5. Module Architecture of Loader

� Loader consists of the following components:

Centro de Inovação EDGE - Big Data Course 2


o Loader client provides two types of interactive interface: A Web user interface and
a command line interface
o Loader server processes operation requests sent from the client, managerʼs con-
nectors and Metadata, submits MapReduce jobs and monitors MapReduce jobs
status.
o REST API provide a representational state transfer interface which involves HTTP
and JSON to process the operation requests from the client.
o Job Scheduler periodically executes Loader jobs and there are three types of engi-
nes:
� Transfer Engine is a data transformation engine that supports field combi-
nation, stream guarding and stream reverse.
� Execution engine is to execute Loader jobs in MapReduce manner.
� Submission Engine is to submit Loader jobs to MapReduce
o Job manager manage Loader jobs including creating, querying, updating, deleting
activating, deactivating, starting and stopping jobs.
o Metadata Repository is to store and manage data about Loader connectors, trans-
formation procedures and jobs.
o HA Manager manages the active/standby status of Loader servers. Two Loader
servers are deployed in active/standby mode.
� Loader implements paralleled import or export jobs using MapReduce. Some import or
export jobs may involve only the Map operations and some jobs may involve both Map
and Reduce operations.
� Loader implements fault tolerance using MapReduce.
� Jobs can be rescheduled when job execution fails.

5.1. Module Description

� This page shows the description of each module of Loader in detail

6. Service Status WebUI of Loader

� This is a web page of FusionInsight Manager where we see the health status of Loader
components. We could also make operation management such as starting or stopping
services and downloading clients.
� Click the LoaderServer (active) here to go to the Loader job management page.

7. Job Management WebUI of Loader

Centro de Inovação EDGE - Big Data Course 3


� A job is to describe the process of extracting, transforming, and loading data from the
data source to the target end.
� Loader provide many functions to manage job-related operations including creating/
importing/exporting/starting/stopping/copying/deleting jobs, migrating job groups,
deleting jobs in batches, and viewing job history.

7.1. Job Conversion Rules

� Loader provides various job conversion rules for data cleaning and conversion into the
target data structure in different service scenarios. If conversion is not required in actual
application, no conversion rules need to be specified, except for this conversion opera-
tor mentioned here.
� Loader also provides the following operators:
o EL Operation: Specifies an algorithm to calculate field values
o String Operation: Converts the upper and lower cases of existing fields to gener-
ate new fields.
� String Reverse: reverses existing string fields to generate new fields.
� String trim: clears space contained in existing string fields to generate new
fields.
o Filter Rows: filters rows that contain triggering conditions by configuring logic
conditions
o Update Fields: updates fields values when certain conditions are met.

8. Creating a Loader Job Basic Information

� Then, how to create a Loader job? Here describes an example of loading a SFTP file into
HDFS.
� To create new job, the first step is to configure the basic information including name,
type, connection, group, queue, and priority.
� Name identifies the new job and needs to be unique.
� Type specifies the type of a job to import or export data.
� Connection provides data source connection information for the new job. If no required
connection is available click ADD here to create a connection.
� Group indicates a job group and Queue indicates the Yarn queue to which the new job
belong.
� Priority indicates the priority level of the job in the Yarn queue.

Centro de Inovação EDGE - Big Data Course 4


8.1. Creating a Loader Job – From

� Next is to configure the input information including input path, File split type, Filter
type, Path filter, File filter, Encode type, suffix name and compression.
� Input path can be a directory or the name of the source file.
� File split type can be set to FILE or SIZE. When the parameter is set to FILE, a file is not
split and can only be processed by a Map job. The file name and content remain un-
changed during data read. When the parameter is set to SIZE, the file structure is
changed. A file is split into multiple segments which are read by different Map Jobs.
� Usually FILE is recommended when the original file name needs to be reserved. Size is
recommended when the original file name does not need to be reserved or when a
super large file is to be processed.
� Filter type indicates the file filtering criterion.
� Path filter is used with Filter type to specify the expression for filtering the directories in
the input path of the source files. If there are multiple filter conditions use commas to
separate them. If the value is empty, the directory is not filtered.
� File filter is also used with Filter type to specify the expression for filtering the file name
of the source files. If there are multiple file conditions, use commas to separate them.
The value cannot be empty.
� Encode type indicates the encoding format of a source file.
� Suffix name indicates the suffix added to a source file after the source file is imported.
� Compression indicates whether to compress data to reduce I/O resource consumption
when an SFTP server is used for data transmission.

8.2. Creating a Loader Job – Transform

� Loader operators have three types: Input operator, output operator and Transform op-
erator.

• Input Operator is used in the first step of data conversion. This type of operator
converts data into fields. Only one input operator can be used in each conversion.
• Conversion operator is used in the intermediate conversion step of data conver-
sion. This type of operator is optional. The conversion operators can be used to-
gether in any combination. Conversion operators can process only fields. There-
fore, an input operator must be used first to convert data into fields

Centro de Inovação EDGE - Big Data Course 5


• Output operator is used in the last step of data conversion. Only one output oper-
ator can be used in each conversion for exporting processed fields.

8.3. Creating a Loader Job – To

� The last step is to configure output information including Storage type, File type, com-
pression format, output directory, file operate type and number.
� Storage type indicates the target data storage type which can be HDFS, HBASE and
HIVE.
� File type indicates the file type in which data is imported. Available values are TEXT_FILE,
BINARY_FILE (binary read byte), and SEQUENCE_FILE(sequence file).
� Compression format is to specify the data compression format after the data is im-
ported to HDFS.
� Output directory indicates the target directory. If a file with the same name is already
existed in the target directory, Loader provides 5 operate types:
o OVERRIDE means to override the oldfile.
o RENAME is to rename the new file and imports to the target directory.
o APPEND means to add new file content to the old file.
o IGNORE means to ignore the new file and keeps the old file.
o ERROR will generates an error report when a file has the same name as that in the
target directory.
� Number indicates the number of Map jobs.

9. Monitoring Job Execution Status

� Then go to the Loader job management page. This page displays all current jobs and
last execution status.

9.1. Job Execution History

� Select a job and then click History button in the operation column so that we can fill
execution records of specified jobs.

9.2. Dirty Data

� When using Loader to transform data, if data does not meet the Loader conversion
rules, we call it Dirty Data. Users can check on the job history page. Dirty data is stored
in HDFS.

Centro de Inovação EDGE - Big Data Course 6


� On the job history page, click the log button, we can see the MapReduce log page for
the execution.

9.3. Job Execution Failure Alarm

� When a Loader job fails an alarm will be reported.

Centro de Inovação EDGE - Big Data Course 7


Centro de Inovação EDGE - Big Data Course 8

You might also like