You are on page 1of 18

Apache Sqoop

Hanoi – Autumn 2019


Apache Sqoop

• Sqoop exchanges data between an RDBMS and Hadoop.

• It can import all tables, a single table, or a portion of a table into HDFS.

• Does this very efficiently via a Map-only MapReduce job

• Result is a directory in HDFS containing comma-delimited text files (default)

• Sqoop can export data from HDFS back to RDBMS


Apache Sqoop Architect
Importing Tables with Sqoop

This will imports the customers table from a MySQL database

• Will create /mydata/customers directory in HDFS

• Directory will contain comma-delimited text files.


Importing An Entire Database with Sqoop

Import all tables from database (fields will be tab-delimited)


Importing Partial Tables with Sqoop

Import only specified columns from a table.


Importing Partial Tables with Sqoop

Import only matching rows


Importing With Free-Form Query

Sometimes what we want to import is harder to get than directly extracting


columns from one table. Therefore, Sqoop provides Free-Form Query for us to
create more complex import queries
● Must specify a destination directory with target-dir
Importing With Free-Form Query

● The query must include the token $CONDITIONS


○ Sqoop can use $CONDITIONS to get a table’s metadata by setting the
token to ‘1 = 0’
● Must select a splitting column with split-by
○ Helps Mappers parallelize the query
Importing With Free-Form Query

● We can tell the query to be executed once and imported serially by specifying
a single map task with -m 1
$ sqoop import \
--query ‘SELECT a.*, b.* FROM a JOIN b ON (a.id == b.id) WHERE $CONDITIONS’\
-m 1
-- target-dir /user/foo/joinresults

● $CONDITIONS can be listed along with other constraints


$ sqoop import \
--options-file world-options.txt -m 1 \
--target-dir /user/hdfs/sqoop-my-sql-import/canada-city
--query “SELECT ID, Name FROM City \
WHERE CountryCode=’CAN’ AND \$CONDITIONS”
Importing With Free-Form Query

$ sqoop import \
--connect jdbc:mysql://mysql.example.com/sqoop \
--username sqoop --password sqoop \
--query ‘SELECT normcities.id, countries.country, normcities.city FROM
normcities \
JOIN countries USING(country_id) WHERE $CONDITIONS’ \
--split-by id
--target-dir cities
Importing With Free-Form Query
#
# Options file for Sqoop import
#

# Specifies the tool being invoked


import

# Connect parameter and value


--connect
jdbc:mysql://localhost/db

# Username parameter and value


--username
foo

#
# Remaining options should be specified in the command line.
#
Arguments
Import control arguments

Output formatting arguments


Supported Database

Supports almost every RDBMS

Supports --direct option for MySql and PostgreSQL to increase importing speed
Incremental Imports with Sqoop

Two incremental import modes: append and lastmodified


● Sqoop’s incremental append mode imports only new records based on value
of last record in specified column
Handling Modifications with Incremental Imports

• What if existing records are also modified in the databases?

• Incremental append mode doesn’t handle this.

• Sqoop’s lastmodified incremental import mode adds and updates records


Exporting Data from Hadoop to RDBMS
Sqoop User Guide

https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html

You might also like