You are on page 1of 20

APACHE SQOOP

Deepak Sharma
September-09-2015

Agenda of the Day

Introduction of Apache Sqoop.


Why Sqoop?
How does it work?
Sqoop Installation and Configuration
Sqoop Export/Import Work flow
Sqoop Examples
Sqoop Limitation
Cisco Architecture (Sqoop) High level

Copyright 2015 University of Big Data

Introduction of Apache Sqoop


A command line Interface (Also available through web in
Sqoop 2)
Apache Sqoop is a tool designed for efficient transferring bulk
data between Hadoop and Structured data stores such as
Relational databases.
Also Export/import sub-set of data/table or Tables from HDFS
to RDBMS and Vice versa.
Uses Map Jobs from Hadoop Map Reduce.
It supports plugin, so new external system can be integrated.
Written in Java.
Generates Java Classes to allow you to interact with your
import data.
Sqoop can integrate with Oozie for scheduling and automating
export and import.

Introduction of Apache Sqoop

Why Sqoop?
A way needed for transferring proceed data from HDFS to
RDBMS
Need Parallelism ( #Mapper process) for loading data into
RDBMS
For application which require to move data from RDBMS
to Hadoop
Using Scripts for Transferring data is inefficient and time
consuming

How does it work?


Sqoop runs in Hadoop Cluster
Sqoop uses Mapper to slice incoming data
Data is sliced up into different partitions using a map
only with individual mappers responsible for
transferring slice of the data
Each record of data is handled in a type safe manner
since Sqoop uses the database metadata to the infer
the data types.
Can import into
Hive (use hive-import flag)
Hbase (use hbase * flags)

Sqoop Installation
Prerequisite: Machine must have installed and
Configured Hadoop server or client (one of Slave
node).
Download Sqoop from sqoop-1.4.4.bin Or latest
(based on Hadoop Version) from the mirror website
http://sqoop.apache.org/
Untar the download file
tar xvzf sqoop-1.4.4.bin_hadoop-1.0.0.tar.gz

Copy the extract folder in /usr/local/sqoop location


Sudo cp r sqoop-1.4.4.bin_hadoop-1.0.0 /usr/local/sqoop

Sqoop Configuration

Set the path in Linux bash enviornment


Sudo vi $HOME/.bashrc

Assign Permission to linux user(Hadoop)

Sudo chown hadoop /usr/local/sqoop


Sudo chown hadoop /usr/local/sqoop/lib
Sudo chown hadoop /usr/local/sqoop/bin
Sudo chown hadoop /usr/local/sqoop/conf

Connectors for Sqoop


There are a range of connectors available to connect
Sqoop to traditional RDBMS
TeraData :
http://www.cloudera.com/content/cloudera/en/downloads/connectors
/sqoop/teradata/1-4c5-powered-by-teradata.html
Mysql : http://dev.mysql.com/downloads/connector/j/
Oracle : http://www.oracle.com/technetwork/database/enterpriseedition/jdbc-112010-090769.html
Ms SQL Server :http://www.microsoft.com/enus/download/details.aspx?displaylang=en&id=11774
PostgreSQL : http://jdbc.postgresql.org/download.html
Copy the download jar into $SQOOP_HOME/lib directory
Copyright 2015 University of Big Data

Sqoop Commands

Sqoop Import Work Flow

Sqoop Import -Example


Sqoop import from RDBMS to HDFS

Import : Import a table from RDBMS to HDFS


Connect: JDBC connection string
Table: Name of the table
Username: Oracle username
Field Terminated : By tab Key
Num-mappers: #Map process (Parallel).

Copyright 2015 University of Big Data

12

Sqoop Export Work Flow

Copyright 2015 University of Big Data

13

Sqoop Export -Example


Sqoop import from RDBMS to HDFS

Eexport: Export an HDSF file to a database


Connect: JDBC connection string
Table: Name of the table
Username: Oracle username
Export-dir: Directory from where data need to exorted
Verbose: Show in detial

Copyright 2015 University of Big Data

14

Sqoop Limitations
Sqoop has some limitations , including:
Error prone Syntax (cryptic , contextual command line
argument)
Client-only Architecture.
Tight coupling to JDBC model not a good fit for nonRDBMS systems
Poor support for security.
$sqoop import username scott password tiger..
Sqoop can read command line options from an option file, but this still
has hole

Copyright 2015 University of Big Data

15

Fortunately..
Sqoop2(Incubating) will address many of these
limitations
Add a web-based GUI.
Centralized configuration (Client-Server
Architecture)
More flexible model.
Improved security model

Copyright 2015 University of Big Data

16

Sqoop 2 Architecture(Proposed)

Copyright 2015 University of Big Data

17

Cisco Example on Sqoop import/export

Copyright 2015 University of Big Data

18

Standard Data Workflow in Hadoop Env.

Copyright 2015 University of Big Data

19

QUESTIONS?

Copyright 2015 University of Big Data

20

You might also like