Apache Sqoop

What is it ? How does it work ? Interfaces Example Architecture

Scoop What is it ?

A command line interface

( plus web in scoop2 )

For data import / export to Hadoop Uses Map jobs from Map Reduce Supports incremental loads Written in Java Licensed by Apache Uses plugins for new types of data source

Scoop How does it work ?

Data sliced into partitions Mappers transfer data Data types determined via meta data Many data transfer formats supported

i.e. CSV, Avro Hive Hbase ( use --hive-import flag ) ( use hbase* flags )

Can import into

Scoop Interfaces

Get data from

Relational databases Data warehouses NoSQL databases

Load to Hive and Hbase Integrates with Oozie

for scheduling

Scoop Example
An example scoop command to

load data from mySql into Hive

bin/sqoop-import --connect jdbc:mysql://<mysql host>:<msql port>/db3 \ -username <username> \ -password <password> \ --table <tableName> \ --hive-table <Hive tableName> \ --create-hive-table \ --hive-import \ --hive-home <hive path>

Scoop Architecture
Scoop has moved from

Scoop1 to Scoop 2 Changed from client to server install Now has web and command line access Server now accesses Hive & Hbase Oozie uses REST API

Scoop Architecture - Scoop1

Scoop Architecture - Scoop2

