Professional Documents
Culture Documents
Quicklook Sqoop
Quicklook Sqoop
What is Sqoop
Sqoop is a tool designed to transfer data between Hadoop and relational databases
Sqoop can be used to import data from a relational database management system (RDBMS)
such as MySQL or Oracle into the Hadoop Distributed File System (HDFS), transform the data in
Hadoop MapReduce, and then export the data back into an RDBMS.
Sqoop automates most of this process, relying on the database to describe the schema
for the data to be imported. Sqoop uses MapReduce to import and export the data,
which provides parallel operation as well as fault tolerance.
Sqoop Tools
Sqoop is a collection of related tools. To use Sqoop, you specify the tool you want to use and
the arguments that control the tool.
sqoop-import
The import tool imports an individual table from an RDBMS to HDFS. Each row from a table is
represented as a separate record in HDFS. Records can be stored as text files (one record per
line), or in binary representation as Avro or SequenceFiles.
Syntax:
$ sqoop import (generic-args) (import-args)
$ sqoop-import (generic-args) (import-args)
Argument Description
--connect <jdbc-uri> --connection- Specify JDBC connect string
manager <class-name> --driver <class- Specify connection manager class to use
Manually specify JDBC driver class to use
name> --hadoop-home <dir>
Override $HADOOP_HOME
Print usage instructions Read
--help -P
password from console Set
--password <password> -- authentication password Set
username <username> -- authentication username
verbose
Print more information while working
--connection-param-file <filename>
Optional properties file that provides
connection parameters
sqoop-import
sqoop-import-all-tables
The import-all-tables tool imports a set of tables from an RDBMS to HDFS. Data from each table
is stored in a separate directory in HDFS. For the import-all-tables tool to be useful, the
following conditions must be met:
• You must not intend to impose any conditions via a WHERE clause. Syntax: $ sqoop import-all-
tables (generic-args) (import-args) $ sqoop-import-all-tables (generic-args) (import-args)
Example invocations:
sqoop-export
The export tool exports a set of files from HDFS back to an RDBMS. The target table must
already exist in the database. The input files are read and parsed into a set of records according
to the user-specified delimiters..
The default operation is to transform these into a set of INSERT statements that inject the
records into the database.
Syntax:
Example invocations:
sqoop-create-hive-table
The create-hive-table tool populates a Hive metastore with a definition for a table based on a
database table previously imported to HDFS, or one planned to be imported. This effectively
performs the "--hive-import" step of sqoop-import without running the preceding import.
If data was already loaded to HDFS, you can use this tool to finish the pipeline of importing the
data to Hive. You can also create Hive tables with this tool; data then can be imported and
populated into the target after a pre-processing step run by the user.
Syntax:
Example invocations:
The eval tool allows users to quickly run simple SQL queries against a database; results are
printed to the console. This allows users to preview their import queries to ensure they import
the data they expect. Syntax: $ sqoop eval (generic-args) (eval-args) $ sqoop-eval (generic-args)
(eval-args)
Example Invocations:
sqoop-list-databases
Syntax:
Example Invocations:
sqoop-list-tables
Syntax:
Example Invocations:
employees
payroll_checks
job_descriptions
office_supplies
For the import first you create the table in mysql, table name as employee and insert some data’s in to
that.
incremental append:
For the export you want the file with data in hdfs and you need the empty table name as employee in
mysql (the table rows = hdfs file rows separated with commas)
sqoop eval --connect jdbc:mysql://localhost/test \ --query "SELECT * FROM employee LIMIT 2"
Parallelism:
sqoop import --connect jdbc:mysql://localhost/test --table employee2
--num-mappers 4