You are on page 1of 11

Individual Project

Big Data Technology : Sqoop

Declaration: The Big Data technology that I am presenting is unique in the class: YES

Video link: https://youtu.be/1OL_W81IsWQ


Name: Rishu Verma
Sqoop
Sqoop − “SQL to Hadoop and Hadoop to
SQL”

Sqoop is a data transfer mechanism that


connects Hadoop and relational database
servers. It's used to import data from
relational databases like MySQL and
Oracle into Hadoop HDFS, as well as
export data from Hadoop HDFS to
relational databases. The Apache Software
Foundation provides it.
Why Sqoop?

 As more organizations deploy Hadoop to analyze vast streams of


information, they may find they need to transfer large amount of
data between Hadoop and their existing databases, data
warehouses and other data sources

 Loading bulk data into Hadoop from production systems or


accessing it from map-reduce applications running on a large
cluster is a challenging task since transferring data using scripts is
a inefficient and time-consuming task
Architecture of SQOOP
Features of SQOOP

Full Load: Apache Sqoop can load Incremental Load: Apache Sqoop


the whole table by a also provides the facility
single command. You can also load of incremental load where you can
all the tables from a database using load parts of table whenever it
a single command. is updated. Sqoop import supports
two types of incremental imports
Compression: You
can compress your data
by using deflate (gzip) algorithm
with -compress argument, or
by specifying -compression-
codec argument. You can also
load compressed table
Import results of SQL query: You in Apache Hive. Parallel import/export: Sqoop uses
can also import the result YARN framework to import
returned from an SQL query and export the data, which provides
in HDFS. fault tolerance on top of parallelism.
Programming/Technology Skills
Big Data Frameworks or Hadoop-based Data Mining
technologies Data mining tools like Apache
Learning Hadoop is the first step towards becoming Mahout, Rapid Miner, KNIME, etc.
Add Text
a successful Big Data Developer. Hadoop is not a Are very important
single term, instead, it is a complete ecosystem. The
Simple
Hadoop ecosystem contains a number of tools that PowerPoint
serve different purposes. Presentation Programming language (Java/Python/R).
Knowledge of data structures, algorithms,
and at least one programming language.

Real-time processing framework (Apache


Spark). Spark is a real-time distributed processing
Apache
framework with in-memory computing capabilities.
So Spark is the best choice for big data developers Visualization tools like Tableau,
to be skilled in any of the one real-time processing QlikView, QlikSense.
frameworks. Data visualization tools like QlikView,
Tableau, QlikSense that help in
understanding the analysis performed by
SQL based technologies. various analytics tools
Structure Query Language (SQL) is the
data-centered language used to structure,
manage and process the structured data
stored in databases.
Use Case- 1 Use Case- 2

An enterprise that runs a Parallelizes data transfer


nightly Sqoop import to for fast performance and
load the day’s data from optimal system
a production utilization
transactional RDBMS
into a Hive data
warehouse for further

Use Case of Sqoop


analysis.

Use Case- 3 Use Case- 4

Mitigates excessive Allows data imports from


loads to external external datastores and
systems and makes  enterprise data
data analysis more warehouses into Hadoop
efficient
How SQOOP works?
SQOOP
• Sqoop provides a pluggable connector mechanism
for optimal connectivity to external systems.

• The Sqoop extension API provides a convenient


framework for building new connectors which can be
dropped into Sqoop installations to
provide connectivity to various systems.

• Sqoop itself comes bundled with various connectors


that can be used for popular database and
data warehousing systems
Who Uses Sqoop

Online Marketer Coupons.com uses And countless other


Sqoop to exchange data Hadoop users use Sqoop to
between Hadoop and the IBM efficiently move their data
Netezza data warehouse appliance,
The organization can query
its structures databases and pipe
the results into Hadoop using
Sqoop.

Education company The Healthcare companies uses Sqoop


Apollo group also uses the software to transfer the data from relational
not only to extract data from databases to Hadoop to analyze
databases but to inject the results this data and gain some insight from
from Hadoop jobs back into it
relational databases
Companies using SQOOP
THANK YOU
Insert the Sub Title of Your Presentation

You might also like