Professional Documents
Culture Documents
Sqoop Practice
Class will start with refreshing the previous class with QA…. (30)
Today’s topics:
To remember
For sqoop command every time we have to start with typing sqoop followed by <space>
For line breaking use <space>\
mysql commands ends with ;
Sqoop commands will ends with no symbol or sign
In this lesson all commands are written with font type “consolas” in red for easy understanding
All comments starts with # (which is not a part of any command)
import command is used to import data from RDBMS to HDFS (by default)
export command is used to export data from HDFS to RDBMS (by default)
3306 is the default port for localhost. Can be replaced with ip address
If otherwise not mentioned - the default source of data is mysql
If otherwise not mentioned - the default target directory is hdfs
Subcommand will start with <-->
main tool will start with <space>
normal command will start with <->
exit; is the command to get out of mysql
sqoop help command will show all available active sqoop commands.
Sudo jps can be used to check all required daemon are running or not.
Suggested to type command with all small letters and not gap, if required use _ (underscore)
Start MySQL
1
create database myfirsttutorial; # to create a database “myfirsttutorial”
(101, ‘a.ravi’, ‘ravi’, ‘102 ny street’, 11102, ‘ny’, ‘usa’), # to insert data in table directly
(102, ‘b.avi’, ‘avi’, ‘103 nj street’, 11103, ‘nj’, ‘usa’), # all varchar within quotes
(103, ‘c.javi’, ‘javi’, ‘104 ct street’, 11104, ‘ct’, ‘usa’);
# to view list of file from the terminal…type the below command, and the list will be shown
hadoop fs –ls / # all list will be shown under user/cloudera
#There will be a mapper file part-m-00000, if 2 mapper was configured there would be 2 file. Another one being
m-00001
hadoop fs –cat /user/cloudera/customer/part-m-00000
# it will show data as a csv text file with all information of “Customers” table
Before exporting data we must create an empty table , where data will be hosted..(let this table is named as consumer)
Export command
Switch to sqoop terminal
3
sqoop export \
--connect jdbc:mysql://localhost:3306/myfirsttutorial \
--username root \
--password cloudera \
--table consumer \
# No mapper
select * from myfirsttutorial.consumer # To check
PuTTY
Please refer to attachment Lesson-17a
LINKS
4
LESSON-17a
PuTTY
What is PuTTY
PuTTY is free and open source terminal emulator or software to provide user interface to connect linux machine. Say a
database is running on linux, once the user connects to putty he can perform any operation (like view files, move, copy,
delete etc.) in the server or database running on linux.
Installation
Click “putty.exe”
Put “host name/ip address” (This is the IP address of your machine, where you want to work with. This IP address to be
collected from the cluster - host)
(If we have any issue to get correct ip address (which should be like 192.168.56.101) of cloudera vm, we have to reset
network as below
Select “network”
Select “adaptor-2”
# to view list of file under myfirstdata from the terminal…type the below command, and the list will be shown
hadoop fs –ls /user/cloudera/customer
#There will be a mapper file part-m-00000, if 2 mapper was configured there would be 2 file. Another one being
m-00001
hadoop fs –cat /user/cloudera/customer/part-m-00000
# it will show data as a csv text file with all information of “Customers” table
Settings
Click “putty icon” on the top left of command line to manage setting
Click “appearance”
Click “change”
Click “apply”
1. Choose os
2. Instance type
3. Conf instance
4. Security
5. ……
6
6. …..
7. Review
8. Launch……………….while clicking “launch instance” a dialogue box for creating “key pair “ opens…
Once the instances are created, we shall name it and will get private and public DNS number for all the
instances.
From downloaded .pem file we have to create .ppk file
To do the above, we need a program called putty gen (to be download from web)
Browse “putty gen”, and go to “download page”
Search for “puttygen.exe” file and click to download
Once the download is complete- open puttygen from the computer serch…
Click “conversion” on the top of puttygen dialogue box.
Select “import key” from the dropdown list
Again a box will open with a list of files, from where we have to select “.pem” file
This will open box with tabs “save public key” and “save private key” at below right
Hit the one we need………..(public to access from anywhere thru laptop, private for cloudera for cluster
management)
Click yes to save………..
Give .ppk name for saving the file
And click save
Now in the putty “configuration box” Earlier downloaded from “putty.exe” file,
Repeat the above process for the other EC2 instances with the appropriate naming like “Slave-1”, “Slave-2” and so on as
per the cluster design.
Finally click “open” below to connect with instances (require user name password as usual)
7
https://www.mybloggingthing.com/ssh-putty-commands/
Putty installation
https://www.youtube.com/watch?v=69MEd9O6W0U