You are on page 1of 8

LESSON-17

Sqoop Practice
 Class will start with refreshing the previous class with QA…. (30)

Today’s topics:

1. To remember (commands) .. .. .. …. .. (10) Lesson Plan


2. Creating/inserting data in table under db (30)
3. Importing data from RDBMS (mysql) to HDFS (20)
4. Exporting data from HDFS to RDBMS (20)
5. PuTTY (Attachment Cr-24) (50)
6. QA and break (20)

To remember

 For sqoop command every time we have to start with typing sqoop followed by <space>
 For line breaking use <space>\
 mysql commands ends with ;
 Sqoop commands will ends with no symbol or sign
 In this lesson all commands are written with font type “consolas” in red for easy understanding
 All comments starts with # (which is not a part of any command)
 import command is used to import data from RDBMS to HDFS (by default)
 export command is used to export data from HDFS to RDBMS (by default)
 3306 is the default port for localhost. Can be replaced with ip address
 If otherwise not mentioned - the default source of data is mysql
 If otherwise not mentioned - the default target directory is hdfs
 Subcommand will start with <-->
 main tool will start with <space>
 normal command will start with <->
 exit; is the command to get out of mysql
 sqoop help command will show all available active sqoop commands.
 Sudo jps can be used to check all required daemon are running or not.
 Suggested to type command with all small letters and not gap, if required use _ (underscore)

Creating table ‘Customers’ under database ‘myfirsttutorial’ in mysql.

Start MySQL

mysql –u root –p # connecting to mysql

cloudera # enter as default password

mysql> # prompt will appear

1
create database myfirsttutorial; # to create a database “myfirsttutorial”

show databases; # to view the list of databases

use myfirsttutorial; # to work under database “myfirsttutorial”

select database(); # to view under which db

create table # create table “customer” under above db.

customer(customer_id int(20) primary key not null, # 3 constraints (20)/PK/NN


customer_name varchar(20),
contact_name varchar(20),
address varchar(45),
postal_code int(6),
city varchar(20),
country varchar(20));

describe customer; # to view the schema of table customer

insert into customer values

(101, ‘a.ravi’, ‘ravi’, ‘102 ny street’, 11102, ‘ny’, ‘usa’), # to insert data in table directly

(102, ‘b.avi’, ‘avi’, ‘103 nj street’, 11103, ‘nj’, ‘usa’), # all varchar within quotes
(103, ‘c.javi’, ‘javi’, ‘104 ct street’, 11104, ‘ct’, ‘usa’);

select * from customer; # to view the whole table

Importing data from RDBMS (mysql) to hdfs

Switch on sqoop-2, hdfs and yarn through cloudera manager


Open new terminal
# sqoop import command to import “customer” table under database “myfirsttutorial” and put into default
directory using one mapper.
sqoop import \
--connect jdbc:mysql://localhost:3306/myfirsttutorial \
2
--username root \
--password cloudera \

--table customer \ # by default file will be under user/cloudera directory

--m 1 # select one mapper. (if not by default- m 4)

# if target directory not mentioned, by default it is user/cloudera


# Go to “HDFS” in cloudera manager and click “NameNode Web UI” then “utility” and “file browser” to view the
data under user/cloudera, there will be one file “part-m-00000”, since one mapper was configured.
# click the part and download. Click “places” on top left of cloudera page, click “downloads” the “part-m-00000”
will be viewed.

# to view list of file from the terminal…type the below command, and the list will be shown
hadoop fs –ls / # all list will be shown under user/cloudera

hadoop fs –ls /user/cloudera/customer # all part will be shown under customer

#There will be a mapper file part-m-00000, if 2 mapper was configured there would be 2 file. Another one being
m-00001
hadoop fs –cat /user/cloudera/customer/part-m-00000

# it will show data as a csv text file with all information of “Customers” table

Exporting data to RDBMS from hdfs

Before exporting data we must create an empty table , where data will be hosted..(let this table is named as consumer)

Switch to mysql terminal


create table # create empty table “consumer”

consumer(customer_id int(20) primary key not null,


customer_name varchar(20),
contact_name varchar(20),
address varchar(45),
postal_code int(6),
city varchar(20),
country varchar(20));

Export command
Switch to sqoop terminal

3
sqoop export \
--connect jdbc:mysql://localhost:3306/myfirsttutorial \
--username root \
--password cloudera \
--table consumer \

--export-dir /user/cloudera/customer # from which directory we are going to exp

# No mapper
select * from myfirsttutorial.consumer # To check

PuTTY
Please refer to attachment Lesson-17a

LINKS

Create and import


https://www.youtube.com/watch?v=ebngkH20FyU

use case commands import export and so on


https://www.youtube.com/watch?v=r1NLCComQ9Q

4
LESSON-17a
PuTTY

What is PuTTY

PuTTY is free and open source terminal emulator or software to provide user interface to connect linux machine. Say a
database is running on linux, once the user connects to putty he can perform any operation (like view files, move, copy,
delete etc.) in the server or database running on linux.

To remember for (PuTTY)

 PuTTY starts with typing user name in CLI


 Then typing Password in Ubuntu it is not visible – so type carefully also use numerical key below function keys
 In putty highlighting or selecting is same as copying. No need CTRL+C
 Only right click is same as pasting.
 CTRL+L will clear the command line
 Type exit and hitting enter will allow you to come out of the putty emulator.

Installation

Switch to main machine or host machine

Go to browser – browse to “putty download”

Click “PuTTY Download page”

Click “putty.exe”

Run the .exe file

Open PuTTY dialog box

By default connection type is set to “ssh” and port “22”

Put “host name/ip address” (This is the IP address of your machine, where you want to work with. This IP address to be
collected from the cluster - host)

(If we have any issue to get correct ip address (which should be like 192.168.56.101) of cloudera vm, we have to reset
network as below

Swith off the cloudera vm, by slecting power off


Go and click “setting icon” on top right – at virtual box home page

Select “network”

Select “adaptor-2”

Enable network adopter by clicking the “check box”

Select “host only adopter”


5
Finally click “ok”

“Open”, which will allow to log in to server, by opening terminal

Provide user name as root and password as cloudera

# the password will not be visible

Now we have PuTTY terminal to get in remote server on linux

Using PuTTY (for local host solo machine)

# to view list of file under myfirstdata from the terminal…type the below command, and the list will be shown
hadoop fs –ls /user/cloudera/customer

#There will be a mapper file part-m-00000, if 2 mapper was configured there would be 2 file. Another one being

m-00001
hadoop fs –cat /user/cloudera/customer/part-m-00000

# it will show data as a csv text file with all information of “Customers” table

Settings

Click “putty icon” on the top left of command line to manage setting

Click “Change setting”

Click “appearance”

Click “change”

Set or edit the font size and other criteria as deemed

Click “apply”

This will help us to read or write with more clarity

Setting PuTTY in AWS

After creating account with aws..

During creating instances……in order to set up a cluster

1. Choose os
2. Instance type
3. Conf instance
4. Security
5. ……
6
6. …..
7. Review
8. Launch……………….while clicking “launch instance” a dialogue box for creating “key pair “ opens…

So before launching instances – we have to..


 Select “Create new key pair” (private DNS and public DNS)
 Type a suitable name..
 Click “download key pair” so a .pem (privacy enhance mail) file will be downloaded, which is to be
secure carefully – as It will be required to connect EC2-instances.
 Check acknowledgement

Then only we can move with click “Launch instances…………………..

Once the instances are created, we shall name it and will get private and public DNS number for all the
instances.
 From downloaded .pem file we have to create .ppk file
 To do the above, we need a program called putty gen (to be download from web)
 Browse “putty gen”, and go to “download page”
 Search for “puttygen.exe” file and click to download
 Once the download is complete- open puttygen from the computer serch…
 Click “conversion” on the top of puttygen dialogue box.
 Select “import key” from the dropdown list
 Again a box will open with a list of files, from where we have to select “.pem” file
 This will open box with tabs “save public key” and “save private key” at below right
 Hit the one we need………..(public to access from anywhere thru laptop, private for cloudera for cluster
management)
 Click yes to save………..
 Give .ppk name for saving the file
 And click save

Now in the putty “configuration box” Earlier downloaded from “putty.exe” file,

 we have to put IP address in designated box for “host”


 Give a name of this session – to save for future (Say Master)
 From the left category panel (of putty conf. box), click “Auth” under “SSH” under “Connection”
 Brows the .ppk file – at the “browse” tab
 Go back to “session” From the left category panel (of putty conf. box)
 And finally hit “save”

Repeat the above process for the other EC2 instances with the appropriate naming like “Slave-1”, “Slave-2” and so on as
per the cluster design.

Finally click “open” below to connect with instances (require user name password as usual)

link putty and commands

7
https://www.mybloggingthing.com/ssh-putty-commands/

Putty installation
https://www.youtube.com/watch?v=69MEd9O6W0U

You might also like