You are on page 1of 4

Sqoop can be used to transform a relational database schema into an HBase schema.

course, the main goal here is to demonstrate how Sqoop can import data from an RDBMS or
data warehouse directly into HBase, but it’s always better to see how a tool is used in context
versus how it’s used in the abstract.
The figure shows how the Service Order Database might look after being transformed into an
HBase schema.

For this particular import example, you want to import thecustomercontactinfo table directly into
an HBase table in preparation for building the HBase Service Order Database schema. To
complete the HBase schema, you’d have to execute the same steps to import
the productinfo table, and then the serviceorders table could be built with a Java MapReduce

Sqoop doesn’t now permit you to import, all at once, a relational table directly into an HBase
table having multiple column families. To work around this limitation, you create the HBase table
first and then execute three Sqoop import operations to finish the task. The listing shows the
task of creating the table.

contactinfo" --hbase-table customercontactinfo --column-family ContactInfo --hbase-row-key customernum -m 1 Enter password: . 13/08/17 16:53:01 INFO mapreduce.. 'CustomerName'.hbase(main):017:0> create 'customercontactinfo'.ImportJobBase: Retrieved 5 records.. The customernumprimary key also becomes the HBase row key. $ sqoop import --connect jdbc:mysql://localhost/serviceorderdb --username root -P --table customercontactinfo --columns "customernum.0680 seconds In the following listing. as specified by the –hbase-row-keyCLA. hbase(main):018:0* 'ContactInfo'...ImportJobBase: Retrieved 5 records. for each Sqoop import command. 13/08/17 17:00:59 INFO mapreduce. $ sqoop import --connect jdbc:mysql://localhost/serviceorderdb --username root -P --table customercontactinfo --columns "customernum.customername" --hbase-table customercontactinfo --column-family CustomerName --hbase-row-key customernum -m 1 Enter password: . 'ProductNums' 0 row(s) in 1. . note that the target HBase column family specified by the –column-family CLA and the corresponding MySQL columns specified by the –columns CLA are in bold.

A200. timestamp=1376773256317. value=John Timothy Smith 10000 column=ProductNums:productnums. timestamp=1376773256317. timestamp=1376772776684. 11111. you’d see that the import and translation from a relational database table on MySQL directly into HBase was a success. timestamp=1376772776684.$ sqoop import --connect jdbc:mysql://localhost/serviceorderdb --username root -P --table customercontactinfo --columns "customernum. CA. to quickly move relational tables that may be exceeding capacity on your RDBMS or data warehouse into HBase. timestamp=1376772776684. value=Jane Ann Doe 20000 column=ProductNums:productnums. timestamp=1376773256317. timestamp=1376773551221. value=2 HBase Ave.B500. value=1 Hadoop Lane.. value=B500 10001 column=ContactInfo:contactinfo.B400. value=A100. CO. timestamp=1376773256317. value=Joe Developer .ImportJobBase: Retrieved 5 records. timestamp=1376773551221. 22222 20000 10000 column=CustomerName:customername. The customercontactinfo table in this example is rather small. but imagine the power you now have. where capacity is virtually unlimited and scalability is automatic. If you were to carry out an HBase scan of your new table. value=1 Piglatin Ave. NY.Sm ith@xyz.C60 20000 column=ContactInfo:contactinfo. 13/08/17 17:05:54 INFO mapreduce. 22222 10001 column=CustomerName:customername. hbase(main):033:0> scan 'customercontactinfo' ROW COLUMN+CELL 10000 column=ContactInfo:contactinfo.D700 column=ProductNums:productnums. John. value=A100. CA. value=Bill Jones 10001 0.. value=1 Expert HBase Ave.A200.C500. 33333 20001 column=CustomerName:customername.productnums" --hbase-table customercontactinfo --column-family ProductNums --hbase-row-key customernum -m 1 Enter password: . timestamp=1376772776684.A300 20001 column=ContactInfo:contactinfo. using Sqoop and HBase.A300. timestamp=1376773551221.

1120 seconds Importing existing relational data via Sqoop into Hive and HBase tables can potentially enable a wide range of new and exciting data analysis workflows. value=Data Scientist 30000 column=ProductNums:productnums. timestamp=1376773551221. value=C500 5 row(s) in 0. MA. value=D700 30000 column=ContactInfo:contactinfo. timestamp=1376773256317. timestamp=1376772776684. 33333 30000 column=CustomerName:customername. .20001 column=ProductNums:productnums. value=1 Statistics Lane. timestamp=1376773551221. check out the Apache Sqoop documentation for additional Hive and HBase command line arguments and features. If this feature is of interest to you.