You are on page 1of 2

Login to MySql using following credentials

-------------------------------------------
username - root
password - cloudera

Use retail_db;

show tables;

select count(*) from customers;

Let's Sqoop now!!!!


---------------------

sqoop import --connect jdbc:mysql://localhost/retail_db --username root --password


cloudera --as-avrodatafile -m 1 --target-dir /customers_avro_metadata --table
customers --where 1=2

Extract the schema from the file using avro-tools


--------------------------------------------------

avro-tools getschema hdfs://localhost:8020/customers_avro_metadata/part-m-


00000.avro > /home/cloudera/Desktop/schema

Upload the schema to HDFS


--------------------------

hdfs dfs -put /home/cloudera/Desktop/schema /customers_avro_metadata/

Let's Sqoop again!!!! This time the whole table...


----------------------------------------------------

sqoop import --connect jdbc:mysql://localhost/retail_db --username root --password


cloudera --as-avrodatafile --target-dir /customers_avro_data --table customers

Create hive table on top of the data using the schema extracted in earlier steps.
----------------------------------------------------------------------------------
CREATE EXTERNAL TABLE customers_avro
stored as avro
LOCATION '/customers_avro_data'
TBLPROPERTIES ('avro.schema.url'='hdfs:///customers_avro_metadata/schema');
Sqooping using SNAPPY compression codec
----------------------------------------

sqoop import --connect jdbc:mysql://localhost/retail_db --username root --password


cloudera --as-avrodatafile --target-dir /customers_avro_data_snappy --table
customers --compression-codec snappy

Create hive table on top of the data using the schema extracted in earlier steps.
----------------------------------------------------------------------------------
CREATE EXTERNAL TABLE customers_avro_snappy
stored as avro
LOCATION '/customers_avro_data'
TBLPROPERTIES ('avro.schema.url'='hdfs:///customers_avro_metadata/schema');

You might also like