You are on page 1of 16

MINI PROJECT ON BIGDATA

Contents
 Prerequisites before Initiate HDFS file operation.
 $start-dfs.sh (To Start all Daemons)
 Jps (To check for all Daemons working)

1)Basic HDFS File Operation.


 put Command (File import from Local file system to Hdfs)
 get Command (File import from Hdfs to Local file system)
 cp Command (Copy file from one directory to other Hdfs
directory)
 mv command (move file from one directory to destination
Hdfs directory )

2)Sqoop Commands
 Sqoop import command.
 Sqoop import with Where clause command.
 Sqoop export command
 Sqoop Incremental append.
3) Hive Commands
 Internal/Managed table Creation in Hive
 External table Creation in Hive
 Loading data from Local file system to Hive
 Static partitioning in Hive
 Dynamic partitioning in Hive
 Bucketing in Hive

HDFS File Operation:-


 put Command
Loading file from Local file system to specific directory in HDFS

 get Command
-getcommand is used to copy data from hadoop system to local file
system, it will copy the data from hdfs stored directories to local file
system we can do the same by using copyToLocal command .

 cp Command
It will copy file from one directory of HDFS file system to destination
directory in HDFS file system itself.
 mv command

By using move command the File1.txt in wep directory will be moved to the new directory
/user/sumit .
Sqoop Command
 Sqoop import command.
RDBMS-HDFS

The above Sqoop command will copy the file from local database which is in
localhost and in table student will be copied into Sqoop data directories in hdfs,
here first we create connections with jdbc and mysql and then data will be copied
to hdfs part file in directories.
Sqoop import with Where clause command.
Sqoop export command
HDFS-RDBMS

 Sqoop Incremental append.

The command is used to load data from local database to hdfs in incremental
manner that means by looking at the last value of check column data will be
loaded to hdfs , the data after the specified value of column will be loaded

Hive Commads:
 Internal table creation
 External table creation

By creating the external table it helps in the way as if we will drop the
external table then the table will be deleted (Metadata) but the data
associated with the table (Actual data) will remain there in the
warehouse directories of hive .

 Loading data from Local file system to Hive


STEPS TO DO PARTITIONING
Data can be stored either two ways one with
Internal/managed table or External table
Step -1: Create non partition table( Internal/External)
Step-2: Loading data into Created table
Step-3: Create Partition table
Step -4: For Dynamic Partioning Set Property
For Static its not needed
Step-5 : Loading data into Partition table

 Static partitioning in Hive


static partitioning, where you explicitly specify partition column and that
column corresponding directory will be created in hive/warehouse
directory.
 Dynamic partitioning in Hive
Unlike static partitioning, where you explicitly specify partition values,
dynamic partitioning lets Hive determine these values automatically based
on the data itself. And separate directory will be created implicitly in
hive/warehouse directory

 Bucketing in Hive

Bucketing is based on the hashing technique.


For a given column value, calculate the modulo of that value with the
number of required buckets (let’s say, F(x) % 3).

Based on the resulting value, store the data into the corresponding bucket.
Data is distributed evenly between corresponding buckets.

You might also like