Professional Documents
Culture Documents
Credits : 04
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
10 Hours
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
Introduction to Hadoop
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
Introduction to Hadoop
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
Introduction to Hadoop
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
Hadoop core components
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
Spark
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
Features of Hadoop
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
Hadoop Ecosystem components
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
Hadoop Ecosystem components
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
Hadoop Streaming
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
Hadoop Pipes
Example: IBM PowerLinux enable working with Hadoop pipes and libraries
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
HDFS: Data Storage
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
HDFS: Data Storage
- Racks
- Each racks has many DataNodes
- Each DN has many DataBlocks
- Racks distribute in clusters
- File divides data into blocks
- Data block size 64MB
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
HDFS: Data Storage
Features
- Create, append, delete, rename and attribute modification functions
- Content of file cannot modified, but can append at the end
- Write once, use many times during usages and processing
- Average file size can be more than 500 MB
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
HDFS: Physical Organization
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
HDFS: Physical Organization
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
HDFS: Physical Organization
Master Slaves and Hadoop Client Node load the data into cluster
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
Hadoop 2
- Single NameNode failure in Hadoop1 is an operational failure
- Scaling is also restricted beyond few thousand of nodes and clusters
- Hadoop 2 provides multiple NameNodes enables higher
resource availability
Each MainNode has following components
- An associated NameNode
- Zookeeper coordination client functions as centralized repository for
distributed applications
Zookeeper uses Synchronization, serialization and coordination activities
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
Hadoop 2
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
HDFS Commands
- Commands for interacting with files in HDFS require
/bin/hdfs dfs <args>
copyToLocal copying file at HDFS to local
-cat copying to standard output (stdout)
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
MapReduce Framework and Programming Model
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
MapReduce Framework and Programming Model
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
MapReduce Framework and Programming Model
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
MapReduce Framework and Programming Model
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
MapReduce Framework and Programming Model
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
MapReduce Framework and Programming Model
Features:
1.Provides automatic parallelization and distribution of computation
2. Processes data stored on distributed clusters of DataNodes and racks
3.Allows processing large amount of data in parallel
4. Provides scalability for usages of large number of servers
5. Provides MapReduce batch-oriented programming model in Hadoop v1
6. Provides additional processing modes in Hadoop 2 YARN based system
and enables required parallel processing
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
Hadoop MapReduce Framework
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
Hadoop MapReduce Framework
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
Hadoop MapReduce Framework
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
Centralized Data (Shared Data)
Distributed Computing
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
Distributed Data
&
Distributed
Computing
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
Distributed
Computing with
No shared data
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
Data Block
stuData stuData
File size < 64MB
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
DN 3 DN 239
…
DN 1
stuData
DN 2 DN 4 DN 240
Each Data Node size 64 GB Rack 1 Rack 2 Rack 120
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
DN 2 DN 4 DN 240
Each Data Node size 64 GB Rack 1
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
• Each DNs capacity is 64GB, each rack has 2 DNs are there,
so capacity of one rack is = 2 x 64 GB = 128 GB
DN 3 DN 239
…
DN 1
DN 2 DN 4 DN 240
Each Data Node size 64 GB Rack 1 Rack 2 Rack 120
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
DN 1 DN 3 DN 239
DN 2 DN 4 DN 240
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
DN 1 DN 3 DN 239
DN 2 DN 4 DN 240
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
stuData1 stuData1
DB1 DB2
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
DN 3 DN 239
…
DN 1
DN 2 DN 4 DN 240
Each Data Node size 64 GB Rack 1 Rack 2 Rack 120
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
DN 1 DN 3 DN 239
stuData Each DB replicates 3 times in DN
120 x 1024 /3 = 40960
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
Hadoop YARN based Execution Model
• Client Node
• Resource Manager
• Node Manager
• App Master
• Containers
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
Hadoop YARN based Execution Model
Job History Server
Master Node
Resource Manager
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
Hadoop Ecosystem Tools
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
Hadoop Ecosystem Tools
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
Hadoop Ecosystem Tools
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
Hadoop Ecosystem Tools
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
Hadoop Ecosystem: Zookeeper, Oozie, Sqoop and Flume
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
Hadoop Ecosystem: Zookeeper, Oozie, Sqoop and Flume
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
Hadoop Ecosystem: Zookeeper, Oozie, Sqoop and Flume
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
Hadoop Ecosystem: Zookeeper, Oozie, Sqoop and Flume
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
Hadoop Ecosystem: Zookeeper, Oozie, Sqoop and Flume
• Apache Sqoop load voluminous data efficiently between Hadoop and external
repositories that resides on Enterprise servers or relational database.
• Sqoop works with relational databases like Oracle, MySQL, PostgreSQL and DB2
• Sqoop provides mechanism for importing data from external data store to HDFS
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
Hadoop Ecosystem: Zookeeper, Oozie, Sqoop and Flume
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
Hadoop Ecosystem: Zookeeper, Oozie, Sqoop and Flume
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
Ambari
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
Ambari
Features are
1. Simplification of installation, configuration and management
2. Enable easy, efficient, repeatable and automated creation of clusters
3. Manages and monitors scalable clustering
4. Provides an intuitive web interface and REST API.
5. Visualize the health of clusters and critical metrics for their operations
6. Enable detection of faulty node links
7. Provides extensibility and customizability
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
HBase
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
HBase
Features:
6. Access rows serially
7. Provides random, real-time read/write access to BigData
8. Fault tolerant storage
9. Similarity with Google BigTable
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
HBase
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
Hive
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
Hive
- Hive interact with structured data stored in HDFS with Hive Query Language
- HQL translates SQL-like queries into MapReduce jobs
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Pig Introduction to Hadoop, HDFS and Essential Tools
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
HDFS Basics
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
HDFS Components
Design is based on two types of nodes: Name Node and Data Node
• Single Name Node manages all meta data need to store and
retrieve actual data from DNs
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
HDFS Components
Based on two types of nodes: Name Node and Data Node
• File system namespace operations : opening, closing and renaming files
and directories are all managed by Name Node
• Slave(DataNode) are responsible for serving read & write requests from
file system to clients
Name Node manages block creation, deletion and replecation
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
HDFS Components
fsimage_*
edit_*
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
HDFS Block Replication
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
HDFS Block Replication
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
HDFS Safe Mode
• When Name Node starts, It enters read-only safe mode, Where blocks
cannot be replicated or deleted
Safe mode enables Name Node to perform two important processes
1. Previous file system state is reconstructed by loading fsimage file into memory
and replying the edit log
2. Mapping between blocks and data nodes is created by waiting for enough of
Data Nodes to register
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
Rack Awareness
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
Name Node high availability
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
Name Node high availability
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
HDFS Name Node Federation
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
HDFS Name Node Federation
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
HDFS Checkpoints and Backups
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
HDFS Snapshots
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
HDFS NFS Gateway
- Users can easily download/upload files from/to the HDFS file system
to/from their local file system
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
HDFS User Commands
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
HDFS User Commands
Usage:
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
HDFS User Commands
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
HDFS User Commands
oiv apply the offline fsimage viewer to an fsimage
oiv_legacy apply the offline fsimage viewer to an legacy
fsimage
oev apply the offline edits viewer to an edits file
fetchdt fetch a delegation token from the NameNode
getconf get config values from configuration
groups get the groups which users belong to
snapshotDiff diff two snapshots of a directory or diff the
current directory
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
HDFS User Commands
portmap run a portmap service
nfs3 run an NFS version 3 gateway
cacheadmin configure the HDFS cache
crypto configure HDFS encryption zones
storagepolicies get all the existing block storage policies
version print the version
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
HDFS User Commands
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
HDFS User Commands
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
HDFS User Commands
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
HDFS User Commands List Files in HDFS
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
HDFS User Commands
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
HDFS User Commands
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
HDFS User Commands
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "hdfs.h"
int main(int argc, char **argv)
{
hdfsFS fs = hdfsConnect("default", 0);
const char* writePath = "/tmp/testfile.txt";
hdfsFile writeFile = hdfsOpenFile(fs, writePath, WRONGLY|O_CREAT, 0,0, 0);
if(!writeFile)
{
fprintf(stderr, "Failed to open %s for writing!\n", writePath);
exit(-1);
} Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
Essential Hadoop Tools
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
Essential Hadoop Tools
• Hadoop ecosystem offers many tools to help Data input, High level
processing,
workflow management and creation of huge database.
• Each tool is managed is managed as a separate Apache Software
foundation project
• But designed to operate with core Hadoop services including HDFS, YARN
and MapReduce
• Background on each tool with start and finish example given here
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
Using Apache Pig
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
Using Apache Pig
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
Using Apache Hive
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
Using Apache Hive
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
Using Apache Sqoop to Acquire Relational Data
Apache Sqoop is a tool designed to transfer data between Hadoop and relational
databases.
• Sqoop can used to import data from a RDBMS into the HDFS
• transform the data in Hadoop
• Export the data back into an RDBMS
• Can be used with any JDBC–compliant database
• Has been tested on Microsoft SQL Server, PostgresSQL, MySQL, and Oracle
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
Using Apache Sqoop to Acquire Relational Data
Version-1 Version-2
-------------------------------------------------------------------------------------------------------
1. Data were accessed using connectors Does not support connectors
written for specific databases
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
Using Apache Sqoop to Acquire Relational Data
Import Methods
Step1: Examines the database to
gather
the necessary metadata
for the data to be imported
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
Using Apache Sqoop to Acquire Relational Data
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
Using Apache Flume to Acquire Data Streams
• Used for log files, social media-generated data, email messages, and
just about any continuous data source
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
Using Apache Flume to Acquire Data Streams
Channel: A channel is a data queue that forwards the source data to the
sink destination.
as a buffer that manages input (source) and output (sink) flow rates
Sink: The sink delivers data to destination such as HDFS, a local file, or
another Flume agent
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
Using Apache Flume to Acquire Data Streams
• Sqoop agents may be placed in a pipeline, possibly to traverse several machines or
domains
• This configuration is normally used when data are collected on one machine (e.g.,
a web server) and sent to another machine that has access to HDFS.
• The data transfer format used by Flume is called Apache Avro, provides several
useful features
1. Avro is a data serialization/deserialization system that uses a compact
binary format
2. The schema is sent as part of the data exchange and is defined using
JSON
3. Avro also uses RPCs to send data.
That is, an Avro sink will contact an Avro source to send data.
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
Using Apache Flume to Acquire Data Streams
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
Manage Hadoop Workflows with Apache Oozie
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
Manage Hadoop Workflows with Apache Oozie
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
Manage Hadoop Workflows with Apache Oozie
• Oozie is integrated with the rest of the Hadoop stack, supporting several
types of Hadoop jobs out of the box (e.g., Java MapReduce, Streaming
MapReduce, Pig, Hive, and Sqoop)
• As well as system-specific jobs (e.g., Java programs and shell scripts).
• Oozie also provides a CLI and a web UI for monitoring jobs.
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
Manage Hadoop Workflows with Apache Oozie
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
Manage Hadoop Workflows with Apache Oozie
• Control flow nodes define the beginning and the end of a workflow. They
include start, end, and optional fail nodes.
• Action nodes are where the actual processing tasks are defined. When an
action node finishes, the remote systems notify Oozie and the next node in
the workflow is executed.
.
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
Manage Hadoop Workflows with Apache Oozie
• Fork/join nodes enable parallel execution of tasks in the workflow. The fork
node enables two or more tasks to run at the same time. A join node
represents a rendezvous point that must wait until all forked tasks complete.
FORK JOIN
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
Manage Hadoop Workflows with Apache Oozie
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
Using Apache HBase
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
Using Apache HBase
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
Using Apache HBase
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
Using Apache HBase
HBase Data Model Overview
• It is possible to have many versions of data within an HBase cell.
• Almost anything can serve as a row key, from strings to binary representations
of longs to serialized data structures.
• Rows are lexicographically sorted with the lowest order appearing first in a table.
• The empty byte array denotes both the start and the end of a table’s
namespace.
• All table accesses are via the table row key, which is
• considered its primary key.
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
Using Apache HBase
Create the Database
hbase(main):006:0> create 'apple', 'price' , 'volume'
0 row(s) in 0.8150 seconds
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
Using Apache HBase
The put command is used to add data to the database from within the shell.
put 'apple','6-May-15','price:open','126.56'
put 'apple','6-May-15','price:high','126.75'
put 'apple','6-May-15','price:low','123.36'
put 'apple','6-May-15','price:close','125.01'
put 'apple','6-May-15','volume','71820387'
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
Using Apache HBase
Inspect the Database :The entire database can be listed using the scan
command.
hbase(main):006:0> scan 'apple'
ROW COLUMN+CELL
6-May-15 column=price:close, timestamp=1430955128359, value=125.01
6-May-15 column=price:high, timestamp=1430955126024, value=126.75
6-May-15 column=price:low, timestamp=1430955126053, value=123.36
6-May-15 column=price:open, timestamp=1430955125977, value=126.56
6-May-15 column=volume:, timestamp=1430955141440, value=71820387
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
Using Apache HBase
Get a Row You can use the row key to access an individual row.
hbase(main):008:0> get 'apple', '6-May-15'
COLUMN CELL
price:close timestamp=1430955128359, value=125.01
price:high timestamp=1430955126024, value=126.75
price:low timestamp=1430955126053, value=123.36 price:open
timestamp=1430955125977, value=126.56
volume: timestamp=1430955141440, value=71820387
5 row(s) in 0.0130 seconds
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
Using Apache HBase
Get Table Cells A single cell can be accessed using the get command and the
COLUMN option:
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
Using Apache HBase
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
Using Apache HBase
Delete a Cell A specific cell can be deleted using the following command:
hbase(main):009:0> delete 'apple', '6-May-15' , 'price:low'
Delete a Row You can delete an entire row by giving the deleteall command
hbase(main):009:0> deleteall 'apple', '6-May-15'
Remove a Table To remove (drop) a table, you must first disable it. The following
two commands remove the apple table from Hbase:
hbase(main):009:0> disable 'apple'
hbase(main):010:0> drop 'apple'
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
Using Apache HBase
Delete a Cell A specific cell can be deleted using the following command:
hbase(main):009:0> delete 'apple', '6-May-15' , 'price:low'
Delete a Row You can delete an entire row by giving the deleteall command
hbase(main):009:0> deleteall 'apple', '6-May-15'
Remove a Table To remove (drop) a table, you must first disable it. The following
two commands remove the apple table from Hbase:
hbase(main):009:0> disable 'apple'
hbase(main):010:0> drop 'apple'
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga
Introduction to Hadoop, HDFS and Essential Tools
Using Apache HBase
Arun Kumar P
Dept. of ISE, JNNCE, Shivamogga