You are on page 1of 1

HADOOP AND Hadoop HDFS List File Commands

hdfs dfs –ls /


Tasks
Lists all the files and directories given for the hdfs
Commands used to interact with MapReduce

Commands Tasks

MAPREDUCE
destination path
hdfs dfs –ls –d /hadoop This command lists all the details of the Hadoop files hadoop job -submit <job-file> used to submit the Jobs created
Recursively lists all the files in the Hadoop directory shows map & reduce completion
hdfs dfs –ls –R /hadoop hadoop job -status <job-id>

C H E AT S H E E T
and al sub directories in Hadoop directory
status and all job counters
This command lists all the files in the Hadoop directory
hdfs dfs –ls hadoop/ dat* hadoop job -counter <job-id> <group-
starting with ‘dat’ prints the counter value
name><countername>

Hadoop & MapReduce


Hdfs basic commands Tasks hadoop job -kill <job-id> This command kills the job

hdfs dfs -put logs.csv /data/


This command is used to upload the files from local file
MapReduce hadoop job -events <job-id> shows the event details received

Basics
system to HDFS
<fromevent-#> <#-of-events> by the job tracker for given range
hdfs dfs -cat /data/logs.csv This command is used to read the content from the file MapReduce is a framework for processing parallelizable problems Prints the job details, and killed
across huge datasets using a large number of systems referred as hadoop job -history [all] <jobOutputDir>
hdfs dfs -chmod 744 /data/logs.csv This command is used to change the permission of the files and failed tip details
clusters. Basically, it is a processing technique and program model for This command is used to display
Hadoop hdfs dfs -chmod –R 744 /data/logs.csv
This command is used to change the permission of the files
distributed computing based on Java
hadoop job -list[all]
all the jobs
recursively
Hadoop is a framework basically designed to handle a large volume of data This command is used to kill the
hdfs dfs -setrep -w 5 /data/logs.csv This command is used to set the replication factor to 5 hadoop job -kill-task <task-id>
both structured and unstructured tasks
hdfs dfs -du -h /data/logs.csv This command is used to check the size of the file Mahout hadoop job -fail-task <task-id>
This command is used to fail the
This command is used to move the files to a newly created
HDFS hdfs dfs -mv logs.csv logs/
subdirectory Apache Mahout is an open source algebraic framework used for data
hadoop job -set-priority <job-id>
task
Changes and sets the priority of
mining which works along with the distributed environments with
Hadoop Distributed File System is a framework designed to manage huge hdfs dfs -rm -r logs This command is used to remove the directories from Hdfs <priority> the job
volumes of data in a simple and pragmatic way. It contains a vast amount of simple programming languages
stop-all.sh This command is used to stop the cluster HADOOP_HOME/bin/hadoop job -kill This command kills the job
servers and each stores a part of file system start-all.sh This command is used to start the cluster <JOB-ID> created
In order to secure Hadoop, configure Hadoop with the following aspects Hadoop version This command is used to check the version of Hadoop HADOOP_HOME/bin/hadoop job - This is used to show the history
Components of history <DIR-NAME> of the jobs
• Authentication: hdfs fsck/ This command is used to check the health of the files
• Define users
This command is used to turn off the safemode of
MapReduce Important commands used in MapReduce
Hdfs dfsadmin –safemode leave
• Enable Kerberos in Hadoop namenode PayLoad: The applications implement Map and Reduce functions and
Usage: mapred [Generic commands] <parameters>
• Set-up Knox gateway to control access and authentication Hdfs namenode -format This command is used to format the NameNode form the core of the job
to the HDFS cluster hadoop [--config confdir]archive - Parameters Tasks
MRUnit: Unit test framework for MapReduce
This command is used to create a Hadoop archieve
• Authorization: archiveName NAME -p
Mapper: Mapper maps the input key/value pairs to the set of -input directory/file-name Shows Inputs the location for mapper
• Define groups hadoop fs [generic options] -touchz
intermediate key/value pairs -output directory-name Shows output location for the mapper
This is used to create an empty files in a hdfs directory
• Define HDFS permissions <path> ... -mapper executable or
NameNode: Node that manages the HDFS is known as namednode Used for Mapper executable
• Define HDFS ACL’s hdfs dfs [generic options] -getmerge This is used to concatenate all files in a directory into one script or JavaClassName
DataNode: Node where the data is presented before processing takes
[-nl] <src> <localdst> file
• Audit: -reducer executable or
hdfs dfs -chown -R admin:hadoop This is used to change the owner of the group
place Used for reducer executable
Enable process execution audit trail script or JavaClassName
/new-dir MasterNode: Node where the jobtrackers runs and accept the job
• Data protection: Makes the mapper, reducer, combiner
request from the clients
Enable wire encryption with Hadoop Commands Tasks
SlaveNode: Node where the Map and Reduce program runs -file file-name executable available locally on the
yarn This command shows the yarn help
JobTracker: Schedules jobs and tracks the assigned jobs to the task computing nodes
yarn [--config confdir] This command is used to define configuration file
tracker -numReduceTasks This is used to specify number of reducers
yarn [--loglevel loglevel] This can be used to define the log level, which can be
fatal, error, warn, info, debug or trace -mapdebug Script to call when the map task fails
TaskTracker: Tracks the task and updates the status to the job tracker
yarn classpath This is used to show the Hadoop classpath -reducedebug Script to call when the reduce task fails
Job: A program which is an execution of a Mapper and Reducer across
yarn application This is used to show and kill the Hadoop applications a dataset
yarn applicationattempt This shows the application attempt
Task: An execution of Mapper and Reducer on a piece of data
yarn container This command shows the container information
Task Attempt: A particular instance of an attempt to execute a task on
yarn node This shows the node information FURTHERMORE:
a SlaveNode
yarn queue This shows the queue information Big Data Hadoop Certification Training Course

You might also like