Hadoop Troubleshooting

2/25/2018 Hadoop Troubleshooting
Hadoop Troubleshooting
Following are some pitfalls and bugs that we have run into while running Hadoop. If you have a problem that isn't here please let the TA know so that
we can help you out and share the solution with the rest of the class.
Back to checklist
General Advice
If you are having problems, check the logs in the logs directory to see if there are any Hadoop errors or Java Exceptions.
Logs are named by machine and job they carry out in the cluster, and this can help you figure out which part of your configuration is giving you trouble.
Even if you were very careful, the problem is probably with your configuration. Try running the grep example from the QuickStart. If it doesn't run then
you need to check your configuration.
If you can't get it to work on a real cluster, try it on a single-node.
Sometimes it can just take some time and sweat to make complex systems run; but, it never hurts to ask for help so please ask the TA and your fellow
students ASAP if you are having trouble making Hadoop run.
Symptoms and Possible Solutions

Possible
Symptom Possible Solution
Problem
Your cluster
enters safe
mode when it
hasn't been 1. First, wait a minute or two and then retry your command. If you
able to verify just started your cluster, it's possible that it isn't fully initialized
that all the yet.
data nodes
2. If waiting a few minutes didn't help and you still get a "safe
necessary to
mode" error, check your logs to see if any of your data nodes
You get an error that you cluster is in "safe mode" replicate your
didn't start correctly (either they have Java exceptions in their
data are up
logs or they have messages stating that they are unable to contact
and
some other node in your cluster). If this is the case you need to
responding.
resolve the configuration issue (or possibly pick some new
Check the
nodes) before you can continue.
documentation
to learn more
about safe
mode.
One of your
nodes cannot
be reached The only workaround is to pick a new node to replace the unreachable
You get a NoRouteToHostException in your logs or in stderr correctly. This one. Currently, I think that creusa is unreachable, but all other Linux
output from a command. may be a boxes should be okay. None of the Macs will currently work in a
firewall issue, cluster.
so you should
report it to me.
http://www.cs.brandeis.edu/~rshaull/cs147a-fall-2008/hadoop-troubleshooting/ 1/3
Possible
Problem
You have
moved your
single node
cluster from
one machine
in the Berry
Patch to You can ask your login to skip checking the validity of localhost. You
another. The do this by setting NoHostAuthenticationForLocalhost to yes in
You get an error that "remote host identification has changed" name ~/.ssh/config. You can accomplish this with the following command:
when you try to ssh to localhost. localhost thus
is pointing to a echo "NoHostAuthenticationForLocalhost yes" >>~/.ssh/config
new machine,
and your ssh
client thinks
that it might
be a man-in-
the-middle
attack.
Creating
directories is Go to the HDFS info web page (open your web browser and go to
only a http://namenode:dfs_info_port where namenode is the hostname of
function of the your NameNode and dfs_info_port is the port you chose
NameNode, so dfs.info.port; if followed the QuickStart on your personal computer
your then this URL will be http://localhost:50070). Once at that page
DataNode is click on the number where it tells you how many DataNodes you
not exercised have to look at a list of the DataNodes in your cluster.
until you
Your DataNode is started and you can create directories with If it says you have used 100% of your space, then you need to free
actually want
bin/hadoop dfs -mkdir, but you get an error message when you up room on local disk(s) of the DataNode(s).
to put some
try to put files into the HDFS (e.g., when you run a command
bytes into a If you are on Windows then this number will not be accurate
like bin/hadoop dfs -put).
file. If you are
(there is some kind of bug either in Cygwin's df.exe or in
sure that the Windows). Just free up some more space and you should be okay.
DataNode is On one Windows machine we tried the disk had 1GB free but
started, then it
Hadoop reported that it was 100% full. Then we freed up another
could be that 1GB and then it said that the disk was 99.15% full and started
your writing data into the HDFS again. We encountered this bug on
DataNodes are Windows XP SP2.
out of disk
space.
You may have
created a
directory
inside the
input
You try to run the grep example from the QuickStart but you get directory in
an error message like this: the HDFS. For The easiest way to get the example run is to just start over and make
example, this the input anew.
might happen
java.io.IOException: Not a file:
hdfs://localhost:9000/user/ross/input/conf
if you run bin/hadoop dfs -rmr input
bin/hadoop bin/hadoop dfs -put conf input
dfs -put conf
input twice in
a row (this
would create a
subdirectory
in input...
why?).
Your Hadoop
namespaceID You need to do something like this:
Your DataNodes won't start, and you see something like this in became
logs/*datanode*: corrupted. bin/stop-all.sh
Unfortunately rm -Rf /tmp/hadoop-your-username/*
Incompatible namespaceIDs in /tmp/hadoop-ross/dfs/data the easiest bin/hadoop namenode -format
thing to do
reformat the Be VERY careful with rm -Rf
HDFS.
Possible
Problem
When you try the grep example in the QuickStart, you get an You haven't
error like the following: created an
input directory bin/hadoop dfs -put conf input
org.apache.hadoop.mapred.InvalidInputException: containing one
Input path doesnt exist : /user/ross/input or more text
files.
You might Remove the output directory before rerunning the example:
have already
When you try the grep example in the QuickStart, you get an run the bin/hadoop dfs -rmr output
error like the following: example once,
creating an
Alternatively you can change the output directory of the grep
output
org.apache.hadoop.mapred.FileAlreadyExistsException:
directory. example, something like this:
Output directory /user/ross/output already exists
Hadoop
doesn't like to bin/hadoop jar hadoop-*-examples.jar \
grep input output2 'dfs[a-z.]+'
overwrite
files.
You might
have given
only a relative
path to the
mapper and Use absolute paths like this from the tutorial:
reducer
programs. The bin/hadoop jar contrib/hadoop-0.15.2-streaming.jar \
You can run Hadoop jobs written in Java (like the grep
tutorial -mapper $HOME/proj/hadoop/multifetch.py \
example), but your HadoopStreaming jobs (such as the Python
originally just -reducer $HOME/proj/hadoop/reducer.py \
example that fetches web page titles) won't work. -input urls/* \
specified
-output titles
relative paths,
but absolute
paths are
required if you
are running in
a real cluster.

Hadoop Troubleshooting

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Hadoop Troubleshooting

Uploaded by

Copyright:

Available Formats

2/25/2018 Hadoop Troubleshooting

If you can't get it to work on a real cluster, try it on a single-node.

Symptoms and Possible Solutions

You might also like