You are on page 1of 3

1. Unable to locate completed jobs in History Server (HS)?

Why they are pointing to old month


directories?

All the completed jobs will be maintained in HS cache. If the cache is full, it is unable to load the
new jobs/applications in the cache.

Solution:

 Stop history server


 Delete or move old job directories
 Restart HS to load new jobs into cache

2. Too many open files in hive log.

sudo lsof | grep mapr | wc -l

grep -i mapr /etc/security/limits.conf

Just restart the hiveserver2.


sudo /etc/init.d/hiveserver2 status
sudo /etc/init.d/hiveserver2 stop
sudo /etc/init.d/hiveserver2 start

3. Why Jobs fail in COMMIT stage with COMMIT_SUCCESS file exists exception?

This appears to be issue with speculative execution where different tasks create same
COMMIT_SUCCESS file for a single task.

Solution:
Rerun job with below properties
mapreduce.map.speculative=FALSE
mapreduce.reduce.speculative=FALSE

4. How to troubleshoot “GC overhead limit exceeded” issue at Reducer phase?

Tasks may fail if they don’t have enough memory to store its input data.

2015-11-04 13:56:49,465 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running


child : java.lang.OutOfMemoryError: GC overhead limit exceeded

General scenario is either to increase number of Reducers. At times, even though we increase
number of Reducers, the data may be skewed to only few Reducers.

1
For ex:

If the total map output keys are 10000 and you are using 5 Reducers to process this data, and if
your data is not skewed uniformly like below, only 1st Reducer is taking complete load to process
this data and failing due to out of memory.

REDUCE_INPUT_RECORDS for Reducer R1 = 9000


REDUCE_INPUT_RECORDS for Reducer R2 = 100
REDUCE_INPUT_RECORDS for Reducer R3 = 600
REDUCE_INPUT_RECORDS for Reducer R4 = 200
REDUCE_INPUT_RECORDS for Reducer R5 = 100

Solution:

Try to increase the Reducer memory and Java Opts property only for that particular job as
shown below. This will launch each Reducer with 6GB memory and tasks will be succeeded
without memory issue.

- Dmapreduce.reduce.memory.mb=6144
-Dmapreduce.reduce.java.opts=-Xmx4915m

5. Time difference in job execution due to disk latency?

At times tasks will take more time to complete due to disk latency.

6. Hive CLI is not showing hive prompt and just hangs?

Possible checks:

Check below directory for any startup errors.


/tmp/<username>/hive.log

Run hive in debug mode to see the errors.


hive -hiveconf hive.root.logger=DEBUG,console

Check for any defunc processes running from long time. If found any, kill them.
ps -aef |grep -i defunc

7. No LoginModules Exception?

Exception in thread "main" java.io.IOException: failure to login: No LoginModules configured for


hadoop_simple
2
at
org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:724
)

8. How to enable verbose property to know which jars are getting picked from where?

Add below properties to mapred-site.xml

<property>
<name>mapreduce.map.java.opts</name>
<value> -Xmx512M -verbose:class </value>
</property>

In Spark side,Set the below property in CLI.

export SPARK_SUBMIT_OPTS=-verbose:class

And add below properties at /opt/mapr/spark/spark-1.6.1/conf/spark-defaults.conf

spark.driver.userClassPathFirst=true
spark.executor.userClassPathFirst=true

9. RM process going Beyond Xmx values?


Bug in RM. Fixed by MapR.

10. RM not utilizing resources even when resources are available?


Bug in RM. Fixed by MapR.

11. .

12. A
13. A
14. A
15. A
16. A
17. A

You might also like