Professional Documents
Culture Documents
L13-1
Learning Goals
L13-2
How Do You Know There's a Problem?
• Alarm raised
• Performance issues
• Jobs failing
L13-3
Locate Troubleshooting Resources
• Log files
• MapR documentation
– maprdocs.mapr.com
L13-4
Default Log File Locations
L13-5
Set Log File Retention
YARN
To change, set:
yarn.nodemanager.log.retain-seconds
in:
yarn-site.xml
/opt/mapr/logs
To retain indefinitely set:
log.retention.exceptions
in:
warden.conf
L13-6
Problem: Services Not Running
• Typically corresponds to a raised alarm
L13-7
Strategy for Troubleshooting Service Problems
1. Open two terminal windows
2. Window 1:
$ service mapr-warden stop
3. Window 2:
$ tail –f /opt/mapr/logs/warden.log
4. Window 1:
$ service mapr-warden start
L13-8
Problem: Cluster Startup Issues
1. ZooKeepers up?
$ service mapr-zookeeper qstatus
L13-9
Problem: Data Not Available
• Is affected volume node-specific?
– /var/mapr/local/<node name>
– Is node offline?
• Permission problems
L13-10
Problem: Jobs Not Running
• Start with the YARN job logs
– Command line
– ResourceManager page (<IP address>:19888)
• Permissions problems
• Resource problems
L13-11
Problem: Degraded Performance
• Check /opt/mapr/logs/mfs.log-3
– Failures communicating with the CLDB nodes or other servers?
L13-12
Learning Goals
L13-13
Core Files
• Located in /opt/cores
– Name includes service involved
L13-14
Caution:
fsck
Using fsck –r
• Find/fix file system inconsistencies can result in loss
of data. Contact
– /opt/mapr/server/fsck
support first!
• Verification or repair
– Use -r for repair
L13-15
gfsck
• Find/fix volume inconsistencies
– /opt/mapr/bin/gfsck
• Typical process
– Take storage pool offline
– Run fsck on storage pool
– Bring storage pool back online
– Run gfsck on affected cluster/volumes/snapshots
L13-16
mapr-support-dump.sh
• Collects information/logs from a single node
$ /opt/mapr/support/tools/mapr-support-dump.sh
• Output at /opt/mapr/support/dump
– Creates a .tar file
L13-17
mapr-support-collect.sh
• Collects information/logs from entire cluster
$ /opt/mapr/support/tools/mapr-support-collect.sh
• Output at /opt/mapr/support/collect
– Uses SSH and SCP
– Creates a .tar file
L13-18
Congratulations!
L13-19