You are on page 1of 2

HADOOP

HADOOP BASICS
The Motivation for Hadoop
 Problems with traditional large-scale systems
 Data Storage literature survey
 Data Processing literature Survey
 Network Constraints
 Requirements for a new approach
Hadoop: Basic Concepts
 What is Hadoop?
 The Hadoop Distributed File System
 Hadoop Map Reduce Works
 Anatomy of a Hadoop Cluster
HDFS (Hadoop Distributed File System)
 Blocks and Splits
 Input Splits
 HDFS Splits
 Data Replication
 Hadoop Rack Aware
 Data high availability
 Cluster architecture and block placement
 CASE STUDIES

Programming Practices & Performance Tuning

 Pseudo-distributed Mode

 Fully distributed mode

Hadoop Development
Writing a MapReduce Program
 Examining a Sample MapReduce Program with several examples
 Basic API Concepts
 The Driver Code
 The Mapper
 The Reducer
 Hadoop's Streaming API
Performing several Hadoop jobs
 The configure and close Methods
 Sequence Files
 Record Reader
 Record Writer
 Role of Reporter
 Output Collector
 Counters
 Directly Accessing HDFS
 ToolRunner
 Using The Distributed Cache
Several MapReduce jobs (In Detailed)
 MOST EFFECTIVE SEARCH USING MAPREDUCE
 GENERATING THE RECOMMENDATIONS USING MAPREDUCE
 PROCESSING THE LOG FILES USING MAPREDUCE
 Identity Mapper
 Identity Reducer
 Exploring well known problems using MapReduce applications
Advanced MapReduce Programming
 The Secondary Sort
 Customized Input Formats and Output Formats
 Joins in MapReduce
Tuning for Performance in MapReduce
 Reducing network traffic with combiner
 Partitions
 Reducing the amount of input data
 Using Compression
 Reusing the JVM
 Running with speculative execution
 Other Performance Aspects

HADOOP ANALYST
Hive
 Hive concepts
 Hive architecture
 Install and configure hive on cluster
 Different type of tables in hive
 Hive library functions
 Buckets
 Partitions
 File formats
 Joins in hive
Sqoop
 Install and configure Sqoop on cluster
 Connecting to RDBMS
 Installing Mysql
 Import data from Oracle/Mysql to hive
 Export data to Oracle/Mysql
 Internal mechanism of import/export

SPARK INTRODUCTION WITH EXAMPLES


KAFKA INTRODUCTION

You might also like