Professional Documents
Culture Documents
Program Highlights :
Data Lake Analytics Program is developed by very experienced and proven big data professionals in core
big data industry. There are three major streams in this program wherein candidate can choose one of the
certification track and our experienced big data professionals help you to cater the training needs of
chosen certification tracks and help them to be part of big data industry. This program includes extensive
foundation training on Red Hat Linux, Apache Hadoop, Apache Spark and advanced training includes
hands on Hortonworks Distribution Apache Hadoop ( Hortonworks Data Platform ) , Data science spark
Lab with zeppelin notebook and Scala IDE . Our experienced trainers can help you to perform essential
tasks as per certification exam objectives of your chosen certification track and achieve recognition
badges :-
Tools
Data Lake Analytics Program training can helpful for engineering , IT graduates to increase their
employability in market with niche technologies catered to demand in Big Data Industry . This program
will be helpful for experienced IT professionals like Linux administrators , BI Developers , Data Analyst
to upgrade their skills with next generation technologies and take up new roles like Hadoop
Administrators , Big Data Hadoop Architect , Data Engineers and Data Scientist.
Nasscom, the IT industry’s trade association, decided to clear the air and set the record straight and
said -
“ The big challenge for IT companies, however, will be to re-engineer its 3.9 million-strong human
resource base to meet the demands of a fast-transforming marketplace. Not only is technology
changing rapidly, with automation and big data making deep inroads, the demands of industry’s global
clientele have also evolved. “
Data Lake Analytics Program training will be delivered in two modes and candidates can either opt for
physical classroom or virtual class room training delivered through audio conferencing and desktop
sharing mode.
This training will be kicked off on 15th Apr 2018 and 22nd Apr 2018 with introductory session and
training calendar will be distributed further to enrolled candidates.
Additional Services :
Industry Based Projects and Case studies , Job Assistance - Referrals, Resume Building , Professional
Grooming and Mock Interview for Freshers
Training Duration : 2-3 Months ( Weekends -SAT , SUN )
Training Location :
Shop 136 , Boulevard Mall , Mumbai Agra Road, Thane West, Thane, Maharashtra
You can email us for any queries on training program and fees on datalakeacademy@gmail.com
*Google Cloud Platform, offered by Google, is a suite of cloud computing services that runs on the same
infrastructure that Google uses internally for its end-user products, such as Google Search and YouTube.
*Red Hat® Enterprise Linux® gives you the tools you need to modernize your infrastructure, boost efficiency through
standardization and virtualisation, and ultimately prepare your datacenter for an open, hybrid cloud IT architecture.
*Scala IDE provides advanced editing and debugging support for the development of pure Scala and mixed Scala-
Java applications.
*Apache Spark™ is a fast and general engine for large-scale data processing.
Apache Zeppelin. Web-based notebook that enables data-driven, interactive data analytics and collaborative
documents with SQL, Scala and more
Ansible delivers simple IT automation that ends repetitive tasks and frees up DevOps teams for more strategic work.
FORMAT
50% Lecture
50% Hands-on Labs
AGENDA SUMMARY
Week 1 : Introduction to Linux, Big Data, Apache Hadoop
Day 1 :
OBJECTIVES \ LECTURES :
1) Introduction to Big Data, Apache Hadoop
2) Introduction to Linux
3) Linux Boot Process and Architecture
4) Linux Commands, Shell Scripts , Cron Utility
5) RHEL Linux OS Best Practices for Hadoop
6) Virtualization ( VMWARE, Virtual Box )
6) Quiz & Q&A
LAB :
1) Installation of Linux OS
2) Practising Linux OS Commands
3) Configuring RHEL Linux OS Best Practices for Hadoop
4) Read and Execute Shell Script and Schedule through Cron Utility
OBJECTIVES \ LECTURES :
1) Design of HDFS
HDFS Concepts
Blocks
NameNodes DataNodes
HDFS Federation
HDFS High Availabiltity
2) Manage HDFS using Command-line Tools
3) Discussing Hadoop Cluster Installation Options
4) Understanding Hadoop Configuration Files
5) Quiz & Q&A
LAB :
Week 2 :
OBJECTIVES \ LECTURES :
1) MR1 vs YARN ( MR2)
2) YARN Architecture- Anatomy of a YARN Application Run
3) Scheduling In YARN and Scheduler Options
4) Capacity Scheduler Configuration
5) Fair Scheduler Configuration
6) Preemption , Delay Scheduling
7) Dominant Resource Fairness ( DRF) Configuration
6) Quiz & Q&A
LAB :
OBJECTIVES \ LECTURES :
1) Introduction to HDP and Architecture
2) Understanding Typical Production Cluster Specification - Namenodes , DataNodes, EdgeNodes,
Management or Utility Nodes - Hardware Requirements
3) Understanding Network Architecture / Topology for Typical Production Hadoop Cluster
4) Understanding Role of ZooKeeper and Journal Nodes
LAB :
Week 3 :
OBJECTIVES \ LECTURES :
1) Recap of Key Concepts of HDFS and YARN
2) Quiz & Q&A
3) Discussing Best Practices for HDP Cluster Deployment
4) Case Study of Typical Product Cluster Big Data Architecture
5) Understanding Hive and Spark Architecture
6) Manage HDFS using Ambari Web, NameNode and DataNode UIs
7) Summarize the Purpose and Benefits of Rack Awareness
LAB :
Day 6:
LAB :
Week 4:
Day 7 : HIGH AVAILABILITY WITH HDP, DEPLOYING HDP WITH BLUEPRINTS, AND THE HDP
UPGRADE PROCESS
OBJECTIVES \ LECTURES :
Recap of Week 3 - Quiz & Q&A
Summarize the Purpose of NameNode HA
Configure NameNode HA Using Ambari
Describe the Features and Benefits of the Apache Ambari Dashboard
Ambari Views and Blueprints
LAB :
Configuring NameNode HA
Configuring Resource Manager HA
Configuring Ambari Alerts
Day 8:
OBJECTIVES \ LECTURES :
Recall the Types of Methods and Upgrades Available in HDP
Describe the Upgrade Process, Restrictions and Pre-upgrade Checklist
Perform an Upgrade Using the Apache Ambari Web UI
LAB :
Day 9 - Ranger
OBJECTIVES \ LECTURES :
Authentication and Authorization
Ambari User Management
Ranger Architecture
Atlas Architecture
Hue Architecture
Case Study of Type User Management in Enterprise Cluster through AD ( Kerberos )
SmartSense Usage
LAB :
Ranger Installation
Creation Ranger Policies like HDFS, Hive etc
Day 10
Week 6:
Day 11 :
Recap of Week 5
HDP CA Certification Tasks - Part - 2
Day 12:
Project Assignment
Day 13
Q&A
FORMAT
50% Lecture
50% Hands-on Labs
AGENDA SUMMARY
Week 1 : Data Ingestion
Day 1 :
OBJECTIVES \ LECTURES :
1) Introduction to Hive
2) Hive Architecture
3) Flume Architecture
LAB :
Import data from a table in a relational database into HDFS
Import the results of a query from a relational database into HDFS
Import a table from a relational database into a new or existing Hive table
Insert or update data from HDFS into a table in a relational database
Given a Flume configuration file, start a Flume agent
Given a configured sink and source, configure a Flume memory channel with a specified capacity
OBJECTIVES \ LECTURES :
1) Introduction to Pig
2) Transformations available in PIG
LAB :
Write and execute a Pig script
Load data into a Pig relation without a schema
Load data into a Pig relation with a schema
Load data from a Hive table into a Pig relation
Use Pig to transform data into a specified format
Transform data to match a given Hive schema
Group the data of one or more Pig relations
Use Pig to remove records with null values from a relation
Store the data from a Pig relation into a folder in HDFS
Store the data from a Pig relation into a Hive table
Sort the output of a Pig relation
Remove the duplicate tuples of a Pig relation
Specify the number of reduce tasks for a Pig MapReduce job
Join two datasets using Pig
Perform a replicated join using Pig
Run a Pig job using Tez
Within a Pig script, register a JAR file of User Defined Functions
Within a Pig script, define an alias for a User Defined Function
Within a Pig script, invoke a User Defined Function
Home Assignment for Data Ingestion and Transformation using Sqoop and Flume Pig
© 2018 Data Lake Academy
Hadoop Developer Certification Course
AGENDA SUMMARY
Week 2 : Hive SQL
Day 3 :
DDL (create/drop/alter/truncate/show/describe), Statistics
(analyze), Indexes, Archiving,
DML (load/insert/update/delete/merge, import/export, explain plan),
File Formats and Compression: RCFile, Avro, ORC, Parquet; Compression, LZO
Hive Configuration Properties
Hive Client (JDBC, ODBC, Thrift)
HiveServer2: Overview, HiveServer2 Client and Beeline, Hive Metrics
https://cwiki.apache.org/confluence/display/Hive/Home
Spark Certification Course
FORMAT
50% Lecture
50% Hands-on Labs
AGENDA SUMMARY
Week 1 : Introduction to Spark
Day 1 :
OBJECTIVES \ LECTURES :
1) Introduction to Spark
2) Benefits of Spark over MapReduce
3) Spark Architecture
4) Spark IDEs overview -( Scala IDEs, IntelliJ, Maven )
LAB :
OBJECTIVES \ LECTURES :
1) Spark SQL architecture
2) Spark Streaming Architecture
3) Data Visualisation using Zeppelin
LAB :
The candidate can opt for below course offerings and our recommendation will
be big data architect / SME program which gives more value and build strong
career ahead with leading big data organisations and startup in the market and
this is real investment in career.
HDPCA - This is entry level program which will help to achieve Hortonworks
administration certification badge.
HDPCA + Ansible - This program caters industry’s need for automation along with
Hadoop and has more value in market thus more incentives
HDP Developer Spark – This program covers apache Hadoop and HDP basics along
with core Spark developer tools with Scala IDE and zeppelin, RStudio for visualisation
and statistical reporting.
Big Data Architect / SME Course – This is unique program which meets the need of
Big Data Architect or SME role. After completing this course, you will be able to help
your clients to provide end to end solutions with Hadoop platform, Data ingestion
framework for data at rest and in motion and also covers latest spark developer tools
to build real time use case and stunning visualisations. This is recommended course
for experience hires who want to move as big data architect and also new graduates
to get overall big picture on how to build end to end solutions and can give big
opportunities in start-up firms.