You are on page 1of 12

Sathish Yellanki’s Spark

Mark Your Next Generation In-Memory Data Management

We Make You Start With Concepts, Take You


into The Architecture, Analysis And
Application Development.
Overview of Big Data and Apache Spark
Getting Introduced To Apache Spark
o Why We Need Apache Spark?
o Apache Spark in The Context of Big Data
o Apache Spark Time Line
o Making The Spark Environment Ready
Downloading Spark Locally
Understanding Spark Interactive Consoles
Starting Our First Spark Environment
Understanding The Spark Session
Getting into Apache Spark Architecture
o Basic Architecture of Apache Spark
Spark Applications
o Understanding The Apache Spark’s Language APIs
o Understanding The Spark’s APIs
o Introduction To Resilient Distributed Datasets(RDD’s)
o Introduction To Apache Spark DataFrames
Understanding The Partitions in DataFrames
o Understanding Transformations in Apache Spark
Lazy Evaluation in Spark Transformations
o Understanding The Apache Spark Actions
o Understanding The Spark UI
o Understanding The Spark SQL
Getting An Idea on Apache Spark’s Toolset
o How To Run Production Applications in Spark
o Datasets of Apache Spark And Type-Safe Structured APIs
o What is Meant By Structured Streaming in Apache Spark?
o Provisions of ML and Advanced Analytics in Spark
o Introduction of Lower-Level API’s in Apache Spark
o Getting An Idea on SparkR
o Catching An Idea on Spark’s Ecosystem and Packages
Understanding The Apache DataFrames and Datasets
o What Are Schemas in Apache DataFrames?
o Getting The Overview of Structured Spark Types
Difference Between DataFrames & Datasets
Understanding The Columns And Rows
Understanding The Apache Spark Types
 Overview of Structured API Execution in Apache Spark
 Understanding Logical Planning
 Understanding Physical Planning
 Understanding Execution
o Basic Structured Operations
Schemas
Columns and Expressions
 Columns
 Expressions
Records and Rows
 Creating Rows
DataFrame Transformations
How To Create DataFrames?
Usderstanding Select And SelectExpr
Converting Data to Spark Types
Adding Columns into The DataFrame
How To Rename OR Alias Columns?
Spark DataFrame Reserved Characters and Keywords
Understanding The Case Sensitivity Issues
How To Removing Columns in DataFrame
Implementing Type Casting on Columns
Data Filtering on Rows
Representing Unique OR Distinct Rows
Getting Random Samples
Implementing Random Splits
Concatenating and Appending Rows OR Applying Union
Implementing Data Sorting on Rows
Implementing The Concept of Limit
Implementing The Repartition and Coalesce
Collecting Rows to the Driver
Learning How To Work With Different Types of Data
o Excavating The Required API’s
o Understanding The Concept of Converting to Spark Types
o Working With Booleans in Spark
o Working With Numbers in Spark
o Working With Strings And Regular Expressions in Spark
o Working with Dates and Timestamps in Spark
o Different Strategies For Handling Null’s in Spark
coalesce
ifnull, nullIf, nvl, and nvl2
drop
fill
replace
o Implementing Ordering Upon The Data
o Understanding Complex Types in Spark
structs
arrays
split
array Length
array_contains
explode
Maps
o Handling And Operating With JSON Format
o Designing User-Defined Functions in Spark
Executing Aggregations Using Spark
o Understanding Aggregation Functions in Spark
count
countDistinct
approx_count_distinct
first and last
min and max
sum
sumDistinct
avg
Variance and Standard Deviation
skewness and kurtosis
Covariance and Correlation
Aggregating to Complex Types
o Applying Grouping Upon The Data
Grouping With Expressions
Grouping With Maps
o Applying Window Functions Upon The Data
o Implementing Grouping Sets Upon The Data
Rollups
Cube
Grouping Metadata
Pivot
o User-Defined Aggregation Functions
Executing Joins in Spark
o Join Expressions
o Join Types
Inner Joins
Outer Joins
Left Outer Joins
Right Outer Joins
Left Semi Joins
Left Anti Joins
Natural Joins
Cross (Cartesian) Joins
o What Are The Challenges When Using Joins
Joins on Complex Types
Handling Duplicate Column Names in Joins
o How Spark Architecture Performs Joins
Understanding Communication Strategies in Spark
Working With Different Data Sources in Spark
o Understanding Structure of The Data Sources API in Spark
Data Reading API Structure
Understanding Basics of Reading Data
Data Write API Structure
Understanding Basics of Writing Data
o Working With CSV Files
CSV Options
Reading CSV Files
Writing CSV Files
o Working With JSON Files
JSON Options Available in Spark
Reading JSON Files
Writing JSON Files
o Parquet Files
Reading Parquet Files
Writing Parquet Files
o ORC Files
Reading ORC Files
Writing ORC Files
o SQL Databases
Reading from SQL Databases
Query Pushdown
Writing to SQL Databases
o Text Files
Reading Text Files
Writing Text Files
o Advanced I/O Concepts
Splittable File Types and Compression
Reading Data in Parallel
Writing Data in Parallel
Writing Complex Types
Managing File Size
Associating With Spark SQL
o What is Spark SQL?
o Big Data and SQL With Apache Hive
o Big Data and SQL With Spark SQL
Understanding Spark’s Relationship To Hive
o How to Run Spark SQL Queries
Understanding Spark SQL CLI
Spark’s Programmatic SQL Interface
Spark SQL Thrift JDBC/ODBC Server
o Understanding The Concept of Catalog in Spark SQL
o Working With Tables in Spark SQL
Spark-Managed Tables
Creating Tables
Creating External Tables
Inserting into Tables
Describing Table Metadata
Refreshing Table Metadata
Dropping Tables
Caching Tables
Working With Views in Spark
Creating Views in Spark
Dropping Views in Spark
o Working With Databases in Spark
Creating Databases in Spark
Setting the Database in Spark
Dropping Databases in Spark
o Implementing Select Statements
case…when…then Statements
o Understanding The Advanced Topics
Working With Complex Types
Implementing Functions in Spark SQL
Building Subqueries in Spark SQL
o Configurations in Spark SQL
o Setting Configuration Values in SQL
Understanding Datasets Concept in Spark
o When to Use Datasets in Spark
o How To Create Datasets in Spark
Understanding Datasets in Java  Encoders
Understanding Datasets In Scala  Case Classes
o Understanding Actions on Datasets
o Understanding Transformations on Datasets
Working With Data Filtering
Working With Mapping Concepts
o Implementing Joins on Datasets
o Implementing Grouping and Aggregations on Datasets
Working With Low-Level API’s in Spark
o Understanding Resilient Distributed Datasets (RDDs)
o What Are Low-Level API’s Available in Apache Spark?
When to Use Low-Level API’s?
How to Use Low-Level API’s?
o Understanding RDD’s in Apache Spark
What Are The Types of RDD’s
When to Use RDD’s in Apache Spark?
Applying Datasets and RDD’s of Case Classes
o How To Create RDD’s in Apache Spark?
Interoperating Between DataFrames, Datasets, and RDD’s
From a Local Collection
From Data Sources
o How To Manipulate RDD’s in Apache Spark?
o Implementing Transformations on Apache Spark RDD’s?
distinct
filter
map
sort
Random Splits
o Implementing Actions on Spark RDD’s
reduce
count
first
max and min
take
o How To Save Files Using Spark RDD’s?
saveAsTextFile
SequenceFiles
Hadoop Files
o What is Caching ikn Spark RDD’s?
o Implementing Checkpointing Concept in Apache Spark
o Pipe RDDs to System Commands
mapPartitions
foreachPartition
glom
Advanced RDDs
o Understanding Key-Value Basics in RDD’s
keyBy
Mapping over Values
Extracting Keys and Values
lookup
sampleByKey
o Implementing Aggregations on Spark RDD’s
countByKey
Understanding Aggregation Implementations
Other Aggregation Methods
o What Are CoGroups in Apache Spark?
o Executing Joinsin Apache Spark?
Inner Join
zips
o Controlling Partitions in Apache Spark
coalesce
repartition
repartition And Sort Within Partitions
Custom Partitioning
o Custom Serialization Concept in Apache Spark
Understanding Distributed Shared Variables in Spark
o What Are Broadcast Variables
o Understanding Accumulators in Spark
Basic Example
Custom Accumulators
Production Applications
o Understanding How Spark Runs on a Cluster
Looking into The Architecture of a Spark Application
Understanding The Spark Execution Modes
Understanding The Life Cycle of a Spark Application (Outside Spark)
Client Request
Launch
Execution
Completion
Understanding The Life Cycle of a Spark Application (Inside Spark)
Understanding SparkSession
What Are Logical Instructions
Insight of Spark Job
Understanding Stages in Spark
Understanding The Spark Tasks
o Looking The Execution Details of Spark
Pipelining Concepts in Apache Spark
What is Shuffle Persistence
Developing Spark Applications
o Learning The Basics of Writing Spark Applications
Understanding A Simple Scala-Based App
How To Write Python Applications
How To Write Java Applications
o Understanding And Testing Spark Applications
Introduction of Strategic Principles
What Are Tactical Takeaways
How To Connect to Unit Testing Frameworks
How To Connect to Data Sources
o Understanding The Spark Development Process
o How To Launch Spark Applications
o Configuring Spark Applications
Understanding The SparkConf
What Are Application Properties?
What Are Runtime Properties?
What Are Execution Properties?
Configuring Memory Management in Apache Spark
Configuring Shuffle Behavior of Apache Spark
Environmental Variables in Apache Spark
Job Scheduling Within an Application of Spark
Understanding Concept of Deploying Spark
o Where to Deploy Spark Cluster to Run Spark Applications
Understanding On-Premise Cluster Deployments
Understanding Spark in the Cloud
o What Are Cluster Managers in Apache Spark?
Understanding Standalone Mode
Spark on YARN
Configuring Spark on YARN Applications
Spark on MESOS
What Are Secure Deployment Configurations in Apache Spark?
What Are Cluster Networking Configurations in Apache Spark?
What is Application Scheduling in Apache Spark?
Understanding Monitoring and Debugging in Apache Spark
o The Monitoring Landscape
o What to Monitor
Driver and Executor Processes
Queries, Jobs, Stages, and Tasks
o Spark Logs
o The Spark UI
Spark REST API
Spark UI History Server
o Debugging and Spark First Aid
Spark Jobs Not Starting
Errors Before Execution
Errors During Execution
Slow Tasks or Stragglers
Slow Aggregations
Slow Joins
Slow Reads and Writes
Driver OutOfMemoryError OR Driver Unresponsive
Executor OutOfMemoryError OR Executor Unresponsive
Unexpected NULL’s in Results
No Space Left on Disk Errors
Serialization Errors
Performance Tuning
o Indirect Performance Enhancements
Design Choices
Object Serialization in RDDs
Cluster Configurations
Scheduling
Data at Rest
Shuffle Configurations
Memory Pressure and Garbage Collection
o Direct Performance Enhancements
Parallelism
Improved Filtering
Repartitioning and Coalescing
User-Defined Functions (UDFs)
Temporary Data Storage (Caching)
Joins
Aggregations
Broadcast Variables
Understanding Apache Streaming
o Understanding Stream Processing Fundamentals
o What is Meant By Stream Processing?
Stream Processing Use Cases
Advantages of Stream Processing
Challenges of Stream Processing
o Understanding Stream Processing Design Points
What is Meant By Record-at-a-Time Versus Declarative APIs
Event Time Versus Processing Time
Continuous Versus Micro-Batch Execution
o What Are Spark’s Streaming APIs
Understanding The DStream API
Understanding Structured Streaming
Understanding Structured Streaming Basics
o Fundamentals of Structured Streaming Basics
o Catching Up With Core Concepts
Transformations and Actions
Input Sources
Sinks
Output Modes
Triggers
Event-Time Processing
o Applying Structured Streaming in Action
o Applying Transformations on Streams
Selections and Filtering
Aggregations
Joins
o Catching up Input and Output
Where Data Is Read and Written (Sources and Sinks)
Reading from the Kafka Source
Writing to the Kafka Sink
How Data Is Output (Output Modes)
When Data Is Output (Triggers)
o Streaming Dataset API
Event-Time and Stateful Processing
o Event Time
o Stateful Processing
o Arbitrary Stateful Processing
o Event-Time Basics
o Windows on Event Time
Tumbling Windows
Handling Late Data with Watermarks
o Dropping Duplicates in a Stream
o Arbitrary Stateful Processing
Time-Outs
Output Modes
mapGroupsWithState
flatMapGroupsWithState
Structured Streaming in Production
o Fault Tolerance and Checkpointing
o Updating Your Application
Updating Your Streaming Application Code
Updating Your Spark Version
Sizing and Rescaling Your Application
o Metrics and Monitoring
Query Status
Recent Progress
Spark UI
o Alerting
o Advanced Monitoring with the Streaming Listener

Contact For Details And Counseling


Plot No 47, 3RD Floor(Top Floor),
Gayathri Nagar,
Behind Maithrivanam,
SAP Street, Ameerpet, Hyderabad
Phones : 040-6464 0047, 9985798869,
Whatsapp For Enquiry : 9985798869

Request For Demo Video by Whatsapp Message


“Please Share Demo Video – BigData <yourmaill-ID>”

You might also like