Mark Your Next Generation In-Memory Data Management
We Make You Start With Concepts, Take You
into The Architecture, Analysis And Application Development. Overview of Big Data and Apache Spark Getting Introduced To Apache Spark o Why We Need Apache Spark? o Apache Spark in The Context of Big Data o Apache Spark Time Line o Making The Spark Environment Ready Downloading Spark Locally Understanding Spark Interactive Consoles Starting Our First Spark Environment Understanding The Spark Session Getting into Apache Spark Architecture o Basic Architecture of Apache Spark Spark Applications o Understanding The Apache Spark’s Language APIs o Understanding The Spark’s APIs o Introduction To Resilient Distributed Datasets(RDD’s) o Introduction To Apache Spark DataFrames Understanding The Partitions in DataFrames o Understanding Transformations in Apache Spark Lazy Evaluation in Spark Transformations o Understanding The Apache Spark Actions o Understanding The Spark UI o Understanding The Spark SQL Getting An Idea on Apache Spark’s Toolset o How To Run Production Applications in Spark o Datasets of Apache Spark And Type-Safe Structured APIs o What is Meant By Structured Streaming in Apache Spark? o Provisions of ML and Advanced Analytics in Spark o Introduction of Lower-Level API’s in Apache Spark o Getting An Idea on SparkR o Catching An Idea on Spark’s Ecosystem and Packages Understanding The Apache DataFrames and Datasets o What Are Schemas in Apache DataFrames? o Getting The Overview of Structured Spark Types Difference Between DataFrames & Datasets Understanding The Columns And Rows Understanding The Apache Spark Types Overview of Structured API Execution in Apache Spark Understanding Logical Planning Understanding Physical Planning Understanding Execution o Basic Structured Operations Schemas Columns and Expressions Columns Expressions Records and Rows Creating Rows DataFrame Transformations How To Create DataFrames? Usderstanding Select And SelectExpr Converting Data to Spark Types Adding Columns into The DataFrame How To Rename OR Alias Columns? Spark DataFrame Reserved Characters and Keywords Understanding The Case Sensitivity Issues How To Removing Columns in DataFrame Implementing Type Casting on Columns Data Filtering on Rows Representing Unique OR Distinct Rows Getting Random Samples Implementing Random Splits Concatenating and Appending Rows OR Applying Union Implementing Data Sorting on Rows Implementing The Concept of Limit Implementing The Repartition and Coalesce Collecting Rows to the Driver Learning How To Work With Different Types of Data o Excavating The Required API’s o Understanding The Concept of Converting to Spark Types o Working With Booleans in Spark o Working With Numbers in Spark o Working With Strings And Regular Expressions in Spark o Working with Dates and Timestamps in Spark o Different Strategies For Handling Null’s in Spark coalesce ifnull, nullIf, nvl, and nvl2 drop fill replace o Implementing Ordering Upon The Data o Understanding Complex Types in Spark structs arrays split array Length array_contains explode Maps o Handling And Operating With JSON Format o Designing User-Defined Functions in Spark Executing Aggregations Using Spark o Understanding Aggregation Functions in Spark count countDistinct approx_count_distinct first and last min and max sum sumDistinct avg Variance and Standard Deviation skewness and kurtosis Covariance and Correlation Aggregating to Complex Types o Applying Grouping Upon The Data Grouping With Expressions Grouping With Maps o Applying Window Functions Upon The Data o Implementing Grouping Sets Upon The Data Rollups Cube Grouping Metadata Pivot o User-Defined Aggregation Functions Executing Joins in Spark o Join Expressions o Join Types Inner Joins Outer Joins Left Outer Joins Right Outer Joins Left Semi Joins Left Anti Joins Natural Joins Cross (Cartesian) Joins o What Are The Challenges When Using Joins Joins on Complex Types Handling Duplicate Column Names in Joins o How Spark Architecture Performs Joins Understanding Communication Strategies in Spark Working With Different Data Sources in Spark o Understanding Structure of The Data Sources API in Spark Data Reading API Structure Understanding Basics of Reading Data Data Write API Structure Understanding Basics of Writing Data o Working With CSV Files CSV Options Reading CSV Files Writing CSV Files o Working With JSON Files JSON Options Available in Spark Reading JSON Files Writing JSON Files o Parquet Files Reading Parquet Files Writing Parquet Files o ORC Files Reading ORC Files Writing ORC Files o SQL Databases Reading from SQL Databases Query Pushdown Writing to SQL Databases o Text Files Reading Text Files Writing Text Files o Advanced I/O Concepts Splittable File Types and Compression Reading Data in Parallel Writing Data in Parallel Writing Complex Types Managing File Size Associating With Spark SQL o What is Spark SQL? o Big Data and SQL With Apache Hive o Big Data and SQL With Spark SQL Understanding Spark’s Relationship To Hive o How to Run Spark SQL Queries Understanding Spark SQL CLI Spark’s Programmatic SQL Interface Spark SQL Thrift JDBC/ODBC Server o Understanding The Concept of Catalog in Spark SQL o Working With Tables in Spark SQL Spark-Managed Tables Creating Tables Creating External Tables Inserting into Tables Describing Table Metadata Refreshing Table Metadata Dropping Tables Caching Tables Working With Views in Spark Creating Views in Spark Dropping Views in Spark o Working With Databases in Spark Creating Databases in Spark Setting the Database in Spark Dropping Databases in Spark o Implementing Select Statements case…when…then Statements o Understanding The Advanced Topics Working With Complex Types Implementing Functions in Spark SQL Building Subqueries in Spark SQL o Configurations in Spark SQL o Setting Configuration Values in SQL Understanding Datasets Concept in Spark o When to Use Datasets in Spark o How To Create Datasets in Spark Understanding Datasets in Java Encoders Understanding Datasets In Scala Case Classes o Understanding Actions on Datasets o Understanding Transformations on Datasets Working With Data Filtering Working With Mapping Concepts o Implementing Joins on Datasets o Implementing Grouping and Aggregations on Datasets Working With Low-Level API’s in Spark o Understanding Resilient Distributed Datasets (RDDs) o What Are Low-Level API’s Available in Apache Spark? When to Use Low-Level API’s? How to Use Low-Level API’s? o Understanding RDD’s in Apache Spark What Are The Types of RDD’s When to Use RDD’s in Apache Spark? Applying Datasets and RDD’s of Case Classes o How To Create RDD’s in Apache Spark? Interoperating Between DataFrames, Datasets, and RDD’s From a Local Collection From Data Sources o How To Manipulate RDD’s in Apache Spark? o Implementing Transformations on Apache Spark RDD’s? distinct filter map sort Random Splits o Implementing Actions on Spark RDD’s reduce count first max and min take o How To Save Files Using Spark RDD’s? saveAsTextFile SequenceFiles Hadoop Files o What is Caching ikn Spark RDD’s? o Implementing Checkpointing Concept in Apache Spark o Pipe RDDs to System Commands mapPartitions foreachPartition glom Advanced RDDs o Understanding Key-Value Basics in RDD’s keyBy Mapping over Values Extracting Keys and Values lookup sampleByKey o Implementing Aggregations on Spark RDD’s countByKey Understanding Aggregation Implementations Other Aggregation Methods o What Are CoGroups in Apache Spark? o Executing Joinsin Apache Spark? Inner Join zips o Controlling Partitions in Apache Spark coalesce repartition repartition And Sort Within Partitions Custom Partitioning o Custom Serialization Concept in Apache Spark Understanding Distributed Shared Variables in Spark o What Are Broadcast Variables o Understanding Accumulators in Spark Basic Example Custom Accumulators Production Applications o Understanding How Spark Runs on a Cluster Looking into The Architecture of a Spark Application Understanding The Spark Execution Modes Understanding The Life Cycle of a Spark Application (Outside Spark) Client Request Launch Execution Completion Understanding The Life Cycle of a Spark Application (Inside Spark) Understanding SparkSession What Are Logical Instructions Insight of Spark Job Understanding Stages in Spark Understanding The Spark Tasks o Looking The Execution Details of Spark Pipelining Concepts in Apache Spark What is Shuffle Persistence Developing Spark Applications o Learning The Basics of Writing Spark Applications Understanding A Simple Scala-Based App How To Write Python Applications How To Write Java Applications o Understanding And Testing Spark Applications Introduction of Strategic Principles What Are Tactical Takeaways How To Connect to Unit Testing Frameworks How To Connect to Data Sources o Understanding The Spark Development Process o How To Launch Spark Applications o Configuring Spark Applications Understanding The SparkConf What Are Application Properties? What Are Runtime Properties? What Are Execution Properties? Configuring Memory Management in Apache Spark Configuring Shuffle Behavior of Apache Spark Environmental Variables in Apache Spark Job Scheduling Within an Application of Spark Understanding Concept of Deploying Spark o Where to Deploy Spark Cluster to Run Spark Applications Understanding On-Premise Cluster Deployments Understanding Spark in the Cloud o What Are Cluster Managers in Apache Spark? Understanding Standalone Mode Spark on YARN Configuring Spark on YARN Applications Spark on MESOS What Are Secure Deployment Configurations in Apache Spark? What Are Cluster Networking Configurations in Apache Spark? What is Application Scheduling in Apache Spark? Understanding Monitoring and Debugging in Apache Spark o The Monitoring Landscape o What to Monitor Driver and Executor Processes Queries, Jobs, Stages, and Tasks o Spark Logs o The Spark UI Spark REST API Spark UI History Server o Debugging and Spark First Aid Spark Jobs Not Starting Errors Before Execution Errors During Execution Slow Tasks or Stragglers Slow Aggregations Slow Joins Slow Reads and Writes Driver OutOfMemoryError OR Driver Unresponsive Executor OutOfMemoryError OR Executor Unresponsive Unexpected NULL’s in Results No Space Left on Disk Errors Serialization Errors Performance Tuning o Indirect Performance Enhancements Design Choices Object Serialization in RDDs Cluster Configurations Scheduling Data at Rest Shuffle Configurations Memory Pressure and Garbage Collection o Direct Performance Enhancements Parallelism Improved Filtering Repartitioning and Coalescing User-Defined Functions (UDFs) Temporary Data Storage (Caching) Joins Aggregations Broadcast Variables Understanding Apache Streaming o Understanding Stream Processing Fundamentals o What is Meant By Stream Processing? Stream Processing Use Cases Advantages of Stream Processing Challenges of Stream Processing o Understanding Stream Processing Design Points What is Meant By Record-at-a-Time Versus Declarative APIs Event Time Versus Processing Time Continuous Versus Micro-Batch Execution o What Are Spark’s Streaming APIs Understanding The DStream API Understanding Structured Streaming Understanding Structured Streaming Basics o Fundamentals of Structured Streaming Basics o Catching Up With Core Concepts Transformations and Actions Input Sources Sinks Output Modes Triggers Event-Time Processing o Applying Structured Streaming in Action o Applying Transformations on Streams Selections and Filtering Aggregations Joins o Catching up Input and Output Where Data Is Read and Written (Sources and Sinks) Reading from the Kafka Source Writing to the Kafka Sink How Data Is Output (Output Modes) When Data Is Output (Triggers) o Streaming Dataset API Event-Time and Stateful Processing o Event Time o Stateful Processing o Arbitrary Stateful Processing o Event-Time Basics o Windows on Event Time Tumbling Windows Handling Late Data with Watermarks o Dropping Duplicates in a Stream o Arbitrary Stateful Processing Time-Outs Output Modes mapGroupsWithState flatMapGroupsWithState Structured Streaming in Production o Fault Tolerance and Checkpointing o Updating Your Application Updating Your Streaming Application Code Updating Your Spark Version Sizing and Rescaling Your Application o Metrics and Monitoring Query Status Recent Progress Spark UI o Alerting o Advanced Monitoring with the Streaming Listener
Contact For Details And Counseling
Plot No 47, 3RD Floor(Top Floor), Gayathri Nagar, Behind Maithrivanam, SAP Street, Ameerpet, Hyderabad Phones : 040-6464 0047, 9985798869, Whatsapp For Enquiry : 9985798869
Request For Demo Video by Whatsapp Message
“Please Share Demo Video – BigData <yourmaill-ID>”
Download Hands On Guide To Apache Spark 3 Build Scalable Computing Engines For Batch And Stream Data Processing Alfonso Antolinez Garcia 2 full chapter