Spark BD

Uploaded by

Mohamed H. Mokarab

0% found this document useful (0 votes)

13 views9 pages

Big Data spark apache

Original Title

spark BD

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Big Data spark apache

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

13 views9 pages

Spark BD

Uploaded by

Mohamed H. Mokarab

Big Data spark apache

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 9

Search inside document

Spark

• Apache Spark is a lightning-fast cluster computing

technology, designed for fast computation. It is based on
Hadoop MapReduce and it extends the MapReduce
model to eﬀiciently use it for more types of
computations, which includes interactive queries and
stream processing. The main feature of Spark is its in-
memory cluster computing that increases the
What is processing speed of an application.
spark • Spark is designed to cover a wide range of workloads
such as batch applications, iterative algorithms,
interactive queries and streaming. Apart from
supporting all these workload in a respective system, it
reduces the management burden of maintaining
separate tools.
Apache Spark has following features.
• Speed − Spark helps to run an application in Hadoop
cluster, up to 100 times faster in memory, and 10 times
faster when running on disk. This is possible by reducing
number of read/write operations to disk. It stores the
Features of intermediate processing data in memory.
Apache • Supports multiple languages − Spark provides built-in
Spark APIs in Java, Scala, or Python. Therefore, you can write
applications in diﬀerent languages. Spark comes up with
80 high-level operators for interactive querying.
• Advanced Analytics − Spark not only supports ‘Map’ and
‘reduce’. It also supports SQL queries, Streaming data,
Machine learning (ML), and Graph algorithms
Apache Spark Core

Spark SQL

Components
Spark Streaming
of Spark
MLlib (Machine Learning Library)

GraphX
Apache Spark Core
• Spark Core is the underlying general execution engine for
spark platform that all other functionality is built upon. It
provides In-Memory computing and referencing datasets in
external storage systems.
Spark SQL
• Spark SQL is a component on top of Spark Core that
Cont. introduces a new data abstraction called SchemaRDD, which
provides support for structured and semi-structured data.
Spark Streaming
• Spark Streaming leverages Spark Core's fast scheduling
capability to perform streaming analytics. It ingests data in
mini-batches and performs RDD (Resilient Distributed
Datasets) transformations on those mini-batches of data.
MLlib (Machine Learning Library)
• MLlib is a distributed machine learning framework above
Spark because of the distributed memory-based Spark
architecture. It is, according to benchmarks, done by the
MLlib developers against the Alternating Least Squares (ALS)
implementations. Spark MLlib is nine times as fast as the
Hadoop disk-based version of Apache Mahout (before
Cont. Mahout gained a Spark interface).
GraphX
• GraphX is a distributed graph-processing framework on top of
Spark. It provides an API for expressing graph computation
that can model the user-defined graphs by using Pregel
abstraction API. It also provides an optimized runtime for this
abstraction.
main abstractions of Spark

RDD – Resilient Distributed DAG – Direct Acyclic Graph

Dataset
• Resilient Distributed Datasets (RDD) is a fundamental
data structure of Spark. It is an immutable distributed
collection of objects. Each dataset in RDD is divided into
RDD logical partitions, which may be computed on diﬀerent
nodes of the cluster. RDDs can contain any type of
Python, Java, or Scala objects, including user-defined
classes.
• DAG a finite direct graph with no directed cycles. There
are finitely many vertices and edges, where each edge
directed from one vertex to another. It contains a
sequence of vertices such that every edge is directed
DAG from earlier to later in the sequence. It is a strict
generalization of MapReduce model. DAG operations
can do better global optimization than other systems
like MapReduce. The picture of DAG becomes clear in
more complex jobs.

Spark Framework Speeds Up Data Processing 10-100x
Document8 pages
Spark Framework Speeds Up Data Processing 10-100x
Jnsk Srinu
100% (1)
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet
Spark
Document9 pages
Spark
Mohamed H. Mokarab
No ratings yet
Spark-Rdd
Document15 pages
Spark-Rdd
K Anantha Krishnan
No ratings yet
Spark: Prepared by Dulari Bhatt
Document19 pages
Spark: Prepared by Dulari Bhatt
Dulari Bosamiya Bhatt
No ratings yet
Apache Spark
Document14 pages
Apache Spark
wassimoss00
No ratings yet
Unit-5 Spark
Document20 pages
Unit-5 Spark
Siva
No ratings yet
Top Spark Interview Questions
Document32 pages
Top Spark Interview Questions
srinivas75k
No ratings yet
unit 4 spark cassendra
Document41 pages
unit 4 spark cassendra
downloadjain123
No ratings yet
Module 3
Document51 pages
Module 3
sagarhn sagarhn
No ratings yet
Unit 5
Document109 pages
Unit 5
Rajesh Kumar Rakasula
No ratings yet
Apache Spark Ecosystem - Complete Spark Components Guide: 1. Objective
Document11 pages
Apache Spark Ecosystem - Complete Spark Components Guide: 1. Objective
divya kolluri
No ratings yet
What is Apache Spark? The Complete Guide
Document65 pages
What is Apache Spark? The Complete Guide
Apurva
No ratings yet
Spark SQL
Document25 pages
Spark SQL
Rishi
No ratings yet
Apache Spark Interview Questions
Document12 pages
Apache Spark Interview Questions
varun3dec1
No ratings yet
Apache Spark Quick Guide
Document21 pages
Apache Spark Quick Guide
Oumaima Alfa
100% (1)
Learn Apache Spark
Document31 pages
Learn Apache Spark
abreddy2003
100% (1)
Key Features: General-Purpose Fast Cluster Computing Platform
Document16 pages
Key Features: General-Purpose Fast Cluster Computing Platform
Mahesh VP
No ratings yet
Solution Methodology
Document3 pages
Solution Methodology
Arnab Dey
No ratings yet
Apache Spark
Document13 pages
Apache Spark
wassimoss00
No ratings yet
A Brief Introduction To Apache Spark
Document10 pages
A Brief Introduction To Apache Spark
Venkatesh Narisetty
No ratings yet
Apache Spark: Data Science Foundations
Document55 pages
Apache Spark: Data Science Foundations
TRAPMUZIC HDTV
No ratings yet
09 Programming Hadoop - Spark, R and Pig
Document80 pages
09 Programming Hadoop - Spark, R and Pig
Neeraj Garg
No ratings yet
Tech Seminar Report
Document5 pages
Tech Seminar Report
Saikumar Thurai
No ratings yet
Shark
Document24 pages
Shark
kapilkashyap3105
No ratings yet
Apache Spark Components
Document4 pages
Apache Spark Components
nitinlucky
No ratings yet
Big Data Assignment
Document6 pages
Big Data Assignment
suibian.270619
No ratings yet
Apache Spark Theory by Arsh
Document4 pages
Apache Spark Theory by Arsh
Faraz Akhtar
No ratings yet
Pyspark Modules&packages RDD
Document9 pages
Pyspark Modules&packages RDD
klogeswaran.it
No ratings yet
Top Answers To Spark Interview Questions
Document4 pages
Top Answers To Spark Interview Questions
Ejaz Alam
No ratings yet
Spark Intreview FAQ
Document21 pages
Spark Intreview FAQ
haranadhc
100% (1)
Apache Spark Interview Questions and Answers PDF
Document31 pages
Apache Spark Interview Questions and Answers PDF
Zyad Ahmed
No ratings yet
Spark Interview Questions and Answers
Document31 pages
Spark Interview Questions and Answers
srinivas75k
100% (1)
Spark In-Memory Cluster Computing
Document7 pages
Spark In-Memory Cluster Computing
Sailesh Chauhan
No ratings yet
Apache Spark Tutorial
Document6 pages
Apache Spark Tutorial
abhimanyu thakur
100% (1)
Introduction To Spark
Document84 pages
Introduction To Spark
Namruta G H
No ratings yet
Ace Your Apache Spark Interview
Document22 pages
Ace Your Apache Spark Interview
Venmo 6193
0% (1)
Overview of Apache Spark Technology
Document1 page
Overview of Apache Spark Technology
surbhi
No ratings yet
Apache Spark Architecture
Document7 pages
Apache Spark Architecture
klogeswaran.it
No ratings yet
Apache Spark Interview Questions and Answers For 2020
Document8 pages
Apache Spark Interview Questions and Answers For 2020
Shashank Abhishek
No ratings yet
BDA GTU Study Material Presentations Unit-6 03102021061221PM
Document23 pages
BDA GTU Study Material Presentations Unit-6 03102021061221PM
Ri Patel
No ratings yet
Spark Vs Hadoop Features Spark
Document9 pages
Spark Vs Hadoop Features Spark
consania
No ratings yet
Apache Spark
Document25 pages
Apache Spark
PhillipeSantos
No ratings yet
Apache Spark Explanation
Document9 pages
Apache Spark Explanation
levin696
No ratings yet
Apache Spark Primer 170303
Document8 pages
Apache Spark Primer 170303
selives
No ratings yet
Big Data Processing With Apache Spark
Document17 pages
Big Data Processing With Apache Spark
abhijitch
No ratings yet
Introduction To Spark
Document23 pages
Introduction To Spark
Trần Nguyên Thái Bảo
No ratings yet
Spark Notes
Document6 pages
Spark Notes
babjeereddy
No ratings yet
Spark 101
Document25 pages
Spark 101
Daniel Ortiz
No ratings yet
1 - Apache Spark
Document3 pages
1 - Apache Spark
Achmad Ardi
No ratings yet
Spark Interview 4
Document10 pages
Spark Interview 4
consania
No ratings yet
Spark Interview Questions: Click Here
Document35 pages
Spark Interview Questions: Click Here
Keshav Krishna
No ratings yet
Pyspark Interview Code
Document197 pages
Pyspark Interview Code
mailme me
100% (1)
Apache Spark
Document16 pages
Apache Spark
Kolariya Dheeraj
No ratings yet
Apache Spark PDF
Document34 pages
Apache Spark PDF
sowjanya kandukuri
No ratings yet
226 Unit-7
Document26 pages
226 Unit-7
shivam saxena
No ratings yet
Big Data Processing With Apache Spark - Infoqdotcom
Document16 pages
Big Data Processing With Apache Spark - Infoqdotcom
abhijitch
No ratings yet
Apache Spark Features
Document2 pages
Apache Spark Features
nitinlucky
No ratings yet
Apache Spark Is A Distributed Computing System For Big Data Processing Based On The MapReduce Model
Document1 page
Apache Spark Is A Distributed Computing System For Big Data Processing Based On The MapReduce Model
PHANTOME FF
No ratings yet
Learning Apache Spark 2
From Everand
Learning Apache Spark 2
Muhammad Asif Abbasi
No ratings yet
Information Technology For Management Digital Strategies For Insight Action and Sustainable Performance 10th Edition Turban Test Bank
Document49 pages
Information Technology For Management Digital Strategies For Insight Action and Sustainable Performance 10th Edition Turban Test Bank
KevinSandovalitre
100% (8)
Mini Project 1 V 6
Document3 pages
Mini Project 1 V 6
ifreenisrar37
No ratings yet
R20 JAVA LAB MANUAL - With Outputs
Document69 pages
R20 JAVA LAB MANUAL - With Outputs
kumardhadi1
No ratings yet
SVMCM Manual
Document29 pages
SVMCM Manual
student notes
No ratings yet
ePDG SolutionDescription
Document88 pages
ePDG SolutionDescription
sid1245
No ratings yet
Prosafe VPN Firewall Fvs318V3 Reference Manual: 350 East Plumeria Drive San Jose, Ca 95134 Usa
Document184 pages
Prosafe VPN Firewall Fvs318V3 Reference Manual: 350 East Plumeria Drive San Jose, Ca 95134 Usa
hernan
No ratings yet
SAP User Types
Document2 pages
SAP User Types
Surya Nanda
No ratings yet
Getting Started With Cyber Security
Document22 pages
Getting Started With Cyber Security
Mohammad Mizanur Rahman
No ratings yet
E-Marketing Research
Document61 pages
E-Marketing Research
shivraj kumar singh
No ratings yet
Troubleshooting CM
Document6 pages
Troubleshooting CM
Ratnodeep Roy
No ratings yet
Topic 4 Socialmedia Mix and Cybersocial Tools - Vol 1
Document26 pages
Topic 4 Socialmedia Mix and Cybersocial Tools - Vol 1
Sathya Manoharan
No ratings yet
CompTIA Security+ SY0-501 Exam
Document12 pages
CompTIA Security+ SY0-501 Exam
smantha sin
100% (5)
A Interview Questions On SQL Transformation
Document13 pages
A Interview Questions On SQL Transformation
seenaranjith
No ratings yet
Diploma - Thesis - 1.0 - Analysis of Online Shop Systems
Document90 pages
Diploma - Thesis - 1.0 - Analysis of Online Shop Systems
NecroimiX
No ratings yet
Process Model Guide for Software Development
Document52 pages
Process Model Guide for Software Development
Chirag Vadhavana
No ratings yet
1.1 Fill in The Blanks in Each of The Following
Document3 pages
1.1 Fill in The Blanks in Each of The Following
Muhammad Mahadi Hasan
No ratings yet
Schem SPI Server Installation Guide
Document14 pages
Schem SPI Server Installation Guide
anj
No ratings yet
Servlet and JSP
Document21 pages
Servlet and JSP
1da21cs137.cs
No ratings yet
Butterfly Addition Matching Cards
Document9 pages
Butterfly Addition Matching Cards
Fabiana-Fabiola Couto
No ratings yet
Tes Lo Ini Yaa
Document15 pages
Tes Lo Ini Yaa
fizai ahmad05
No ratings yet
Lecture - 1 - What Is Information Security
Document8 pages
Lecture - 1 - What Is Information Security
vaishaligupta
No ratings yet
5990-7879EN OpenLAB ChemStation Ed Ordering Guide
Document21 pages
5990-7879EN OpenLAB ChemStation Ed Ordering Guide
Hayat Shaikh
No ratings yet
Data Transfer Techniques Explained
Document12 pages
Data Transfer Techniques Explained
Prashant Rawat
No ratings yet
Tinkoff Customer Story 3062
Document3 pages
Tinkoff Customer Story 3062
Hanh Ho
No ratings yet
Gift Shop: Management System
Document8 pages
Gift Shop: Management System
Ryan Marasigan
No ratings yet
Purple Futuristic Pitch Deck Presentation
Document30 pages
Purple Futuristic Pitch Deck Presentation
Eljine Ochoa
No ratings yet
Exploring ADR in Oracle Database 11g: by Jeremiah Wilton, ORA-600 Consulting
Document3 pages
Exploring ADR in Oracle Database 11g: by Jeremiah Wilton, ORA-600 Consulting
bb2001
No ratings yet
Test 4
Document14 pages
Test 4
pandu
No ratings yet
Scheduled Classic
Document33 pages
Scheduled Classic
Tu Tu
No ratings yet
The Ultimate Guide To C - BYD15 - 1908 - SAP Certified Application Associate - SAP Business ByDesign Implementation Consultant
Document3 pages
The Ultimate Guide To C - BYD15 - 1908 - SAP Certified Application Associate - SAP Business ByDesign Implementation Consultant
Stefan
No ratings yet