Professional Documents
Culture Documents
• Speed:
• Hadoop: Hadoop MapReduce is disk-based, which can result in slower
processing times for iterative algorithms as data is written to and read
from disk in each iteration.
• Spark: Spark's in-memory processing capability speeds up data
processing, especially for iterative algorithms, by keeping
intermediate data in memory between stages. This can significantly
improve performance compared to Hadoop MapReduce.
Difference between Spark &
Hadoop frameworks
• Ease of Use:
• Hadoop: Writing programs in Hadoop MapReduce typically involves
low-level coding in Java, which can be complex and time-consuming
for developers.
• Spark: Spark provides high-level APIs in multiple programming
languages, including Java, Scala, Python, and R.
Difference between Spark &
Hadoop frameworks
• Use Cases:
• Hadoop: Hadoop is well-suited for batch processing of large datasets
where latency is not critical. It is commonly used for tasks like log
processing, data warehousing, and ETL (Extract, Transform, Load)
jobs.
• Spark: Spark is versatile and can handle batch processing, interactive
queries, machine learning, graph processing, and streaming data. Its
in-memory processing makes it suitable for applications that require
low-latency processing.
Spark Vs Hadoop
Spark Features
Apache Spark
• As against a common belief, Spark is not a modified version of
Hadoop and is not, really, dependent on Hadoop because it has its
own cluster management. Hadoop is just one of the ways to
implement Spark.
• Spark uses Hadoop in two ways – one is storage and second
is processing. Since Spark has its own cluster management
computation, it uses Hadoop for storage purpose only.
Spark Built on Hadoop:
Spark Built on Hadoop