Professional Documents
Culture Documents
Pros:
1. Provides functional programming like view with better control to
programmer. Even a single mapper/reducer can be split into multiple ones very
easily, with intermediate caching.
2. Better/Suitable for iterative computations, where we perform steps through
chaining as data can be kept in memory between stages as required.
3. The concept of shared variables (broadcasters and accumulators ) across
workers appear to be better than distributed cache.
Cons:
1. The core is developed in Scala, although it provides Java APIs, and not
being used extensively so far by the industry with Java. Appears to be stable with
Java, but at times one finds it hard to address Scala related exceptions.
Here are some useful links, which also include differences between Hadoop and
Spark.
http://stackoverflow.com/questions/25267204/hadoop-vs-spark
http://datascience.stackexchange.com/questions/441/what-are-the-use-cases-for-
apache-spark-vs-hadoop
http://www.researchgate.net/post/What_is_the_differences_between_SPARK_and_Hadoop_M
apReduce2
http://www.devx.com/opensource/getting-started-with-apache-spark.html
https://databricks.com/spark
http://java.dzone.com/articles/apache-spark-next-big-data
http://stackoverflow.com/questions/24119897/apache-spark-vs-apache-storm
https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark
/examples/JavaPageRank.java
http://spark.apache.org/docs/latest/programming-guide.html
https://spark.apache.org/examples.html
I dont have comparison doc as technology selection was done by DLA but here the
main differences per my understanding
http://www.dezyre.com/article/hadoop-mapreduce-vs-apache-spark-who-wins-the-
battle/83#.VG78S_mUfl8
http://www.qubole.com/spark-vs-mapreduce/
http://planetcassandra.org/blog/the-new-analytics-toolbox-with-apache-spark-going-
beyond-hadoop/