You are on page 1of 1

Lightning-fast unified analytics engine

Download Libraries Documentation Examples Community Developers Apache Software Foundation

Latest News
Apache Spark™ is a unified analytics engine for large-scale data
Spark 2.4.5 released (Feb 08, 2020)
processing. Preview release of Spark 3.0 (Dec 23,
2019)

Preview release of Spark 3.0 (Nov 06,


2019)

Speed Spark 2.3.4 released (Sep 09, 2019)

Archive
Run workloads 100x faster.

Apache Spark achieves high performance for both batch and streaming
data, using a state-of-the-art DAG scheduler, a query optimizer, and a
physical execution engine.
Logistic regression in Hadoop and Spark
Download Spark

Ease of Use df = spark.read.json("logs.json")


df.where("age > 21")
Built-in Libraries:
SQL and DataFrames
Write applications quickly in Java, Scala, Python, .select("name.first").show()
Spark Streaming
R, and SQL. Spark's Python DataFrame API
MLlib (machine learning)
GraphX (graph)
Read JSON files with automatic schema inference
Spark offers over 80 high-level operators that make it easy to build parallel Third-Party Projects
apps. And you can use it interactively from the Scala, Python, R, and SQL
shells.

Generality
Combine SQL, streaming, and complex analytics.

Spark powers a stack of libraries including SQL and DataFrames, MLlib for
machine learning, GraphX, and Spark Streaming. You can combine these
libraries seamlessly in the same application.

Runs Everywhere
Spark runs on Hadoop, Apache Mesos,
Kubernetes, standalone, or in the cloud. It can
access diverse data sources.

You can run Spark using its standalone cluster mode, on EC2, on Hadoop
YARN, on Mesos, or on Kubernetes. Access data in HDFS, Alluxio, Apache
Cassandra, Apache HBase, Apache Hive, and hundreds of other data
sources.

Community Contributors Getting Started


Spark is used at a wide range of Apache Spark is built by a wide set of Learning Apache Spark is easy whether
organizations to process large datasets. developers from over 300 companies. you come from a Java, Scala, Python, R,
You can find many example use cases on Since 2009, more than 1200 developers or SQL background:
the Powered By page. have contributed to Spark!
Download the latest release: you can
There are many ways to reach the The project's committers come from more run Spark locally on your laptop.
community: than 25 organizations. Read the quick start guide.
Learn how to deploy Spark on a
Use the mailing lists to ask questions. If you'd like to participate in Spark, or
cluster.
In-person events include numerous contribute to the libraries on top of it,
meetup groups and conferences. learn how to contribute.
We use JIRA for issue tracking.

Apache Spark, Spark, Apache, the Apache feather logo, and the Apache Spark project logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other
countries. See guidance on use of Apache Spark trademarks. All other marks mentioned may be trademarks or registered trademarks of their respective owners. Copyright © 2018 The Apache Software
Foundation, Licensed under the Apache License, Version 2.0.

You might also like