You are on page 1of 23

Install Spark on Windows 10

or macOS
Kazi Aminul Islam
Department of Computer Science
Kennesaw State University
Acknowledgment:
Dr. Dan Lo
Install Spark on Windows 10
Steps
• Download and install 7Zip if not in your computer
• Download and install JVM if not in your computer
• Download and install Spark
• Download and install Hadoop
• Configure environment variables
• Grant permission to temp folder
• Test it
Download and Install 7Zip (latest version)

https://www.7-zip.org/download.html
Download and Install JVM v1.8.0_221-b11
• Check the Java version your have: java –version
• Other Java versions may not work!
Download and Install Spark
http://spark.apache.org/downloads.html

• You want to select a Spark


release: 3.0.1 (Sep 02 2020)
• Package type: Pre-built for
Apache Hadoop 2.7
• Download the file
spark-3.0.1-bin-hadoop2.7.tgz
C:\spark
• Create a folder c:\spark
• Unzip the spark tarball and copy everything over c:\spark
• This will ease maintenance later. For example, you may want to try
different versions in the future. Just simply overwrite the folder with
the new release.
Copy over c:\spark
Download and install Hadoop
• Go to https://github.com/steveloughran/winutils
• Click on the green button labeled “Code” and download ZIP
It is easier to clone everything to you local PC.
• Unzip the tarball
• Create a folder c:\hadoop
• Copy everything under winutils-master\Hadoop-2.7.1\*.* over
c:\hadoop
Copy Hadoop 2.7.1 over C:\hadoop
Configure Environment Variables
• From windows logo=>search to launch: "Search advanced system
settings" program-> click the button of "Environment Variables“
• JAVA_HOME=C:\Program Files\Java\jre1.8.0_221
• SPARK_HOME=C:\spark
• HADOOP_HOME=C:\hadoop
• Append %SPARK_HOME%\bin into "Path"
Grant permission to temp folder
• Create a temp folder c:\tmp\hive
• Change mod to 777 by
>winutils.exe chmod 777 c:\tmp\hive
Test it
• Run spark-shell under command prompt
• Run pyspark under command prompt
• Run spark-submit <app_name>
Spark-shell
pyspark
Run Hello World in Scala
• Change director to c:\spark
Install Spark on macOS
• Open a terminal (running bash)
• Use Homebrew, a free and open-source software package
management system that simplifies the installation of software on
Apple's macOS operating system and Linux.
Steps
1. /bin/bash -c "$(curl -fsSL
https://raw.githubusercontent.com/Homebrew/ins
tall/master/install.sh)“
2. brew install java
3. brew install apache-spark
4. mv spark-3.0.1-bin-hadoop2.7 /usr/local/spark
5. spark-shell
6. Copy sample datasets
Install Homebrew
(base) ltksup39868mac:~ dlo2$ /usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
Warning: The Ruby Homebrew installer is now deprecated and has been rewritten in
Bash. Please migrate to the following command:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install.sh)"

Password:
==> This script will install:
/usr/local/bin/brew
/usr/local/share/doc/homebrew
/usr/local/share/man/man1/brew.1
/usr/local/share/zsh/site-functions/_brew
/usr/local/etc/bash_completion.d/brew
/usr/local/Homebrew
==> The following existing directories will be made group writable:
/usr/local/bin
/usr/local/include
/usr/local/lib
/usr/local/share
/usr/local/lib/pkgconfig
/usr/local/share/doc
==> The following existing directories will have their owner set to dlo2:
/usr/local/bin
/usr/local/include
/usr/local/lib
/usr/local/share
Install Java
(base) ltksup39868mac:~ dlo2$ brew cask install java
==> Tapping homebrew/cask
Cloning into '/usr/local/Homebrew/Library/Taps/homebrew/homebrew-cask'...
remote: Enumerating objects: 543760, done.
remote: Total 543760 (delta 0), reused 0 (delta 0), pack-reused 543760
Receiving objects: 100% (543760/543760), 238.43 MiB | 29.48 MiB/s, done.
Resolving deltas: 100% (383842/383842), done.
Tapped 3790 casks (3,911 files, 255.9MB).
Error: Calling brew cask install is disabled! Use brew install [--cask] instead.
(base) ltksup39868mac:~ dlo2$ brew install java
==> Downloading https://homebrew.bintray.com/bottles/openjdk-15.0.1.catalina.bot
==> Downloading from https://d29vzk4ow07wi7.cloudfront.net/9376a1c6fdf8b0268b6cb
######################################################################## 100.0%
==> Pouring openjdk-15.0.1.catalina.bottle.tar.gz
==> Caveats
For the system Java wrappers to find this JDK, symlink it with
sudo ln -sfn /usr/local/opt/openjdk/libexec/openjdk.jdk /Library/Java/JavaVirtualMachines/openjdk.jdk

openjdk is keg-only, which means it was not symlinked into /usr/local,


because it shadows the macOS `java` wrapper.

If you need to have openjdk first in your PATH run:


echo 'export PATH="/usr/local/opt/openjdk/bin:$PATH"' >> /Users/dlo2/.bash_profile
Install Scala
(base) ltksup39868mac:~ dlo2$ brew install scala
==> Downloading https://downloads.lightbend.com/scala/2.13.4/scala-2.13.4.tgz
######################################################################## 100.0%
==> Caveats
To use with IntelliJ, set the Scala home to:
/usr/local/opt/scala/idea
==> Summary
?? /usr/local/Cellar/scala/2.13.4: 42 files, 23.4MB, built in 2 seconds
(base) ltksup39868mac:~ dlo2$ brew install apache-spark
==> Downloading https://homebrew.bintray.com/bottles/openjdk%4011-11.0.9.catalin
==> Downloading from https://d29vzk4ow07wi7.cloudfront.net/c640eade77c3ad69fef4d
######################################################################## 100.0%
==> Downloading https://www.apache.org/dyn/closer.lua?path=spark/spark-3.0.1/spa
==> Downloading from https://apache.osuosl.org/spark/spark-3.0.1/spark-3.0.1-bin
######################################################################## 100.0%
==> Installing dependencies for apache-spark: openjdk@11
==> Installing apache-spark dependency: openjdk@11
==> Pouring openjdk@11-11.0.9.catalina.bottle.tar.gz
==> Caveats
For the system Java wrappers to find this JDK, symlink it with
sudo ln -sfn /usr/local/opt/openjdk@11/libexec/openjdk.jdk /Library/Java/JavaVirtualMachines/openjdk-11.jdk

openjdk@11 is keg-only, which means it was not symlinked into /usr/local,


Running Spark-Shell
(base) ltksup39868mac:~ dlo2$ spark-shell
21/01/13 14:29:29 WARN Utils: Your hostname, ltksup39868mac.local resolves to a loopback address: 127.0.0.1; using
192.168.1.67 instead (on interface en0)
21/01/13 14:29:29 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/usr/local/Cellar/apache-
spark/3.0.1/libexec/jars/spark-unsafe_2.12-3.0.1.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
21/01/13 14:29:30 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes
where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://ltksup39868mac.attlocal.net:4040
Spark context available as 'sc' (master = local[*], app id = local-1610566175650).
Spark session available as 'spark'.
Welcome to

/ / //
_\ \/ _ \/ _ `/ / '_/
/ / . /\_,_/_/ /_/\_\ version 3.0.1
Copy Sample Dataset
(base) ltksup39868mac:Downloads dlo2$ ls spark*
spark-3.0.1-bin-hadoop2.7.tgz

spark-3.0.1-bin-hadoop2.7:
LICENSE README.md conf jars python
NOTICE RELEASE data kubernetes sbin
R bin examples licenses yarn
(base) ltksup39868mac:Downloads dlo2$ mv spark-3.0.1-bin-hadoop2.7 /usr/local/spark
mv: rename spark-3.0.1-bin-hadoop2.7 to /usr/local/spark: Permission denied
(base) ltksup39868mac:Downloads dlo2$ sudo mv spark-3.0.1-bin-hadoop2.7 /usr/local/spark
Password:
(base) ltksup39868mac:Downloads dlo2$ pwd
/Users/dlo2/Downloads
(base) ltksup39868mac:Downloads dlo2$ cd /usr/local
(base) ltksup39868mac:local dlo2$ ls
Caskroom bin jamf outset share
Cellar dockutil lib remotedesktop spark
Frameworks etc munki sal var
Homebrew include opt sbin
(base) ltksup39868mac:local dlo2$ cd spark/
(base) ltksup39868mac:spark dlo2$ ls
LICENSE README.md conf jars python

You might also like