You are on page 1of 2

Spark Installation

1. Login to Linux EC2 Instance.


2. Pre-requisites: Install open-jdk-1.8
3. Switch to /usr/local/src
4. Download the following packages
 sudo wget https://downloads.lightbend.com/scala/2.11.12/scala-2.11.12.tgz
 sudo wget http://apachemirror.wuchna.com/spark/spark-2.4.4/spark-2.4.4-bin-
hadoop2.7.tgz
5. Extract scala-2.11.12.tgz using the command
 tar -xvf scala-2.11.12.tgz
6. Extract spark-2.4.4-bin-hadoop2.7.tgz using the command
 tar -xvf spark-2.4.4-bin-hadoop2.7.tgz
7. Rename spark-2.4.4-bin-hadoop2.7 folder to spark by using the command
 sudo mv spark-2.4.4-bin-hadoop2.7 spark
8. Set full access and ownership for the above folder
 sudo chmod 777 -R scala-2.11.12
 sudo chown ec2-user:ec2-user scala-2.11.12
 sudo chmod 777 -R spark
 sudo chown ec2-user:ec2-user spark
9. Create temporary directory '/tmp/spark-events/'.

10. Switch to /usr/local/src/spark/conf

11. Rename spark-env.sh.template to spark-env.sh by using the command

 sudo mv spark-env.sh.template spark-env.sh

12. Edit the file spark-env.sh, add the below lines at the end of the file, save the file and exit

Command: sudo nano spark-env.sh

export SPARK_MASTER_IP= REPLACE_WITH_EC2_INSTANCE_IP


export SCALA_HOME=/usr/local/src/scala-2.11.12
export SPARK_EXECUTOR_MEMORY=16g
export SPARK_WORKER_INSTANCES=2
export SPARK_DRIVER_MEMORY=10g

13. Execute the spark env file by using the command ./spark-env.sh

14. Rename slaves.template to slaves add the below lines at the end of the file, save the file and exit

Command: sudo nano slaves

REPLACE_WITH_EC2_INSTANCE_IP
15. Rename spark-defaults.conf.template to spark-defaults.conf, uncomment the below lines at the
end of the file, save the file and exit

Command: sudo nano spark-defaults.conf

spark.master spark:// REPLACE_WITH_EC2_INSTANCE_IP _IP:7077


spark.eventLog.enabled true
16. Switch to /usr/local/src/sbin
17. Run spark master by using the command
 ./start-master.sh
18. Run spark slave by using the command
 ./start-slave.sh spark:// REPLACE_WITH_EC2_INSTANCE_IP:7077

Submit Spark Job


1. Clone project from GitHub.

 git clone git@github.com:PilotFlyingJ/gpss-telematics.git

2. Change directory to gpss-telematics.

 cd gpss-telematics

3. Install python libraries.

 pip install –r requirements.txt

4. Setup below environment variables.

export PYTHON_PATH=service/emr/DeliveryStandards/
export AWS_PROFILE=dev
export AWS_DEFAULT_REGION=us-east-1
export GLOBAL_SECRET=global-secret-trips-dev
export GPSS_PGDB_READ_SECRET=aurorapg-gameplansharedservices-dev-read-secret
export TM_PGDB_WRITE_SECRET=aurorapg-telematics-dev-write-secret
export STACK_NAME=dev

5. Run delivery standards.

 spark-submit --packages postgresql:postgresql:9.1-901-1.jdbc4


service/emr/DeliveryStandards/delivery_standards_main.py

You might also like