Professional Documents
Culture Documents
1. Install spark
cd /home/hduser/install
mkdir -p /home/hduser/sparkdata/logs
mkdir -p /home/hduser/sparkdata/tmp
cd /usr/local/spark/conf
cp spark-env.sh.template spark-env.sh
Now edit spark/conf/spark-env.sh and specify the location of Hadoop configuration directory and a
YARN job queue where you have permissions to submit jobs:
vi spark-env.sh
export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
export YARN_CONF_DIR=/usr/local/hadoop/etc/hadoop
export SPARK_LOG_DIR=/home/hduser/sparkdata/logs
export SPARK_WORKER_DIR=/home/hduser/sparkdata/tmp
export SPARK_WORKER_INSTANCES=1
export SPARK_MASTER_MEMORY=400m
export SPARK_EXECUTOR_MEMORY=400m
export SPARK_WORKER_MEMORY=400m
export SPARK_WORKER_CORES=2
export SPARK_EXECUTOR_CORES=1
start-master.sh
6. Spark Workers/Executors
start-slaves.sh
7. Web UI:
localhost:8080
8. Run Spark
To start Pyspark
pyspark
cd /usr/local/spark/conf/
cp -p spark-defaults.conf.template spark-defaults.conf