Welcome to Scribd!

TP Spark

Uploaded by

0% found this document useful (0 votes)

8 views10 pages

The document outlines steps to run Spark batch and streaming jobs locally and on a Hadoop cluster. It describes starting Hadoop containers, loading data into HDFS, running a sample Scala Spark job, developing a Java Spark batch job to count words and running it locally and on a cluster, and developing a Spark streaming job to run locally and on a cluster.

Original Description:

Original Title

TP SPARK

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

8 views10 pages

TP Spark

Uploaded by

Samar Mhenyy

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 10

Search inside document

Mohamed Aziz Zouaghia SR-A

TP SPARK
Start Hadoop containers:

Enter Master Container Start YARN and HDFS Daemons:

Check Daemon Status for the master container:

Check Daemon Status for the slave container:

Create a text file file1.txt on the master node:

Load the file into HDFS:

Mohamed Aziz Zouaghia SR-A

Check Spark installation:

Test Spark with a Scala code snippet:

Download the result directory file1.count from HDFS:

Check the result:

Mohamed Aziz Zouaghia SR-A

Spark Batch in Java:

Project Setup:

Add dependencies to the pom.xml file:

Mohamed Aziz Zouaghia SR-A

Create a package tn.isetcom.tp21 under the java directory with a class named
WordCountTask inside of it:

Insert loremipsum.txt in the src/main/resources directory:

Create a run configuration:

Mohamed Aziz Zouaghia SR-A

After running the configuration a directory out created under resources containing
two files: part-00000 and part-00001:

Part-00000:

Part-00001:
Mohamed Aziz Zouaghia SR-A

Cluster Execution:
Modify the WordCountTask class for cluster execution:

Run the commands package install:

The file wordcount-1.0-SNAPSHOT.jar created under target directory:

Copy the JAR file to the Hadoop master container:

At the master container:

Mohamed Aziz Zouaghia SR-A

Submit the Spark job in local mode:

After the job completes the output directory created and contains result file:

Part-00000:

Part-00001:

Launch the Spark job on YARN in cluster mode:

Mohamed Aziz Zouaghia SR-A

After the job completes the output2 directory created and contains result file:

Spark Streaming:
Create a new Maven project:

Pom.xml configuration:

Create a class Stream in the package tn.isetcom.tp22:

Mohamed Aziz Zouaghia SR-A

Local Testing:
Execute the Stream class:

Cluster Execution:
Modify the Stream class for cluster streaming:

Run mvn package install to create the JAR file:

Copy the JAR file to the Hadoop container:

Mohamed Aziz Zouaghia SR-A

Launch the Spark streaming job on the cluster:

Observe the Result:

LinuxFoundation CKS v2023-03-27 q41
Document64 pages
LinuxFoundation CKS v2023-03-27 q41
kimon
0% (1)
Kubernetes CKA Real Question Analysis-20200402 Real Question - Programmer Sought
Document19 pages
Kubernetes CKA Real Question Analysis-20200402 Real Question - Programmer Sought
exam.mcse
No ratings yet
NoSQL Injection for Elasticsearch
From Everand
NoSQL Injection for Elasticsearch
Gary Drocella
No ratings yet
Apache Spark
Document8 pages
Apache Spark
Raja Dawood
No ratings yet
Activity 2
Document31 pages
Activity 2
patilbhavesh991209
No ratings yet
Installing Apache Spark and Scala: Windows
Document3 pages
Installing Apache Spark and Scala: Windows
sidirasg
No ratings yet
Docker Research
Document15 pages
Docker Research
Ali W
No ratings yet
Apache Spark On Docker: 1. Pull The Image From Docker Repository
Document3 pages
Apache Spark On Docker: 1. Pull The Image From Docker Repository
Gaurav Saini
No ratings yet
Installing Spark
Document11 pages
Installing Spark
Charles W Gitahi
No ratings yet
Qe 1
Document6 pages
Qe 1
Govada Dhana
No ratings yet
Step 1: Verifying Java Installation: Download Scala
Document3 pages
Step 1: Verifying Java Installation: Download Scala
eabernstein
No ratings yet
DOCS For Apache
Document3 pages
DOCS For Apache
Dileep Prabakar
100% (1)
Quick Step-By-Step Guide To Apache Split Deployment With BI4.1
Document8 pages
Quick Step-By-Step Guide To Apache Split Deployment With BI4.1
RavinderPalSingh
No ratings yet
Subversion Cheat Sheet
Document7 pages
Subversion Cheat Sheet
PrasadBSR
No ratings yet
Building
Document11 pages
Building
Trần Minh Châu
No ratings yet
OMGforge 1.9.4 12.17.0.1990 Installer Win - Exe
Document3 pages
OMGforge 1.9.4 12.17.0.1990 Installer Win - Exe
Urias Acosta
No ratings yet
Zookepper Installation
Document4 pages
Zookepper Installation
Sujith S.B
No ratings yet
Install Spark On Windows 10-MacOS
Document23 pages
Install Spark On Windows 10-MacOS
KALAGANI KEERTHI PRIYA,CSE(19-23) Vel Tech, Chennai
No ratings yet
Spark Interview Questions PDF 2
Document19 pages
Spark Interview Questions PDF 2
Varun
No ratings yet
Bda Lab
Document47 pages
Bda Lab
pawan
No ratings yet
EX1-Installation of Hadoop
Document6 pages
EX1-Installation of Hadoop
anand.avcs088
No ratings yet
Hadoop Administrator Training - Lab Hand Book
Document12 pages
Hadoop Administrator Training - Lab Hand Book
debkrc
No ratings yet
Hive-1.2.1-Installation Guide-On-Hadoop-2.x
Document7 pages
Hive-1.2.1-Installation Guide-On-Hadoop-2.x
uday vengala
No ratings yet
Connecting Interbase To Java Applications: Getting and Installing The Driver
Document5 pages
Connecting Interbase To Java Applications: Getting and Installing The Driver
duque_604
No ratings yet
Nutch Configuration
Document6 pages
Nutch Configuration
devendraiiit1
No ratings yet
Integrate Your Ci/Cd Process: Dockerizing Your Test Project
Document11 pages
Integrate Your Ci/Cd Process: Dockerizing Your Test Project
Luis Fernando Cachi Condori (LUFER)
No ratings yet
Assessment 3
Document17 pages
Assessment 3
Durga prasad T
No ratings yet
Installation Guide
Document3 pages
Installation Guide
Rajesh
No ratings yet
Exercise: 1. Single Node Cluster - Installing Java 8 JDK
Document14 pages
Exercise: 1. Single Node Cluster - Installing Java 8 JDK
Huỳnh Lý Minh Chương
No ratings yet
Installation of Oracle 11g On Linux
Document5 pages
Installation of Oracle 11g On Linux
UKG
No ratings yet
Install Wamp SSL PDF
Document9 pages
Install Wamp SSL PDF
Ionel Gherasim
No ratings yet
CC 8
Document4 pages
CC 8
vikram9763s
No ratings yet
Anaconda Project Documentation PDF
Document14 pages
Anaconda Project Documentation PDF
Lancine
No ratings yet
Installation of Oracle 11g On Linux
Document5 pages
Installation of Oracle 11g On Linux
Eric Li
No ratings yet
Oiuyfr
Document36 pages
Oiuyfr
k2sh
No ratings yet
Wsc2017 Tp39 Module A Pre en
Document15 pages
Wsc2017 Tp39 Module A Pre en
Julian Gomez
No ratings yet
Attack Defense AWS
Document47 pages
Attack Defense AWS
Nora Maridueña Carrion
No ratings yet
How To Get A Working Vdoc On A Server
Document4 pages
How To Get A Working Vdoc On A Server
Bertrand Yan
No ratings yet
Apache Spark Installation
Document4 pages
Apache Spark Installation
Harshit Sinha
No ratings yet
4 Dac Windows
Document17 pages
4 Dac Windows
Dev Jagtap
No ratings yet
Cloudera Install
Document30 pages
Cloudera Install
chetana tukkoji
No ratings yet
Install Hadoop-2.6.0 On Windows10
Document8 pages
Install Hadoop-2.6.0 On Windows10
Sana Latif
No ratings yet
Apache Spark Assessment
Document1 page
Apache Spark Assessment
vikram1322
No ratings yet
README
Document427 pages
README
Tran Sistor
No ratings yet
Setting Up Spark 2.0 With Intellij Community Edition
Document12 pages
Setting Up Spark 2.0 With Intellij Community Edition
amitkm21
No ratings yet
Install Wamp SSL
Document9 pages
Install Wamp SSL
Treblig Saira
No ratings yet
Initializing A Build Environment
Document26 pages
Initializing A Build Environment
Muhammad Ali
No ratings yet
Kafka 1
Document10 pages
Kafka 1
scintific things
No ratings yet
ASC2018 Skill39 ModuleA 20180828
Document15 pages
ASC2018 Skill39 ModuleA 20180828
Co Yiskāh
No ratings yet
Deploying Flask Apps Easily
Document10 pages
Deploying Flask Apps Easily
atul.jha2545
No ratings yet
Unit+5 +Build+Tool+-+Maven
Document16 pages
Unit+5 +Build+Tool+-+Maven
Krishan Choudhary
No ratings yet
Environmental Setup Document
Document6 pages
Environmental Setup Document
Linkss Mk
No ratings yet
Spark Ops Final
Document45 pages
Spark Ops Final
jeanluc_orsai185
No ratings yet
Go To Cloudera Quickstart VM To Download A Pre-Setup CDH Virtual Machine
Document20 pages
Go To Cloudera Quickstart VM To Download A Pre-Setup CDH Virtual Machine
Sayeth Saabith
No ratings yet
Marine Developer Guide v.1.1.1
Document35 pages
Marine Developer Guide v.1.1.1
Joao Silva
No ratings yet
Deploying Containerized Application in Docker Vs Openshift
Document4 pages
Deploying Containerized Application in Docker Vs Openshift
Agung Riyadi
No ratings yet
FSD Week 5
Document59 pages
FSD Week 5
RDX Gaming
No ratings yet
Docker Class
Document14 pages
Docker Class
sumalearn066
No ratings yet
Dac Linux Installation Notes White Paper
Document7 pages
Dac Linux Installation Notes White Paper
jbeatofl
No ratings yet
reStructuredText for Sphinx
From Everand
reStructuredText for Sphinx
Vimalkumar Velayudhan
No ratings yet