Professional Documents
Culture Documents
TP Spark
TP Spark
TP SPARK
Start Hadoop containers:
Project Setup:
Create a package tn.isetcom.tp21 under the java directory with a class named
WordCountTask inside of it:
After running the configuration a directory out created under resources containing
two files: part-00000 and part-00001:
Part-00000:
Part-00001:
Mohamed Aziz Zouaghia SR-A
Cluster Execution:
Modify the WordCountTask class for cluster execution:
After the job completes the output directory created and contains result file:
Part-00000:
Part-00001:
After the job completes the output2 directory created and contains result file:
Spark Streaming:
Create a new Maven project:
Pom.xml configuration:
Local Testing:
Execute the Stream class:
Cluster Execution:
Modify the Stream class for cluster streaming: