Welcome to Scribd!

3-1 Bigdata (Spark)

Uploaded by

0% found this document useful (0 votes)

3 views3 pages

The document outlines experiments to be conducted as part of an industrial oriented mini project using Big Data and Spark. It includes experiments on Hadoop architecture, loading data into HDFS, basic Linux commands, MapReduce programs, matrix multiplication using MapReduce, analyzing employee data using Hive and Scoop, creating RDDs and transformations in PySpark, and configuring Spark using SparkConf. The goal is to get hands-on experience with Big Data tools and techniques including Hadoop, HDFS, MapReduce, Hive, Scoop, PySpark and SparkConf.

Original Description:

Original Title

3-1_BIGDATA(SPARK)

Copyright

Available Formats

DOCX, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

3 views3 pages

3-1 Bigdata (Spark)

Uploaded by

prudvini

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 3

Search inside document

Christu Jyothi Institute of Technology & Science

Colombo Nagar, Yeshwanthapur, Janagaon-506167

Department of Computer Science and Engineering

(CS606PC) Industrial Oriented Mini Project/ Internship/ Skill Development Course

(Big data-Spark)

List of Experiments:

1. To Study of Big Data Analytics and Hadoop Architecture

(i) Know the concept of big data architecture

(ii) Know the concept of Hadoop architecture

2. Loading Data Set in to HDFS for Spark Analysis Installation of Hadoop and cluster
management

(i) Installing Hadoop single node cluster in Ubuntu environment

(ii) Knowing the differencing between single node clusters and multi-node clusters

(iii) Accessing WEB-UI and the port number

(iv) Installing and accessing the environments such as hive and sqoop

3. File management tasks & Basic Linux commands

(i) Creating a directory in HDFS

(ii) Moving forth and back to directories

(iii) Listing directory contents

(iv) ploading and downloading a file in HDFS

(v) Checking the contents of the file

(vi) Copying and moving files

(vii) Copying and moving files between local to HDFS environment

(viii) Removing files and paths

(ix) Displaying few lines of a file

(x) Display the aggregate length of a file

(xi) Checking the permissions of a file

(xii) Zipping and unzipping the files with & without permission pasting it to a
location

(xiii) Copy, Paste commands

4. Map-reducing

(i) Definition of Map-reduce

(ii) Its stages and terminologies

(iii) Word-count program to understand map-reduce (Mapper phase, Reducer phase,

Driver code)

5. Implementing Matrix-Multiplication with Hadoop Map-reduce

6. Compute Average Salary and Total Salary by Gender for an Enterprise.

7. (i) Creating hive tables (External and internal)

(ii) Loading data to external hive tables from sql tables(or)Structured c.s.v using
scoop

(iii) Performing operations like filtrations and updations

(iv) Performing Join (inner, outer etc)

(v) Writing User defined function on hive tables

8. Create a sql table of employees Employee table with id,designation Salary table
(salary ,dept id) Create external table in hive with similar schema of above tables,Move data
to hive using scoop and load the contents into tables,filter a new table and write a UDF to
encrypt the table with AES-algorithm, Decrypt it with key to show contents

9. (i) Pyspark Definition(Apache Pyspark) and difference between Pyspark, Scala, pandas

(ii) Pyspark files and class methods

(iii) Get (file name)

(iv) Get root directory()

10. Pyspark -RDD’S

(i) What is RDD’s?

(ii) Ways to Create RDD

(iii) Parallelized collections

(iv) External dataset (v) existing RDD’s

(v) Existing RDD’s

(vi) Spark RDD’s operations (Count, foreach(), Collect, join,Cache()

11. Perform pyspark transformations

(i) Map and flatMap

(ii) To remove the words, which are not necessary to analyze this text.

(iii) groupBy

(iv) What if we want to calculate how many times each word is coming in corpus ?

(v) How do I perform a task (say count the words ‘spark’ and ‘apache’ in rdd3)
separately on each partition and get the output of the task performed in these
partition?

(vi) Unions of RDD

(vii) Join two pairs of RDD Based upon their key

12. Pyspark sparkconf-Attributes and applications

(i) What is Pyspark spark conf ()

(ii) Using spark conf create a spark session to write a data frame to read details in a
c.s.v and later move that c.s.v to another location

Hostel Allocation System 1-3
Document33 pages
Hostel Allocation System 1-3
factorial
60% (5)
How To Unblur OR Get CourseHero Free Unlock PDF
Document5 pages
How To Unblur OR Get CourseHero Free Unlock PDF
Henoke Man
No ratings yet
Logcat Home Fota Update Log
Document639 pages
Logcat Home Fota Update Log
Alenxander Borda Torres
No ratings yet
Worksheet 2.8 Relational Database and SQL
Document9 pages
Worksheet 2.8 Relational Database and SQL
Lorena Fajardo
No ratings yet
Hadoop Development Download Syllabus PDF
Document5 pages
Hadoop Development Download Syllabus PDF
shubham phulari
No ratings yet
Azure SQL Trainings: Contact: +91 90 32 82 44 67
Document6 pages
Azure SQL Trainings: Contact: +91 90 32 82 44 67
jeffa123
No ratings yet
226 Unit-7
Document26 pages
226 Unit-7
shivam saxena
No ratings yet
QCM Bigdata 1 Exampdf
Document7 pages
QCM Bigdata 1 Exampdf
younes khouna
No ratings yet
4 5969937999511686081
Document6 pages
4 5969937999511686081
Ahmed Mohamed
No ratings yet
Batch 13 Curricullam
Document11 pages
Batch 13 Curricullam
Devi Vara
No ratings yet
Azure Data Engineer + Databricks Content
Document7 pages
Azure Data Engineer + Databricks Content
sai b
No ratings yet
Big Data: Sqoop
Document43 pages
Big Data: Sqoop
Sheetal Vartak
No ratings yet
Question Bank - Big Data Analytics - Final1
Document6 pages
Question Bank - Big Data Analytics - Final1
Kajal Vaniya
No ratings yet
20dce017 Bda Pracfil
Document41 pages
20dce017 Bda Pracfil
Raj Chauhan
No ratings yet
Spart Part 2
Document44 pages
Spart Part 2
Aleena Nasir
100% (1)
C21053 Jay Vijay Karwatkar-Big Data Analytics & Visualization
Document210 pages
C21053 Jay Vijay Karwatkar-Big Data Analytics & Visualization
Jay Karwatkar
No ratings yet
Kick-Starter Kit For BigData Developers
Document7 pages
Kick-Starter Kit For BigData Developers
kim jong un
No ratings yet
Gold Video Task Complted
Document31 pages
Gold Video Task Complted
srinivas75k
No ratings yet
Hadoop Development Training in Bangalore
Document5 pages
Hadoop Development Training in Bangalore
Igeeks Technologies,Bangalore
No ratings yet
Introduction To Hive: Liyin Tang Liyintan@usc - Edu
Document24 pages
Introduction To Hive: Liyin Tang Liyintan@usc - Edu
poonam
No ratings yet
QB 3
Document2 pages
QB 3
Sushma S
No ratings yet
Final
Document276 pages
Final
Yàssine Hàdry
No ratings yet
BDA Lab2
Document8 pages
BDA Lab2
Mohit Gangwani
No ratings yet
Question Bank - 1
Document2 pages
Question Bank - 1
Priyanshu Kumar
No ratings yet
Hadoop Updated Course Content
Document3 pages
Hadoop Updated Course Content
nandy39
No ratings yet
Bigdata Syllabus
Document3 pages
Bigdata Syllabus
Sankar Terli
No ratings yet
Step by Step Guide For Data Engineering
Document9 pages
Step by Step Guide For Data Engineering
vlmohammedadnan9
No ratings yet
Big Data Syllabus For Theory and Lab
Document4 pages
Big Data Syllabus For Theory and Lab
chetana tukkoji
No ratings yet
Big Data - Explain Structured, Semi Structured and Unstructured Data
Document2 pages
Big Data - Explain Structured, Semi Structured and Unstructured Data
Sailpoint Course
No ratings yet
Nagarjuna Hadoop Resume
Document7 pages
Nagarjuna Hadoop Resume
recruiterkk
No ratings yet
Resume
Document7 pages
Resume
MA
100% (1)
Compiler Design
Document2 pages
Compiler Design
Shubham Kumar
No ratings yet
Cloudlab Exercise 11 Lesson 11
Document2 pages
Cloudlab Exercise 11 Lesson 11
benben08
No ratings yet
@bigdatalabfile 09
Document35 pages
@bigdatalabfile 09
goatrip2024
No ratings yet
Certificate Course in Data Engineering - 40hrs
Document2 pages
Certificate Course in Data Engineering - 40hrs
raja00999
No ratings yet
Big Data Analytics - Assignmen
Document1 page
Big Data Analytics - Assignmen
Goku pubgM
No ratings yet
Spark Intreview FAQ
Document21 pages
Spark Intreview FAQ
haranadhc
100% (1)
Daily Data Ingestion/data Node Capacity
Document2 pages
Daily Data Ingestion/data Node Capacity
Sam Unkil
No ratings yet
HD Insight
Document1,315 pages
HD Insight
Varsha Mishra
No ratings yet
Big-Data-Unit 5
Document54 pages
Big-Data-Unit 5
20egjcs091
No ratings yet
Midhun BIGDATA Curicullum
Document17 pages
Midhun BIGDATA Curicullum
Fukkk
No ratings yet
Hadoop Interviews Q
Document9 pages
Hadoop Interviews Q
S K
No ratings yet
Step by Step Guide For Data Engineering
Document7 pages
Step by Step Guide For Data Engineering
Shubham Jagdale
No ratings yet
Deepshikha Agrawal Pushp B.Sc. (IT), MBA (IT) Certification-Hadoop, Spark, Scala, Python, Tableau, ML (Assistant Professor JLBS)
Document47 pages
Deepshikha Agrawal Pushp B.Sc. (IT), MBA (IT) Certification-Hadoop, Spark, Scala, Python, Tableau, ML (Assistant Professor JLBS)
Ashita Punjabi
No ratings yet
CP7019-Managing Big Data-Anna University - Question Paper
Document4 pages
CP7019-Managing Big Data-Anna University - Question Paper
bhuvangates
100% (3)
Assignment 4-Gcc: Hive Is Not
Document3 pages
Assignment 4-Gcc: Hive Is Not
mini v
No ratings yet
GCC Unit 4 Questions
Document3 pages
GCC Unit 4 Questions
akhil golla
No ratings yet
Hadoop Developer
Document7 pages
Hadoop Developer
Anonymous YuDfWkAb
No ratings yet
Chapter 5 Hive
Document69 pages
Chapter 5 Hive
Komal
No ratings yet
Questions Certif BigData
Document12 pages
Questions Certif BigData
Bendouda
No ratings yet
Hadoop Admin Download Syllabus PDF
Document4 pages
Hadoop Admin Download Syllabus PDF
shubham phulari
No ratings yet
Kcs 061 PPT Unit 2
Document56 pages
Kcs 061 PPT Unit 2
PRACHI ROSHAN
No ratings yet
Unit-Vi Hive Hadoop & Big Data
Document24 pages
Unit-Vi Hive Hadoop & Big Data
Abhay Dabhade
100% (1)
PDF
Document23 pages
PDF
HAMDI GDHAMI
No ratings yet
Real Time Hadoop Interview Questions From Various Interviews
Document6 pages
Real Time Hadoop Interview Questions From Various Interviews
Saurabh Gupta
No ratings yet
SL-VI Assignment
Document4 pages
SL-VI Assignment
noxex
No ratings yet
Projects Profile Project #4: Tax System. Hadoop Developer Speedway - Enon, Oh. June 2015 To Present
Document4 pages
Projects Profile Project #4: Tax System. Hadoop Developer Speedway - Enon, Oh. June 2015 To Present
rameshborukati
No ratings yet
Projects Profile Project #4: Tax System. Hadoop Developer Speedway - Enon, Oh. June 2015 To Present
Document4 pages
Projects Profile Project #4: Tax System. Hadoop Developer Speedway - Enon, Oh. June 2015 To Present
rameshborukati
No ratings yet
PH: 601-691-1228 Linkedin:: Karthik Potharaju Sr. Hadoop/Big Data Developer
Document5 pages
PH: 601-691-1228 Linkedin:: Karthik Potharaju Sr. Hadoop/Big Data Developer
Vinoth
No ratings yet
Notes
Document53 pages
Notes
Radheshyam Shah
No ratings yet
SME Prep
Document5 pages
SME Prep
Guruprasad Vijayakumar
No ratings yet
Some of The Frequently Asked Interview Questions For Hadoop Developers Are
Document72 pages
Some of The Frequently Asked Interview Questions For Hadoop Developers Are
Amandeep Singh
100% (1)
DA Unit-5
Document78 pages
DA Unit-5
Gio
No ratings yet
Professional Hadoop Solutions
From Everand
Professional Hadoop Solutions
Boris Lublinsky
Rating: 4 out of 5 stars
4/5 (2)
Smart Parking Sensors Brochure
Document3 pages
Smart Parking Sensors Brochure
saikrishnajbs
No ratings yet
Safe From Violence
Document40 pages
Safe From Violence
haoxufei88
No ratings yet
National Institute of Technology, Jamshedpur: Training and Skill Internship On
Document4 pages
National Institute of Technology, Jamshedpur: Training and Skill Internship On
Mail Bot
No ratings yet
Videojet Software Release Notes
Document5 pages
Videojet Software Release Notes
mike mike
No ratings yet
Format of Synopsis of MiniProject 1
Document4 pages
Format of Synopsis of MiniProject 1
avenger
No ratings yet
R1422515P1 J - Vendor Presentation - Motorola Solutions Inc
Document69 pages
R1422515P1 J - Vendor Presentation - Motorola Solutions Inc
Jesus Rafael Bastardo
No ratings yet
Advances in Data Science and Analytics Concepts and Paradigms - Advances in Data Science and Analytics Concepts and Paradigms (M. Niranjanamurthy, Hemant Kumar Gianey Etc.)
Document353 pages
Advances in Data Science and Analytics Concepts and Paradigms - Advances in Data Science and Analytics Concepts and Paradigms (M. Niranjanamurthy, Hemant Kumar Gianey Etc.)
jonathan Herrera
No ratings yet
Gardening in The Greenhouse For Kids by Slidesgo
Document57 pages
Gardening in The Greenhouse For Kids by Slidesgo
Ara Mae A. Manabat
No ratings yet
Sorting
Document3 pages
Sorting
ARYAN RATHORE
No ratings yet
Responsive Web Design With CSS
Document52 pages
Responsive Web Design With CSS
mojtaba sardarabadi
No ratings yet
Ring Counter PDF
Document6 pages
Ring Counter PDF
حسنين غانم خضير
No ratings yet
Arduino Based Traffic Congestion Control
Document6 pages
Arduino Based Traffic Congestion Control
Devaraj Yadav
No ratings yet
DHI-KTP01: IP Villa Outdoor Station & Indoor Monitor
Document3 pages
DHI-KTP01: IP Villa Outdoor Station & Indoor Monitor
ERIKA JHOSELINE CRUZ CALLE
No ratings yet
Quickspecs: HP Z2 G9 Tower Workstation Desktop PC
Document57 pages
Quickspecs: HP Z2 G9 Tower Workstation Desktop PC
Raju Bhai
No ratings yet
4.2 1 Maintain Computer Systems 2
Document15 pages
4.2 1 Maintain Computer Systems 2
Alexis Rhianne
No ratings yet
bq34z100-G1 Wide Range Fuel Gauge With Impedance Track™ Technology
Document65 pages
bq34z100-G1 Wide Range Fuel Gauge With Impedance Track™ Technology
Pcross
No ratings yet
Artikel 2 (Pendukung) Preparing-For-The-Next-Normal-Via-Digital-Manufacturings-Scaling-Potential
Document10 pages
Artikel 2 (Pendukung) Preparing-For-The-Next-Normal-Via-Digital-Manufacturings-Scaling-Potential
Erik Gunawan
No ratings yet
Social Media and Impacts On The Children
Document3 pages
Social Media and Impacts On The Children
Affan Azam
50% (2)
Class 10 Maths CBSE PYQ Chapter Wise Topic Wise
Document188 pages
Class 10 Maths CBSE PYQ Chapter Wise Topic Wise
Shib sankar
No ratings yet
Python - Course Contents
Document5 pages
Python - Course Contents
jay
No ratings yet
The Ultimate Guide To Iphone Resolutions
Document3 pages
The Ultimate Guide To Iphone Resolutions
khaled alahmad
No ratings yet
FHC50 FHM10 Fume Hood Controller Monitor 6003830H Web
Document112 pages
FHC50 FHM10 Fume Hood Controller Monitor 6003830H Web
Antonio Bocanegra
No ratings yet
Agreement For Production of A Website: Authorization
Document5 pages
Agreement For Production of A Website: Authorization
Renato Caetano
No ratings yet
C Language Lab Manualr18 PPS Lab Manual - 07.03.2019 PDF
Document121 pages
C Language Lab Manualr18 PPS Lab Manual - 07.03.2019 PDF
Nagarjuna Boyalla
No ratings yet
Benton and Andersen - Technology Arbitration Revisited
Document25 pages
Benton and Andersen - Technology Arbitration Revisited
Florosia Starshine
No ratings yet
Max17048/Max17049 Micropower 1-Cell/2-Cell Li+ Modelgauge Ics
Document19 pages
Max17048/Max17049 Micropower 1-Cell/2-Cell Li+ Modelgauge Ics
Jan Kubala
No ratings yet