You are on page 1of 19

Zero Lecture

BIG DATA ANALYTICS LAB


BCA04206

From:
Megha Garg

1
Objective

 Self Introduction
 Introduction of Course
 Syllabus
 Course Objective : Course Outcome, Why, What this subject
 Relevance of subject
 ABC Analysis of Subject
 Certification
 Library Resource
 Online Reference & MOOC
 Examination System

2
Self Introduction

Name : Megha GArg

Qualification : B.Tech (Honors)

Designation : Assistant Professor

Research Area : Complex Networks

Experience : 6+Yrs.

E-mail Id : megha.garg@poornima.edu.in

Research Paper : 1 Paper

Other : 15 certification (3+ MOOC course , 10+FDP )

Contact No. : +91-8233095306

3
Introduction of Course

• The Big Data Analytics Lab (BDaL), is an interdisciplinary research laboratory, that focuses on large-scale
data analytics problems that arise in different application domains and disciplines.
• One of the primary focus of our lab is to investigate an alternative computational paradigm that involves
"humans-in-the-loop" for large-scale analytics problems.
• These problems arise at different stages in a traditional data science pipeline (e.g., data cleaning, query
answering, ad-hoc data exploration, or predictive modeling
• We study optimization opportunities that come across because of this unique man-machine collaboration
and address data management and computational challenges to enable large-scale analytics with humans-in-
the-loop

4
5
What is Big Data?

• Big Data is a collection of data that is huge in volume, yet growing exponentially with time.
• It is a data with so large size and complexity that none of traditional data management tools can store it or process it efficiently.
• Big data is also a data but with huge size
• The New York Stock Exchange is an example of Big Data that generates about one terabyte of new trade data per day.
• The statistic shows that 500+terabytes of new data get ingested into the databases of social media site Facebook, every day
• Types of Big Data:
1. Structured
2. Unstructured
3. Semi-structured
• Any data that can be stored, accessed and processed in the form of fixed format is termed as a ‘structured’ data.
• Any data with unknown form or the structure is classified as unstructured data.Exp: Output returned by Google Search.
• Semi-structured data can contain both the forms of data. We can see semi-structured data as a structured in form but it is actually not defined
with e.g. a table definition in relational DBMS.

6
Why Big Data Analtyics Lab?

• To learn hidden patterns, unknown correlations, market trends, and customer preferences
• To know about data filtering,data extraction,data aggregation and data analysis.
• For visualization of data.
• Practical implementation of the above data structures.

7
Syllabus
Code: BCA04206 Big Data Analytics Lab 1.5 Credits [LTP:0-0-3]
Students are supposed to create a single node cluster to execute programs for Big Data Analytics lab.
The objective for this exercise is to make you industry ready for handling roles and responsibilities
for Big Data Administrator and as well as Big Data development

Part A
Code: BCA04206 Big Data Experiment 1:
Prepare infrastructure and understand objective for software requirement for setting up single node
Hadoop cluster.
 WinSCP
 Putty
 Ubuntu
 VMPlayer
 Hadoop version

Experiment 2:
Create single node Hadoop cluster.
 Installing Ubuntu on VM
 Installing Java
 SSH Configuration
8
 Core-site.xml Configuration
• Yarn-site.xml Configuration

•Experiment 3:
•Testing Single Node cluster, Web UI ports and Exploring different daemons of Hadoop Cluster.
•Experiment 4:
•Perform / Execute below sets of Hadoop basic commands:
• appendToFile
•cat
•Chgrp
chmod
• chown
• copyFromLocal
• copyToLocal
• count
• cp

9
• Experiment 5:
• Perform / Execute below sets of Hadoop basic commands:
• du
• dus
• expunge
• get
• getfacl
• getfattr
• getmerge
• Ls
• Lr
• Mkdir

10
Part B
• Experiment 6:
• Perform / Execute below sets of Hadoop basic commands:
•  moveFromLocal
•  moveToLocal
•  mv
•  put
•  rm
•  rmr
•  setfacl
•  setfattr
•  setrep
•  stat
•  tail
•  test
•  text
•  touchz
11
• Experiment 7:
• Install eclipse IDE on single node cluster for executing MapReduce Job and understand the role of dependent
libraries for processing job.
• Experiment 8:
• Perform a Map Reduce word count job for a given input file by configuring Number of Reducer 2.
• Experiment 9:
• Perform a Map Reduce word count job for a given input file by configuring Number of Reducer 6 and Analyze
Experiment 8 and 9.
• Experiment 10:
• Perform a Map Reduce word count job for a given input file by configuring only Mapper (No reducer is involved)
and Analyze Experiment 8, 9 and 10.
• Experiment 11:
• Implement one executable Hadoop MapReduce program to perform the inner join of two tables based on “Student
ID” . You can create sample data in below format and can further execute this exercise

12
13
Course Outcome

CO’s OUTCOME

CO1: Learn the development of a Hadoop cluster


CO2: Explore the daemons of Hadoop Cluster.

CO3:
Interperet and Perform basic Hadoop commands.
CO4:
Interpret the use of Map Reduce Programs.
CO5:
Enhance implementable map reduce programs.

14
Relevance

• Relevance to Branch: It teaches you how to create your own data cluster and use it to solve mapreduce problems.

• Relevance to Society: For designing and analyzing map reduce problems.

• Relevance to Self: Understanding of Big Data Analytics tools and data cleaning processes

• Relation with laboratory: For storing data into Hadoop clusters and also for using mapreduce we should be able to know
how to apply big data analytics tools.
• Connection with previous year and next year:Big Data is a new subject currently in trend in research.

• Potential for career: Knowledge of Big Data Analtyics Lab and its advance subjects are requirement for any CS job and
this subject will help in research work as to analysis the mapreduce problems.
 

15
Books/ Website/Journals & Handbooks/ Association & Institution
S. No. Title of Book Authors Publisher
Text Book
T1 Big Data Analtyics Made Easy Lakshmi Prasad

Websites related to course

1 https://centers.njit.edu/bdal/node/62

2 https://www.greatlearning.in/academy/learn-for-free/courses/mastering-big-data-analytics

16
Outcome
1. Able to know the role of big data in data analytics.
2. Analyze the various stages of data preprocessing –filtering, extraction,aggregation, analysis.
3. Describe the use of Hadoop Cluster.
4. Describe the use of Hadoop mapreduce.

17
Online Reference & MOOC

https://nptel.ac.in/courses/106/104/106104189/
https://onlinecourses.nptel.ac.in/noc20_cs92/preview
https://nptel.ac.in/courses/106/106/106106142/
https://www.coursera.org/specializations/jhu-data-science?action=e
nroll
https://www.upgrad.com/big-data-pgd-iiitb

18
19

You might also like