Lect - 11 - BIG DATA

Uploaded by

Rasika Malode

0% found this document useful (0 votes)

4 views42 pages

Original Title

Lect_11_BIG DATA

Copyright

Available Formats

PPTX, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PPTX, PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

4 views42 pages

Lect - 11 - BIG DATA

Uploaded by

Rasika Malode

Copyright:

Available Formats

Download as PPTX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 42

Search inside document

 Big Data is a data analysis methodology enabled by recent advances in

technologies that support high-velocity data capture, storage and analysis.

 Data sources extend beyond the traditional corporate database to include
emails, mobile device outputs, and sensor-generated data where data is no
longer restricted to structured database records but rather unstructured data
having no standard formatting.
 Apache Flume is a distributed, reliable, and available system for efficiently
collecting, aggregating and moving large amounts of log data from many
different sources to a centralized data store.
 Flume deploys as one or more agents, each contained within its own
instance of the Java Virtual Machine (JVM). Agents consist of three
pluggable components: sources, sinks, and channels.
 Multiple Flume agents can be connected together for more complex
workflows by configuring the source of one agent to be the sink of another.
Flume sources listen and consume events.
 Channels are the mechanism by which Flume agents transfer events from
their sources to their sinks. Events written to the channel by a source are
not removed from the channel until a sink removes that event in a
transaction.
 Sink is an interface implementation that can remove events from a channel
and transmit them to the next agent in the flow, or to the event’s final
destination and also sinks can remove events from the channel in
transactions and write them to output.
 Hive is a technology developed by Facebook that turns Hadoop into a
data warehouse complete with a dialect of SQL for querying. Being a
SQL dialect, HIVEQL is a declarative language.
 MongoDB is an open source, document-oriented NoSQL database that
has lately attained some space in the data industry. It is considered as one
of the most popular NoSQL databases, competing today and favors
master-slave replication.
 The role of master is to perform reads and writes whereas the slave
confines to copy the data received from master, to perform the read
operation, and backup the data.
 The query system of MongoDB can return particular fields and query set
compass search by fields, range queries, regular expression search, etc.
and may include the user-defined complex JavaScript functions.
 MongoDB practice flexible schema and the document structure in a
grouping, called Collection, may vary and common fields of various
documents in a collection can have disparate types of the data.
 The Apache Hadoop software library is a framework that enables the
distributed processing of large data sets across clusters of computers. It is
designed to scale up from single servers to thousands of machines, with
each offering local computation and storage.
 Data may contain valuable information, which sometimes is not properly
explored by existing systems. Most of this data is stored in a nonstructured
manner, using different languages and format, which, in many cases, are in
compatible.
 Hadoop is a popular choice when you need to filter, sort, or pre-process
large amounts of new data in place and distill it to generate denser data that
theoretically contains more information.
 Distributions must integrate with data warehouses, databases, and other
data management products so data can move among Hadoop clusters and
other environments to expand the data pool to process or query.

The Bits and Bytes of Computer Networking
Document55 pages
The Bits and Bytes of Computer Networking
Aravind Dhananjeyan
83% (6)
Evolution of Processors
Document6 pages
Evolution of Processors
Ashley Nohay
No ratings yet
AdvaCommand For Unix
Document43 pages
AdvaCommand For Unix
mr.sany
100% (1)
Kafka Up and Running for Network DevOps: Set Your Network Data in Motion
From Everand
Kafka Up and Running for Network DevOps: Set Your Network Data in Motion
Eric Chou
No ratings yet
Hortonworks Data Platform (HDP)
Document56 pages
Hortonworks Data Platform (HDP)
Harshit Bansal
100% (1)
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet
A Review Paper On Big Data Database'S: Cassandra, Hbase, Hive
Document6 pages
A Review Paper On Big Data Database'S: Cassandra, Hbase, Hive
shani thakur
No ratings yet
Chapter - 2 Hadoop
Document32 pages
Chapter - 2 Hadoop
Rahul Pawar
No ratings yet
Hadoop Ecosystem Components
Document6 pages
Hadoop Ecosystem Components
Kittu
No ratings yet
Bda Lab Manual
Document40 pages
Bda Lab Manual
vishalatdwork573
0% (1)
Big Data & Hadoop
Document7 pages
Big Data & Hadoop
Nowshin Jerin
No ratings yet
Big Data & Hadoop
Document7 pages
Big Data & Hadoop
Nowshin Jerin
No ratings yet
BDA Report
Document11 pages
BDA Report
Akshath Kandlur
No ratings yet
Activity: NAME: Chogle Saif Ali ROLLNO.: 12CO27 Class: Be-Co Summary: Components of Hadoop Ecosystem
Document5 pages
Activity: NAME: Chogle Saif Ali ROLLNO.: 12CO27 Class: Be-Co Summary: Components of Hadoop Ecosystem
Saif Chogle
No ratings yet
Big Data Technology Stack
Document12 pages
Big Data Technology Stack
Khalid Imran
No ratings yet
Big Data Introduction & Ecosystems
Document4 pages
Big Data Introduction & Ecosystems
Harish Ch
No ratings yet
Big Data and Hadoop - Suzanne
Document5 pages
Big Data and Hadoop - Suzanne
Tripti Sagar
No ratings yet
2 Hadoop
Document20 pages
2 Hadoop
YASH PRAJAPATI
No ratings yet
Haddob Lab Report
Document12 pages
Haddob Lab Report
Magneto Eric Apollyon Thorn
No ratings yet
Lecture Notes: Data Ingestion For Structured/Unstructured Data
Document31 pages
Lecture Notes: Data Ingestion For Structured/Unstructured Data
Yuvaraj V, Assistant Professor, BCA
No ratings yet
Big Data Tools and Techniques
Document12 pages
Big Data Tools and Techniques
Nivash
No ratings yet
Getting Started With HDP Sandbox
Document107 pages
Getting Started With HDP Sandbox
risdianto sigma
No ratings yet
BigData Unit 2
Document15 pages
BigData Unit 2
Sreedhar Arikatla
No ratings yet
Apache Hadoop Technology
Document1 page
Apache Hadoop Technology
Seethal Kumars
No ratings yet
Big Data Ana Unit - II Part - II (Hadoop Architecture)
Document47 pages
Big Data Ana Unit - II Part - II (Hadoop Architecture)
Mokshada Yadav
No ratings yet
Big Data Overview
Document39 pages
Big Data Overview
noor khan
No ratings yet
Big Data Analytics
Document13 pages
Big Data Analytics
Neha Kolte
No ratings yet
CC-KML051-Unit V
Document17 pages
CC-KML051-Unit V
Fdjs
No ratings yet
BDA Lab Assignment 3 PDF
Document17 pages
BDA Lab Assignment 3 PDF
parth shah
No ratings yet
Data Analytics Unit-3 Notes
Document21 pages
Data Analytics Unit-3 Notes
18R11A0530 MUSALE AASHISH
No ratings yet
Unit 2
Document56 pages
Unit 2
Ramstage Testing
No ratings yet
Demystifying The Big Data Ecosystem... - Param Natarajan
Document8 pages
Demystifying The Big Data Ecosystem... - Param Natarajan
Param
100% (1)
15 Big Data Tools and Technologies To Know About in 2021
Document7 pages
15 Big Data Tools and Technologies To Know About in 2021
viay
No ratings yet
What Is The Hadoop Ecosystem
Document5 pages
What Is The Hadoop Ecosystem
Zahra Mea
No ratings yet
Hadoop
Document7 pages
Hadoop
Anonymous mFO6slhI0
No ratings yet
S - Hadoop Ecosystem
Document14 pages
S - Hadoop Ecosystem
trancongquang2002
No ratings yet
BDA Unit 3
Document6 pages
BDA Unit 3
Sp
No ratings yet
Module 2. 16974328568170
Document113 pages
Module 2. 16974328568170
Sagar B S
No ratings yet
Data Analytics and Hadoop
Document21 pages
Data Analytics and Hadoop
pradeep
No ratings yet
What Is The Hadoop Ecosystem?
Document4 pages
What Is The Hadoop Ecosystem?
Maanit Singal
No ratings yet
Guide to Components of the Hadoop Ecosystem
Document17 pages
Guide to Components of the Hadoop Ecosystem
V Kalyan
No ratings yet
Hadoop Ecosystem: Hdfs Mapreduce Yarn Hadoop Common
Document5 pages
Hadoop Ecosystem: Hdfs Mapreduce Yarn Hadoop Common
Harshdeep850
No ratings yet
Ibm Hadoop
Document4 pages
Ibm Hadoop
4022 MALISHWARAN M
No ratings yet
BDA Unit 2 Q&A
Document14 pages
BDA Unit 2 Q&A
viswakranthipalagiri
No ratings yet
Assignment 6
Document12 pages
Assignment 6
Pujan Patel
No ratings yet
Hadoop Ecosystem Components Explained
Document38 pages
Hadoop Ecosystem Components Explained
dnyanbavkar
No ratings yet
Big Data Module 2
Document23 pages
Big Data Module 2
Srikanth M
No ratings yet
Module-2 - Introduction To Hadoop
Document13 pages
Module-2 - Introduction To Hadoop
shreya
No ratings yet
777 1651400645 BD Module 3
Document62 pages
777 1651400645 BD Module 3
nimmy
No ratings yet
Unit 5 - Introduction To Hadoop
Document50 pages
Unit 5 - Introduction To Hadoop
Shree Shak
No ratings yet
Research Paper On Apache Hadoop
Document6 pages
Research Paper On Apache Hadoop
soezsevkg
100% (1)
Abstract Hadoop
Document1 page
Abstract Hadoop
143himabindu
No ratings yet
Big Data Links
Document7 pages
Big Data Links
Sijee Sadasivan
No ratings yet
Bda 18CS72 Mod-2
Document152 pages
Bda 18CS72 Mod-2
Dhathri Reddy
No ratings yet
Big Data
Document16 pages
Big Data
roushan singh
No ratings yet
Apache Hadoop
Document11 pages
Apache Hadoop
Imaad Ukaye
No ratings yet
h13999 Hadoop Ecs Data Services WP
Document9 pages
h13999 Hadoop Ecs Data Services WP
Vijay Reddy
No ratings yet
Assignment 0
Document5 pages
Assignment 0
Aayush Kumar
No ratings yet
Techniques of Handling Big Data All Sessions
Document56 pages
Techniques of Handling Big Data All Sessions
Vaboy
100% (1)
6 H Data With Hive Big Data Analytics B.tech. Final Year
Document24 pages
6 H Data With Hive Big Data Analytics B.tech. Final Year
RISHIKA ARORA
No ratings yet
.Analysis and Processing of Massive Data Based On Hadoop Platform A Perusal of Big Data Classification and Hadoop Technology
Document3 pages
.Analysis and Processing of Massive Data Based On Hadoop Platform A Perusal of Big Data Classification and Hadoop Technology
Precious Pearl
No ratings yet
Hadoop Presentation: Swarnali B.SC Computer Science Hons. 2 Year Chandernagore Govt. College Halder
Document8 pages
Hadoop Presentation: Swarnali B.SC Computer Science Hons. 2 Year Chandernagore Govt. College Halder
Akash Halder
No ratings yet
Hadoop Ecosystem
Document56 pages
Hadoop Ecosystem
RUGAL NEEMA MBA 2021-23 (Delhi)
No ratings yet
Instructions Admin Ui Backend Authentication
Document18 pages
Instructions Admin Ui Backend Authentication
Ahlam Hana
No ratings yet
Deploy IIB and MQ Docker Container with Bluemix Continuous Delivery
Document16 pages
Deploy IIB and MQ Docker Container with Bluemix Continuous Delivery
Gerard Sorto
No ratings yet
Windows-Based Terminal
Document2 pages
Windows-Based Terminal
Liliana Jaramillo
No ratings yet
Universal Serial Bus USB Report
Document24 pages
Universal Serial Bus USB Report
shiva
No ratings yet
Firebird Connection Strings - ConnectionStrings
Document4 pages
Firebird Connection Strings - ConnectionStrings
boehe
No ratings yet
Oracle Database 12c: Backup and Recovery Workshop: Student Guide - Volume 1 D78850GC20 Edition 2.0 - March 2015 - D90708
Document24 pages
Oracle Database 12c: Backup and Recovery Workshop: Student Guide - Volume 1 D78850GC20 Edition 2.0 - March 2015 - D90708
Hamzeh Wael Shaheen
No ratings yet
Amd Ryzen 5
Document6 pages
Amd Ryzen 5
Nixxi
No ratings yet
Implementation of Approximate Half Precision Floating Point Multiplier Using Verilog
Document3 pages
Implementation of Approximate Half Precision Floating Point Multiplier Using Verilog
usutkarsh0705
No ratings yet
MOTHERBOARD INFO Biw1b
Document64 pages
MOTHERBOARD INFO Biw1b
shawnrod
No ratings yet
EUB9603H: Wireless 802.11 B/G/N USB Adapter HIGH POWER (MAX)
Document3 pages
EUB9603H: Wireless 802.11 B/G/N USB Adapter HIGH POWER (MAX)
Ghazwan Nabeel Al-Ehealy
No ratings yet
Service Manual: PCG-FXA32/FXA33/ /FXA36
Document28 pages
Service Manual: PCG-FXA32/FXA33/ /FXA36
Frank Elizalde
No ratings yet
Unit 7 Coa
Document20 pages
Unit 7 Coa
Arshad Khan Sturay
No ratings yet
Debug 1214
Document20 pages
Debug 1214
msholeh
No ratings yet
Sangoma A116 16-Span T1/E1/J1 Card Datasheet
Document2 pages
Sangoma A116 16-Span T1/E1/J1 Card Datasheet
maple4VOIP
No ratings yet
95386-Hia XNMR DS
Document2 pages
95386-Hia XNMR DS
Palika Medina
No ratings yet
VB5
Document24 pages
VB5
Vijini Pathiraja
No ratings yet
3rd Generation Intel® Core™ I5 Processors
Document2 pages
3rd Generation Intel® Core™ I5 Processors
8336121
No ratings yet
AWS Infrastructure Specifications and Cost Estimates
Document1 page
AWS Infrastructure Specifications and Cost Estimates
Sharif Razi
100% (1)
Interconnecting Cisco Networking Devices Part 2 (ICND2 v1.0)
Document17 pages
Interconnecting Cisco Networking Devices Part 2 (ICND2 v1.0)
amapreet.scorpio
No ratings yet
OpenCore 0.6.0 Configuration
Document74 pages
OpenCore 0.6.0 Configuration
polaris44
No ratings yet
Chapter 1 Quiz Routing and Switching Essentials Review and
Document6 pages
Chapter 1 Quiz Routing and Switching Essentials Review and
Andy Bertolozzi
50% (2)
Autoexec
Document6 pages
Autoexec
Jan anthony Panaguiton
No ratings yet
WP SNMP in DeltaV
Document8 pages
WP SNMP in DeltaV
إسماعيل محمد
No ratings yet
Koe081: Cloud Computing: Detailed Syllabus 3-1-0 Unit Topic Proposed I
Document1 page
Koe081: Cloud Computing: Detailed Syllabus 3-1-0 Unit Topic Proposed I
Anmol
No ratings yet
POSIX Thread Programming: Pthreads
Document26 pages
POSIX Thread Programming: Pthreads
azmat_alikhan
No ratings yet
SGOS Upgrade Downgrade Guide PDF
Document100 pages
SGOS Upgrade Downgrade Guide PDF
shimorix
No ratings yet
CMS200 English User Manual
Document22 pages
CMS200 English User Manual
tracer111
No ratings yet