Professional Documents
Culture Documents
Unit 1
Unit 1
Agenda
• What is Big data?
• Real world examples
• Structuring big data
• Types of data
Presented by : A.H.Shanthakumara
Big Data
What is Bigdata?
➢Lots of data is being collected and warehoused
➢Web data, e-commerce
➢purchases at department/grocery stores
➢Bank/Credit Card transactions
➢Social Network
➢A new data challenge that required leveraging
existing systems differently
➢Classified in terms of four V’s: volume variety velocity
and veracity
➢Usually unstructured and qualitative in nature
➢The process of capturing or collecting big data is
known as datafication.
➢ big data is datafied so that it can be used productivity
Presented by : A.H.Shanthakumara
Big Data
Presented by : A.H.Shanthakumara
Big Data
Presented by : A.H.Shanthakumara
Big Data
Types of data:
➢ Data is obtained primarily from two types of sources
1. Internal sources
➢ Provides a structured for organized data
➢ Used to support daily business operations
➢ Data generated from CRM, ERP, OLTP, POS., etc..
2. External sources
➢ Provides unstructured or unorganized data
➢ Often analyzed to understand the entities mostly external to the
organization
➢ the data generated from social media, internet, government agencies,
Syndicate data suppliers
Presented by : A.H.Shanthakumara
Big Data
Types of data:
➢ On the basis of the data received from various sources, the Big
data comprises: Structured data, unstructured data and semi-
structured data.
➢ Typically, the unstructured data is larger in volume then the
structured and semi structured data
Presented by : A.H.Shanthakumara
Big Data
Structured data:
➢ It can be defined as the data that has defined repeating pattern
➢ Much easier and faster to process
➢ Mostly in tabular form
➢ Fixed fields within a record or a file
➢ Used to query and report against predetermined data types
➢ Some sources of structured data
➢ Relational databases
➢ Flat files in the form of records
➢ multi dimensional databases
➢ Legacy databases
Presented by : A.H.Shanthakumara
Big Data
Unstructured data:
➢ It is a set of data that might or might not have any logical or
repeating patterns
➢ Typically of meta data
➢ Inconsistent data obtained from files, social media, website,
satellite etc..
➢ Data in different formats such as emails, text, audio, video or
images
➢ Some sources of unstructured data:
➢ Text data both internal and external to the organization
➢ Social media
➢ Mobile data
Presented by : A.H.Shanthakumara
Big Data
Presented by : A.H.Shanthakumara
Big Data
Semi-structured data:
➢ Also known as having a schema-less or self describing
structure.
➢ Data is stored inconsistently in rows and columns of a database
➢ Some sources of semi structured data:
➢ File systems such as web data in the form of cookies
Presented by : A.H.Shanthakumara
Big Data
Agenda
• Elements of big data
• Big Data Analytics
– Advantages of Big Data Analytics
– The application areas of big data
Presented by : A.H.Shanthakumara
Big Data
Presented by : A.H.Shanthakumara
Big Data
Presented by : A.H.Shanthakumara
Big Data
Presented by : A.H.Shanthakumara
Big Data
Presented by : A.H.Shanthakumara
Big Data
Agenda
• Use of big data in social networks
• Use of big data in preventing fraudulent
activities
• Use of big data in Retail Industry
Presented by : A.H.Shanthakumara
Big Data
a wide audience.
Presented by : A.H.Shanthakumara
Big Data
Use of big data in Retail Industry:
Agenda
• Distributed and parallel computing for big
data
• How data models and computer models
are different?:
Presented by : A.H.Shanthakumara
Big Data
big data
circumstances.
Presented by : A.H.Shanthakumara
Big Data
Presented by : A.H.Shanthakumara
Big Data
Presented by : A.H.Shanthakumara
Big Data
➢ Generate notations of
job divided into tasks
➢ Implements MapReduce
computing model
➢ Consider every task as
either a map or a reduce
Presented by : A.H.Shanthakumara
Big Data
Agenda
• Introducing Hadoop
• Hadoop multinode cluster architecture
• Important features of Hadoop
• How does hadoop function?
Presented by : A.H.Shanthakumara
Big Data
Introducing Hadoop:
➢ Traditional technologies are incapable to handle large data
➢ Combined number of Technologies and products into system
that can overcome the challenges faced by traditional
processing systems (Hadoop)
➢ Hadoop is an open source platform that provides Analytical
Technologies and computational power
➢ Provides an improved programming model
➢ There are two main components: Hadoop distributed file
system(HDFS) and the MapReduce
➢ Hdfs is used for storage and mapreduce used for processing
Presented by : A.H.Shanthakumara
Big Data
MapReduce:
➢ A Framework that helps developers to write programs to process
large volume of unstructured data in parallel over a distributed
architecture
➢ Programmers use mapreduce libraries to build talk without
communication or coordination between nodes
➢ Performs all mathematical computations
➢ Parallel and distributed implementation provides high
performance
➢ Each node will periodically report it’s status to masternode
➢ If a node does not respond as expected the Masternode re-
MapReduce:
Presented by : A.H.Shanthakumara
Big Data
MapReduce:
➢ Mapreduce consists of several components, few are
➢ Jobtracker: Masternode that manages all jobs and resources in a
cluster
➢ Tasktrackers: agents deployed at each machine in the cluster to run
the map and reduce task at the terminal
➢ JobHistoryServer: component that tracks completed jobs
Presented by : A.H.Shanthakumara
Big Data
Presented by : A.H.Shanthakumara
Big Data
How does hadoop function?:
➢ Hadoop cluster are created from the racks of commodity machines
➢ Tasks are created and distributed across these nodes.
➢ Nodes are allowed to work independently and provide their responses
to the starting node.
➢ Hadoop can add or remove node dynamically in a cluster.
➢ Accomplishes its operations with mapreduce model.
➢ Mapreduce model comprises two functions: mapper and reducer
➢ Mapper maps the computational subtask to different nodes, handles
load balancing and managing failure recovery.
➢ Reducer reduces the responses from compute nodes to a single result.
➢ Aggregate all the elements together after the completion of the
distributed computation Presented by : A.H.Shanthakumara
Big Data
How does hadoop function?:
Presented by : A.H.Shanthakumara
Big Data
Agenda
• Cloud computing and big data
• Cloud Computing model
• Features of Cloud computing
• Cloud deployment models
• Cloud delivery models
• In-memory computing Technology for big
data
Presented by : A.H.Shanthakumara
Big Data
Presented by : A.H.Shanthakumara
Big Data
Presented by : A.H.Shanthakumara
Big Data
➢ Cloud Computing uses data canters to collect data and ensure the data
backup and recovery automatically performed.
➢ Both cloud computing and Big Data Analytics use the distributed
computing model
Presented by : A.H.Shanthakumara
Big Data
Presented by : A.H.Shanthakumara
Big Data
Presented by : A.H.Shanthakumara
Big Data
Presented by : A.H.Shanthakumara
Big Data
Presented by : A.H.Shanthakumara
Big Data
Community cloud:
Hybrid cloud:
➢Cloud environment in which
various internal or external
service providers offer
services to many
organisations
➢An organisation can use
both private and public cloud
together
➢The organisation can
manage an internal private
cloud for general use and may
access public cloud during the
peak periods
Presented by : A.H.Shanthakumara
Big Data
Presented by : A.H.Shanthakumara
Big Data
Presented by : A.H.Shanthakumara
Big Data
Presented by : A.H.Shanthakumara