You are on page 1of 21

• Data Warehouse & Data Mart

• Big Data
• MapReduce

@SPJIMR Courage . Heart


Data Warehouse

• A data warehouse is a database that stores current and historical data of


potential interest to decision makers throughout the company
• The data originate in many core operational transaction systems, such as systems
for sales, customer accounts, and manufacturing, and may include data from Web
site transactions
• The data warehouse consolidates and standardizes information from different
operational databases so that the information can be used across the enterprise
for management analysis and decision making

@SPJIMR Courage . Heart


Data Mart

• Companies often build enterprise-wide data warehouses, where a


central data warehouse serves the entire organization, or they
create smaller, decentralized warehouses called data marts
• A data mart is a subset of a data warehouse in which a
summarized or highly focused portion of the organization’s data
resides
• For example, a company might develop marketing and sales data
marts to deal with customer information

@SPJIMR Courage . Heart


Datawarehouse & Datamarts

@SPJIMR Courage . Heart


Online Analytical Processing

• OLAP supports
multidimensional data
analysis
• Enables users to view the
same data in different ways
using multiple dimensions.
• Each aspect of information
—product, pricing, cost,
region, or time period—
represents a different
dimension.

@SPJIMR Courage . Heart


Big Data

• Data is measured by 3V's:


• Volume: TBs & Petabytes
• Velocity: TB/sec. Speed of creation or change
• Variety: Type (Text, audio, video, images, geospatial, ...)
• Veracity (accuracy, correctness, applicability)

• Increasing processing power, storage capacity, and networking have caused data to
grow in all 3 dimensions
• 80% of data being processed today is unstructured data
• Examples: social network data, sensor networks,
• Internet Search, Genomics, astronomy, …

@SPJIMR Courage . Heart


Why Big Data Now ..

1. Low cost storage to store data that was discarded earlier


2. Powerful multi-core processors
3. Low latency possible by distributed computing: Compute
clusters and grids connected via high-speed networks
4. Virtualization
6. Better understanding of task distribution (MapReduce),
computing architecture (Hadoop) – Parallel Processing
7. Advanced analytical techniques (Machine learning)
8. Managed Big Data Platforms: Cloud service providers
9. Open-source software: OpenStack, PostGresSQL

@SPJIMR Courage . Heart


Why Big Data Now…
• Structured Data: Data that has a pre-set format,
• E.g., Address Books, product catalogs, banking transactions
• Unstructured Data: Data that has no pre-set format
• E.g. Movies, Audio, text files, web pages, computer programs, social media,

• Batch vs. Streaming Data


• Real-Time Data: Streaming data that needs to analysed as it comes in
• E.g., Intrusion detection. Aka “Data in Motion”
• Data at Rest: Non-real time. E.g., Sales analysis.

@SPJIMR Courage . Heart


Mapreduce
• MapReduce
• Software framework to process massive amounts of unstructured
data in parallel
• Map: Takes a set of data and converts it into another set of
key-value pairs..
• Reduce: Takes the output from Map as input and outputs a
smaller set of key-value pairs.

@SPJIMR Courage . Heart


Hadoop Big Data Processing
Two Principles are Key : Hadoop Distributed File Systems (HDFS), MapReduce

Many implementation of this : Cloudera, AWS HDFS

@SPJIMR Courage . Heart


Hadoop Implementation Examples

• 100 files with daily temperature in two cities. Each file has 10,000 entries
• For example, one file may have (Toronto 20), (New York 30),
• Our goal is to compute the maximum temperature in the two cities
• Assign the task to 100 Map processors each works on one file.
• Each processor outputs a list of key-value pairs,
• e.g., (Toronto 30), New York (65), …
• Now we have 100 lists each with two elements. We give this
• list to two reducers – one for Toronto and another for New
• York.
• The reducer produce the final answer: (Toronto 55), (New York 65)

@SPJIMR Courage . Heart


Data Mining
• Traditional database queries answer such questions as,
• “How many units of product number 403 were shipped in February 2010?”
• OLAP, or multidimensional analysis, supports much more complex requests for
information, such as
• “Compare sales of product 403 relative to plan by quarter and sales region for the past
two years.”
• With OLAP and query-oriented data analysis, users need to have a good idea about the
information for which they are looking
• Data mining is more discovery-driven. Data mining provides insights into
• Corporate data that cannot be obtained with OLAP by finding hidden patterns and
• One popular use for data mining is to provide detailed analyses of patterns in customer data for
one-to-one marketing campaigns or for

@SPJIMR Courage . Heart


Big Data Analysis
• Basic Analytics for insight
• Slicing & dicing of data, reporting, simple visualization, basic monitoring
• Advanced Analytics for insight
• More complex analysis such as predictive modelling & other pattern matching
• Data mining
• Operational analytics
• Analysis becomes part of the business process
• Monetized Analytics
• Analytics to directly drive revenues
• Selling data sets
• Telecom companies sell location based insights to retailers

@SPJIMR Courage . Heart


Data Ingestion Information Reservoir Data Access & Information Consumption & Analytic Methods
Extraction

Batch Integration Operational Reporting In Memory Data Grid Dashboards, Reporting, OLAP
Access Services
File Landing
Native EL Zone / Hana * * Data/Web Services

Mobile BI Tableau, Business Objects, Micro Strategy


REST, SOAP
Standard ETL NA Global Connectors
D Hive, Hbase, Impala
Redshift D
I
C S

Advanced Visualization
RDS * C
O
LA V
E
D R
Redshift Y
Redshift

Portal
C
Landing Zone

RDS *

AP
Extraction
D (Legacy Platform)
Redshift

C RDS * C Native EL
E
File Transfer RDS * R
Geospatial
T
EMEA I Standard ETL
F
D I
Redshift E
D
C Search & Exploration
Custom Loader RDS *

Data Validation, Blending& Processing Files & Extracts


Data Sciences

@SPJIMR Courage . Heart


Internet Of Things

@SPJIMR Courage . Heart


Internet Of Things

• The Internet of Things refers to the ever-growing network of physical


objects that feature an IP address for internet connectivity, and the
communication that occurs between these objects and other Internet-
enabled devices and systems
• Connects the physical world to the Internet so that you can use data
from devices to increase productivity and efficiency
• What is causing the growth of IOT:
• Cheap sensors (RFID tags & Micro-Electro-Mechanical Systems MEMS)
• Internet Connectivity
• Growth of data processing capability

@SPJIMR Courage . Heart


@SPJIMR Courage . Heart
IoT Impacts & The Red Flags
• Impacts :
• Reduced costs
• Improved performance
• Create innovative solutions
• New Revenue streams
• Red Flags
• Privacy
• Security
• Autonomy & control
• Influences on human decision making

@SPJIMR Courage . Heart


IOT ..

• Take Things and Add the ability to sense, touch, communicate &
control
• As humans – see, hear, touch, taste, smell
• How many sensors does your smartphone have?
• How many sensors does your Fitbit have?
• Nest – temperature / movements / communicates
• Arduino Board -

@SPJIMR Courage . Heart


Big Enough Business To Get The Big Business
Interested

• Huge investments
• GE - Predix
• Samsung – purchased “smart things”
• IBM – smarter planet
• CISCO
• Intel
• Google – purchased “Nest” for 3.2 Billion USD
• The Telecom players

@SPJIMR Courage . Heart


Internet Of Things

• Ready leverage in:


• Consumer products such as refrigerators, security cameras, and cable set-top
boxes;
• Industrial systems such as conveyor belts and manufacturing equipment;
• Commercial devices such as traffic signals and smart meters
• Any device that can be powered on could be part of an IoT application
• Preventive Maintenance of Machines

@SPJIMR Courage . Heart

You might also like