You are on page 1of 21

Big Data Adoption

Quang Duong

Queen’s College

September 16, 2023

LATEX
September 16, 2023 BDM 1043 Fall 2023 Week 2 1 / 21
Part 1: Business Motivation and Drivers

1 Marketplace Dynamics

2 Business Architecture

3 Information and Communication Technology

September 16, 2023 BDM 1043 Fall 2023 Week 2 2 / 21


Marketplace Dynamics

The global economy can experience periods of uncertainty due to


various factors
Enterprise need leverage external data sources as a means of sensing
the marketplace
Big Data delivers analytic results that can be used to infer information
about marketplace and assist management in making decision

September 16, 2023 BDM 1043 Fall 2023 Week 2 3 / 21


Business Architecture

Big Data ties to business architecture at each of the organizational layers


1 Strategic: C-level executives 1 Strategic: Wisdom
and advisory group (know-why)
2 Tactical or managerial layer: 2 Tactical or managerial layer:
Steer the organization Knowledge (know-how)
3 Operations: Execute core 3 Operations: Information
processes (know-what)

September 16, 2023 BDM 1043 Fall 2023 Week 2 4 / 21


Information and Communication Technology

The following ICT development that have accelerated the pace of Big
Data adoption in business:
Data analytic and Data Science: statistical techniques, data
warehousing, machine learning
Digitization: online banking, online shopping, streaming video
Affordable Technology and Commodity Hardware: open sources
software
Social Media: customer interaction
Hyper-Connected Communities and Devices: increase in the number
of available data streams
Cloud Computing: external datasets, scalable processing, and vast
amount of storage
Internet of Everything

September 16, 2023 BDM 1043 Fall 2023 Week 2 5 / 21


Part 2: Planning Consideration

4 Organization Prerequisites

5 Privacy

6 Security

7 Governance Requirement

8 Clouds

9 Big Data Analytic Lifecycle

September 16, 2023 BDM 1043 Fall 2023 Week 2 6 / 21


Organization Prerequisites

Enterprises need to have data management and Big Data governance


frameworks
Long term plan for Big Data environment and road-map

September 16, 2023 BDM 1043 Fall 2023 Week 2 7 / 21


Privacy

Performing analytics on datasets can reveal confidential information


about organizations or individuals
Data privacy regulation. Techniques for data tagging and
anonymization

September 16, 2023 BDM 1043 Fall 2023 Week 2 8 / 21


Security

Securing Big Data involves ensuring that the data networks and
repositories are sufficiently secured via authentication and
authorization mechanism
Big Data security involves establishing data access levels for different
categories of users

September 16, 2023 BDM 1043 Fall 2023 Week 2 9 / 21


Governance Requirement

Standardization of how data is tagged and the metadata used for


tagging
Policies that regulate the kind of external data that may be acquired
Policies regarding the management of data privacy and data
anonymization
Policies for the archiving of data sources and analysis results
Policies that establish guidelines for data cleansing and filtering

September 16, 2023 BDM 1043 Fall 2023 Week 2 10 / 21


Clouds

Inadequate in-house hardware resources


Upfront capital investment for system procurement is not available
The Big Data initiative is a proof of concept
datasets that need to be processed are already cloud resident
the limits of available computing and storage resources used by an
in-house Big Data solution are being reached

September 16, 2023 BDM 1043 Fall 2023 Week 2 11 / 21


Big Data Analytic Lifecycle

1 Business Case Evaluation


2 Data Identification
3 Data Acquisition and Filtering
4 Data Extraction
5 Data Validation and Cleansing
6 Data Aggregation and Representation
7 Data Analysis
8 Data Visualization
9 Utilization of Analysis Results

September 16, 2023 BDM 1043 Fall 2023 Week 2 12 / 21


Business Case Evaluation

What is the motivation and goals of performing the analysis


Evaluate the business case to define how we measure the success of
the project
Define KPI
Or SMART: specific, measurable, attainable, relevant, and timely
Estimate budget

September 16, 2023 BDM 1043 Fall 2023 Week 2 13 / 21


Data Identification

Finding the right datasets and its sources


Internal datasets
External datasets: publicly available datasets, third-party data
provider, web scraping

September 16, 2023 BDM 1043 Fall 2023 Week 2 14 / 21


Data Acquisition and Filtering

Gather data from sources then filter to remove bad quality data or no
relevant data
Need to store the original copy of the datasets
Adding metadata to improve classification and querying
What is metadata ?
– Time and date of creation
– Creator or author of the data
– File size
– Source of the data
– Process used to create the data

September 16, 2023 BDM 1043 Fall 2023 Week 2 15 / 21


Data Extraction

Extract data and transform it into a format that the downstream


process can use
The extraction and transformation process depends on the type of
analytic and capabilities of the Big Data solution
Example: Extract useful info from logs
Example: Extract data from XML documents

September 16, 2023 BDM 1043 Fall 2023 Week 2 16 / 21


Data Validation and Cleansing

Data input into Big Data analyses can be unstructured without any
indication of validity
Remove duplicate or irrelevant observations
Fix structural errors
Filter unwanted outliers and handle missing data

September 16, 2023 BDM 1043 Fall 2023 Week 2 17 / 21


Data Aggregation and Representation

Integrate multiple datasets together to arrive at a unified view


Require extensive time and effort operation due to the volume of Big
Data

September 16, 2023 BDM 1043 Fall 2023 Week 2 18 / 21


Data Analysis

Confirmatory data analysis Exploratory data analysis


Analyze the data to prove or
disprove a hypothesis and provide Explore the data to find patterns,
definitive answer to specific anomalies, trends, and correlation
questions

September 16, 2023 BDM 1043 Fall 2023 Week 2 19 / 21


Data Visualization

Use visualization techniques and tools to communicate the analysis


results to business users

Figure: Heatmap
Figure: Faceted logistic regression

September 16, 2023 BDM 1043 Fall 2023 Week 2 20 / 21


Utilization of Analysis Results

Input for Enterprise Systems


Business Process Optimization
Alerts

September 16, 2023 BDM 1043 Fall 2023 Week 2 21 / 21

You might also like