You are on page 1of 24

Digital Transformation using AI

and Emerging Technologies


Term IV Elective
Session 5 & 6
Agenda
• BCG’s Digital Transformation Framework
• TDPP framework
• Big Data basics
BCG Digital Transformation Framework
BCG Digital Transformation Framework
TDPP framework
P
P DESIGN
D

IMPLEMENTATION
Big Data

• Big Data is a digital era phenomenon which involves the unprecedented generation of diverse data from
internal and external sources encompassing structured, semi-structured, and unstructured data.

• 5Vs
• Volume refers to the ever-growing large magnitude of data.
• Velocity refers to the continuous and high speed of data generation.
• Variety refers to the diverse data formats, from structured data to unstructured data.
• Veracity refers to data quality and integrity comprising biases, noise and abnormality.
• Value is the economic and social value that can be derived from Big Data.
Big Data Solutions
Big Data Cloud-based Solutions
Big Data Basics
• Database Management Systems (DBMS) are software used to
manage:
• Additions, updates, and deletions of data as transactions occur
• Support data queries and reporting

Data definition Helps create data dictionary & structure DB.


Data manipulation Allows users to create, read, update, delete information
using Views, Report generators, QBE, SQL.
DBMS
Application Include tools for creating visually appealing and easy to
Components generation use applications.
Data administration Managing DB environment- backup, recovery, security
and performance.
Big Data Basics
• Databases are collections of datasets or records stored in a systematic way.
• They can store data generated from multiple sources- both internal and external.

Database Component Definition


Entity Person, place, thing, transaction or event about which information is stored.

Entity Class (Tables) A collection of similar entities


Attributes Characteristics or properties of an entity class (fields or columns).

Primary Key Field that uniquely identifies a given entity in a table.

Foreign key Primary key of one table that appears in another table- captures logical
relationship.
Big Data Basics

Entity Customer, Distributor, Orders


Entity Class Order Line, Product
Attributes Customer ID, Contact Name
Primary Key Order ID, Product ID
Foreign Key Distributor ID, Customer ID
Big Data Basics
A data warehouse is a logical collection of information- gathered from many different
operational databases- in an aggregated form (total, count, averages) more suited to
analysis and decision making.

ETL (Extraction, Transformation, and


loading)
Process that extracts information from
databases, transforms using common set
of enterprise definitions, and loads the
information into a data warehouse.

Data Marts
A subset of data warehouse information
having focused information particular to
the needs of a given business unit.
Big Data Basics
New Entry:
Student ID Name Location Gender Age {“Student ID”: “ID008”,
ID001 Sachin Mumbai M 32 “Name”: “Ravi”,
ID002 Sourav Kolkata M 31 “Hobby”: “Football”,
“Gender”: “M”,
ID003 Mithali Delhi F 28 “Age”: “41”
ID004 Smriti Hyderabad F 34 }

• A major disadvantage of relational database is for any new additional noSQL database can accommodate
entry, data for all fields is required. If any field is missing, then a dummy these anomalies through its
value is entered which leads to space wastage. schemaless architecture.

• Another disadvantage is loss of information. If new data contains data


for a new column, then the same cannot be captured or else the whole
database will have to be reconfigured to include the new field.
Relational Database

• RDBMS
• Schema based
• Allows vertical scaling
• Disadvantage:
• requires maintaining data consistency which makes it hard to scale and
resource incentive

Big Data • Cannot scale horizontally due to its structured and schema-based nature

Basics
NoSQL

• Schema-less
• Allows vertical scaling as well as horizontal scaling
• Each item in the database has two fields: (i) unique key, and (ii) value
• For consistent keys, hash function is used that converts key into fixed
range.
• Largest known NoSQL database: Apple with 75000+ servers.
• When application and datastructure is constantly evolving, noSQL
database is preferred
Hash Function

Hash Function Fixed Range


Input Non-reversible
one-way function Output

https://emn178.github.io/online-tools/sha256.html
Big Data – Hadoop ecosystem

Multiple
copies across
systems Fault Tolerant
PARALLEL
Big Data – Hadoop ecosystem PROCESSING
MAPPER PHASE
SHUFFLE & SORT
Relational, 1 REDUCE PHASE
SPLIT database, 1
INPUT is, 1 database
schema, 1 database
Relational database based, 1 database
Relational database is schema based.
database, 3
is schema based. is is, 3
noSQL database is is .
noSQL database is
schema less. is .
schema less.
noSQL database is .
horizontally .
scalable. noSQL database is
horizontally
scalable.
Big Data – Hadoop ecosystem
• Helps in efficient management of resources (RAM, network
bandwidth, CPU).
• YARN processes job requests and manages cluster resources
• Comprises four roles:
• Resource Manager: assigns resources
• Node Manager: handles nodes and monitors resources used in
nodes
• Application Master: requests node manager for containers
whenever a task arises.
• Containers: holds collection of physical resources.
Big Data – Hadoop ecosystem
Big Data Solutions

Consistency

CA CP

Partition
Availability AP
Tolerance
BDA Strategic Value
BDA Strategic Roles
BDA – Value Creation
Thank you.

You might also like