You are on page 1of 49

DATA AS A SERVICE

‹#› Het begint met een idee


ABOUT ME

▪ Sjoerd Rieske
▪ Business Analytics Consultant at Digital Sundai

▪ 2018 - 2022: Lecturer Research Methods at ITACA (Premaster)


▪ 2014 - 2022: Business Analytics Consultant at Atos Digital Consulting
▪ 2010 - 2014: Lecturer Research Methods at University of Groningen

2 BS / ITACA / Data as a Service


Vrije Universiteit Amsterdam
AGENDA

Topic Indicative time (mins)


Introduction 13:30
Data as a Service concept 13:35
Architecture & Infrastructure 13:50
Break 14:30
Data Management 14:40
Data Processes & MLOps 14:55
Organizational Change 15:10
Implications for Auditors 15:20
Break 15:30
Case description 15:40
Case work 15:50
Presentations 3 mins/team (elevator pitch) 16:20
Wrap up 16:55

3 BS / ITACA / Data as a Service


Vrije Universiteit Amsterdam
LEARNING OBJECTIVES

• To understand the importance organizations to become


data driven

• To understand the trends in data analytics tooling,


platforms, processes and methodologies

• To show how to do an increase organizational analytical


maturity

• To apply these concepts in the context of the IT auditor

4 BS / ITACA / Data as a Service


Vrije Universiteit Amsterdam
INTRO CASE

NEXT LEVEL FLYING

‹#› Het begint met een idee


5
DATA AS A SERVICE

INTRODUCTION

‹#› Het begint met een idee


6
DATA IS THE NEW OIL
Data Increase vs Utilization
Data Quantity
Data Utilization

Past Future ?

Present

7 BS / ITACA / Data as a Service


Vrije Universiteit Amsterdam
YET, ORGANIZATIONS ARE STILL FINDING IT DIFFICULT
TO UNLEASH THE POWER OF DATA

88% of enterprise data is still unexploited 70% of data is


for better business insights (Forrester) being created by
customers – not
in enterprise
applications

80% of
organisations are
in the early 66% of organizations don’t know how
stages of Big
Data and to truly get value from Big Data (IDC)
analytics
initiatives
96% of
organisations do
not have the right
analytics skills &
60% of Big Data projects will fail to tools
survive beyond the pilot phase (Gartner)

8 BS / ITACA / Data as a Service


Vrije Universiteit Amsterdam
BECOMING A DATA DRIVEN ORGANIZATION
REQUIRES A CHANGE FOR MOST ORGANIZATIONS

Today Tomorrow
Mature internal analytics
Low maturity in analytics Capability capability

Delivery in months Execution Delivery in days

Business driven
IT controlled Engagement collaboration

Duplicated cost
across BU’s
Cost Lower shared cost

Knowledge centralized and


Knowledge scattered Knowledge shared

Decisions based on best Decisions based on high


effort data analysis Decisioning quality insight

Depending on external Internal analytics


resources Sourcing competences

9 BS / ITACA / Data as a Service


Vrije Universiteit Amsterdam
WHAT IS DATA AS A SERVICE?

To position analytics as an integral value driver for


organization in order to become data driven.

This contains transformations in:


> The core IT layer (data lakes, cloudification)
> The supporting organizational processes
> Data management
> The workforce

Which fits in the ‘XaaS-trend’ and results in the transformation


of the data competency into a service.

Note: Commercial vendors offer out-of-the-box DaaS solutions. This


generally only includes the first point.

Source: Y. Duan, G. Fu, N. Zhou, X. Sun, N. C. Narendra and B. Hu, "Everything as a Service (XaaS) on the Cloud: Origins,
Current and Future Trends," 2015 IEEE 8th International Conference on Cloud Computing

10 BS / ITACA / Data as a Service


Vrije Universiteit Amsterdam
WHAT IS DATA AS A SERVICE?

Data is stored centrally in data lakes, identified through meta data management

All data products follow standardized design principles

Processes are automated-where-possible

Designs and delivered products are documented

Product and services are updated to the last available analytical insights

Users are trained to work with all deliverables

11 BS / ITACA / Data as a Service


Vrije Universiteit Amsterdam
HOW DO YOU ORGANIZE DAAS?

Design your organization following Data Mesh principles.

Data Mesh is an architecture to support organizational wide data-driven decision-making.


Data Mesh is based on four fundamental principles:

Source: Data Mesh – Delivering Data-driven Value at Scale, Deghani Z. (2022) O’reilly publishing

Vrije Universiteit Amsterdam


DATA MESH

Source: datamesharchitecture.com
13 BS / ITACA / Data as a Service
Vrije Universiteit Amsterdam
DATA AS A SERVICE

ARCHITECTURE &
INFRASTRUCTURE

‹#› Het begint met een idee


14
MAIN DATA PLATFORMS
Real Time • Platform for real time data process
• Strong SLA
• No data storage

Data Lake • Storage and processing of large data volumes


A standardized • Low cost (standardized) hardware
architecture enables • Consolidation and archive

faster, better and


Data • Structured and consistent data
cheaper analytics Warehouse
• Self service function
• Strong query possibilities

R&D • Standalone dev platform (playground)


• Less restrictions,
• Data stored only temporary

Note the difference between operational and analytical (reporting) systems

15 BS / ITACA / Data as a Service


Vrije Universiteit Amsterdam
ENTERPRISE DATA ARCHITECTURE / DATA FABRIC

Data Sources Stream Processing Consumers


Customer data
Message Broker • Delivers services/API in
streaming format Internal users
Application • Does not store data
data
External users
Logs Analytical (partners)
Data Store
Social
Data Lake • High performance data
Customers
Text
querying
• Batch processing
Images • Ad-hoc analytics Operational
• Data in all structures
Systems
Audio • Large data sets
• Cost effective data storage Data
Video • Archiving and cold data Scientists
• Support filtering, R&D
IoT preprocessing, and ETL Business
before integration in DWH • Research and Development Analysts
(...) • Agile and Flexible
• Data Science toolkit: R, (…)
Python…

Functional reference architecture based on lambda principles


Can be deployed on premise, (multi)cloud or in hybrid form

16 BS / ITACA / Data as a Service


Vrije Universiteit Amsterdam
UNIFIED DATA ARCHITECTURE

Real Time Data

Data warehouse

Data Lake
Data Sources Consumers

R&D

17 BS / ITACA / Data as a Service


Vrije Universiteit Amsterdam
CLOUD DATA ARCHITECTURE

18
Vrije Universiteit Amsterdam
TRENDS IN DATA INFRASTRUCTURE

Main trends:
• Standardization
• Cost reduction
• Increasing scarcity of human resources

Result in:
• Further cloudification
• Outsourcing of data platforms
• Serverless implementations
• Industry specific products / algorithms

Source:
• McKinsey Digital, 2021: How to build a data architecture to drive innovation—
today and tomorrow
• Ravat F., Zhao Y. (2019) Data Lakes: Trends and Perspectives

19
Vrije Universiteit Amsterdam
BREAK

Topic Indicative time (mins)


Introduction 13:30
Data as a Service concept 13:35
Architecture & Infrastructure 13:50
Break 14:30
Data Management 14:40
Data Processes & MLOps 14:55
Organizational change 15:10
Implications for Auditors 15:20
Break 15:30
Case description 15:40
Case work 15:50
Presentations 3 mins/team (elevator pitch) 16:20
Wrap up 16:55

20 BS / ITACA / Data as a Service


Vrije Universiteit Amsterdam
DATA AS A SERVICE

DATA MANAGEMENT

‹#› Het begint met een idee


21
DATA STRATEGY

▪ Augmentation - Obtain data that you do not have


▪ Origin and reliability – Where /how data is obtained
▪ Integration – Link/consolidate data across sources
▪ Completeness - Which data should you have?
▪ Structure – nature of the data you have
▪ Uniqueness – data no one else has

22 BS / ITACA / Data as a Service


Vrije Universiteit Amsterdam
IF YOU GO SHOPPING, WHAT DO YOU PREFER

Suppose you need some canned tomatoes

23 BS / ITACA / Data as a Service


Vrije Universiteit Amsterdam
IF YOU GO SHOPPING, WHAT DO YOU PREFER

Easily find canned tomatoes ….

24 BS / ITACA / Data as a Service


Vrije Universiteit Amsterdam
DATA MANAGEMENT PRINCIPLES

Best practices (simplified) … to avoid that:


• Data is proper described in 1 central place
• All data is classified and has an owner
• Data definitions are easy to understand
• System of record is unique and known
• Data consumption request is simple for both
data consumer and data owner
• Data Linage is visible

… or that:

25 BS / ITACA / Data as a Service


Vrije Universiteit Amsterdam
EXAMPLES OF DATA MANAGEMENT ARTIFACTS

Example: Data as a Product visible in a Example: Federated Data Governance policies


data catalogue

Vrije Universiteit Amsterdam


PRACTICAL EXAMPLE: FRAUD DETECTION

27 BS / ITACA / Data as a Service


Vrije Universiteit Amsterdam
DATA OWNERSHIP

▪ Data ownership typically resides at the


owner of the data producing system.
▪ Centralizing data in a data lake results in
a federated model of ownership.
▪ Risk for analytical model development
▪ How to govern effectively?

Source: Klein (2017), Six Things You Need to Know About Data, Carnegie Mellon University SEI blog

28 BS / ITACA / Data as a Service


Vrije Universiteit Amsterdam
TRENDS & RELEVANCE IT AUDIT

▪ Centralizing Data Ownership / Chief Data Officer


▪ Increasing relevance of data quality
▪ Data quality over time
▪ Data product management
▪ Data lineage

29 BS / ITACA / Data as a Service


Vrije Universiteit Amsterdam
DATA AS A SERVICE

DATA PROCESSES

‹#› Het begint met een idee


30
PERSONAS

Personas are fictional characters


and role-based. They represent
the different user types that will
use the DaaS Services.
The persona cards will help you to
understand the users’ needs,
behavior and goals.

31 BS / ITACA / Data as a Service


Vrije Universiteit Amsterdam
FROM INTERFACE TO BACK-END

Business Service E2E Processes (Automated) Technical


Services & Products
Request
User Kerberos
Access FTP setup Oracle DB
PW reset

Data
Ingestion

+d
Dockers Create VM Add HDFS
Search
Service Catalog in
existing
Data

Process
xx xx xx
xxx

User Interface Orchestration Technical backend

32 BS / ITACA / Data as a Service


Vrije Universiteit Amsterdam
IMPRESSION OF END RESULT

33 BS / ITACA / Data as a Service


Vrije Universiteit Amsterdam
DATA/ML OPS

ML DevOps = ML + Data + DevOps

Data Pipeline Plan Release

Configure
ML Orchestration

Create

ML Dev Ops

Model
Verify Monitor
Management

Modeling Develop Operate


▪ Data Acquisition ▪ Modeling + Testing ▪ Continuous Delivery
▪ Business Understanding ▪ Continuous Integration ▪ Data Feedback Loop
▪ ML Modeling ▪ Continuous Deployment ▪ Model Monitoring

34 BS / ITACA / Data as a Service


Vrije Universiteit Amsterdam
ANALYTICAL DEVELOPMENT ACTIVITIES

Source: Google Cloud

35 BS / ITACA / Data as a Service


Vrije Universiteit Amsterdam
DATA/ML OPS

Data Prod Value

Development &
innovation

Sandbox Develop Orchestrate Test Deploy Orchestrate Monitor


Management

Innovation Pipeline Data Factory

Individual Team development Test/UAT


Production environment
environment environment environment

36 BS / ITACA / Data as a Service


Vrije Universiteit Amsterdam
ML PIPELINE

37 BS / ITACA / Data as a Service Source: Google Cloud


Vrije Universiteit Amsterdam
DATA AS A SERVICE

ORGANIZATIONAL CHANGE

‹#› Het begint met een idee


38
CHALLENGES

No data governance in place yet

Differences in maturity level per business

SAFe but not really SAFe

Budget (and continuously proving value)

Consolidating enterprise in one way of working

High business expectations

Changing platforms

Vrije Universiteit Amsterdam


TRANSFORMATION PRINCIPLES

Be fast and iterative Engage the organization at all levels


Work with a quick feedback loop 'Expect, accept and understand/work with

01 04 different agendas and opinions from


different stakeholders

Connect the different stakeholders


Work on site as much as possible inside the client's organization

02 Visibility helps to drive change


05 Connect the business to the IT initiative

Be pro-active Right person on the right job


Transformation takes time. Minimally Ensure the match between the role and
double your initial estimate of duration
03 and you are getting near reality 06 the qualities and personality of the
individual

40 BS / ITACA / Data as a Service


Vrije Universiteit Amsterdam
AGENDA

Topic Indicative time (mins)


Introduction 13:30
Data as a Service concept 13:35
Architecture & Infrastructure 13:50
Break 14:30
Data Governance 14:40
Data Processes & MLOps 14:55
Organizational change 15:10
Implications for Auditors 15:20
Break 15:30
Case description 15:40
Case work 15:50
Presentations 3 mins/team (elevator pitch) 16:20
Wrap up 16:55

41 BS / ITACA / Data as a Service


Vrije Universiteit Amsterdam
DATA AS A SERVICE

IMPACT FOR IT AUDIT

‹#› Het begint met een idee


42
TRENDS IN DATA ANALYTICS

Source: Gartner

43 BS / ITACA / Data as a Service


Vrije Universiteit Amsterdam
IMPACT TO THE IT AUDIT PROFESSION

• Analytics is a key component of organizational


strategy

• Data is being consolidated, but strong


governance is needed
Examples of data management audit outcomes:
• Data positioned as corporate assets with
subsequent risks ▪ The data strategy is defined in locally, but we didn’t identify a
global data strategy
▪ DAMA-DMBOK2 framework is not always used
Key attention points ▪ Maturity among business domains is heterogeneous
▪ Roles and responsibilities are not always well defined and
• Data management is essential translated into professional objectives.
▪ The business ownerships (entities / processes) are not always
defined
• GDPR

• Paper vs. practice

• Holistic view on analytics

44 BS / ITACA / Data as a Service


Vrije Universiteit Amsterdam
Q&A

45 BS / ITACA / Data as a Service


Vrije Universiteit Amsterdam
CASE PRESENTATION

5 30 3
min min min

Read case Case work Elevator Pitch


description 4 groups Presentation in 5
KEI B.V. under 3 mins. min

Wrap-up

46 BS / ITACA / Data as a Service


Vrije Universiteit Amsterdam
CASE STUDY: KEI B.V.

Create an elevator pitch (<3 mins) based on


the case description of KEI B.V. that
adresses the following points:
• Describe the organizational challenge(s)
within the analytics domain
• Provide at least 2 suggestions on how to
enable more efficient analytics
• Draft a high level data governance
structure that adresses the data
management challenges

47 BS / ITACA / Data as a Service


Vrije Universiteit Amsterdam
WRAP UP:

Data as a Service:
▪ Enabling organizations to become more data driven
▪ This requires a significant organizational transformation
▪ Data governance is a key topic for many organizations adopting DaaS
related concepts
▪ IT Auditors will experience an increase in the importance of data in their
work. This implies:
▪ Shift in organizational risks
▪ Soft side is an important element in becoming data driven
▪ Data Governance & Data Management are essential for data driven
organizations

48 BS / ITACA / Data as a Service


Vrije Universiteit Amsterdam
CLOSURE

An overload of rules, processes and metrics keeps us from


doing our best work together
https://www.ted.com/talks/yves_morieux_how_too_many_rules_at_work_keep_you_from_getting_things_done

http://students.brown.edu/seeing-theory/index.html

http://www.tylervigen.com/spurious-correlations
https://www.reddit.com/r/dataisbeautiful/

49 BS / ITACA / Data as a Service


Vrije Universiteit Amsterdam

You might also like