Data & Analytics Modernization - Final

Cloud – Enabler for Data & Analytics Modernisation
November 2019
Digital Transformation Imperative
What drives the journey towards Digital Transformation?
Journey Drivers Sales
Customers
Marketing
1. Enable new business models

Business
Partners
2. Revector Technology to enable new business Customers
Finance
models
Enterprise
Finance Data
3. Cloud Native solutions with on demand
storage and compute
R&D
4. End of Life Technologies Vendor
5. Enterprise Data Core Product HR

Service
©2019 Deloitte Touche Tohmatsu India LLP DA Modernization Deloitte 2

Aging & costly data platforms limits analytics capabilities
High Costs of Ownership

• High total costs of ownership
Limited Support for ML/AI • Significantly high costs to scale storage and compute
• Little to no support for newer ML/AI

methods Aging Platform
• Unable to support streaming data
analytics • Structured data, batch workloads only
• Significant challenges integrating with
Legacy Data modernized analytical tools
Platform • End of life for vendor support
Lack of Agility Challenges
• Optimized only for BI and legacy
reporting workloads Limited Scalability
• Unable to support streaming data
• Fixed increment scaling
analytics
• Not suited for expanding data
Data Silos volumes
• Significant Technical Debt to support
• Promotes offline user solutions leading to data redundancy and silos elastic workloads
• Data proliferation due to limited data type support

Why Cloud?
Moving to the cloud is the first step to amplifying business value and solidifying market
leadership. We will help you gain competitive advantage through integration,
automation, and quicker time to market.
Reduce TCO of Data Analytics: Sharply reduce the cost of
data management and analytics. This cost saving can be
utilized towards revenue generating innovation
Elasticity: Scale with demand and in response
to resiliency needs Elastic Compute and Scalability: Leverage the scalable
cloud infrastructure and Auto scaling features to monitor
compute needs to automatically scale up or down
Enhanced Security: through continuous Centralize Data Platform: Utilize the movement to Cloud to
testing and cloud enabled protocols
Benefits
centralize data assets for business to leverage
Goals
Improve Regulatory and Security Needs: Improve security

Variable Cost: Enable pricing based on and controls with the process of implementing in the cloud
consumption
Answer New Questions: With more economical scalability

and elasticity of the cloud, use on-demand infrastructure to
Speed to Market: Build technical & business scale based on needs to answer new business questions
capabilities more rapidly
Quicker time to Implement: With scalability on demand and
next generation technologies developed for easier software
integration; realize the value for optimization within weeks.
End User Enablement: Improved self-service Enable Digitization: Transform the Enterprise by building
new business models around data and analytics capabilities
and platform for innovation

Cloud provides multiple opportunities to overcome existing challenges
Challenges of
Traditional Modernization
Data Warehouse Approach
>$10k 70% Up to 50% 3 6

Cost/TB Costs / TB Steps Key Use-Cases
Expensive licensing models Data Unused Deloitte’s proven Our focused use-cases for
are cost prohibitive to Un-queried data in traditional reduction Advise-Implement- Cloud migration for a more
support ever increasing data DW drives costs up and the Seen by customers such Operate methodology flexible and scalable
volumes. To scale on-prem. added nuance of not as DowJones, Novartis, designed for an infrastructure with key
infrastructure every 12 supporting non-relational data and others moving to exponential leap forward benefits
5
months is no longer AWS
realistic
>55% 60% > 50% Accelerators

Org. with Legacy
Compute
Resources for ETL
>75% Lower Cost of
Deloitte Managed Platform,
ATADATA Workload Mobility
Platform,
IT Infrastructure Consumption of CPU capacity More agility Operations UnDial Reverse Engineering
Either run partly in the cloud for ETL loads impact the From an idea to inception Over a 5 year period vs. Tool, Infrastructure
or on premise, hence priority performance of queries. In for product development running the same Provisioning Accelerator,
workloads addition existing velocity Cloud Migration TCO
for migration are data- infrastructure not built workload in an on-prem.
environment calculator
related for elasticity
Benefits
of Cloud
Gains after migrating Analytics into Cloud
Key metrics to be measured along the journey of an Analytics workload migration from on-
premise to cloud (AWS) to ensure expected benefits are achieved are as follows
Illustrative Metrics for Cloud Migration1
Financial Metrics Performance Metrics Reliability Metrics Security Metrics
Total TCO Savings of 25% CPU utilization above Solution has an overall 100% instances have
70%, indicating resource SLA of 99.95% automated monitoring
optimization
Infrastructure Savings of Unused storage footprint DR Failover time reduced to Data retention policy
˜ 50% reduced to 5% 2 hours increased to 7 years
Response time improved

Payback realized in 2 3x over on-premise Outage duration reduced Security incidents reduced
years by >50% by 30%
Instance provisioning time Mean time between failures

Reduction of staff hours All workloads run on
reduced to minutes from increased by 40%
by approximately 20% hardened golden images
days or months

Cloud ML and Analytics capabilities
Combining the power of Deloitte/AWS Cloud & Analytics capabilities with Industry
expertise enables us to accelerate value for our clients
Cloud Big Data Analytics Traditional BI

Process big datasets in the Supports traditional dashboarding
Data Storage Apache big data ecosystem and reporting solutions
Offers managed databases, Object
Storage, Data Warehouse, Data Lakes
and Archival solutions
Advanced Analytics
Analytics
Data Management Machine Learning
Create an intelligent data preparation, Build, train and deploy
batch and real time data migration, machine learning models at
integration and orchestration platform scale
AWS
Enabled Real-time Streaming Data
On-Demand Compute
Collect, process, and analyze real-
Delivers high performance & scalable Cognitive Cloud time, streaming data to get timely
VMs and allows the development of insights and react quickly to new
server-less applications information.
Operations Artificial Intelligence

Provides Monitoring, Scheduling, Develop intelligent applications
Visualization that provide vision, speech and
Backup, DR and logging and
subscription management services Create and publish interactive language analysis as well as
dashboards that can be accessed chatbots
from browsers or mobile devices
Different Analytics Workload migration for Cloud
There are different analytics workloads for migration to Cloud based services and some
of the key drivers for migration to a more flexible and scalable infrastructure are:
 Demand Driven Big Data Analytics workloads: Over-utilized capacity during peak hours and under-utilized at
other times
 New Cloud Native Analytics Platform and Migrating Operational and Managed Reporting with strict SLAs to Cloud
 Exploring Advanced Analytics, Machine Learning and Data Science Models that need scalable compute
services
 Re-platforming of on-premise/hybrid data lake to cloud native technologies to gain efficiency
 MPP (Netezza, Teradata) Offload: Running out of capacity on existing MPP appliances which are cost-prohibitive
to scale, and the desire to optimize cost without compromising performance
 Enhancing Business Intelligence and Visualization capability with interactive dashboards using both
structured and un-structured datasets
Deloitte’s Data Modernization Reference Architecture
Data Ingestion & Acquisition Data Storage Data Processing Analytics
Governance & Metadata Platform & Infrastructure Security Management i
We will leverage this reference architecture to achieve vision of NextGen Data Lake
Our Perspective Deployment Strategy for Data & Analytics Modernization
Deployment choices should be driven by aligning organizations current investments
landscapes to future strategic direction for analytics
Current Organizational Investments Future Analytics Strategy Deployment Strategy
Deployment Strategy Framework
Deployment Options Key Considerations Key Sample Solution Outcomes

On Premise or Hybrid • Highly regulated industries prohibiting data placement on the Cloud • All sensitive data stays On Premise
(On-Premise/Cloud) Solution • Lack of executive support for Cloud based modernization strategy • Sanitized, aggregated “consumption only” data on Cloud
• High ingress/egress needs for data integration with On-Premise • Hadoop investments leveraged in future state
solutions • BI investments leveraged in future state
• Existing investment in Hadoop technologies • Limited egress of data from Cloud
• Existing investment in Hadoop skills sets
• Existing investment in Business Intelligence (BI) solutions
Hadoop on the Cloud Solution • Highly regulated industries prohibiting data placement on the Cloud • Hadoop enabled Data Lake and/or Data Warehouse
• Lack of executive support for Cloud based modernization strategy on the Cloud
• Existing investment in Hadoop technologies • Hadoop investments leveraged in future state
• Existing investment in Hadoop skills sets • High portability or Hadoop investments onto the Cloud
• Desire for staged evolution to all Cloud strategy • Limited egress of data from Cloud
• Cloud vendor lock-in avoidance
• Balanced Total Cost of Ownership Model
All Cloud Solution • Alignment with future state all Cloud Analytics strategy • Cloud native solution enabled analytics
• Executive support for data and analytics on the Cloud • Legacy data and transformations re-engineered
• Key data assets/sources on the Cloud or planned for migration to to Cloud technologies
Cloud • Limited integration/egress of data from Cloud
• Limited or no current investment in Hadoop technologies • Elimination of Infrastructure maintenance
• Limited or no investment in Hadoop skills sets • Reduction/Elimination of Software maintenance
• Investment in Cloud enabled solutions (depending on full or partial PaaS)
• Fully optimized Total Cost of Ownership Model

Use Case – Demand Driven Big Data Analytics Workloads
Current Challenges Solution and Benefits
• Usage is variable and there is a lack of elastic compute Leverage scalable infrastructure using AWS Auto scaling
• Infrastructure is over-utilized during peak hours and under- feature to monitor compute needs and automatically adjust
utilized at other times capacity to maintain steady, predictable performance at
• Cost prohibitive to size the infrastructure for peak capacity the lowest possible cost
• Lack of ability to scale to meet the needs of exponential growth
in data Ability to scale an EMR cluster from zero to thousands of
nodes within a few minutes, and then scale-down when
processing needs are met which helps optimize cost and
AWS architecture that can be considered in such scenarios reduces TCO
Sales order,
inventory & Batch
trends data inserts/updates
from multiple
online &
physical Upload to S3 Auto scaling AWS Load cleansed Processed data Reporting and BI
locations EMR Cluster data into S3 loaded on Redshift
for analytics
Real-Time Streaming Updated information

immediately available
to users
Amazon Kinesis Custom Real-Time data is
Kinesis Streams Application (AWS Elastic uploaded to Dynamo
Search) DB
Technology Choice Points
Ingestion • Batch Ingestion can be done by tools like Talend, Informatica BDM, or AWS Glue
• Stream Ingestion can be done using Apache Kafka or Kinesis Streams
Consumption / Analytics DB • For Conventional Database use AWS RDS
• To run complex analytic queries against petabytes of structured data use Redshift
• For dynamic querying directly on top of S3 use Redshift Spectrum or Athena
• Streaming application can either use Amazon Kinesis Data Analytics, Amazon Kinesis API, Elastic Search or Amazon Kinesis Client Library
Custom Streaming Application (KCL)
• For heavy lifting use KCL
Use Case – New Cloud Analytics Platform (Cloud Native Data Warehouse)
• In today’s digital world, every organization’s data is doubling in 18 months. In an on-prem setup,
AWS provides fast, fully managed data
you will need to scale up the infrastructure every 12 months to keep up with demand. An on
premise install has depreciation of hardware every 3 years. warehouses that makes it simple and cost-
• This on premise infrastructure is over-utilized during peak hours and under-utilized at other times effective to stand-up
• Current DWs are not built for different datatypes – structured, semi-structured, and un-structured
• Traditional DWs are robust, but it can take months and cost millions of dollars just to get started Increasing agility while reducing costs in
terms of time and money promotes low risk
for experimentation and analytics
AWS architecture that can be considered in such scenarios Pay for what you use model, using
elasticity to scale up or down your data
architecture, and increase or decrease
performance as required
Create a centralized data platform by

ingesting different types of data and apply
different ingestion patterns based on source
and latency
Ingestion • Batch Ingestion can be done by tools like Talend, Informatica BDM, or AWS Glue
• Stream Ingestion can be done using Apache Kafka or Kinesis Streams
Consumption / Analytics DB • A combination of AWS RDS Datamarts or Redshift can be used depending on BI/Analytics workloads
• For complex analytic queries against petabytes of structured data use Redshift
• For dynamic querying directly on top of S3 use Redshift Spectrum or Athena

Use Case – Advanced Analytics/Data Science Models needing scalability
• Lack of scalable compute for processing large datasets for DevOps automation using tools like Jupyter, Zeppelin, or
different data science and ML models RStudio, as well as elastic scalability to support burst
• Exponential growth of data for any organization and need for demands
accessing raw, processed, and curated data by data scientists
• Cost of maintaining large cluster based on expected demands is Access to all data in S3 (structured, unstructured, semi-
cost prohibitive structured) and data science tools on the edge connected
to S3 data lake
Increasing agility for experimentation and analytics

AWS architecture that can be considered in such scenarios
Rapidly scale and enhance your analytics environment
with elasticity and performance capabilities
Data Science Workbench • Choice of JupyterHub, Zeppelin or RStudio on the Edge node connected to S3 or Redshift for Advanced Analytics Models or ML
models
Model Source • Access to raw, processed, and curated data from S3 as well as AWS Redshift for aggregated structures
• The model output to be fed back to S3 along with AWS RDS for dashboarding and analysis

Use Case – AWS AI/Cognitive capabilities

• AI implementations are costly and complex. A state-of-the-art machine
learning server with full storage and networking resources could cost over AWS AI and ML services are designed to be scalable,
$60,000 for the hardware, and that doesn’t include the DevOps and hosting continuously optimized, and do not require extensive AI/ML
facilities required to turn it on and maintain for use expertise.
• AI platforms are constrained by memory because they have to be manually
trained by engineers and it takes a massive amount of time to get up to AI resources are available for less than 40 cents a minute
speed to be commercial on-demand, and even less when taking advantage of
• Lack of expertise and talent in AI/ML pricing during off hours
Allow engineers to design faster and accurate
recommendations engines
AWS architecture that can be considered in such scenarios
Enable customers to develop and quickly “train” their
own artificial intelligence algorithms, build software
applications capable of translating language on the fly,
analyze video, and scan text for trends or key phrases

Speech recognition and • Lex provides deep functionality and flexibility of natural language understanding (NLU) and automatic speech recognition (ASR)
natural language processing • Polly is used as Text-to-Speech service
• TensorFlow / PyTorch/ Torch on AWS EC2 for NLU and ASR
Image Recognition and • Amazon Rekognition provides highly accurate image/facial analysis and recognition
Detection • Clarifai on AWS, an AI-powered image and video recognizing API used for image detection
• AWS SageMaker enables data scientists to quickly and easily build, train, and deploy machine learning models at any scale
Machine Learning Platform • With Amazon SageMaker, use a deep learning framework of your choice for model training. Bring your own Docker container with preferred
frameworks and libraries - such as Caffe2, PyTorch, Microsoft Cognitive Toolkit (CNTK), Chainer, or Torch

Use Case – Advanced Analytics, ML, and Data Science Models
The complexity in data science is increasing every day. This complexity is driven by
three fundamental factors: Increased Data Generation, Low cost of data storage, and
Cheap computational power. An AWS architecture is built to overcome these challenges.
Use case Solution Benefit Drivers for AWS Cloud Adoption
• Reduction in time spent by Data scientists on Server Administration and

Maintenance tasks
Artificial Intelligence like speech Amazon Lex, Polly, Amazon
Faster Time to • Pre-trained services that provide computer vision, speech, language analysis, and
recognition and natural language Rekognition, Amazon Machine
Market chatbot functionality
processing Learning, TensorFlow on AWS
• AWS offers machine learning services and tools tailored to meet needs and level of
expertise for a data scientist, ML researcher, or developer
Internet of Things like Predictive AWS IoT, AWS Greengrass, • Data scientists have more access to scalable compute. Auto scalable AWS EC2
maintenance, Connected Apache Kudu on AWS, Apache Scalability instances help scaling systems up or down easily by changing configuration for
vehicles/telematics, Smart cities Spark on AWS memory, number of vCPUs, bandwidth
• Reduce the cost of Data Science cluster significantly by using AWS EMR on-demand,
set it to grab the code and data from S3, run the task on the cluster, and store the
Amazon Kinesis Firehose, results in S3/Redshift and terminate the cluster
Real Time Analytics like social Kinesis Streams, Kinesis • Different types of instances for different use cases. For e.g., there are instances that
Lower Cost
networks data Analytics, Apache Kafka on are optimized for computation and those have relatively lower cost on CPU usage.
AWS Or those optimized for memory have lower cost on memory usage
• With AWS Lambda run code without servers, Pay only for the compute time
consumed
• Supports all major machine learning frameworks to develop any model, including
AWS Lambda, Amazon EMR, TensorFlow, Caffe2, and Apache MXNet
Recommendation Engine Apache Spark on AWS, Diversity • AWS offers a broad array of compute options for training and inference with
Amazon ML powerful GPU-based instances, compute and memory optimized instances, and even
FPGAs

Our Comprehensive Approach
End-to-end solution life-cycle from developing the Analytics Strategy & Roadmap and
Implementing the modernized solution to Operating the new asset via Managed Services
1 Advise 2 Implement 3 Operate
Our innovative approach goes beyond a like-for-like Our methods and reference architectures Our enhanced offerings help our clients
migration and helps our clients modernize their coupled with Deloitte proprietary better manage operational maintenance and
data platforms and how they leverage data and accelerators help expedite delivery of the quickly react to changing business
extract new platform. requirements.
new insights.
Immersive Assessment Tools, Reference Agile Delivery Managed Sustainment
Labs Cloud Value Architecture Methods Services Methods
Calculators,
Business Case
Art of Industry Insights & Pre-built Testing DevOps Hosting

Possible Leading Practices Accelerators Strategies Automations Platforms
Demos
Optional Service – Sustain and Sunset Existing Platform
Maintain Sunset
1 We can acquire – then maintain and lease your current data 2 Decommission legacy assets
assets during the course of your platform modernization

Examples of our Data Sciences and Cognitive thought leadership
Deloitte’s eminence extends across a board range of data modernization and cognitive
topics.
2017 Cognitive Survey: Navigating the future Technology, media, and Minds and machines: The art Opting in: Using IoT
Bullish on the business of Work telecom get smarter of forecasting in the age of connectivity to drive
value of Cognitive artificial intelligence differentiation
Tech Trends 2018: The The rise of cognitive Robotic process automation: Mission analytics: Data- From security monitoring to
Symphonic Enterprise work (re) design A path to the cognitive driven decision making in cyber risk monitoring: Enabling
enterprise government business-aligned cybersecurity
Cloud Security
Private and Confidential

Agenda
Cloud Adoption in Industry 3
Top Concerns in Cloud Adoption 4
Key Security Risks in Cloud 5
Regulations and Auditing Requirements 6
Aligning to Cloud Risks 7
© 2019 Deloitte Touche Tohmatsu India LLP 19

Cloud Adoption in the Industry
The cloud is more of a standard solution for the Enterprise world. From banking to manufacturing, every industry has started
migrating to the cloud. These industries are consolidating plans to build and operate their services, products in the cloud.
• By 2021, cloud data center’s will process 94% of workloads and the cloud computing market is expected to reach $623
billion by 2023
• Growth in cloud-based security will remain strong, at about 19% through 2020
• Public Cloud spending is predicted to grow quickly, attaining 18% year-over-year growth in 2019
• Public cloud IaaS workloads will experience 60% fewer security incidents than traditional data center’s by 2020
• The revenue from the global public cloud computing market is set to reach $258 billion in 2019
• In total, Gartner estimates the cloud-based security services market will reach close to $9 billion by 2020

Top Concerns in Cloud Adoption
Enterprises are adopting cloud technologies at an increasing pace. Security and availability are the top
concerns in cloud adoption.
Cloud adaption is growing rapidly Top concerns for cloud computing among clients
Speed and Configurability

Scalability
Security
Privacy
Support
Compliance
IT Governance
Source: Forrester Research, Inc.
Although cloud computing is maturing, security and risk remain a top concern due to growing complexity and compliance
obligations

Key Risks in Cloud adoption
• Inadequate oversight • Lack of coherent cloud strategy and roadmap

• Inability to demonstrate compliance with regulatory requirements • Cloud strategy does not align with business needs/technology maturity
• Lack of independent assessment of cloud solution • Lack of configurability and customization of cloud architecture
• Changing compliance landscape for regulations & standards • Unacceptable performance degradation due to increased network or system
• Unclear cloud providers and subscribers roles and responsibilities of Governance, latency
governance Delivery
Risk
Strategy &
Management
Architecture
& Compliance
• Lack of vendor monitoring • Security vulnerabilities introduced by cloud content and ecosystem
• Failure to plan for cloud portability and interoperability partners
• Unclear security requirements in contract Vendor Infrastructure • Compromise of cloud environment due to poor security practices by
• Lack of comprehensive contractual agreements with cloud customer
Management Security
providers and subscribers roles and responsibilities • Compromise of cloud management interfaces due to targeted attacks
• Unclear roles during incidents and investigations • Lack of defense against attacks originating from within the cloud
• Unclear legal liability and insurance coverage environment
• Inability to independently test security
• Inadequate facilities to capture and store application logs
• Inadequate cloud security controls or uncertified environment
Business Identity &

• Lack of tracking of virtual assets & IT Access
• Poorly defined roles and responsibilities of cloud participants Operations Management • Inadequate due diligence prior to assignment of broad cloud
• Delayed data breach notification management privileges
• Inadequate IT skills to manage cloud-based technologies • Failure to implement access controls for cloud management
• Inadequate records management, retention & disposal policies interfaces
• Underestimating operational or financial commitment to existing • Inability to restrict access or implement segregation of duties
hardware or software Business
Data for cloud provider staff
• Lack of understanding of overall technical requirements until late in Resiliency &
Management
project Availability
• Unauthorized access to data storage

• Inability to verify cloud infrastructure resilience • Inability to monitor data integrity inside cloud storage
• Interruption of cloud services due to subcontractor failure • Lack of clear ownership of cloud-generated data
• Operational disruption • Non-compliance with data privacy laws due to cross-jurisdictional data transfer
• Increased complexity of data replication or backup to other clouds or • Failure to remove data from multiple cloud data stores
back in-house • Failure to properly retain data due to complexity of multiple cloud data stores

Regulations and risk auditing per industry standards
Below are some of the key Cloud Security Regulations. These vary by
Industry and Geography.

How to align your enterprise to cloud risks
New Cloud Services: Custom & SaaS
Who might What are they after, and

what are the key business What tactics
Attack? might attackers
Public SaaS
risks clients needs to
use? Internet
mitigate?
Traditional Apps and Databases in the
Cloud
• Cyber criminals • Privilege escalation in multi- • Spear phishing, drive

• Hactivists (agenda tenet environment by download, etc. IaaS
driven) • Theft of IP/strategic plans • Software or hardware
• Nation states • Financial fraud vulnerabilities Traditional Perimeter
• Insiders/partners • Reputation damage • Third party
• Competitors • Business disruption compromise
Traditional Enterprise
• Skilled individual • Destruction of critical • Multi-channel attacks • Applications • Databases • Infrastructure
hacker infrastructure • Stolen credentials
On Premise Users Enterprise Networks and Legacy Data Centers
“Are controls in place to guard “Can we detect malicious or Are we prepared to respond quickly
against known and emerging unauthorized activity, including the to minimize impact?
threats?” unknown?”
SECURE VIGILANT RESILIENT

What strategies and • Identity • Security Event Monitoring • Resilience Planning
solutions do clients • Data Protection • SOC Integration • Cyber Attack Simulation
need? • Application Integrity • Threat Intelligence • Incident Response
• Infrastructure Security

Re-architecting to cloud security risks
Protecting cloud risks needs security re-architecture
• Revisit data asset inventory, classification, and implement tagging

Data
• On premise or in the cloud data protection tools?
Protection
• Data residency, privacy, and compliance based on cloud use cases
• Configuring cloud provider proprietary IaaS and PaaS services appropriately

Virtualized • Securing ingress/egress between traditional enterprise and other cloud providers
Network & • Segmentation, micro-segmentation for hybrid cloud (subnets, firewalls, NACLs (Network Access Control List), etc.)
Infrastructure • Integrating policy enforcement in IaaS, PaaS, and virtual network as software
• Harden virtual servers and endpoints
• Adapt a culture of DevSecOps with guardrails and compliance validations

DevSecOps • Integrate security controls into system development lifecycle (automated CI/CD (Continuous Integration and Deployment)
• Extend protection and scanning of new infrastructure and automation source code components
• Achieving comprehensive visibility for cloud down to the guest-level

Vigilant • Keeping up with elastic environments with proprietary IaaS and PaaS technology
• Use on-premise SIEM or build new one in the cloud?
• Designing resilient cloud architectures

Resilient • Refreshing data backup and archiving for IaaS and PaaS
• Ensuring incident management and response capabilities are updated for cloud

Become cloud risk compliant, aligned and adaptive
Compliant Risk Aligned Adaptive
Privileged access
ID & context in the cloud Cloud access auditing
management in the cloud
Context aware identity
Data protection and privacy in the Encrypted data and key Data Loss Prevention in
Content aware DRM
cloud management the Cloud
Govern the risk and compliance with Cloud usage discovery and Cloud provider Automated enforcement
cloud providers vendor management governance and executive reporting
Security testing during Security requirements Continuous app

Secure all cloud applications development defined for cloud apps monitoring and reporting
Cloud events centrally

Vigilance in the cloud stored and protected
Threat management Active defense
Resilience in the cloud BCR for cloud services Orchestrated response Attack Simulation
Infrastructure and platform Cloud and mobile asset Cloud and mobile Software defined
security in the cloud management configuration compliance perimeter

Data & Analytics Modernization Assets and Tools
November 2019
AWS Reference Architecture
BI/Analytics/IOT Workloads
INFORMATION INFORMATION
DATA SOURCES INGEST DATA STORAGE AND PROCESSING
DELIVERY CONSUMERS
Business Applications Landing Zone and Data Lake (AWS S3) Data Storage and Business Intelligence Functional Users
Data Movement Ingestion Methods Provisioning Platform – QuickSight
Production Data Lake (AtScale, Cognos, Business Consumers /
MicroStrategy, etc.) Analytics
Customer & Distribution Batch ETL & ELT – Analytics Marts –AWS
AWS Glue Raw Layer Processed Layer Consumption Redshift
AWS Direct Connect Layer (Visualization, Dashboards,
Finance and Accounting
Cleansed Data from Transactional Analytics)
Marketing, Sales & File (Batch, Intra-day Data based on
Distribution batch, mini-batch) Applications Applications Structured Data by
Domains Advanced Analytics/ ML – Corporate Business
(Informatica, Talend) DQ Applied Files SageMaker, AWS ML, Functions and HR
Raw Files
AWS Storage Analytical Datasets Rekognition, etc.
ERP Atomic Data Marts/ODS
Gateway
Consumption Ready – AWS RDS (Text Analytics, Predictive Data Scientist Community
External data Unstructured Data
Messaging Batch – External Data Modeling, Data Mashup)
AWS Kinesis Firehose
HR & Finance … … External Data External Consumers
Analytics and Visualization
AWS Database – QuickSight (Tableau, Qlik)
Web Services –
Migration In-Memory Processing
Service NiFi Research Data Lake
Master Data Mgmt. – Presto on EMR (Processing, Analysis, Information Access
Visualization) Channels
Analytical Processing
Other RDBMS Stream Ingestion Streaming Analytics Portals
Batch/Micro-Batch Processing (AWS EMR, Data Bricks) (Kibana)
AWS CLI S3 Interactive Querying –
AWS Athena
Data Streams/ Sync (Real-time dashboards and Mobile
Messaging –AWS transactional applications)
Other Data Sources Kinesis Enterprise Search
AWS S3 Transfer (Confluent Kafka / Stream Processing (AWS MR, Real-time Search –
IOT Data Analytics/ML
Acceleration NiFi) Kinesis) IOT Processing (AWS IoT AWS Elastic Search Model Repository Enterprise
Analytics) Applications
Unstructured Data
Geospatial Data
Developer and Management Tools
External Data
AWS Identity & AWS Key AWS CloudWatch AWS Management AWS Directory AWS CloudFormation Code Repository AWS Code Deploy
Management AWS CloudTrail (Data Dog) Console Service (Ansible) (Git, Bit Bucket) (Jenkins/Circle CI)
Live Streams Access Management
Service
Enterprise Data Governance
Enterprise Content Master and Reference Data

Data Quality Management Metadata Management Data Security & Privacy Business Rules Management Audit, Balance and Control Data Catalog and Discovery
Management Management
©2019 Deloitte Touche Tohmatsu India LLP Data & Analytics Modernization Assets and Tools 28
Introducing Deloitte’s Augmented Data Lake Framework (ADLF)
Deloitte’s ADLF is our foundation architecture framework to achieve target state for Data
and Analytics modernization roadmap
Data Catalog/Metadata/ 3 Data Science & Advanced

Lineage Layer
2
Analytics Layer
1
Ingestion Sources Data Lake & Data Warehouse Consumers
Layer Data Ingestion Data Storage & Data Processing Data Provisioning
Analytics
& Acquisition Management (Compute) & Orchestration
Internal
Data Exploration Provisioning
Processing
CDC Raw Layer Preparation and (Pub/Sub)
Frameworks
Unstructured Feature Engineering
Batch Systems
data
Ingestion Curated Layers Spatial BI/OLAP
Machine Learning /
Framework
6
Processing Artificial Intelligence
Pipeline
Databases Events/IOT Multi-tenancy Streaming
Adapters Exploration Layer
Management Analytics
Orchestration &
Scheduling
Consumption
Layer
Governance Technical & Business Applications
Data Masking Lineage Tagging
& Metadata Metadata
External
Audit
Unstructured
data
Databases Balance
Control
Job Stats Batch Stats Error Handling Reporting
4
Audit Balance
Data Data Integrity User
Quality Rules
Data Cleansing De-duplication Validation
Dashboards Control Layer
Batch API and
Platform & Monitoring/ Visualization
Network Storage DevOps
Infrastructure Operations
Master Data Management

(MDM) Security Authorization &
Data Protection Perimeter Security Audit & Reporting
Management Authentication
5
Data Quality Layer
7 DevSecOps Layer
Key Drivers
Modernization enables agility, advanced capabilities and innovative thinking which needs
to be effectively managed
Demand
Improve Time Standardization Fail Early
Volatility
to Market
Assets Library (1 of 2)
Reusable accelerators, framework and utilities developed by Deloitte which be used
during different project phases of any Data and Analytics modernization
Undial Reverse S3 Data Masking

UniCon DPlumber
Engineering Tool Solution
Parses ETL source and extract Disguises the data using Distributed A framework which converts Automated ingestion of the data from on-
metadata dependencies platform, creating a structurally similar existing SQL scripts into equivalent premise to Cloud platform using Talend
• Expedited design and build of functional substitute of sensitive data Spark/Scala code base thus helping Accelerates data ingestion into a data lake
new solution and orchestration that can be used for Sandbox, user reduce development effort and cost on Big Data Implementations using Talend
by 30% training etc. for transitioning from traditional
Used for data migrations onto multiple
RDBMS to Data Lake environment
Adherence with the compliance analytical platforms or the cloud
requirements Reduces up to 95% of effort during the
implementation phase
D-Ingest Schemanator IngressBot Hive Metadata Reconciler
A collection of reusable metadata- A schema builder which automates the Tool for migration of various source Enables validation and comparison of
driven processors to assemble data conversion of S3 data to Athena / systems, Ingestion patterns, and data Hive data models across different Hadoop
ingestion pipelines for both structured Hive/ Hbase table objects, expediting formats onto Enterprise Analytics environments
and semi-structured files creation of data consumption layer in Platform through metadata-driven • Reduced effort of reconciliation from 2
• Performs configurable file platform modernization projects configuration days to couple of hours for over 3,000
validations and route data to • Metadata-driven • Schema-less and configurable tables
correct destination • Generates objects in batch mode • Flattening multi-level hierarchy
• 90% processing time gains • Automated target-specific JSON/XML
• 70% effort savings partitioning • Real-time and batch data ingestion
Assets Library (2 of 2)
Reusable accelerators, framework and utilities developed by Deloitte which be used
during different project phases of any Data and Analytics modernization
AWS Ecosystem Platform and Services

ABC Framework CogniSteward
Automation Operating Framework
An integrated framework to capture job Advanced data management solution built Automation of building and configuring An operating model and services framework
execution information with Job on AWS platform using analytics, semantic the AWS platform for enabling the ADLF designed for the modernized platform
statistics, rejected and error records models, and machine learning techniques to architecture using preconfigured  Revamped operating model for support and
with standard S3 folder management & accelerate data management and framework and scripts used for delivery simplifying remapping of existing
CloudWatch integration stewardship activities customization operations team
 Saved manual effort of job  expedite the process of understanding  Reduced efforts of platform
monitoring, errors tracking, etc. the data from a new system during an automation by 80% and reduced the
through a quick deployable template acquisition time by 70%
 Enabled restorability from the point-  supervised learning capabilities can be
of-failure used to expedite cleansing and
standardization activities
Agile Data
CogX ACE Framework
DevOps Model
Data DevOps model post migration to Solution that ingests and extracts content AWS service capacity & cost estimator for
AWS Data Platform which includes best from semi-structured and unstructured enabling the ADLF architecture. Helps
practices & templates for CI/CD pipelines sources to automate processes and arrive at monthly approximate costs over a
 Modernized the operations team and improve operational efficiency period of time
the development team to enhance  accelerates the extraction of content  Ensures all service components are
their skill in a continuous deployment using OCR, machine learning, NLP, considered for cost planning
model with DevOps culture workflows, and intelligent review  Will act as input to the AWS Ecosystem
 Pipelines of out-of-the-box and custom Automation framework
models and services are orchestrated
to build solutions
let’s create
Thank you
Deloitte refers to one or more of Deloitte Touche Tohmatsu Limited, a UK private company limited by guarantee
(“DTTL”), its network of member firms, and their related entities. DTTL and each of its member firms are legally
separate and independent entities. DTTL (also referred to as “Deloitte Global”) does not provide services to clients.
Please see www.deloitte.com/about for a more detailed description of DTTL and its member firms.
The information contained in this material is meant for internal purposes and use only among personnel of Deloitte
Touche Tohmatsu Limited, its member firms, and their related entities (collectively, the “Deloitte Network”). The
recipient is strictly prohibited from further circulation of this material. Any breach of this requirement may invite
disciplinary action (which may include dismissal) and/or prosecution. None of the Deloitte Network shall be responsible
for any loss whatsoever sustained by any person who relies on this material.
©2019 Deloitte Touche Tohmatsu India LLP. Member of Deloitte Touche Tohmatsu Limited

Data & Analytics Modernization - Final

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data & Analytics Modernization - Final

Uploaded by

Copyright:

Available Formats

Cloud – Enabler for Data & Analytics Modernisation

Journey Drivers Sales

1. Enable new business models

4. End of Life Technologies Vendor

5. Enterprise Data Core Product HR

©2019 Deloitte Touche Tohmatsu India LLP DA Modernization Deloitte 2

High Costs of Ownership

• Little to no support for newer ML/AI

©2019 Deloitte Touche Tohmatsu India LLP DA Modernization Deloitte 3

Improve Regulatory and Security Needs: Improve security

Answer New Questions: With more economical scalability

©2019 Deloitte Touche Tohmatsu India LLP DA Modernization Deloitte 4

>$10k 70% Up to 50% 3 6

>55% 60% > 50% Accelerators

Illustrative Metrics for Cloud Migration1

Financial Metrics Performance Metrics Reliability Metrics Security Metrics

Response time improved

Instance provisioning time Mean time between failures

©2019 Deloitte Touche Tohmatsu India LLP DA Modernization Deloitte 6

Cloud Big Data Analytics Traditional BI

Operations Artificial Intelligence

 Re-platforming of on-premise/hybrid data lake to cloud native technologies to gain efficiency

Data Ingestion & Acquisition Data Storage Data Processing Analytics

Governance & Metadata Platform & Infrastructure Security Management i

Current Organizational Investments Future Analytics Strategy Deployment Strategy

Deployment Strategy Framework

Deployment Options Key Considerations Key Sample Solution Outcomes

©2019 Deloitte Touche Tohmatsu India LLP DA Modernization Deloitte 10

Current Challenges Solution and Benefits

Real-Time Streaming Updated information

Technology Choice Points

Create a centralized data platform by

Technology Choice Points

©2019 Deloitte Touche Tohmatsu India LLP DA Modernization Deloitte 12

Current Challenges Solution and Benefits

Increasing agility for experimentation and analytics

Technology Choice Points

©2019 Deloitte Touche Tohmatsu India LLP DA Modernization Deloitte 13

Current Challenges Solution and Benefits

Technology Choice Points

©2019 Deloitte Touche Tohmatsu India LLP DA Modernization Deloitte 14

• Reduction in time spent by Data scientists on Server Administration and

©2019 Deloitte Touche Tohmatsu India LLP DA Modernization Deloitte 15

1 Advise 2 Implement 3 Operate

Art of Industry Insights & Pre-built Testing DevOps Hosting

Optional Service – Sustain and Sunset Existing Platform

©2019 Deloitte Touche Tohmatsu India LLP DA Modernization Deloitte 16

Private and Confidential

Cloud Adoption in Industry 3

Top Concerns in Cloud Adoption 4

Key Security Risks in Cloud 5

Regulations and Auditing Requirements 6

Aligning to Cloud Risks 7

© 2019 Deloitte Touche Tohmatsu India LLP 19

© 2019 Deloitte Touche Tohmatsu India LLP 20

Speed and Configurability

Source: Forrester Research, Inc.

© 2019 Deloitte Touche Tohmatsu India LLP 21

• Inadequate oversight • Lack of coherent cloud strategy and roadmap

Business Identity &

• Unauthorized access to data storage

© 2019 Deloitte Touche Tohmatsu India LLP 22

© 2019 Deloitte Touche Tohmatsu India LLP 23