Professional Documents
Culture Documents
November 2019
Digital Transformation Imperative
What drives the journey towards Digital Transformation?
Customers
Marketing
models
Enterprise
Finance Data
3. Cloud Native solutions with on demand
storage and compute
R&D
Enhanced Security: through continuous Centralize Data Platform: Utilize the movement to Cloud to
testing and cloud enabled protocols
Benefits
centralize data assets for business to leverage
Goals
End User Enablement: Improved self-service Enable Digitization: Transform the Enterprise by building
new business models around data and analytics capabilities
and platform for innovation
Challenges of
Traditional Modernization
Data Warehouse Approach
5
months is no longer AWS
realistic
Total TCO Savings of 25% CPU utilization above Solution has an overall 100% instances have
70%, indicating resource SLA of 99.95% automated monitoring
optimization
Infrastructure Savings of Unused storage footprint DR Failover time reduced to Data retention policy
˜ 50% reduced to 5% 2 hours increased to 7 years
Demand Driven Big Data Analytics workloads: Over-utilized capacity during peak hours and under-utilized at
other times
New Cloud Native Analytics Platform and Migrating Operational and Managed Reporting with strict SLAs to Cloud
Exploring Advanced Analytics, Machine Learning and Data Science Models that need scalable compute
services
MPP (Netezza, Teradata) Offload: Running out of capacity on existing MPP appliances which are cost-prohibitive
to scale, and the desire to optimize cost without compromising performance
Enhancing Business Intelligence and Visualization capability with interactive dashboards using both
structured and un-structured datasets
©2019 Deloitte Touche Tohmatsu India LLP DA Modernization Deloitte 8
Deloitte’s Data Modernization Reference Architecture
We will leverage this reference architecture to achieve vision of NextGen Data Lake
©2019 Deloitte Touche Tohmatsu India LLP DA Modernization Deloitte 9
Our Perspective Deployment Strategy for Data & Analytics Modernization
Deployment choices should be driven by aligning organizations current investments
landscapes to future strategic direction for analytics
• Usage is variable and there is a lack of elastic compute Leverage scalable infrastructure using AWS Auto scaling
• Infrastructure is over-utilized during peak hours and under- feature to monitor compute needs and automatically adjust
utilized at other times capacity to maintain steady, predictable performance at
• Cost prohibitive to size the infrastructure for peak capacity the lowest possible cost
• Lack of ability to scale to meet the needs of exponential growth
in data Ability to scale an EMR cluster from zero to thousands of
nodes within a few minutes, and then scale-down when
processing needs are met which helps optimize cost and
AWS architecture that can be considered in such scenarios reduces TCO
Sales order,
inventory & Batch
trends data inserts/updates
from multiple
online &
physical Upload to S3 Auto scaling AWS Load cleansed Processed data Reporting and BI
locations EMR Cluster data into S3 loaded on Redshift
for analytics
Ingestion • Batch Ingestion can be done by tools like Talend, Informatica BDM, or AWS Glue
• Stream Ingestion can be done using Apache Kafka or Kinesis Streams
Consumption / Analytics DB • For Conventional Database use AWS RDS
• To run complex analytic queries against petabytes of structured data use Redshift
• For dynamic querying directly on top of S3 use Redshift Spectrum or Athena
• Streaming application can either use Amazon Kinesis Data Analytics, Amazon Kinesis API, Elastic Search or Amazon Kinesis Client Library
Custom Streaming Application (KCL)
• For heavy lifting use KCL
©2019 Deloitte Touche Tohmatsu India LLP DA Modernization Deloitte 11
Use Case – New Cloud Analytics Platform (Cloud Native Data Warehouse)
Current Challenges Solution and Benefits
• In today’s digital world, every organization’s data is doubling in 18 months. In an on-prem setup,
AWS provides fast, fully managed data
you will need to scale up the infrastructure every 12 months to keep up with demand. An on
premise install has depreciation of hardware every 3 years. warehouses that makes it simple and cost-
• This on premise infrastructure is over-utilized during peak hours and under-utilized at other times effective to stand-up
• Current DWs are not built for different datatypes – structured, semi-structured, and un-structured
• Traditional DWs are robust, but it can take months and cost millions of dollars just to get started Increasing agility while reducing costs in
terms of time and money promotes low risk
for experimentation and analytics
AWS architecture that can be considered in such scenarios Pay for what you use model, using
elasticity to scale up or down your data
architecture, and increase or decrease
performance as required
Ingestion • Batch Ingestion can be done by tools like Talend, Informatica BDM, or AWS Glue
• Stream Ingestion can be done using Apache Kafka or Kinesis Streams
Consumption / Analytics DB • A combination of AWS RDS Datamarts or Redshift can be used depending on BI/Analytics workloads
• For complex analytic queries against petabytes of structured data use Redshift
• For dynamic querying directly on top of S3 use Redshift Spectrum or Athena
• Lack of scalable compute for processing large datasets for DevOps automation using tools like Jupyter, Zeppelin, or
different data science and ML models RStudio, as well as elastic scalability to support burst
• Exponential growth of data for any organization and need for demands
accessing raw, processed, and curated data by data scientists
• Cost of maintaining large cluster based on expected demands is Access to all data in S3 (structured, unstructured, semi-
cost prohibitive structured) and data science tools on the edge connected
to S3 data lake
Data Science Workbench • Choice of JupyterHub, Zeppelin or RStudio on the Edge node connected to S3 or Redshift for Advanced Analytics Models or ML
models
Model Source • Access to raw, processed, and curated data from S3 as well as AWS Redshift for aggregated structures
• The model output to be fed back to S3 along with AWS RDS for dashboarding and analysis
Internet of Things like Predictive AWS IoT, AWS Greengrass, • Data scientists have more access to scalable compute. Auto scalable AWS EC2
maintenance, Connected Apache Kudu on AWS, Apache Scalability instances help scaling systems up or down easily by changing configuration for
vehicles/telematics, Smart cities Spark on AWS memory, number of vCPUs, bandwidth
• Reduce the cost of Data Science cluster significantly by using AWS EMR on-demand,
set it to grab the code and data from S3, run the task on the cluster, and store the
Amazon Kinesis Firehose, results in S3/Redshift and terminate the cluster
Real Time Analytics like social Kinesis Streams, Kinesis • Different types of instances for different use cases. For e.g., there are instances that
Lower Cost
networks data Analytics, Apache Kafka on are optimized for computation and those have relatively lower cost on CPU usage.
AWS Or those optimized for memory have lower cost on memory usage
• With AWS Lambda run code without servers, Pay only for the compute time
consumed
• Supports all major machine learning frameworks to develop any model, including
AWS Lambda, Amazon EMR, TensorFlow, Caffe2, and Apache MXNet
Recommendation Engine Apache Spark on AWS, Diversity • AWS offers a broad array of compute options for training and inference with
Amazon ML powerful GPU-based instances, compute and memory optimized instances, and even
FPGAs
Our innovative approach goes beyond a like-for-like Our methods and reference architectures Our enhanced offerings help our clients
migration and helps our clients modernize their coupled with Deloitte proprietary better manage operational maintenance and
data platforms and how they leverage data and accelerators help expedite delivery of the quickly react to changing business
extract new platform. requirements.
new insights.
Immersive Assessment Tools, Reference Agile Delivery Managed Sustainment
Labs Cloud Value Architecture Methods Services Methods
Calculators,
Business Case
Maintain Sunset
1 We can acquire – then maintain and lease your current data 2 Decommission legacy assets
assets during the course of your platform modernization
2017 Cognitive Survey: Navigating the future Technology, media, and Minds and machines: The art Opting in: Using IoT
Bullish on the business of Work telecom get smarter of forecasting in the age of connectivity to drive
value of Cognitive artificial intelligence differentiation
Tech Trends 2018: The The rise of cognitive Robotic process automation: Mission analytics: Data- From security monitoring to
Symphonic Enterprise work (re) design A path to the cognitive driven decision making in cyber risk monitoring: Enabling
enterprise government business-aligned cybersecurity
©2019 Deloitte Touche Tohmatsu India LLP DA Modernization Deloitte 17
Cloud Security
• By 2021, cloud data center’s will process 94% of workloads and the cloud computing market is expected to reach $623
billion by 2023
• Growth in cloud-based security will remain strong, at about 19% through 2020
• Public Cloud spending is predicted to grow quickly, attaining 18% year-over-year growth in 2019
• Public cloud IaaS workloads will experience 60% fewer security incidents than traditional data center’s by 2020
• The revenue from the global public cloud computing market is set to reach $258 billion in 2019
• In total, Gartner estimates the cloud-based security services market will reach close to $9 billion by 2020
Enterprises are adopting cloud technologies at an increasing pace. Security and availability are the top
concerns in cloud adoption.
Cloud adaption is growing rapidly Top concerns for cloud computing among clients
Security
Privacy
Support
Compliance
IT Governance
Although cloud computing is maturing, security and risk remain a top concern due to growing complexity and compliance
obligations
• Lack of vendor monitoring • Security vulnerabilities introduced by cloud content and ecosystem
• Failure to plan for cloud portability and interoperability partners
• Unclear security requirements in contract Vendor Infrastructure • Compromise of cloud environment due to poor security practices by
• Lack of comprehensive contractual agreements with cloud customer
Management Security
providers and subscribers roles and responsibilities • Compromise of cloud management interfaces due to targeted attacks
• Unclear roles during incidents and investigations • Lack of defense against attacks originating from within the cloud
• Unclear legal liability and insurance coverage environment
• Inability to independently test security
• Inadequate facilities to capture and store application logs
• Inadequate cloud security controls or uncertified environment
“Are controls in place to guard “Can we detect malicious or Are we prepared to respond quickly
against known and emerging unauthorized activity, including the to minimize impact?
threats?” unknown?”
Privileged access
ID & context in the cloud Cloud access auditing
management in the cloud
Context aware identity
Data protection and privacy in the Encrypted data and key Data Loss Prevention in
Content aware DRM
cloud management the Cloud
Govern the risk and compliance with Cloud usage discovery and Cloud provider Automated enforcement
cloud providers vendor management governance and executive reporting
Resilience in the cloud BCR for cloud services Orchestrated response Attack Simulation
Infrastructure and platform Cloud and mobile asset Cloud and mobile Software defined
security in the cloud management configuration compliance perimeter
INFORMATION INFORMATION
DATA SOURCES INGEST DATA STORAGE AND PROCESSING
DELIVERY CONSUMERS
Business Applications Landing Zone and Data Lake (AWS S3) Data Storage and Business Intelligence Functional Users
Data Movement Ingestion Methods Provisioning Platform – QuickSight
Production Data Lake (AtScale, Cognos, Business Consumers /
MicroStrategy, etc.) Analytics
Customer & Distribution Batch ETL & ELT – Analytics Marts –AWS
AWS Glue Raw Layer Processed Layer Consumption Redshift
AWS Direct Connect Layer (Visualization, Dashboards,
Finance and Accounting
Cleansed Data from Transactional Analytics)
Marketing, Sales & File (Batch, Intra-day Data based on
Distribution batch, mini-batch) Applications Applications Structured Data by
Domains Advanced Analytics/ ML – Corporate Business
(Informatica, Talend) DQ Applied Files SageMaker, AWS ML, Functions and HR
Raw Files
AWS Storage Analytical Datasets Rekognition, etc.
ERP Atomic Data Marts/ODS
Gateway
Consumption Ready – AWS RDS (Text Analytics, Predictive Data Scientist Community
External data Unstructured Data
Messaging Batch – External Data Modeling, Data Mashup)
AWS Kinesis Firehose
HR & Finance … … External Data External Consumers
Analytics and Visualization
AWS Database – QuickSight (Tableau, Qlik)
Web Services –
Migration In-Memory Processing
Service NiFi Research Data Lake
Master Data Mgmt. – Presto on EMR (Processing, Analysis, Information Access
Visualization) Channels
Analytical Processing
Other RDBMS Stream Ingestion Streaming Analytics Portals
Batch/Micro-Batch Processing (AWS EMR, Data Bricks) (Kibana)
AWS CLI S3 Interactive Querying –
AWS Athena
Data Streams/ Sync (Real-time dashboards and Mobile
Messaging –AWS transactional applications)
Other Data Sources Kinesis Enterprise Search
AWS S3 Transfer (Confluent Kafka / Stream Processing (AWS MR, Real-time Search –
IOT Data Analytics/ML
Acceleration NiFi) Kinesis) IOT Processing (AWS IoT AWS Elastic Search Model Repository Enterprise
Analytics) Applications
Unstructured Data
Geospatial Data
Developer and Management Tools
External Data
AWS Identity & AWS Key AWS CloudWatch AWS Management AWS Directory AWS CloudFormation Code Repository AWS Code Deploy
Management AWS CloudTrail (Data Dog) Console Service (Ansible) (Git, Bit Bucket) (Jenkins/Circle CI)
Live Streams Access Management
Service
©2019 Deloitte Touche Tohmatsu India LLP Data & Analytics Modernization Assets and Tools 28
Introducing Deloitte’s Augmented Data Lake Framework (ADLF)
Deloitte’s ADLF is our foundation architecture framework to achieve target state for Data
and Analytics modernization roadmap
1
Ingestion Sources Data Lake & Data Warehouse Consumers
Layer Data Ingestion Data Storage & Data Processing Data Provisioning
Analytics
& Acquisition Management (Compute) & Orchestration
Internal
Data Exploration Provisioning
Processing
CDC Raw Layer Preparation and (Pub/Sub)
Frameworks
Unstructured Feature Engineering
Batch Systems
data
Ingestion Curated Layers Spatial BI/OLAP
Machine Learning /
Framework
6
Processing Artificial Intelligence
Pipeline
Databases Events/IOT Multi-tenancy Streaming
Adapters Exploration Layer
Management Analytics
Orchestration &
Scheduling
Consumption
Layer
Governance Technical & Business Applications
Data Masking Lineage Tagging
& Metadata Metadata
External
Audit
Unstructured
data
Databases Balance
Control
Job Stats Batch Stats Error Handling Reporting
4
Audit Balance
Data Data Integrity User
Quality Rules
Data Cleansing De-duplication Validation
Dashboards Control Layer
Batch API and
Platform & Monitoring/ Visualization
Network Storage DevOps
Infrastructure Operations
Demand
Improve Time Standardization Fail Early
Volatility
to Market
©2019 Deloitte Touche Tohmatsu India LLP Data & Analytics Modernization Assets and Tools 30
Assets Library (1 of 2)
Reusable accelerators, framework and utilities developed by Deloitte which be used
during different project phases of any Data and Analytics modernization
Parses ETL source and extract Disguises the data using Distributed A framework which converts Automated ingestion of the data from on-
metadata dependencies platform, creating a structurally similar existing SQL scripts into equivalent premise to Cloud platform using Talend
• Expedited design and build of functional substitute of sensitive data Spark/Scala code base thus helping Accelerates data ingestion into a data lake
new solution and orchestration that can be used for Sandbox, user reduce development effort and cost on Big Data Implementations using Talend
by 30% training etc. for transitioning from traditional
Used for data migrations onto multiple
RDBMS to Data Lake environment
Adherence with the compliance analytical platforms or the cloud
requirements Reduces up to 95% of effort during the
implementation phase
A collection of reusable metadata- A schema builder which automates the Tool for migration of various source Enables validation and comparison of
driven processors to assemble data conversion of S3 data to Athena / systems, Ingestion patterns, and data Hive data models across different Hadoop
ingestion pipelines for both structured Hive/ Hbase table objects, expediting formats onto Enterprise Analytics environments
and semi-structured files creation of data consumption layer in Platform through metadata-driven • Reduced effort of reconciliation from 2
• Performs configurable file platform modernization projects configuration days to couple of hours for over 3,000
validations and route data to • Metadata-driven • Schema-less and configurable tables
correct destination • Generates objects in batch mode • Flattening multi-level hierarchy
• 90% processing time gains • Automated target-specific JSON/XML
• 70% effort savings partitioning • Real-time and batch data ingestion
©2019 Deloitte Touche Tohmatsu India LLP Data & Analytics Modernization Assets and Tools 31
Assets Library (2 of 2)
Reusable accelerators, framework and utilities developed by Deloitte which be used
during different project phases of any Data and Analytics modernization
An integrated framework to capture job Advanced data management solution built Automation of building and configuring An operating model and services framework
execution information with Job on AWS platform using analytics, semantic the AWS platform for enabling the ADLF designed for the modernized platform
statistics, rejected and error records models, and machine learning techniques to architecture using preconfigured Revamped operating model for support and
with standard S3 folder management & accelerate data management and framework and scripts used for delivery simplifying remapping of existing
CloudWatch integration stewardship activities customization operations team
Saved manual effort of job expedite the process of understanding Reduced efforts of platform
monitoring, errors tracking, etc. the data from a new system during an automation by 80% and reduced the
through a quick deployable template acquisition time by 70%
Enabled restorability from the point- supervised learning capabilities can be
of-failure used to expedite cleansing and
standardization activities
Agile Data
CogX ACE Framework
DevOps Model
Data DevOps model post migration to Solution that ingests and extracts content AWS service capacity & cost estimator for
AWS Data Platform which includes best from semi-structured and unstructured enabling the ADLF architecture. Helps
practices & templates for CI/CD pipelines sources to automate processes and arrive at monthly approximate costs over a
Modernized the operations team and improve operational efficiency period of time
the development team to enhance accelerates the extraction of content Ensures all service components are
their skill in a continuous deployment using OCR, machine learning, NLP, considered for cost planning
model with DevOps culture workflows, and intelligent review Will act as input to the AWS Ecosystem
Pipelines of out-of-the-box and custom Automation framework
models and services are orchestrated
to build solutions
©2019 Deloitte Touche Tohmatsu India LLP Data & Analytics Modernization Assets and Tools 32
let’s create
Thank you
©2019 Deloitte Touche Tohmatsu India LLP Data & Analytics Modernization Assets and Tools 33
Deloitte refers to one or more of Deloitte Touche Tohmatsu Limited, a UK private company limited by guarantee
(“DTTL”), its network of member firms, and their related entities. DTTL and each of its member firms are legally
separate and independent entities. DTTL (also referred to as “Deloitte Global”) does not provide services to clients.
Please see www.deloitte.com/about for a more detailed description of DTTL and its member firms.
The information contained in this material is meant for internal purposes and use only among personnel of Deloitte
Touche Tohmatsu Limited, its member firms, and their related entities (collectively, the “Deloitte Network”). The
recipient is strictly prohibited from further circulation of this material. Any breach of this requirement may invite
disciplinary action (which may include dismissal) and/or prosecution. None of the Deloitte Network shall be responsible
for any loss whatsoever sustained by any person who relies on this material.
©2019 Deloitte Touche Tohmatsu India LLP. Member of Deloitte Touche Tohmatsu Limited