You are on page 1of 91

26th – 28th April 2022

Informatica Cloud Data


Integration Bootcamp
Global Technical Alliances (GTA) and PTS Team
Agenda – Day1
9:30 AM–11:30 AM BST | 10:30 AM CEST – 12:30 PM CEST | 2:00 PM–4:00 PM IST

1 Welcome Note 2 Introduction to 3 Cloud DW/DL


and Program IDMC Modernization
Overview

4 Cloud Mass 5 Cloud Data 6 Cloud Data


Ingestion Integration Integration–E

2 © Informatica. Proprietary and Confidential.


Today’s Presenters

Anil Datar Salvatore Moretto Archana Pascal Hurel Kilian Ingelfinger


Sundarakrishna
VP GTA Director PTS Sen. Solutions Architect Sen. Solutions Consultant
Tech. Alliance &
Program Manager GTA

3 © Informatica. Proprietary and Confidential.


CDI Bootcamp : Associate
Recommended
Prerequisites Weekly Checkpoint
calls –
1. CDW/DL Foundation Cert
2. Cloud On-boarding
Certification
CDI SME Deeper CDI Basic Bootcamp Professional Cost Involved
Sessions Enablement Wrap-up Certification

Week- Week- 4 Certification


Week-1
2&3

1. Three Day SME-led 1. Special Guest session 1. Closing Notes, INFA


Bootcamp Guest Speakers
In-Person 2. Bootcamp Exam &
Associate Badge
1. Cloud Data Integration 1. Cloud Data Integration 1. CAI Services 1. CAI Services
Services Services (IU Training Course) (IU Training Course)
Informatica University Informatica University
Course Course 2. IICS Professional Cert**
Self-paced

All trainings in this timeline are free of charge


** IU Certifications involve costs. Select Top Performers are entitled to
free certification vouchers
9:30 AM–10:30 AM BST
Weekly Cadence Call 2:00 PM–3:00 PM IST

Key Initiative Timeline Preparation Call 9:30 AM–10:30 AM BST


2:00 PM–3:00 PM IST

April May May May May


(25 to29) (2 to 6) (9 to13) (16 to 20) (23 to27)

19 April 2022 4 May 2022 10 May 2022 17 May 2022 June 2022

Live Sessions Live Sessions Live Sessions


26, 27, 28 April 11 May 24 May

WEEK 0 WEEK 1 WEEK 2 WEEK 3 WEEK 4 WEEK 5Certification


Special Guest Closing Notes, 1. CDI Associate Certification
Preparation Call Three Day On Demand Session Informatica 2. Cloud
On Data and Application Integration RXX
Demand
SME-led Bootcamp Guest Speakers

On Demand On Demand On Demand On Demand

Associate Bootcamp Highly recommended Offerings:

Live Session Time: Live Session Time: Live Session Time: Live Session Time:
9:30 AM–10:30 AM BST 9:30 AM–10:30 AM BST 9:30 AM–10:30 AM BST 9:30 AM–10:30 AM BST
10:30 AM–11:30 AM CEST 10:30 AM–11:30 AM CEST 10:30 AM–11:30 AM CEST 10:30 AM–11:30 AM CEST
2:00 PM–3:00 PM IST 2:00 PM–3:00 PM IST 2:00 PM–3:00 PM IST 2:00 PM–3:00 PM IST
IDMC
Introduction
IDMC vs. IICS
Informatica´s
Intelligent Data Managment Cloud Informatica Intelligent Cloud Services

… is a cloud native, comprehensive, by … are exactly these modular Services the


AI/ML-capabilities enhanced and automated IDMC is comprised of, like…
Cloud Data Managment Platform that is
… CDI, CDI-e, CDQ, Data Profiling, CDGC,
comprised of modular services like…
CDMP, CMI, Monitor, Administrator, C360
… Cloud Data Integration, Cloud Data Quality, SaaS, API Manager, API Portal, B2B
Cloud Data Governance and Catalog… Gateway, Application Integration, Integration
Hub, Operational Insights…

7 © Informatica. Proprietary and Confidential.


Data Complexity Increasing Exponentially
AI & ML

Cloud Data
Platforms

“Big Data”
FRAGMENTATION

SaaS Apps
Enterprise Data
Enterprise Warehouses
Apps
Relational
Databases
Mainframe

THEN TODAY
TYPE Structured and Unstructured
LATENCY Batch and Real-time
USERS Technical and Business Users

8 © Informatica. Proprietary and Confidential.


Companies are Challenged by Data Complexity

68% 65% 88% 75%


of organizations have of organizations can’t of organizations are ill of employees don’t feel
not been able to effectively manage prepared for future fully prepared to use
operationalize AI customer experience in supply chain disruptions data effectively and
across the a multichannel journey compliantly
organizations

75% 90%
of organizations, by 2022, will
of organizations are actively utilize multiple CSPs and will
migrating data to the cloud require significant augmented
data management and integration

9 © Informatica. Proprietary and Confidential.


Companies are Investing in These Initiatives
for Digital Transformation

92% 56% 82% 60%


Analytics, negative
experience of
Application
apps are bought of
MDMorganizations
& 360 say Data
of organizations
Governanceare
effects
AI & DW/L
of AI/ML and
Integration
managed & they
Applications
need to improve challenged
& Privacy by data
practices that outside
Hyperautomation
of IT their master and quality and complexity
undervalue data reference data functions

Cloud Modernization

10 © Informatica. Proprietary and Confidential.


Data Management Landscape is Fragmented

CATALOG INGEST INTEGRATE CLEANSE RELATE GOVERN PROTECT PREPARE SHARE & DELIVER
Discover, catalog, Multi-latency data Integrate all types Make data fit for Match and relate Define and verify Detect and protect For analytics Publish and
and curate all ingestion and edge of data purpose identities and data governance sensitive data and collaborate on manage APIs and
enterprise data computing entities policies projects Data Services

Azure

11 © Informatica. Proprietary and Confidential.


Data Management Landscape is Fragmented

CATALOG INGEST INTEGRATE CLEANSE RELATE GOVERN PROTECT PREPARE SHARE & DELIVER
Discover, catalog, Multi-latency data Integrate all types Make data fit for Match and relate Define and verify Detect and protect For analytics Publish and
and curate all ingestion and edge of data purpose identities and data governance sensitive data and collaborate on manage APIs and
enterprise data computing entities policies projects Data Services

Data remains trapped in silos

Diminished trust in data

Azure
Lack of agility and innovation

12 © Informatica. Proprietary and Confidential.


IDMC Delivers Best-of-Breed Products
in a Single Platform

CATALOG INGEST INTEGRATE CLEANSE RELATE GOVERN PROTECT PREPARE SHARE & DELIVER
Discover, catalog, Multi-latency data Integrate all types Make data fit for Match and relate Define and verify Detect and protect For analytics Publish and
and curate all ingestion and edge of data purpose identities and data governance sensitive data and collaborate on manage APIs and
enterprise data computing entities policies projects Data Services

INTELLIGENT DATA MANAGEMENT CLOUD

Azure

13 © Informatica. Proprietary and Confidential.


Pioneering the Intelligent Data Management Cloud

ANY MULTI BATCH, REAL-TIME ANY


DATA CLOUD & STREAMING DATA USER

INTELLIGENT DATA MANAGEMENT CLOUD

14 © Informatica. Proprietary and Confidential.


Pioneering a New Layer in the Enterprise Data Stack

Application Cloud
Analytics Cloud
Salesforce | Oracle | SAP | Adobe |
ServiceNow | Coupa | Workday | Apptio Tableau | ThoughtSpot | Power BI

INTELLIGENT DATA MANAGEMENT CLOUD

Data Cloud Infrastructure Cloud

Snowflake | Amazon Redshift | Google Big Database Cloud Azure | Amazon | Google Cloud |
Query | Azure | Databricks Oracle | Rackspace
Microsoft SQL Azure | MongoDB |
Oracle | | Amazon Aurora

MULTI-HYBRID

INFRASTRUCTURE DATABASES DATA WAREHOUSE BUSINESS CUSTOM APPS PACKAGEDAPPS


INTELLIGENCE TOOLS

15 © Informatica. Proprietary and Confidential.


Best-of-Breed Product Suite
Intelligent Data Management Cloud

DATA DATA API & APP DATA MDM & 360 GOVERNANCE & DATA
CATALOG INTEGRATION INTEGRATION QUALITY APPLICATIONS PRIVACY MARKETPLACE

10K+ Metadata-Aware Connectors


AI-Powered Metadata Intelligence & Automation

Connectivity
Metadata System of Record

SaaS Self-Managed

On-premises Enterprise Cloud

16 © Informatica. Proprietary and Confidential.


IDMC Built for the Next Generation
of Data Management Needs
Modern Technology Architecture Powering Modern Capabilities

Cloud-native ✓ Google-like search and discovery

✓ Single source of truth


Microservices-based
✓ Low-code / no-code
API-driven
✓ Single pane of glass

Multi-cloud, hybrid ✓ Data democratization

17 © Informatica. Proprietary and Confidential.


28 Trillion
Transactions per month

18 © Informatica. Proprietary and Confidential. As of Q4 2021


Best-of-Breed Products—A Leader in Four Gartner
®

Magic Quadrant™ Reports


Informatica is placed highest in ability to execute in ALL of these Magic Quadrant Reports
Enterprise Integration Data Quality Master Data Data
Platform as a Service Solutions Management Solutions Integration Tools

Sep 2021 Sep 2021 Jan 2021 Aug 2021


Eric Thoo, et al., Melody Chien, et al., Simon Walker, et al., Ehtisham Zaidi, et al.,
29 Sep 2021 29 Sep 2021 27 Jan 2021 25 Aug 2021
These graphics were published by Gartner, Inc. as part of larger research documents and should be evaluated in the context of the entire document. The Gartner documents are available upon request from Informatica. Gartner does not endorse
any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of
Gartner's research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular
purpose. GARTNER and Magic Quadrant are registered trademarks and service mark of Gartner, Inc. and/or its affiliates in the U.S. and internationally and are used herein with permission. All rights reserved.

19 © Informatica. Proprietary and Confidential.


How you can benefit from IDMC

Higher Services One Platform Higher CSAT for


Revenue One Skill Repeat Business

20 © Informatica. Proprietary and Confidential.


Date goes here

Cloud Modernization
Presentation
Data Warehouse, Lakes and App
Name Goes Here
Modernization

Speaker Name, Roboto Regular 20 Point


Speaker Title or Email, Roboto Italic 20 Point
Cloud Modernization
Data Warehouse, Lakes and Application Modernization Journey

Data Engineering Data Warehouse Data Lakes Applications Data Science

Integration is the Common Theme

Source:
1 – Harvard Business Review Services Survey, 2019
22
DATA CONSUMERS

ETL Developer Data Engineer Citizen Integrator Data Scientist Data Analyst Business Users

Intelligent Data Management Cloud

DISCOVER & ACCCESS & CONNECT & CLEANSE & MASTER & GOVERN & SHARE &
UNDERSTAND INTEGRATE AUTOMATE TRUST RELATE PROTECT DEMOCRATIZE

DATA DATA API & APP DATA MDM & 360 GOVERNANCE & DATA
CATALOG INTEGRATION INTEGRATION QUALITY APPLICATIONS PRIVACY MARKETPLACE

10K+ Metadata-Aware Connectors


AI-Powered Metadata Intelligence & Automation

Connectivity
Metadata System of Record

DATA SOURCES

Real-time /
SaaS Apps On-premises
Sources + Sources + Streaming
Mainframe Applications Databases IoT Machine Data Logs Sources

23 © Informatica. Proprietary and Confidential.


Cloud Data Management Design Principles

Radical Productivity for Elastic


Simplicity ALL Users Scale
Simple and Easy Data Access, 10X to 100X Productivity Cloud First, Cloud Native
Processing and Consumption for Data Practitioners with Data Management at
for All Data Practitioners AI-Powered Automation Enterprise Scale

Microservices-based API driven Multi-tenant Multi-cloud


Date goes here

Presentation
Simplicity
Name Goes Here
Speaker Name, Roboto Regular 20 Point
Speaker Title or Email, Roboto Italic 20 Point
Address Any Cloud Integration Need

Cloud Data Integration Cloud Data Quality


Codeless, and optimized data integration Identify, fix, and monitor data quality problems

Cloud Data Integration - Elastic Cloud Integration Hub


Dynamic scaling and automation for streamlined cloud-
High-performance, governed pub/sub data hub
based processing on serverless Spark engine

Advanced Serverless Cloud B2B


Managed environment with no cloud administration, Self-service partner onboarding and management
software, servers or clusters to manage.

Cloud Mass Ingestion Cloud Application Integration & API


No-code, no-build UX to build, monitor & maintain
Ingest Database, Change Data Capture, IoT, Streaming
integrations and API
Simplicity Promotes Self-Service
Modern UX No code/no build 5 step wizard Pre-built templates

Architect Operations Application IT Specialist Data Data Business


Developers Engineer Scientist Analyst

Data Integration Application & API Integration Business Process Integration

Self-service Experience for All Users


Date goes here

Presentation
Productivity
Name Goes Here
Speaker Name, Roboto Regular 20 Point
Speaker Title or Email, Roboto Italic 20 Point
Only AI @ Scale Delivering the Enterprise System
of Record for Metadata
Most Comprehensive Active ✓ Google-like search
Metadata Across the Enterprise
✓ Amazon-like recommendations

Broadest Deployment ✓ Social-graph of data


of AI/ML Algorithms
AI-powered Metadata ✓ Democratization of data
Intelligence & Automation

✓ NLP to create rules


Market Leading AI for
Data Management
✓ Automated match, merge
and consolidation

11 Petabytes of active metadata powering


intelligent and automated data management

29 © Informatica. Proprietary and Confidential.


Productivity is Driven by Automation
BUSINESS SELF
RULES HEALING
TRANSLATION PROCESSING

DATA
TRANSFORMATION
RELATIONSHIP RECOMMENDATIONS
INFERENCE

COLUMN SEARCH
DATA SIMILARITY RANKING
DATASET
VOLUME RECOMMENDATIONS
PROJECTIONS

ENTITY MASS DATA


DATA MATCHING DATA CORRECTION SELF
DOMAIN ANOMALY TUNING
INFERENCE DETECTION PROCESSING

NATURAL
OPERATIONAL
LANGUAGE BUSINESS
SCHEMA ANOMALY
DESCRIPTION TERM
INFERENCE DETECTION
OF CODE ASSOCIATIONS
DATASET
SIMILARITY

BUSINESS ECONOMIC
SMART DATA SCHEMA
RULE VALUE
VISUALIZATION MAPPING
ASSOCIATIONS OF DATA
COST OF PREDICTIVE
SCHEMA ENTITY
DATA OPERATIONAL
MATCHING EXTRACTION
BREACH ANALYTICS
Date goes here

Presentation
Scale
Name Goes Here
Speaker Name, Roboto Regular 20 Point
Speaker Title or Email, Roboto Italic 20 Point
SCALE: Performant & cost-effective data management engine
CODELESS AND SERVERLESS

Advanced Advanced
Elastic Engine Pushdown
Optimization

Auto Scaling
True
+
Serverless
Auto Tuning
3TB data processed
under 2 hours 50X more performant than
ETL

• Router Performance • Zero data egress charges


• Connectivity • Connectivity to all CDW/DL
Performance such as ecosystems
S3 writes 65% cheaper than commercial spark
vendor* • Best-in-class performance 60% Lower TCO
• UDF Performance • CLAIRE based autotuning • Switch to PDO with One click • Multi-tenant compute
• Local storage Scaling cluster in our cloud
• Workload aware auto scale • No instances to manage
• Mapping runtime for customers
recommendations • Easy trials
• Spot instances • Single bill for customers
* Based on benchmark study performed
against commercial spark vendor
32 © Informatica. Proprietary and Confidential.
Date goes here

Presentation
The Informatica Difference
Name Goes Here
Speaker Name, Roboto Regular 20 Point
Speaker Title or Email, Roboto Italic 20 Point
Plug-and-Play Connectivity to Any Data Type

Social Data Machine Data Data Lake IoT Data

Application Data SaaS Applications Big Data Web Services

Mainframe Local Files Databases Data Warehouses

34 © Informatica. Proprietary and Confidential.


Multi Cloud Support
Informatica supports your Azure, AWS, Google Cloud, and on-prem environments

S3 Redshift EMR Kinesis Kinesis


Firehose

ADLS CosmoDB Blob Storage SQL Event Hubs HDInsight


DW

Cloud Dataproc Big Bigtable Cloud Cloud


Storage Query SQL Datastore

Achieve the core goal of delivering trusted, actionable data when and how the
business needs it

35 © Informatica. Proprietary and Confidential.


Most Secure and Trusted Cloud Data
Management Provider
VENDOR A VENDOR B VENDOR C VENDOR E VENDOR D VENDOR F VENDOR G VENDOR H

Aligned Program

36 © Informatica. Proprietary and Confidential.


Date goes here

Informatica Data Warehouse


andPresentation
DataLake Architecture
Name Goes Here
Speaker Name, Roboto Regular 20 Point
Speaker Title or Email, Roboto Italic 20 Point
Informatica Data Warehouse and DataLake Architecture

Streaming 1 Stream
6
Processing
Stream Storage Real-time
Analytics
IoT Machine Apps Business
Data Cloud Data Lake 4 CDW 5 User

CDI-Elastic/A-PDO

Data Provisioning
Data Integration & Enterprise
Log Files Social Mobile 2 Analytics
Quality
Landing Landing Data
Zone Zone Analyst
Mass Ingestion
On-Premises Line of Business /
Self-Service
A-PDO Analytics Line of
Business
Mainframe Application Databases 3
Servers Landing Data Enterprise
Zone Enrichment Zone

Data
Documents Data Engineer

CDI-Elastic
Warehouse

Data Science / AI
SaaS
Data
Scientist
CDI-Elastic
ERP DRM

38 © Informatica. Proprietary and Confidential.


Informatica Cloud Integration Reference Architecture
Cloud Data Lake / Warehouse
Locally hosted agent Informatica managed Serverless Services
1 managed by customer
Cloud hosted agent
2 managed by customer 5
Cloud hosted agent
… Kubernetes
3 managed by Informatica 2 4
Serverless Spark on Kubernetes
4 managed by Informatica Metadata
Informatica hosted
5 Push Down Optimization to Intelligent Cloud Services
Warehouse 5 Data

3
Corporate Network
Secure Agent Group

Data

1
Cloud Applications
firewall
DEMO
Admin Service
Cloud Mass
Ingestion (CMI)
Informatica Data Warehouse and DataLake Architecture

Streaming 1 Stream
6
Processing
Stream Storage Real-time
Analytics
IoT Machine Apps Business
Data Cloud Data Lake 4 CDW 5 User

CDI-Elastic/A-PDO

Data Provisioning
Data Integration & Enterprise
Log Files Social Mobile 2 Analytics
Quality
Landing Landing Data
Zone Zone Analyst
Mass Ingestion
On-Premises Line of Business /
Self-Service
A-PDO Analytics Line of
Business
Mainframe Application Databases 3
Servers Landing Data Enterprise
Zone Enrichment Zone

Data
Documents Data Engineer

CDI-Elastic
Warehouse

Data Science / AI
SaaS
Data
Scientist
CDI-Elastic
ERP DRM

42 © Informatica. Proprietary and Confidential.


Enterprise Cloud Data Management – Reference
Architecture Cloud Data Lakes AI/ML &
Data Science

Advanced
Analytics

Files and Databases


Cloud Storage Cloud Data Warehouse Reports &

Data Integration
Dashboards
Files Databases
Mass Ingestion

Google Cloud
Storage Self Service
Streaming
ADLS Gen2 Analytics

IoT Machine Logs


Data
Messaging Event Store
Streaming
Messaging
Analytics

Kafka Amazon Azure


Kinesis EventHub

43 © Informatica. Proprietary and Confidential.


Ingesting large volume of data from a
variety of sources into a CDWDL is a
challenge

Files Mainframe Database

On-Premise
Machine Data IoT
Data Warehouse

Streaming Logs Apps

44 © Informatica. Proprietary and Confidential.


Supporting Any Ingestion pattern by Any User

Cloud Data Cloud Application


Mass Ingestion
Integration Integration

+
Cloud API Streaming File Mass Database Mass Mass Ingestion
Connectivity Applications
Management Ingestion Ingestion Ingestion

Table stakes New and unique patterns

45 © Informatica. Proprietary and Confidential.


Cloud Mass Ingestion Service –Overview

Ingest in Real Time

Databases & Streaming Sources Files


CDC

Unified Experience for Ingestion Real time monitoring

➢ Step by Step wizard for designing & creating an ingestion task


➢ Deployment, Scheduling, Real time Monitoring & Lifecycle Management
➢ Versatile Out of the box connectivity to sources & targets
© Informatica. Proprietary and Confidential.
Mass Ingestion Files
Mass Ingestion Files - Overview
Provides file transfer capabilities
for exchanging files between on
premise and Cloud repositories,
using standard protocols

Google storage
MI Metadata Cloud
Transfer any file type with a Data
Integration
high performance and
scalability Cloud Redshift S3

1 MI Task 4
Update Job
Job and file level tracking and log Azure DW, Blob, Data Lake
monitoring

Secure

Orchestrate File transfer and


2 Agent
3
ingestion in hybrid/cloud as File Mass
managed and secure service Ingestion
Advanced FTP/SFTP/FTPS service
connector

48 © Informatica. Proprietary and Confidential.


Main capabilities
• Unified user experience for all ingestion
types (Streaming, Database, File)
• Simple, wizard-based task definition
• Wide list of supported sources/targets
• Advanced, highly scalable connectors for
handling FTP/SFTP/FTPs
• Filter files by file name pattern, file size,
file date

49 © Informatica. Proprietary and Confidential.


Main capabilities

• API, schedule or file event triggered


• File actions :
- Compress/decompress (Zip, Gzip ,Tar)
- Encrypt/decrypt (PGP)
• Highly scalable, any file type
• Unified monitoring and tracking experience
- Tracking and monitoring - Job and file level

50 © Informatica. Proprietary and Confidential.


File listener
Benefits
• A platform level asset that provides file
listener capabilities that can be used by
different services
• User can define/manage file listeners and different
apps/services can register/invoke file listeners (via
UI or API)
• Usage:
- File Mass ingestion as a scheduling option-move files
when they land in a specific folder
- Taskflow:
• Trigger taskflow when file event occurs
• File watch inside a taskflow process
- B2B Gateway - as a scheduling option- process files
when they land in a specific folder

51 © Informatica. Proprietary and Confidential.


DEMO
CMI-Files
Mass Ingestion
Databases
Cloud Mass Ingestion Databases
Cloud Targets Supported Targets
Informatica Intelligent Amazon S3
Cloud Services Azure ADLS & Synapse
Provides Database ingestion
Apache Kafka
capabilities as part of IICS Mass
Ingestion service Snowflake

Data Data
Ingest relational database data Warehouses Lakes
from Oracle, SQL-Server & MySQL.
Also supporting Schema Drift on
CDC supported Databases

Real-time monitoring of ingestion Kafka


jobs with lifecycle management
and alerting in case of issues Secure
Mainframe Databases
Agent

Orchestrate Database data Supported Sources


ingestion in hybrid/cloud as
App Servers Data Warehouse Oracle, SQL Server, MySQL, Teradata
managed and secure service

On-Premises
Sources
54 © Informatica. Proprietary and Confidential.
Benefits of Mass Ingestion Databases

1 Supports both data 2 Wizard driven 3 Efficiently ingest CDC


synchronization & real experience for ingestion data from 1000’s of
time analytics use cases tables
Faster decision making Increase business agility No expensive maintenance

4 Automatic schema drift 5 OOTB Connectivity to 6 Real time monitoring


addressing CDC sources, Data Lake and alerting
& DWH targets
Increased trust in data assets No need to hand code Faster troubleshooting

57 © Informatica. Proprietary and Confidential.


Mass Ingestion
Streaming
Mass Ingestion Streaming - Overview
Sensor
Data
Provides streaming ingestion Messaging
Systems Real time analytics
capabilities as part of IICS Machine
Data Ingestion service Data / IOT

WebLogs
Data Lake
Ingest streaming data: Logs, Social & ML Consumption
Media
clickstream, social media,
Kafka Kinesis, S3, ADLS, Messaging
Systems
Firehose, etc.

Real-time monitoring of
ingestion jobs with lifecycle
management and alerting in
case of issues

Orchestrate streaming data


ingestion in hybrid/cloud as
managed and secure service

59 © Informatica. Proprietary and Confidential.


Benefits

1 Single ingestion 2 Wizard driven 3 Enable business the


solution for all patterns experience for ingestion ingest streaming data
for their usage
Save time and money Increase business agility Faster decision making

4 Edge transformations 5 Connectivity to 6 Real time monitoring


for cleansing data streaming sources & and alerting
targets
Increased trust in data assets No need to hand code Faster troubleshooting

60 © Informatica. Proprietary and Confidential.


Mass Ingestion
Applications
Mass Ingestion Applications
Benefits
• MIA can transfer data from Software-as-a-Service (SaaS) and on-premise applications to
cloud-based data warehouses.

• The SaaS and on-premises applications used in your business or organization store large
amounts of business-critical data on a daily basis. You can use MIA to transfer the data
stored by your applications to cloud-based targets that can handle large volumes of
data.

• After you transfer the data to the target, you can consolidate the data and use it for
various purposes, such as advanced data analytics and data warehousing.

MIA can perform the following types of load operations:

• Initial load
- Loads source data read at a single point in time to a target.
• Incremental load
- Loads data changes continuously or until the ingestion job is stopped or ends.
• Initial and Incremental load
- Performs an initial load of point-in-time data to the target and then automatically switches to
propagating incremental data changes made to the same source objects on a continuous
basis

62 © Informatica. Proprietary and Confidential.


DEMO
CMI-Applications
Summary

Cloud native ingestion Connectivity Wizard Driven Design Real-time Monitoring


• Unified service for • On-prem Database & CDC • Simple easy to use • Pictorial view of the
ingestion from wizard ingestion job
• On-prem & cloud files
various sources
• Orchestration for • IoT & Streaming • Edge transformations • Real time flow
ingestion from visualization
• Cloud data lakes, • Intent driven ingestion
variety of patterns • Lifecycle management
Datawarehouse and
messaging hub
64 © Informatica. Proprietary and Confidential.
Date goes here

Cloud Data Integration


Presentation
(CDI)
Name Goes Here
Speaker Name, Roboto Regular 20 Point
Speaker Title or Email, Roboto Italic 20 Point
Informatica Data Warehouse and DataLake Architecture

Streaming 1 Stream
6
Processing
Stream Storage Real-time
Analytics
IoT Machine Apps Business
Data Cloud Data Lake 4 CDW 5 User

CDI-Elastic/A-PDO

Data Provisioning
Enterprise
Log Files Social Mobile 2 Data Integration Analytics
Landing Landing Data
Zone Zone Analyst
Mass Ingestion
On-Premises Line of Business /
Self-Service
A-PDO Analytics Line of
Business
Mainframe Application Databases 3
Servers Landing Data Enterprise
Zone Enrichment Zone

Data
Documents Data Engineer

CDI-Elastic
Warehouse

Data Science / AI
SaaS
Data
Scientist
CDI-Elastic
ERP DRM

66 © Informatica. Proprietary and Confidential.


Multi Cloud Integrations using CDI

Key Platform Capabilities

• Ease of Use
• Templates and Wizards
• Micro-service Architecture
• Reusability
• Broad Hybrid and Multi-Cloud
Connectivity
• No coding across the platform
• Performance optimizations like
CDC, parallel processing,
pushdown optimization, Mass
Ingestion, etc

Hybrid, Multi-Cloud integrations using CDI Transformations and Patterns


67 © Informatica. Proprietary and Confidential.
With modern role-based unified experience

• Uniform front-end for


cloud services
• Role-based, easy access,
individualized “Home
Page”
• Integrated access to
Marketplace, Community
and guided tutorials

Unified experience across all cloud services

68 © Informatica. Proprietary and Confidential.


Integration Task Wizards for Citizen Integrators

69 © Informatica. Proprietary and Confidential.


DEMO
Synchronization
Cloud Mapping Designer for Integration Experts
Transformations

Task Flows

52

71 © Informatica. Proprietary and Confidential.


Intelligent Structure Discovery

72 © Informatica. Proprietary and Confidential.


Dynamic Mappings in IICS
Mapping Challenges
Mappings are tightly bound to schemas

Change in metadata (data type, column,


etc.) may involve manual changes to 100s
of transformations and mappings

Multiple mappings/workflows are


created, tested, maintained for each
source

74 © Informatica. Proprietary and Confidential.


7
Dynamic Mapping – Goals

• Support Any Data Integration Pattern


- Give customers the ability to develop a highly parameterized mapping
• Schema Drift
- Use one mapping to support multiple file formats
- Discover the schema at run-time
• Simplify Maintenance
- Turn hundreds of mappings into 1
- Support table changes without changing the related mappings

75 © Informatica. Proprietary and Confidential.


Efficiency & Flexibility with Dynamic Mapping

• Data Integration: Build a template once – automate mapping execution for 1000’s of
sources with different schemas automatically
• Mapping self-adjusts dynamically to external schema changes and column characteristics

Rule-based ports and links,


e.g., include all String ports

Design time Run time

Generic Source and Target Varying logic, e.g., apply TRIM for varying
with varying schemas number of String fields in the Source
76 © Informatica. Proprietary and Confidential.
7
Dynamic Mapping – Features

• Major Enhancements • Other Enhancements


- Parameterization - Dynamic Ports
- Dynamic Schema - Dynamic Transformations
- Dynamic Target Creation - Run-time Linking
- Concurrent Mapping Execution
- Dynamic Expressions

77 © Informatica. Proprietary and Confidential.


DEMO
Mappings
Date goes here

Cloud Data Integration


Presentation
Elastic
Name Goes Here
Speaker Name, Roboto Regular 20 Point
Speaker Title or Email, Roboto Italic 20 Point
Informatica Data Warehouse and DataLake Architecture

Streaming 1 Stream
6
Processing
Stream Storage Real-time
Analytics
IoT Machine Apps Business
Data Cloud Data Lake 4 CDW 5 User

Data Provisioning
Data Integration & Enterprise
2 Analytics

CDI-Elastic
Log Files Social Mobile
Quality
Landing Landing Data
Zone Zone Analyst
Mass Ingestion
On-Premises Line of Business /
Self-Service
A-PDO Analytics Line of
Business
Mainframe Application Databases 3
Servers Landing Data Enterprise
Zone Enrichment Zone

Data
Documents Data Engineer

CDI-Elastic
Warehouse

Data Science / AI
SaaS
Data
Scientist
CDI-Elastic
ERP DRM

80 © Informatica. Proprietary and Confidential.


CDI-Elastic
Enabling Kubernetes for auto-scaling and provisioning

Informatica Intelligent Cloud


Services

Same, familiar
Informatica Design-Time

Serverless Kubernetes Cluster

Deployed to your Cloud Network

81 © Informatica. Proprietary and Confidential.


Architecture
• IICS-based Spark serverless solution
• Cutting edge technology
• Open source and best-in-class
Spark • Built on the cloud, built for the future
• Lower the overall TCO for customers with Claire-
Containerization based auto scaling and provisioning
- Informatica will manage the compute cluster

Kubernetes+ • Vendor neutral architecture


• Ready for multi-cloud from the get-go
+ – Kubernetes based orchestration for Serverless
CDI-Elastic

82 © Informatica. Proprietary and Confidential.


Standard vs. Advanced Serverless
Current CDI CDI-e Advanced Serverless (CDI/CDI-e)
(Customer Managed Agent)

Data Data Data

Customer Agent Customer Agent Customer Elastic Agent


Manages Manages Manages

Elastic Elastic
Compute
Compute Compute

IICS Platform IICS Platform IICS Platform


Design time Design time Design time Scalable Unit
Delivered Microservices Delivered Microservices Delivered Microservices
as a service Middleware as a service Middleware as a service Middleware
OS OS OS Customer
Virtualization Virtualization Virtualization Managed
Server Server Server
Infrastructure Infrastructure Infrastructure Informatica
Managed

83 © Informatica. Proprietary and Confidential.


CDI-Elastic Deployment Options

Azure AWS GCP

• Compute cluster is launched by Secure Agent in the customer network


• Customer has complete control on network peering, assigning roles and privileges

84 © Informatica. Proprietary and Confidential.


CDI-E Standard Serverless Architecture

Informatica Managed Cloud Customer Managed Cloud

Storage Layer Cloud Data Warehouse

Mappings Secure Agent Compute Cluster

85 © Informatica. Proprietary and Confidential.


CDI-E Advanced Serverless Architecture

Informatica Managed Cloud Customer Managed Cloud

Storage Layer Cloud Data Warehouse

Mappings

Secure Agent Compute Cluster

86 © Informatica. Proprietary and Confidential.


CDI-E Advanced Serverless Architecture: Details
Informatica Account Customer Account

Metadata
Customer VPC Data

Web browser
(Build & manage)
Microservices

Informatica VPC
VPN
Compute

✓ No infrastructure or SA/clusters to manage Trusted Data


✓ Auto upgrade (No s/w to manage) Secure Link
✓ Secure (Tenant isolation, HIPAA, SOC 2)
✓ High Availability with Zero downtime
Data
ERP Databases
Warehouse

✓ Auto-Scaling
✓ Multiple AWS regions
✓ Resiliency and HA
✓ Tenant Isolation
DMZ
87 © Informatica. Proprietary and Confidential.
CDI-e: Automated Performance Tuning
Powered by CLAIRE

88 © Informatica. Proprietary and Confidential.


CDI-e: Why tune?

Manual work
30% of your Engineers time
Pick new
Parameters

Frequent Outages
Pager ringing at 3 AM
Developer
Analyze the
Run the Job
Logs
Slow and expensive
Missing SLA’s every week.

89 © Informatica. Proprietary and Confidential.


CDI-e: What is tuned?

Optimal cluster parameters Optimal Spark Configuration


• Size • Parallelism
• Instance Type • Shuffle
• # of processors • Storage
• # of memory • JVM Tuning
• Disks • Feature Flags
• … • …

90 © Informatica. Proprietary and Confidential.


CDI-e: Auto Scaling

• Auto scaling to meet your SLA at least possible costs


• Dynamically respond to changes to environment and workloads to meet the
data volume requirements and compute requirements
• Algorithm to scaling up/down effectively
• Auto adjust based on concurrency
• Horizontal and vertical scaling

• Increase/decrease parallelism by arriving at the optimum number of nodes


and spark executors based on the job demands

91 © Informatica. Proprietary and Confidential.


CDI-e: Incremental File Load
• Challenge:
• I want to load data (different flat files) into Cloud Storage
• Files which already have been processed should be ignored.
• I cannot just delete them, since they are used by other processes as well.

Execute Elastic
Source File Directory
Mapping Process

• Solution:
• CDI-Elastic can track data that has been processed during a previous run of an MCT by
persisting the state information of the job run.
• Incremental File Load is a feature of CDI-Elastic which will maintain the state information and
prevent reprocessing of old data.
• Time travel will help to go back in time and re-process files

92 © Informatica. Proprietary and Confidential.


DEMO
CDI-e

You might also like