You are on page 1of 47

Big Data Implementation at Fintech

Muhammad Saipul Rohman


Data Engineer
A glimpse
about
Linkaja
LinkAja 2020 3
… enabled by the support we receives from key stakeholders and SOEs

LinkAja 2020 4
LinkAja 2020 5
LinkAja 2020 6
Big Data
at
Linkaja
Oct 2019, Before the fun begins..

Vendor-driven Information retrieval No extended data


Scalability issues
development is challenging products
Where were we?

1. Monolithic architecture
Scalability issues
2. Architecture & network modification was a challenge

3. Hard to scale servers/engines

4. Infrastructure was not so easy to govern

5. No clear logging & monitoring system

6. Scattered databases
Now, Talking about the team

Vendor-driven
development
1. No internal resource, too much dependency to vendors

2. Communication is a challenge

3. No clear data strategy


Talking about classic problem

Information retrieval
is challenging 1. Not so consistent data

2. Metrics definitions are scattered all over the organization

3. Difficult to extend information influence through visualization

4. Democratizing data.. Are you kidding me?


Limited room for innovation

No extended data
products

1. Can not build AI/ML products

2. Can not do UI/UX & product experimentation


Moving All Data Infrastructure to Cloud

Scalability issues
Compute engine & networking

Storage and datawarehouse

Logging, monitoring, & IAM


Why we exist

The needs of Data unification, layering, & Talent pool demand to build big data
democratization capabilities, such as data engineering,
analytics & data governance, and AI/ML

Critical business activities require big data Requirement to advocate high quality and
technology, i.e. business decision, product secure data
development, & crucial operational works

What are our missions

1 2 3 4
Building a single source of Instil the organisation as a Empowering business with Building Artificial
truth of company data lake, whole with fit-for-purpose accessible and secure data Intelligence/Machine Learning
that is scalable, reliable, and data governance principles platform to extract the benefit products that produce high
with high availability from data driven culture business value
What we do and how we do it

Getting the basics perfect

We focus our access management on our


cloud assets

Our cloud assets are great, but we are the


ones optimizing them we try to do less of these… instead, we do these…

instant, in-the-moment requests structured, planned work


We are the experts in data, from storing it,
ad-hoc data retrieval data enablement and evangelism
accessing it, and everything in between. So
partner with us when using data working on requests with solution building and educating on
short-term thinking the most efficient and sustainable
outcome
Big Data Group

Big Data
Group

Data Data Platform AI/ML


Engineering & Core BI Platform

Data Solution Engineering Core BI & Reporting AI/ML Engineering


Data pipeline development, cross data Proactive data enablement by anticipating Bringing best-in-class of self-service AI/ML
sources integration, and data product business needs of self-service data access platform to support business in developing
enablement to support business decision & AI/ML model
operational optimization Product Analytics
Partnering with Product Owners and Product
AI/ML Scientist
Data Infrastructure Engineering Managers to empower the business in Proactively empowering AI/ML activities in
Build scalable & high performing data making product development data-driven providing business and product insights
infrastructure that enhances the efficiency decisions through simulation, intelligent pattern,
and productivity of the data environment and unstructured data insight

Data Governance
Raise awareness, build frameworks and enforce
activities of data quality and data security in all
data domains across the organisation
How Big Data
started ?
Evolution Big Data
5V of Big Data
Type of Data Source
Common roles in Data Teams
Big Data Architecture

• Lambda Architecture
• Kappa Architecture
Lambda Architecture
Example Lambda Architecture
Kappa Architecture

Streaming/ Streaming/
Data Analytical Analytics &
Real-time Real-time
Sources Ingestion Processing Data Store Reporting
Example Kappa Architecture
• DataWarehouse/Data Marts
Analytical • Data Lake
Data Store • Lakehouse
Data Warehouse VS Data Lake VS Lakehouse
Data Ingestion

Self service
Sources Datalake Datamart
playground

BI Reporting AI/ML
Democratization is nonsense without clear governance framework

We need to ensure the data is in high quality We need to ensure the data is secure

1. Plant airflow sensors in ingestion process


1. Looker dashboard for all individual &
service accounts access

2. Looker dashboard for ingestion processes 2. Platform security assessment


AI is useless if you have
scrappy foundation

Tertiary AI/ML

Reporting,
Secondary
Dashboard, &
Analytics

Primary
Data engineering
Use Case Big
Data at LinkAja
Tech Stacks and Production Components
Data Engineering Technology Ecosystems
Data Sources Persistence Layer
Data Pipelines

Databases

Streaming Infrastructure Google Big


Query

Excel
Connect Connect

Google
Files Batching Infrastructure Cloud Storage

API

Monitoring
Airflow
..... GCP Console
Dashboard
LinkAja 2020 41
Core Business Intelligence

Self-service platform
Data evangelist Visualisation Data enablement • Business data dictionary
• Operational data dashboard
• Query optimization how-to guide
Data linkage Business Business-
and data Intelligence focused
access dashboards enablement Government projects
consultant
• Kartu Prakerja disbursement
• Pegadian Emas integration with LinkAja
Data-driven
business Core reporting
advocate
Financial regulators partner

• Bank Indonesia reporting


• Data pipeline documentation

LinkAja 2020 22
Product Analytics and Experimentation

Analytics dashboards Product cataloguing Data-focused product


development

Customer segmentation Product Requirement Data points enablement


dashboards Document with data in mind during product development

Product categorization based


Product usage dashboards Event tracking analysis
on data

Performance dashboards

LinkAja 2020 43
Data Governance

Data quality centre of excellence

Data quality and security • Prioritisation of data of high importance and value
• Data quality tools and processes improvements
• Data ingestion flow metrics (work in progress!)

Adoption of data governance tools


Data governance policy Best in class • Maximum utilization of Google Cloud Platform built-
implementation tools in tools
• Bespoke data catalog tool (work in progress!)

Simplification through clear framework


Awareness- Business-focussed Data governance
building requirements framework • Optimising the number of user access groups
• Bespoke data catalog tool (work in progress!)

LinkAja 2020 44
• Building AI/ML Platform
AI/ML Engineering • Making AI/ML platform adoption easier
• Automation of AI/ML production

LinkAja 2020 45
AI/ML Scientist

Starting AI Guild as eKYC improvement NLP Project


internal community • Image quality metrics • Gender prediction
• ID card detection • Syariah recommendation

LinkAja 2020 46

You might also like