You are on page 1of 28

Institute for Development and Research

in Banking Technology
(Established by Reserve Bank of India)

Poor Quality Data

Quality Data

Name: Mr. XYZ


Date of Birth: 23-02-1982
Gender: Male
Address: #215-5C/1-B, Plot No. 22

Belahalli Nagar
City: Bengaluru
State: Karnataka
Country: India
Occupation: Doctor
Marital Status: Married
Qualification: MBBS
Annual Income: INR 7,50,000/-
PAN No.: BACGB0022C
UIDAI: 9876 5432 1234
Mobile: 90000 00009
Email: xyz@email.com
Credit Card Details: 9845 8855 0000 1234
Debit Card Details: 1234 5678 9101 1123

Family Details
Income/Wealth Profile

Data Quality Framework


for Indian Banking Sector
Contents
Foreword 01
Message from IBA 02
Preface 03
Executive Summary 04
Introduction 05
Chapter 1: Importance of Data Quality 06
Need for Data Quality
Regulatory Perspective
Product Pricing Perspective
Treasury Management Perspective
Risk Management Perspective
Business Development/CRM/Analytics Perspective
Data Quality Management Lifecycle
Sources of Unclean Data in Banking
Broad Taxonomy of Data Quality Issues
Chapter 2: Advent of Big Data and its Implications on Achieving Good Data Quality in Banks 09
Data Augmentation
Identifying Good Data and Bad Data
Chapter 3: New Framework for Good Data Quality 10
Workflow for Implementing Data Quality Programme
Data Collection and Administration Strategies
Chapter 4: Data Quality Management, Data Governance and Master Data Management 13
Facets of Data Quality Management
Data Quality Governance Management Framework for Responsibility and Accountability
Design of Operational/Organizational Model
Master Data Management
Way Forward for the Bank in Data Quality Management Journey
Chapter 5: HR Matters 17
Chapter 6: Case Studies 18
Karur Vysya Bank
Mailers to Customers
Data Cleaning Algorithms
ICICI Bank
Role of Logic Algorithms in Identifying Unclean Data
Initiatives Rolled Out
HDFC Bank
Methodologies Adopted
Allahabad Bank
Rolling Out Data Quality Improvement Initiative
Implementation
Data Quality Improvement in SBI
Activities at Other Banks
Action Plan for DQ Management 24
References

© An IDRBT Publication, 2014. All Rights Reserved. For restricted circulation in the Indian Banking Sector.
Foreword

Accurate data is a sine qua non for improving the quality of MIS in any organization and thus
forms the backbone of an effective Decision Support System (DSS). For most organizations
including banks, operational challenges have become a part of their day-to-day activities,
making them devise both simple and complex workarounds to compensate for inadequate
data quality. Over the years, the policy and decision-making processes have become more
information-intensive. Therefore, it is imperative to ensure quality of data and its timely
submission by banks not only to the regulator but to the banks' managements as well. This
area requires more attention, given that data quality may have an impact on the reputation
of banks besides posing other risks. Improving the data quality would also assist the banks in
terms of improved timelines, enhanced data quality, and improved efficiency of processes.
Towards this, the Automated Data Flow Project was initiated by the Reserve Bank to
automate flow of data from banks' internal systems to a centralised environment and then to
the regulator. By adopting the automated process for submission of returns, the banks are
now able to submit accurate and timely data to a centralised environment at their end and
then to the regulator without any manual intervention.
The Framework on Data Quality for Indian Banking Industry compiled by IDRBT is being
brought out at the right time when Indian Banks are going through a phase of
transformation. It has aptly dealt with issues relating to data quality and has suggested a new
framework towards achieving this. While doing so, it has also touched upon the relevant HR
issues which are ever so important while dealing with data and its quality. Here is an
opportunity for the banking community to make good use of the document. I appreciate the
good work done by IDRBT in this regard.

Anand Sinha
Deputy Governor, Reserve Bank of India,
Chairman, IDRBT

01
Data Quality Framework for Indian Banking Sector
Message from IBA

The new waves of regulation are driving banks to


collect broader, deeper data sets, to enhance risk
identification and control structures, to upgrade
reporting capabilities, and to improve transparency.
Data Management has been pushed to the forefront
today by the multi-pronged squeeze of compliance,
risk management, operating efficiencies, effective
client relationships and marketing. All of these
functions rely on the accuracy of data for effective
decision making.
Data quality is ubiquitous. It has emerged clearly as an
issue wherever data is present; therefore, data quality
participates as a consideration in every information and manage data quality on a continual basis. The
system’s theme such as Business Intelligence (BI), Framework covers the factors and ways for evolution
Enterprise Resource Planning (ERP), Customer in Data Quality and Data Governance. It gives Data
Relationship Management (CRM), Master Data Quality Governance Framework for Responsibility and
Management (MDM), Service Oriented Architecture Accountability. Using the Information Maturity Model
(SOA), Security, etc. Within data quality, the sub- banks can understand where it stands today and
disciplines, which follow the management and accordingly plan next steps for achieving a higher
implementation of data quality disciplines are data level of data quality. The Framework covers the HR
governance and data ownership – who owns the data, Matters and recommends specialist positions to be
and who is best able to know if the data is wrong, and created within the bank in order to get good quality
knows what rules/logic to apply to repair the data; data on a continual basis. The case studies describe
assessment and profiling – examining the status quo efforts taken by Indian banks to improve and manage
to identify core data quality issues; matching and their data quality and overview of the techniques
cleansing – the process of cleaning the data; and adopted.
monitoring and improvement – the ongoing process
I hope the Framework will be of immense use to the
of monitoring and improving the overall data quality
banking industry for improvement of their data
of systems.
quality activities that will enhance performance
The Institute for Development and Research in management and reporting including data
Banking Technology (IDRBT) has made an attempt to governance, assigning responsibilities and processes
put together the Data Quality domain in the for data definitions, ownership, validation, change
Framework on Data Quality for Indian Banking Sector. control, data sourcing, etc. I thank and congratulate
It covers the data quality issues and why it is necessary Institute for Development and Research in Banking
and important to solve these issues before any Technology (IDRBT) for doing an excellent job in
valuable knowledge can be extracted from the data preparing and timely release of this Framework.
using sophisticated software and algorithms. It
provides a framework detailing various stages of data
K. Ramakrishnan
quality improvement, as well as a roadmap to achieve
Chief Executive,
good data quality. It describes a workflow to improve
Indian Banks’ Association

02
Data Quality Framework for Indian Banking Sector
Preface
“Richness of Data leads to riches to stakeholders.”
Data is a key corporate asset for any organization and
more so for financial institutions. As such quality of
data has significant implications on corporate
governance. Data-driven decision making can be a
competitive differentiator. Given the volume, velocity
and variety of data (Big Data), data governance is an
important component of corporate governance
especially for financial institutions. Quality of data
data quality needs to be given due recognition as is
directly impacts performance, compliance and Profit
done for achieving business targets, as data quality
& Loss.
improvement is also a business goal.
Analytics has emerged as an important field of
Besides new business processes and work flows, banks
business discipline. Industries like health care and
need to acquire and nurture new skill sets/roles like
retail have greatly benefited by the deployment of
data architect, data steward, information architect,
analytics. Banking industry in India as well is seriously
data scientist, etc., to execute data quality programme
exploring opportunities for deployment of analytics.
successfully on an ongoing basis. The HR departments
Cross sell, up sell, lifetime value, reduction in churn,
play an important role in building these new skill sets.
fraud detection/mitigation are some of the important
The CMD and ED need to champion the programs.
business goals sought to be achieved through
Culture of data quality needs to spread across the
deployment of analytics. To achieve these important
entire organization. Business benefits of good quality
business objectives, banks are acquiring technology
data are immense and right blend of people, process
solutions like data warehouse, BI, BA, etc. But
and technology need to be deployed. Given the
investment in these technologies would yield healthy
multi-disciplinary nature of the work, direct
ROI only if the data quality is good enough for
involvement of CMD and ED is essential and the
analytics to provide actionable insights. Mere
progress needs to be reviewed quarterly.
investment in technology without commensurate
investments in data quality would lead to suboptimal In order to provide a framework for banks to improve
results. Secondly, in terms of sequence, investment in data quality, IDRBT has now come out with a
data quality shall precede investment in technology publication which is highly practitioner-oriented.
solutions. This is also essential to mitigate technology Besides IDRBT faculty, bankers, subject experts,
obsolescence. Some of the banks have rushed in technical experts and academicians are involved in
investment in data warehouse/BI/BA tools without bringing out this publication.
addressing data quality issues and consequently ROI I would like to acknowledge the contribution made by
is poor. the working group members and other bankers.
Given the fact that data quality is a critical success It is interesting to have state-of-the-art technology.
factor for quality business growth as well as But, it is important to have good quality data.
compliance, there needs to be a top-down approach Enlightened leadership distinguishes between
to address this issue. It shall be the responsibility of the “Interesting” and “Important”. It is not about tools and
board and top management to put in place not technology, but about creating the right data culture.
merely policies but well-defined processes and
organizational structures to improve the data quality B. Sambamurthy
on top priority. Efforts and achievements in improving Director, IDRBT

03
Data Quality Framework for Indian Banking Sector
Executive Summary
Customer data is paramount to the success of any
business – especially in service-oriented business like
retail banking. To convert data into information and
knowledge, banks in India spend significant amount of
money to deploy Enterprise Data Warehouse (EDW)
and Customer Relationship Management (CRM)
solutions. Both EDW and CRM are the business
initiatives for furthering business growth. It is important
to observe that information is not knowledge.
Online Analytical Processing (OLAP) or Business
Intelligence (BI) tools help in converting data into
information. Then to convert information into Chapter 1 highlights data quality issues and why it is
knowledge deep buried underneath the data, we need necessary and important to solve these issues before
data mining (also known as predictive analytics). any valuable knowledge can be extracted from
Consequently, the knowledge extracted from data customer data using sophisticated software and
using sophisticated tools is only as good as the quality algorithms.
of the data that goes into them. Apart from business Chapter 2 briefs about the significance and impact of
needs, maintaining good and accurate data is also a Big Data on banking. It identifies certain key
regulatory compliance issue. Recent regulatory characteristics that differentiates good data from bad
directives from Basel/RBI, etc., and business efficiency data, thereby addressing data quality issues that banks
imperatives demand diligence on granular data. face while deploying analytics.
Hence, data quality becomes more important now, Chapter 3 deals with approaches to address data quality
and, we attempt to put together in this booklet a issues. It provides a framework detailing various stages
guide on the data quality domain for the banks. of data quality improvement in banks, as well as, a
The issue of data quality with Indian banks is manifold. workflow to achieve and maintain good data quality.
These issues range from data collection, data entry and Chapter 4 explains the factors and ways for evolution in
timely updation of changes to customer data. Data Quality, and consequently enables better Data
Governance.
Chapter 5 deals with the HR perspective and provides
an organizational structure to monitor, improve and
manage data quality on a daily basis.
Chapter 6 provides detailed case studies of efforts taken
by some of the Indian banks to improve their data
quality. It provides an overview of the techniques
adopted by these banks to improve and manage data
quality.
Innovative incentive programmes for both the
customer and staff may be useful for improving and
retaining data quality. All delivery channels may be
leveraged for data quality improvement programmes.

04
Data Quality Framework for Indian Banking Sector
Introduction

The quintessential importance of data quality in


banking industry is best appreciated, if one
understands the Fourth Paradigm in Science. Science ¬ Proper knowledge of data quality
has four important paradigms or facets viz., (i) management principles, tools and processes
Theoretical (ii) Experimental (iii) Computational and ¬ Defining and delineating data quality
finally (iv) Data Intensive scientific discovery. Over the responsibilities between IT and business
last few decades, computational aspects stole the ¬ Proper software solutions to automate the key
limelight primarily because many theories could be functions of data entry, data quality
best explained by resorting to mathematical management from profiling, cleansing,
modeling and the associated computational standardization, monitoring and reporting.
structures. However, during the last one decade, when
Poor data quality could be of two types: wrong data
all fields of science and engineering experienced glut
being populated into the forms or inadequate data
or abundance of data, data mining/data
being collected (attributes missing in the database).
analytics/data science became absolutely essential.
While wrong data can drive data mining algorithms to
Thus, arose the significance of the fourth paradigm.
make wrong conclusions (by way of gaining wrong
Banking and finance is replete with computational knowledge); inadequate data can make it impossible
and data-oriented paradigms. While, financial to mine knowledge from the data. Collecting wrong
engineering exploits the computational paradigm, data adds cost to the process without adding value.
banking can thoroughly benefit from the fourth Since data is meant for the purpose of driving business
paradigm viz., data mining/data science so much so of the bank, it is important to identify and design
that one can get extremely accurate results solely collection forms accordingly. For example, currently
based on the data about customers. However, the while it is easy to find the total number of accounts,
prerequisite to get such high accurate results is high there is no way to determine how many unique
quality data. customers exist and how many households or families
The relevance and importance of the fourth paradigm (other user-defined groups of customers viz.,
to banking is best understood if one realizes that most employees of a common company, doctors, etc.) a
of the business problems in banking and finance can particular bank serves.
be solved efficiently by analyzing customer data. This problem can be solved by gathering additional
Therefore, the purpose of data is to be able to achieve information from a new customer like if he or any of
business goals. This makes it necessary to identify his immediate family (dependents) currently holds an
what needs to be measured and recorded. account at the same bank. Based on this information,
Comprehensive programme on data quality shall the reverse relationship can be easily established by
cover the following: automated tools. Such information is more valuable
¬ Data governance strategy and framework for cross selling other products (viz., insurance
which defines policies, processes, and policies) and establishing better credit lines to the
organizational structures to manage data customers. Hence, it is important to design forms so as
quality to be able to collect all the necessary and relevant
¬ Ability to quantify the amount of bad data, and information. The failure to do so will increase the cost
more importantly how it costs to the business of gathering information.

05
Data Quality Framework for Indian Banking Sector
Chapter 1 Importance of Data Quality

Need for Data Quality customers, help a bank in this regard. Further, Data
The importance of good quality data for banks is Quality is important for the successful
manifested in five perspectives viz., (I) Regulatory (II) implementation of Automated Data Flow (ADF). With
Product pricing (III) Treasury Management (IV) Risk the advent of global banking and globalization, it is
Management and (V) Business/CRM/Analytics (see now necessary to understand the perspectives of
Figure 1). other banking regulators and agencies across the
globe. Various governing bodies have set guiding
principles and metrics to benchmark the quality of
Regulatory data in a bank. Figure 2 provides a summary of data
quality attributes defined by some of these governing
bodies.

Business/ Product Pricing Perspective


Product
CRM
Pricing Correct and real-time data in the following helps this
Analytics

Purpose crucial area, for example:


§ Maturing Asset/Liability/Investment data,
competitor/market rates data, for comparative
study to judge alternatives of funds
sourcing/deployment and their impact. These
Risk Treasury
Management Management are the bases for interest re-pricing in both asset
and liabilities products, or promotion/cool-
down of selected products, or introducing new
Figure 1. Need for Maintaining Data Quality products and their prices
§ Data of resources e.g., fixed assets, implements
Regulatory Perspective
(computer/software/rent/space) human costs,
As per regulations, every bank should have accurate etc., and competitors' rates. This is to recognize
customer data to trace the account of a specific costs and pricing of ser vices, like
individual. This is necessary to deal with cases of remittance/safe custody/guarantee/various
possible money laundering, etc. Know Your Customer account-related charges, etc. Non-interest
policy (KYC) forms, if completely filled by the income is a very big source of earning.

Bank of England European Basel Committee Indian Scenario

Figure 2. Data Quality Standard Defined by Different Governing Bodies

06
Data Quality Framework for Indian Banking Sector
Treasury Management Perspective
Risk Management Perspective
Correct and Real-Time data of the following areas are
vital for efficient operations: Without good quality data, banks struggle to sustain
§ Asset-Liability Mismatch Data Over Time Ranges - sufficient liquidity, and deal with tighter regulatory
Required for deciding funds need levels and scrutiny. Yet, most banks are still relying on reactive
market borrowings/investments. Over/Under approaches while responding to the escalating data
borrowing/investment may mean reasonable demands. With Basel III, there will be a lot of emphasis
undesired losses on risk, capital and liquidity management. Therefore,
§ Granular Asset/Liability maturity dates, interest smart banks recognize the urgent need for a full
rates/accrual figures/re-pricing times and review and possible overhaul of their data quality,
expected rates ahead, and, renewal probability in integrity, underlying architecture and governance.
the portfolio. Errors in micro-levels of data may The solid business case for early and strategic
prove costly approach to the development of Basel III data
§ Correct Loan Data in Detail – Required for capabilities includes the ability to manage market
correctly classifying standard/doubtful/bad stresses, funding and liquidity constraints more
assets, primary/collateral security coverage, and confidently and effectively. Without this strategic
classification of loan with some parameters of it. approach to data, many of the problems exposed by
This is required for provision as well as capital the financial crisis, including the lack of consistency
requirements that significantly affect the and alignment in the information coming from risk,
business result and abilities finance and the front office, will be perpetuated.
§ Physical assets and valuations, employee costs Business Development/CRM/
and classifications, in fact, all items used to do Analytics Perspective
business with – affects cost recognitions,
Data Quality is necessary for obtaining the best RoI
insurance outlays, cost controls, provisions, and
from EDW/CRM implementation:
profitability
§ Aggregated Data of Deposits, Investments, etc. – § Customer segmentation for acquiring
To comply with RBI directives on SLR/CRR, etc. customers through target marketing
Lack of completeness/recency may mean over- § Cross-Sell/Up-Sell
borrowing/underinvestment causing financial § Mitigate various frauds on real time basis
loss § Predict and reduce customer churn
§ Correct past data of credits with delinquencies, § Increase customer loyalty and maximize
write-offs, correct investment history data and Customer Lifetime Value for the enterprise (CLV)
market data with trends, etc. - required for correct § Developing better credit scoring models and
Risk Capital computations. In the absence of good improving asset quality
data, the provisions demand higher capital and
§ Achieve Single Version of Truth
costs affecting business result of the bank, as also
§ Achieve 360 view of the customer.
0

may restrict scales of business


§ Correct HR Information – Ability, strengths, job Data Quality Management Life Cycle
history, personal background, etc., to well utilize Technology that currently exists allows organizations
human resources for business. Very few banks, to improve and consolidate corporate information.
despite some HRMS/personnel data, are in a The five components shown in Figure 3 - data
position to leverage staff well for productivity as profiling, data quality, data integration, data
also for churn reduction.

07
Data Quality Framework for Indian Banking Sector
enrichment and data monitoring - are best addressed Business Process
through a single platform, providing a unified view of
any type of data, including customer, and product § Establishing DQ standards and customer on-
information. The technology companies who figure in boarding process for various business verticals
the Leaders quadrant of the Gartner's Magic § Establishing acceptance criteria for customer
Quadrant as of October 2013 are IBM, Informatica, information pertaining to name, address,
SAP, SAS and Trillium Software in alphabetical order. mobile number and email address
§ Training staff responsible for capturing
AN
AL customer information.
YZ
Technology

E
1
§ Taking care of DQ issues during system
migration
5 0
0
36 § DQ issues during integration of various
software systems, especially during business
ENTERPRISE
2
mergers
DATA
+++ § Reducing Data Entry Errors.
Figure 5: Delineation of Issues between Business
4
Process and Technology

Broad Taxonomy of Data Quality Issues


3
Banks employ software tools to regularly monitor and
identify gaps in the customer data. In the event, gaps
are identified and software tools are used to fill in the
Figure 3: Different Aspects of Data Quality Management gaps. So, it is necessary to understand the broad
nature of data quality issues. This taxonomy can be
Sources of Unclean Data in Banking used to identify and compare features of different
Figure 4 identifies the three primary causes for poor software tools.
data quality in Banks. While some of these issues can
Data
be addressed by improved business process, others Migration
can be overcome by appropriate use of technology. Data
Integration
Figure 5 shows the delineation of data quality issues
between the two.
De-
Duplication Data
Data Entry Quality
Errors
Issue
Data
Enrichment

Reasons
Master Data
for Poor Management

Data
Data Data Data
Cleansing
Migration Collection
Figure 4:
Issues Issues
Primary Causes
Affecting Data Quality Figure 6: Taxonomy of Data Quality Issues

08
Data Quality Framework for Indian Banking Sector
Advent of Big Data and its Implications on
Chapter 2
Achieving Good Data Quality in Banks
Of late, there is a new paradigmatic shift that took During the last few years, Indian banks have
place in the form of Big Data, which is set to have a successfully deployed powerful transaction
profound influence on the way banking will be done processing engines by way of CBS. While banks
in future. It is a catch-all word that connotes new achieved significant productivity gains, data quality
dimensions of data including volume, velocity and has not received the desired effort in the migration
variety. Volume refers to the sheer size of the data in process. Banks need to evolve to the next stage of
terms of transactional data with respect to several information processing and management, leading to
products and customers. Velocity refers to the speed ultimately progressing to deployment of analytics. It
with which customer data comes on board. Customer has been a long pause in this journey. Before they rush
views aired on their banking experience in social to technology deployment, banks need to address
media is an example of this dimension. Variety refers serious issues of data quality, which is one of the main
to the different types of data viz., structured and barriers for deployment of analytics. Many have set up
unstructured. The transaction data, demographic data warehouses without satisfactorily addressing
data, geographic data forms part of the structured data quality issues.
type, while customer complaints, e-mails, voice
recordings over the call center and customer views in Identifying Good Data and Bad Data
social media data, etc., form part of the unstructured
type. With new channels making waves across Gen-X Bad data is generally characterized by:
and Gen-Y customers, the unstructured data becomes § Improbable values
gold mine of customer data in identifying cross-sell § Impossible values
opportunities, detecting customer churn and § Illogical values
detecting frauds in near-real time. The quality of
§ Mis-informative values
unstructured data is as important as that of structured
§ Missing values
one. Therefore, big data throws a new challenge to the
banks in their data quality initiatives. It requires new § Illogical values
investments in IT to deal with this new phenomenon, Based on the above characteristics of bad data, good
which would improve the bottom line for the banks. data quality can be ensured by checking for following
Data Augmentation aspects in the bank’s data:
§ Frequency of errors
This aspect is closely related to big data. Many case
studies demonstrated that augmenting structured § Unreasonable distribution
data (which is described above) with the unstructured § Detection of unexpected quality
one viz., customer complaints, e-mails, voice § Numeric variables
recordings over the call center and customer views in § Out of range data
social media data indeed improved the accuracies of § Outliers
churn models, cross sell models, fraud detection
§ Logic of data
models, etc. Further, data augmentation also
Today technological advances have been made in
connotes collecting customer data from external
terms of software tools and algorithms to identify
agencies such as credit bureau reports, BASEL
unclean data and mitigate the same.
guidelines, RBI circulars, stock market data, census
data, etc. This further enhances the level of success in Single version of truth and single view of customer
our endeavour to achieve 360o view of customers. shall be the important hallmarks of data quality.

09
Data Quality Framework for Indian Banking Sector
Chapter 3 New Framework for Good Data Quality
In this chapter, we propose and recommend a new framework for achieving data quality in the banking industry
in India. It encompasses all aspects of data quality. Figure 7 provides the set of activities that need to be executed
to improve and maintain data quality in the long run.

Data Quality
Data Domain Monitor and
System Mapping Gap Analysis Remediation Remediate
Definition Control
Strategy
Build Data
Perform Root Rationalize
Identify Source Identify Initial Quality
Build Team Cause Analysis Remediation
Systems Issues Scorecards/
& Build Reports Portfolio
Reports
Map Data
Build Develop Initiate Assess
Elements to Profile Data
Common Remediation Remediation Remediation
Source Elements
Definition Options Project Results
Systems
Develop
Build Data
Identify Data Estimates Cleanse and Provide
Quality Rules
Gaps and Business Correct Data Feedback
& Standards
Case
Augment Develop List Prioritize and Implement
Data Quality - of Data Recommend Remediation
Architecture Quality Remediation Plan and
and Analytical Issues Option Track Status
Figure 7: Activity Classification Chart for Improving Data Quality

Descriptions of these work streams are listed below: Data Quality Remediation Strategy
Data Domain Definition § Identify steps and tools necessary to research a
§ Development of common business definitions data quality issue and develop options to
for a data domain remediate it
§ Develop standards about the meaning and § Develop remediation plan to address specific
format of each critical data element. data quality issues.
§ System Mapping Remediate
§ Identifies systems that contain data for the § Prioritize, obtain funding and execute approved
domain defined in Data Domain Definition remediation plans
§ Map each critical element as defined through § Data quality issues should be resolved to the
Data Domain Definition to the actual field level required by the business.
instance in the system. Monitor and Control
Gap Analysis § Monitor Data Quality
§ Identify collection and discovery of data quality § Measure effectiveness of remediation plans
issues related to data quality
§ Identify and log issues that are in ‘Data Domain § Develop Data Quality Score Card and identify
Definition’ and ‘System Mapping’. appropriate Metrics to measure Data Quality.

10
Data Quality Framework for Indian Banking Sector
Workflow for Implementing Data Quality Programme

Figure 8 establishes a workflow diagram that explains the input, output, controls and mechanisms to implement
data quality programme in the bank.
Each activity represented as a block, gives an idea on the inputs (arrows on the left side of the block and pointed
into it) necessary to execute it and the expected output (arrows on the right directed out of the box).
Simultaneously, for each sub-activity, we also indicate the controls that determine the scope of the activity (as
indicated by arrows on the top) and mechanism through which the activity can be achieved, i.e., the person in the
Bank responsible for the activity (indicated by arrows at the bottom of each block).
To be jointly handled by both Business and IT Teams
Controlling factors

acceptable data quality


Identify thresholds for
Business Operations
Interoperability of
Business Growth

ADF

Mailers to
Create and
Define Business Identify Data Customers Identify Sources
Populate
Purpose for Data Fields Necessary of Data
Data Cleaning Test Database
A0 A1 Tools/Algorithm A2 A3

Data Architect
Data Quality
Information
Architect

Manager
Board

Mechanisms
Techniques
Data Entry

Analytics
Business

Completeness
Accuracy

Currency
Lineage

Identify Possible
Identify Fields in
Technologies to
Error and
Improve Quality
A7 Reasons
A5
No Evaluate Data Cleaning
Meets Data Quality
Data
Technology Specialist

Data Steward Data Enrichment


Thresholds?
Quality
A4
Data Steward

Populate
Master
Database
A6
Data Architect

To be handled by IT team

Figure 8: Work Flow Diagram for Continuous Monitoring and Management for Data Quality

11
Data Quality Framework for Indian Banking Sector
Data Collection and Administration Strategies

Figure 9 presents two broad implementation strategies to manage data quality.

Data science
sits with IT

Centralized
Approach

Decentralized
Approach

Data science sits


with Business
Figure 9: Implementation Strategies
for Dealing with Data Quality

Each of the above techniques has its own advantages and disadvantages. Both these ideas are evaluated in Table 1.

Advantages Disadvantages

§ Ensures data quality proactively § Delay in start of banking services to


§ Ensures proper quality checks at the customer
data collection and data entry points § Multiple interactions with the
§ Easy to establish unique identity of customer might be necessary to
each customer make sure correct data is being
Centralized Approach captured
§ Save administrative costs in terms of
manpower and software necessary § All data entry operations moved to
to deal with incomplete and a back end office.
inaccurate data.

§ Capturing missing customer data is


§ Allows for immediate start of
difficult as customer may not be
banking services
Decentralized motivated to provide it
§ Only basic information for account
Approach § Cost of managing multiple
activation is sufficient
systems: a front-end office often
§ Takes a reactive approach to
called a staging area and a back
improve and maintain data quality.
end office for Master Data.

Table 1: Comparison between Centralized and Decentralized Banking Systems

12
Data Quality Framework for Indian Banking Sector
Data Quality Management, Data Governance and
Chapter 4
Master Data Management
Managing sustainable data quality process requires a
data governance strategy and framework.
Data governance often lands in the hands of IT by Facets of Data Quality Management
default. When banks focus on data for data’s sake,
they miss the broader picture - i.e., data is only as Vision and Business Case
valuable as the business processes, decisions, and Data governance is not just about the data. It’s about
interactions it enables and improves. The ultimate the business processes, decisions and stakeholder
objective of data quality governance is to generate interactions you want to enable. The major benefits of
the greatest possible return on data assets. If business a good data management process is:
wants to be sure to capture critical opportunities to § Productivity improvements through reduced
leverage data to support operations, strategy, and average turnaround time, viz., handling time in
customer experience, it needs to govern data assets as the call center’s inbound support line
it does with other enterprise assets such as financial § Revenue growth through increased campaign
securities, cash, and human resources (HR). response rate
Data is a key asset of the bank and like any other asset, § Lower direct marketing costs.
it has to be carefully managed, administered and Policies
guarded. But unlike other assets, data quality is both
§ Establish policies for data accountability and
an IT and business issue and is a continuous process.
o w n e r s h i p , o rg a n i z a t i o n a l ro l e s a n d
Therefore, a successful management and data quality
responsibilities, data capture & validation
administration and governance requires a
standards, information security and data privacy
combination of both tactical and strategic skills as
guidelines, data access and usage, data
highlighted in Figure 10. Figure 10 also provides a
retention, data masking and archiving policies.
hierarchical view of activities from local data quality
Organizational Alignment
management at the lowest level to IT governance
§ Establishes a hierarchical relationship between
activity of the Bank at the highest level.
different roles and teams of people
§ Provide details of responsibilities of each role
and team.
People
Having defined roles and the right people in them to
support, sponsor, steward, operationalize, and
ultimately deliver a positive return on data assets is
important in any data governance program. The
major objectives to be accomplished by the team are:
§ Support: Co-ordination of communication
between different business teams
§ Sponsor: Deal with prioritization and funding
§ Steward: Help establish relationship between
data and business objectives.

Figure 10: Relationship between Data Quality Management


The above objectives when met will ensure positive
and IT Governance. returns on the data sets.

13
Data Quality Framework for Indian Banking Sector
Vision &
Business
Case
Define
Policies
Process

Measurement
Organizational
and
Monitoring Facets of Alignment
Data Quality
Management

to determine the validity of data as it first enters


Tools &
People the system and also when it moves between
Architecture
different systems
Dependant Programme § Facilitate data visualization so that the business
Process Management managers can easily interpret the data and get
the right message
§ Trigger timely alerts should any inconsistency in
Figure 11: Facets of Data Quality Governance
data be detected across different business
Programme Management verticals.

§ Develop multi-phase, multi-year plan for data Measurement and Monitoring


quality management § Identify and highlight the qualitative level of
§ Effective programme management must ensure organizational influence and ensure the data
adoption, visibility, and momentum for future quality governance efforts deliver
improvements. § Quantitative business value measurement:
Dependent Processes revenue growth, cost savings, risk reduction,
efficiency improvements, customer satisfaction.
§ Identify the upstream and downstream
processes impacted by data quality. Done by Defined Processes
understanding the lifecycle of data Processes that cleanse, repair, mask, secure, reconcile,
§ The upstream business processes create, update, escalate, and approve data discrepancies, policies and
transform, enrich, purchase or import data standards.
§ Downstream processes are operational and
Data Quality Governance Management Framework
analytical processes consume and derive
for Responsibility and Accountability
insights and value from data.
Tools and Architecture One of the biggest historical problems with data
governance is the absence of follow-through; while
Specific enabling software capabilities that should be
some organizations may have well-defined
considered to help launch a data governance effort
governance policies, they may not have established
includes:
the underlying organizational structure to make it
§ Aid in establishing the relative importance of useful. This requires two things: the definition of the
data fields and also help understand the mutual management structure to oversee the execution of
relationship between data in different systems the governance framework and a compensation
§ Should facilitate building and configuring rules model that rewards that execution.

14
Data Quality Framework for Indian Banking Sector
Design of Operational/Organization Model

Sponsorship
Deputy General
§ Warrants the enterprise adoption of measurably high-quality data
Manager
§ Negotiates quality SLAs with external data suppliers
Chief Data Officer (CDO)
§ Reports to GM(IT) and GM(Marketing).

Oversight § Strategic committee composed of business clients to oversee the quality program
Data Quality § Ensures data quality priorities are set and aided by business goals
Management § Delineates data accountability.
Council

§ Tactical team headed by Chief Manager (DQ)


Coordination
§ Ensures data activities have defined metrics and acceptance thresholds
Data Quality § Ensures data quality meets business client expectations
Coordination § Manages governance across lines of business
Council § Sets priorities for LOBs
§ Communicates opportunities to the oversight committee.

Stewardship
§ Data quality governance structure at the business level
§ Defines data quality criteria for LOB expectations
§ Delineates stewardship roles
LOB Data Stewards
§ Reports activities and issues to Data Quality Coordination Council.

Figure 12: A Framework for Data Quality Management.

Master Data Management

The adoption of Master Data Management (MDM) promises many benefits ranging from business agility and
improved business performance to increased revenue and lower IT and business costs. However, according to
Gartner Inc., achieving these benefits often entails overcoming formidable technical, organizational and
political hurdles.
Gartner defines MDM as a technology-enabled discipline that ensures the uniformity, accuracy, stewardship and
semantic consistency of an enterprise's official, shared master data assets. Organizations use master data for
consistency, simplification, uniformity of process, analysis and communication across the business.
What MDM is not What MDM is
About implementing technology About understanding how business processes are supposed to work.
Just a project MDM is implemented as a program that forever changes the way the
business creates and manages its master data.
Same as Enterprise Data MDM should/will span the organization across all business units and
Warehouse (EDW) processes (including data stores, operational and analytical).
A substitute for Enterprise Enterprise Resource Planning (ERP) generally means a packaged
Resource Planning (ERP) business application strategy, most often centered on a single, large
vendor. ERP implied, but rarely realized for the user organization, a
single process and data model across the organization.
Just for Large, Complex The principle of MDM is applied whenever two or more business
Enterprises only processes must view or share (master) data. Size of organization
does not matter.

15
Data Quality Framework for Indian Banking Sector
What MDM is not What MDM is
An IT effort MDM must be driven by the business, a business case, and
supported/enabled by IT.
Small. It is too big an MDM can be and is most presently being adopted in one domain
effort to handle province at a time and one use case at a time.
Not separate from data MDM includes governance (of master data) and data quality (of
governance and data quality master data) — MDM cannot be established without them.
Dependant on vendor. Vendor MDM capability has also focused on specialization across
As every vendor’s MDM data domain, industry, use case, organization and implementation
has the same features style. Consequently, vendor selection is critical if organizations are
to find the right partner.

Way Forward for the Bank in Data Quality Management Journey

Finally, before a bank can move forward in this journey, it becomes imperative for the bank to first understand
where it stands today and accordingly plan next steps for achieving a higher level of data quality. Based on the
above discussion, we now recommend the following maturity model that qualitatively identifies the current
maturity level of the data quality process and accordingly plan their way ahead in this journey. Figure 13 gives
the data quality maturity model and business capabilities that a bank can accomplish at each of these different
stages.

Data Quality Management


High Low
Is your Organization ready?

Technology Adoption

Customer MDM
Data Warehouse
Sales Force Automation Product MDM Business Process Automation
ERP
Database Marketing Employee MDM MDM
CRM
Location MDM
REWARD

RISK
UNDISCIPLINED REACTIVE PROACTIVE GOVERNED

Business requirements
Line of business IT and business groups drive IT Projects
IT-Driven Projects
influences IT Projects collaborate
Duplicate, Inconsistent Repeatable, automated
Little cross-functional Enterprise view of business processes
Data
collaboration certain domains
Inability to adapt to Personalized customer
High cost to maintain Data is a corporate relationships and
business changes
multiple applications asset optimized operations

Low Business Capabilities High

People, Process, Technology Adoption


Figure 13: Maturity Model for Data Quality Management

16
Data Quality Framework for Indian Banking Sector
Chapter 5 HR Matters
We suggest the following specialist positions to be created within a bank in order to get good quality data on a
continual basis:
Job Title Responsibilities Qualification Experience
Deputy General Dual reporting to IT and Marketing Heads. MCA with 15 years of work Sound knowledge of Business
Manager Participate in the Steering Council experience in Banking and IT operations
Chief Data Officer meetings with Board and understand domain Knowledge of Statistics.
(CDO) business expectations from data.
Report ongoing data quality
activities/initiatives to the Board.
Chief Manager Organize data quality co-ordination B.Tech / M.Tech (CSE/IT) Practitioner of Statistics and
(Data Quality) council meetings. 5 years of work experience IT tools for Statistics.
Work closely with data quality technical
staff, Applications Developers and the
data owners/ subject matter experts from
the business.
Establish a Data Quality Methodology.
Collaborate directly with the business data
owners to establish the data quality
business rules for dealing data quality
issues.

Data Architects Establish measures to chart progress B.Tech / M.Tech (CSE/IT) Database Application
related to completeness and quality of 3 years of work experience Development, Process Design,
metadata for enterprise information, to Data Quality, ETL
support reduction of data redundancy Development, Data
and fragmentation, elimination of Migration, of which three
unnecessary movement of data, and years has been in an analytical
improvement of data quality. or management role.
Ensure the accuracy and accessibility of all
important data.
Put in place governance processes around
metadata to ensure an integrated
definition of data for enterprise
information, and to ensure the accuracy,
validity, and reusability of metadata.

Data Stewards Manage Data Assets in order to improve M.Sc. (Statistics/Operations 2 years of experience in Data
their Reusability, Accessibility, Integrity Research) Processing/Management
and Consistency and Structure. 3 years of work experience. and Reporting.
Develop measures of customer data
quality, maintain metrics and publish to
stakeholders.
Create Dashboards and report on Data
Quality Metrics.
Resolve Data Integration Issues.
Work with IT to administer Business
Metadata within systems and tools.

17
Data Quality Framework for Indian Banking Sector
Chapter 6 Case Studies
§ Special day wishes mail cannot be sent viz.,
In this chapter, we share approaches adopted by Birthday, Anniversary, Doctor’s day, Women’s Day
Indian banks to improve data quality in their etc.,
operations. The root cause of the data quality To start with, we have formed a team to identify
problem/issue in Indian banks is two-fold. The first is incomplete, incorrect, inaccurate, irrelevant parts of the
missing data or wrong data. The second is data errors data and arrived at an approach to be followed for
which are largely attributable to data entry errors. replacing, modifying, or deleting the irrelevant data.
Karur Vysya Bank We faced challenges in data cleaning of migrated data
like either data is not available or data available is
Mailers to Customers incorrect. During the migration, wherever DOB is not
We have sent statement of accounts to current account available, we have put a standard date 01-Jan-1800. If
& CC account holders and the reject percentage is less that customer comes for opening new account or for
on a test basis and also to SB Account holders wherein renewal of deposit/facility, we have put a system in
the reject percentage is high. check that does not proceed until correct date of birth is
Hence, we have sent the KYC – Self Declaration given. Apart from the above, after migration, we have
document to the customers who are all with our bank started collecting income, education, dependency
for more than five years with self-addressed envelope details, profession, etc.
and requested the customers to either submit the form Field: Country
in the nearest branch or drop the self-addressed postal Problem: Free text, containing special characters, No
envelope at the nearest post box with the latest uniformity in names e.g.) India, INDIA, Ind, etc.
photograph, proof of identify and proof of address.
Data Cleaning Algorithm:
Such collected details are updated in a centralized
processing zone. On an enquiry with the centralized § All the irrelevant data in the country field i.e., not
processing zone, we have received the latest KYC available in the country master have been
documents for 20% of the customers to the total letters removed
we have sent. § If name of the Indian cities/towns or Indian state
is available in the address field, country code has
Data Cleaning Algorithms
been changed as India
Initially, when we decided to clean the customer data,
§ Normalized the field by converting the field
we have chosen to clean the following fields:
values into upper case as in country master.
Country, State, City, Pin code, Gender, Mobile Number,
Data Validation Algorithm:
Email ID, PAN number are the key fields required for any
customer communication in addition to the address § Free text removed
fields. § Pick list provided at the data entry stage to bring
§ If bank sends a physical letter to the wrong it to standard format
address, it will incur the cost of printing, postage § Cross validation built across the fields.
and also operational cost Field: State
§ Intimation of new facilities/product features Problem: Free text, containing special characters,
cannot be communicated to the customer No uniformity in names. The State name Tamil
§ Sending multiple mails to the same person is Nadu has been represented in more than 1 lac
incredibly unprofessional patterns like Tamil Nadu, T.N., T. Nadu,TamilNaadu,
Tamilnadu , Tamil Nad, etc.

18
Data Quality Framework for Indian Banking Sector
Data Cleaning Algorithm codes have been fetched and updated in the
§ Got a CD from the Postal Department. Created a city name
separate master table for State, City & Pin code § Contacted the customer home branch, if any
for data cleansing doubt, in arriving the city names
§ All the irrelevant data in the state field i.e., not § Geographical spread of our branches
available in the state master have been removed predominantly in Tamil Nadu, Andhra Pradesh,
§ If name of the Indian cities/towns or Indian state Karnataka, Kerala and major cities in Northern
is available in the address field, relative state states when we started data cleaning. Our data
codes have been corrected cleaning team had people belonging to one of
§ Checked in internet, if any doubt, in arriving the those states or had work experience in that
state name geographical area. With their help, data
§ Contacted the customer home branch, if any cleaning was done
doubt, in arriving the state name § Normalized the field by converting the field
§ Geographical spread of our branches values into upper case.
predominantly in Tamil Nadu, Andhra Pradesh, Data Validation Algorithm
Karnataka, Kerala and major cities in Northern § Free text removed
states when we started data cleaning. Our data § Pick list provided at the data entry stage to bring
cleaning team had one person belong to that it to standard format. If the state chosen is Tamil
states or having work experience in that Nadu, only the town names available in the state
geographical area. With the help of them data will be displayed. Branch has to choose the
cleaning was done appropriate city name from the pick list
§ Normalized the field by converting the field § Cross validation built across the fields.
values into upper case as in state master.
Field: Pin codes
Data Validation Algorithm
Problem: Free text, containing special characters,
§ Free text removed non-numeric characters found.
§ Pick list provided at the data entry stage to bring
Data Cleaning Algorithm
it to standard format. First Branch has to choose
the country code in the pick list. If the country § Got a CD from the Postal Department. Created a
chosen is India, list of states will be displayed in separate master table for State, City & Pin code
the pick list. Branch has to choose the state code for data cleansing
from the pick list § After doing the city correction, by referring the
§ Cross validation built across the fields. city name, pin code details are corrected
§ Contacted the customer home branch, if any
Field: City
doubt, in arriving the pin codes
Problem: Free text, containing special characters, No
§ Geographical spread of our branches
uniformity in names.
predominantly in Tamil Nadu, Andhra Pradesh,
Data Cleaning Algorithm Karnataka, Kerala and major cities in Northern
§ Got a CD from the Postal Department. Created a states when we started data cleaning. Our data
separate master table for State, City & Pin code cleaning team had one person belonging to one
for data cleansing of those states or had work experience in that
§ If name of the Indian cities/towns or Indian state geographical area. With their help data cleaning
is available in the address field, relative city was done

19
Data Quality Framework for Indian Banking Sector
§ Normalized the field by converting the field stage. Parsing in data cleansing is performed for
values into upper case. the detection of syntax errors. Data is validated
Data Validation Algorithm whether the data entered is accepted within the
allowed data specifications i.e., length of the
§ Free text removed
mobile number should be 12 digits including
§ Pick list provided at the data entry stage to bring country code and it should be in numeric
it to a standard format. If the city is chosen as
§ Repeatednumbersnotaccepted(eg.11111111111).
Chennai, first 3-digit of the pin code will get
automatically displayed; branch has to type the Field: Email ID
remaining three digits only. In that way, we have Problem: Free text containing special characters,
restricted the error level only to the last three non-numeric characters found.
digits Data Cleaning Algorithm
§ Cross validation built across the fields. § Special characters are removed other than dots
Field: Prefix & Gender and @ symbol
Problem: Free text, containing special characters § Ensured no spaces in the mail ID
Data Cleaning Algorithm § Checked only one @ symbol and one dot is
By cross verifying the gender with prefix along with the available.
customer name, data cleaning has been done Data Validation Algorithm
Ex: If gender is male, but the prefix is marked as Ms., by § We adopted parsing method at the data entry
reading the name and verifying the customer photo, stage. Parsing in data cleansing is performed for
either prefix or gender is corrected. the detection of syntax errors. Data is validated
Data Validation Algorithm whether the data entered is accepted within the
allowed data specifications i.e., one @ symbol
Cross validation built between prefix and gender at the
and dot.
data entry level itself. If the prefix is chosen as “Mr.”,
then gender cannot be chosen other than male. Field: PAN Number

Field: Mobile Number Problem: Free text, containing special characters and
spaces
Problem: Free text, containing special characters,
non-numeric characters found. Data Cleaning Algorithm

Data Cleaning Algorithm § Removed special characters and spaces


§ Checked whether first five digits are in alphabets
§ We have removed all the spaces and special
and next four digits are numeric and the last
characters and character values other than the
digit is in character
numeric characters
§ If any PAN number is typed like AAAAA1111A,
§ If the country chosen is India and the length of the
such pan numbers are removed.
mobile number field is 10 digits, such data are
prefixed with country codes Data Validation Algorithm

§ If the length of the mobile number is less than 10 § We adopted parsing method at the data entry
digits, we concluded that number is a wrong stage. Parsing in data cleansing is performed for
number and those are removed. the detection of syntax errors. Data is validated
whether the data entered is accepted within the
Data Validation Algorithm
allowed data specifications i.e., first five digits
§ We adopted parsing method at the data entry are character, next four digits are in numeric and

20
Data Quality Framework for Indian Banking Sector
the last digit is in character customer facing channels (branches, ATM, call
§ We have also provided in our in-house software centres) were provided alerts to ensure that these
to verify the PAN number given in NSDL site. details are captured when they interact with any of
these channels.
ICICI Bank
§ Contactability of delinquent customers: Specific
The primary reason for the bank to go for data cleaning to delinquent customers, there is also a need to ensure
activities was for maintaining better communication that the bank looks at all available data sources to
with the customer. The need for better communication contact these customers. For this, de-duplication
falls into the following four areas: based logic is run with the help of the credit bureaus
§ Product and Service Communication: To ensure and any additional contact details thus identified are
that customers understand the product features and used for collections activity.
use them. This also includes transaction updates, service Role of Logic Algorithms in Identifying Unclean
updates etc. Data
§ Promotional Communication: To make customers It is possible to ascertain incorrect data through basic
aware of the offers and services that the customer logics implemented through data warehouse. Examples
qualifies for as a result of holding the product. of these rules are: Mobile numbers less than 10 digits,
§ Corporate Communication: To communicate with PAN number not following the specified format,
customers on wider issues, not directly relating to the Address and PIN code mismatch etc. These logics help
products held, such as updates on the bank, fraud trigger the customers where the contactability needs
education, credit bureau related education, etc. to be improved through approaches mentioned
§ Marketing communication: To promote products above.
of the bank that might be of interest to the customer for Initiatives Rolled Out
his/her financial needs.
While contactability is a continuous process, the
Since ability to communicate with the bank is so following initiatives have been taken at the bank in the
important to use financial products, the Bank focuses last 2 years that have further sharpened the process:
on various approaches to improve the contactability of
§ The synchronized multichannel contactability
the customers. The contactability problem is handled
capability to ensure that right channel and right
through the following 3 pronged approach:
message is used to get customer contactability
§ Contactability of new customers: The initial information
customer contact is extremely important to get quality
§ The availability of credit bureau data in the last
information on customer contact details etc. Data
few years has led to significant improvements in
quality checks at this stage ensure that quality
achieving further improvements in contactability
customer information is captured.
for the target customer base.
§ Contactability of existing customers: As
customers might move addresses, change phone HDFC Bank
numbers, etc., on-going maintenance of customer
It is possible to ascertain incorrect data through basic
contactibility information is important even when the
rules implemented through data warehouse.
initial data capture is of good quality. We have
Examples of these rules are: Mobile numbers less than
adopted a multi-channel approach to customer
10 digits, PAN number not following the specified
contactability enhancement. If there is a concern with
format, Address and PIN code mismatch, etc. These
the contactability of a customer (emails bouncing,
rules help trigger the customers where the
incorrect phone numbers, incorrect address), various
contactability needs to be improved.

21
Data Quality Framework for Indian Banking Sector
Methodologies Adopted System Level People Level
Data Level
Software tools are used to deal with data de- § Master Data § Understand § C o n t a c t
duplication and errors at the time of data migration. management existing CBS customer to fill
Missing or partial data information is corrected either § Correctness of implementation gaps in legal fields.
existing data § Dealing with § Training banking
by contacting the customer or by comparison with
§ Availability. multiple systems/ personnel
internal database. sources of data. § Monitoring &
Over and above the controls built into the source control of data
quality.
system to ensure correct capture of data,
standardization of data values is undertaken at Figure 14: Data Quality Improvement
regular intervals as per business needs. In this case, a
data profiling exercise is undertaken wherein the
values of the particular field are profiled. Based on the
Data Quality Improvement in SBI
outcome, decisions are taken on standardization to be In case of the bank, the different branches migrated to
done. Then using programming logic or ETL tools CBS at different times. This led to a host of Data Quality
transformation is done. In recent past, this has been (DQ) issues as the validation rules on many important
done primarily for mobile number s and fields or the data elements were not there in the legacy
city/state/country fields. systems. This gave birth to a host of DQ issues and
incomplete data, for reporting and analytics. Therefore,
Allahabad Bank
reporting was done with lot of manual interventions
Allahabad Bank followed an outsourced model for dealing leading to errors, delays and costs. Compounded with
with data quality issues. this, was the sheer number of customers and accounts
RollingOutDataQualityImprovementInitiative which needed to be worked upon. Therefore, the bank
The challenges were divided into three buckets – data level, took up a major initiative to improve the data quality
system level and people level as given in figure 14. with technology support.

DataLevel: Evaluatethecurrentstateofdatainterms of: Bank has been using tools to profile data and generate
data quality related reports. The reports are shared with
§ Completeness: Identifymissingvalues
the respective business units. There are provisions for
§ Correctness and Consistency: Identify fields that were
continuous data quality improvement, right from
filled with default values
cleansing, enrichment, de-duplication to migration.
§ Structureofdatastoredinthemasterdata ‘Project Ganga’ initiative has been taken up by the bank
System Level: Understand the organizational structure of to address and improve the data quality. Initially, the
data: key data fields which impact regulatory and statutory
§ Understand technology aspects of the CBS reporting and customer contactibility was taken up. The
§ How data is spread across multiple systems branches worked on the DQ Reports and updated the
§ All data corrections were made in a data repository fields in the Core Banking System ensuring that the
extracted from the CBS (a Data Mart) – not in quality of data both in the source and the reporting &
the live CBS. analytics systems is improved and is in sync. Project
has been implemented in a phased manner and has
PeopleLevel: Dealwithaspectslike:
produced successful results in various areas. There has
§ Training people to identify the shortcomings in the
been improvement in the quality of data pertaining to
existing data
credit risk and now customer demographic data is
§ Contactingcustomertocollectmissingvalues being worked upon.
§ EstablishaMonitoringsystemtomonitordataquality. Continued in page no. 23...

22
Data Quality Framework for Indian Banking Sector
Implementation
Figure 15 shows solution diagram for Data Quality Improvement activity undertaken:

Tool to
Certify/Missing
CBS Data Population
Extract Data
Updating Data
Quality Audit/ Data Cleansing
from De-duplication
Rules definition
Tables

Figure 15: Solution Approach Taken for Data Quality Improvement Branches

Activities at Other Banks


The scope of the standardization activities undertaken by w Batch validation mode will only check elements
otherIndianBankscanbesummarizedas: forcorrectnessandchangethemifpossible
§ Standardizationofaddresses w Interactive suggestion mode will check
§ Namenormalizer(splittingoftwonames) elements for correctness and improve them if
§ Enrichment of Phone Number with STD codes possibleandprovidespicklistsofalternativesfor
dependingonthecitynames ambiguousinputdatarecords.

§ PANstandardization
§ StateNameenrichmentdependingoncitynames
Data Quality Improvement in SBI
§ Emailaddressstandardization ...Continued from page no. 22
§ EnrichmentofSalutationifitispartofthename We also work closely with the Credit Information
§ Removenoise(spaces,commas,dash,etc.) Companies (CICs) to constantly improve the data
§ StandardizationofBranchcode,customernumber,etc quality and improve the acceptance rate of the loan
§ RecordswhicharenotstandardizedisgiventoBusiness related data by CICs. In our case, due to the above-
Userconsole mentioned initiatives our acceptance rate is now
amongst the highest for PSU Banks and this quantum
§ AddressParsingandStandardization
jump has happened since last one and a half years .
Ÿ Identifiesinternationaladdresselementsinpartially
fielded addresses and assigns them to the proper We have also taken up the work of customer de-
fields duplication and address standardization using tools to
further improve data quality. This improvement in data
Ÿ Performs formatting and standardization of
quality, standardization and completeness of data has
elementstoensureconsistentrepresentation
enabled the bank to fully leverage its investments in
§ GlobalAddressValidation
Data Warehouse and Business Intelligence & Statistical
Ÿ Performs matching of address to reference Modeling tools, by completing large number of
database, with a unique deliverability assessment business intelligence and analytic projects in the areas
feature that classifies addresses according to their of CRM , Risk, Pricing & Profitability, etc. We believe that
probabledeliverability Data Quality Improvement is not a one-time project but
w Validatesindividualaddresselements a continuous process.

23
Data Quality Framework for Indian Banking Sector
Action Plan for DQ Management
Based on the study conducted by the working group on w Tool should be back-ward compatible by
data quality issues and the current activities undertaken providing wrappers to extract and cleanse
by various banks in India, we recommend the data from currently existing data sources
following business action points to monitor and with the banks.
maintain data quality of the existing customer records n Establish process and tools for Metadata
with the Banks. Management
n Banks may form a top level DQ Management Ÿ Tools for building a repository for Business
CouncilheadedbyDGM(Data Quality) Vocabularyshouldsupport:
Ÿ Identify the broad business goals for which data w DataModeling&DataIntegritytool
qualityhastomeet (refer pages 6 & 7) w MetadataDiscoverytool&repository
Ÿ Consult Data Steward to identify the data fields w DQProfiling&Monitoring Tool
tobecollectedand maintained.
w DataCleansing&MatchingTools
n Establish and build a DQ Coordination Council
w DataIntegrationTool.
responsible for continuous monitoring and
n Establishtrainingprocessforstaff:
improvement of customer records
Ÿ Banks need to place qualified personnel for DQ
Ÿ ChiefManager(DataQuality)responsibletopro-
jobs. Train/recruit as possible, and help develop
actively monitor customer information at
expertise
regular intervals
Ÿ Regular user and operator awareness need to be
Ÿ Conduct Audit runs at regular intervals to
a culture to maintain the mindset to avoid poor
measurethecontactability of customers
data.
Ÿ Identifygapsincustomerinformation
n Generate quarterly audit reports on data quality and
Ÿ Takeremedialstepstofillthegaps
thecostsincurred on maintaining the data quality
Ÿ Adoptdataenrichmentpoliciestodealwithgaps
Ÿ DataStewardstogeneratedataqualityreportsat
in legal fields in customer data records and
variouslevels of the organization like zone wise
non-legal fields
and branch wise and submit reports to Chief
Ÿ Improvedatacollectionpoliciesthroughtraining Manager (Data Quality)
of operations and IT personnel to improve
Ÿ Chief Manager (Data Quality) to identify the best
effectives of data quality activities.
branch and zone in terms of data quality and
n Identifyandadoptemergingtechnologiestoreduce provide incentives for the same with
data errors at the time of data entry and data approval from GM(DQ)
migration of customer records.
Ÿ Identify zones and branches that lag in data
Ÿ Reduce human intervention at the time qualityandconducta root cause analysis
customerrecordcreationand data migration
Ÿ Provide necessary resources and training
n Appoint Data Stewards to manage and resolve Data programmes to improve data quality in the
Qualityissues zones and branches that lag in data quality.
n Establish process for Data Quality Maintenance n Identify and deploy metrics to measure data quality
(DQM) contiguously
Ÿ ProcessforpurchaseofITsolutionsforDQM n Incentivise and celebrate good quality data
w Tools should have features for Data Profiling, practices
Data Cleansing, Data Enrichment and Data n Define accountability for data quality between
Integration front office, mid office and back office.

24
Data Quality Framework for Indian Banking Sector
Mentor
Shri B. Sambamurthy, Director, IDRBT

Contributors
¬ Dr. N. Raghu Kisore, Assistant Professor, IDRBT

¬ Dr. V. Ravi, Associate Professor, IDRBT

¬ Mr. S. Mukhopadhyay, Senior Domain Expert,


IDRBT

¬ Mr. K. Karthikeyan, Karur Vysya Bank

¬ Mr. Manish Desai, SAS

¬ Mr. M. Srinivas Kiran, IBM

¬ Mr. Vyom Upadhyay, ICICI Bank

¬ Ms. Aparna Kumar, HDFC Bank

¬ Mr. Pradeep Sheokand, Informatica

Acknowledgements
¬ Mr. Ajay Kapoor, HDFC Bank

¬ Mr. Pushan Mahapatra, State Bank of India References


¬ Mr. K. K. Seth, Central Bank of India 1 “Principles for Effective Risk Data Aggregation
¬ Mr. Sheshadri Achari, Bank of India and Risk Reporting”, Basel Committee and
Banking Supervision, Jan 2013
¬ Mr. J. B. Thomas, Dena Bank
2 Circulars Issued by Reserve Bank of India from
¬ Mr. Navin Bhattacharya, CIBIL Time to Time
¬ Mr. Sunil Modak, Ixsight Technologies Pvt. Ltd 3 Euro Stat and European Financial Stability Board
4 “DQ Issues in Indian Banking”, SAS
Research Support 5 “Data Quality Overview”, Informatica
6 “Data Quality Framework and Governance”, IBM
¬ Mr. V. Siddeshwar, Research Associate
7 “Holistic CRM and Analytics for Indian Banking
¬ Mr. A. Aditya, Research Associate Industry”, 2011, IDRBT Publication
¬ Mr. K. Ramanuja Rao, Research Associate 8 http://www.betterregulation.com/external/Bas
el%20III%20and%20beyond%20Dont%20mak
¬ Mr. D. Datta Sai Babu, Research Associate
e%20data%20quality%20the%20elephant%20i
n%20the%20room.pdf
9 ”A Simple Introduction to Data Science” by Lars
Nielsen & Noreen Burlingame.
Institute for Development and Research in Banking Technology
Castle Hills, Road No. 1, Masab Tank, Hyderabad - 57, A.P, INDIA
Ph : +91-040-23534981, Fax : +91-040-23535157
E-mail : publisher@idrbt.ac.in; Website : www.idrbt.ac.in

You might also like