Professional Documents
Culture Documents
in Banking Technology
(Established by Reserve Bank of India)
Quality Data
Belahalli Nagar
City: Bengaluru
State: Karnataka
Country: India
Occupation: Doctor
Marital Status: Married
Qualification: MBBS
Annual Income: INR 7,50,000/-
PAN No.: BACGB0022C
UIDAI: 9876 5432 1234
Mobile: 90000 00009
Email: xyz@email.com
Credit Card Details: 9845 8855 0000 1234
Debit Card Details: 1234 5678 9101 1123
Family Details
Income/Wealth Profile
© An IDRBT Publication, 2014. All Rights Reserved. For restricted circulation in the Indian Banking Sector.
Foreword
Accurate data is a sine qua non for improving the quality of MIS in any organization and thus
forms the backbone of an effective Decision Support System (DSS). For most organizations
including banks, operational challenges have become a part of their day-to-day activities,
making them devise both simple and complex workarounds to compensate for inadequate
data quality. Over the years, the policy and decision-making processes have become more
information-intensive. Therefore, it is imperative to ensure quality of data and its timely
submission by banks not only to the regulator but to the banks' managements as well. This
area requires more attention, given that data quality may have an impact on the reputation
of banks besides posing other risks. Improving the data quality would also assist the banks in
terms of improved timelines, enhanced data quality, and improved efficiency of processes.
Towards this, the Automated Data Flow Project was initiated by the Reserve Bank to
automate flow of data from banks' internal systems to a centralised environment and then to
the regulator. By adopting the automated process for submission of returns, the banks are
now able to submit accurate and timely data to a centralised environment at their end and
then to the regulator without any manual intervention.
The Framework on Data Quality for Indian Banking Industry compiled by IDRBT is being
brought out at the right time when Indian Banks are going through a phase of
transformation. It has aptly dealt with issues relating to data quality and has suggested a new
framework towards achieving this. While doing so, it has also touched upon the relevant HR
issues which are ever so important while dealing with data and its quality. Here is an
opportunity for the banking community to make good use of the document. I appreciate the
good work done by IDRBT in this regard.
Anand Sinha
Deputy Governor, Reserve Bank of India,
Chairman, IDRBT
01
Data Quality Framework for Indian Banking Sector
Message from IBA
02
Data Quality Framework for Indian Banking Sector
Preface
“Richness of Data leads to riches to stakeholders.”
Data is a key corporate asset for any organization and
more so for financial institutions. As such quality of
data has significant implications on corporate
governance. Data-driven decision making can be a
competitive differentiator. Given the volume, velocity
and variety of data (Big Data), data governance is an
important component of corporate governance
especially for financial institutions. Quality of data
data quality needs to be given due recognition as is
directly impacts performance, compliance and Profit
done for achieving business targets, as data quality
& Loss.
improvement is also a business goal.
Analytics has emerged as an important field of
Besides new business processes and work flows, banks
business discipline. Industries like health care and
need to acquire and nurture new skill sets/roles like
retail have greatly benefited by the deployment of
data architect, data steward, information architect,
analytics. Banking industry in India as well is seriously
data scientist, etc., to execute data quality programme
exploring opportunities for deployment of analytics.
successfully on an ongoing basis. The HR departments
Cross sell, up sell, lifetime value, reduction in churn,
play an important role in building these new skill sets.
fraud detection/mitigation are some of the important
The CMD and ED need to champion the programs.
business goals sought to be achieved through
Culture of data quality needs to spread across the
deployment of analytics. To achieve these important
entire organization. Business benefits of good quality
business objectives, banks are acquiring technology
data are immense and right blend of people, process
solutions like data warehouse, BI, BA, etc. But
and technology need to be deployed. Given the
investment in these technologies would yield healthy
multi-disciplinary nature of the work, direct
ROI only if the data quality is good enough for
involvement of CMD and ED is essential and the
analytics to provide actionable insights. Mere
progress needs to be reviewed quarterly.
investment in technology without commensurate
investments in data quality would lead to suboptimal In order to provide a framework for banks to improve
results. Secondly, in terms of sequence, investment in data quality, IDRBT has now come out with a
data quality shall precede investment in technology publication which is highly practitioner-oriented.
solutions. This is also essential to mitigate technology Besides IDRBT faculty, bankers, subject experts,
obsolescence. Some of the banks have rushed in technical experts and academicians are involved in
investment in data warehouse/BI/BA tools without bringing out this publication.
addressing data quality issues and consequently ROI I would like to acknowledge the contribution made by
is poor. the working group members and other bankers.
Given the fact that data quality is a critical success It is interesting to have state-of-the-art technology.
factor for quality business growth as well as But, it is important to have good quality data.
compliance, there needs to be a top-down approach Enlightened leadership distinguishes between
to address this issue. It shall be the responsibility of the “Interesting” and “Important”. It is not about tools and
board and top management to put in place not technology, but about creating the right data culture.
merely policies but well-defined processes and
organizational structures to improve the data quality B. Sambamurthy
on top priority. Efforts and achievements in improving Director, IDRBT
03
Data Quality Framework for Indian Banking Sector
Executive Summary
Customer data is paramount to the success of any
business – especially in service-oriented business like
retail banking. To convert data into information and
knowledge, banks in India spend significant amount of
money to deploy Enterprise Data Warehouse (EDW)
and Customer Relationship Management (CRM)
solutions. Both EDW and CRM are the business
initiatives for furthering business growth. It is important
to observe that information is not knowledge.
Online Analytical Processing (OLAP) or Business
Intelligence (BI) tools help in converting data into
information. Then to convert information into Chapter 1 highlights data quality issues and why it is
knowledge deep buried underneath the data, we need necessary and important to solve these issues before
data mining (also known as predictive analytics). any valuable knowledge can be extracted from
Consequently, the knowledge extracted from data customer data using sophisticated software and
using sophisticated tools is only as good as the quality algorithms.
of the data that goes into them. Apart from business Chapter 2 briefs about the significance and impact of
needs, maintaining good and accurate data is also a Big Data on banking. It identifies certain key
regulatory compliance issue. Recent regulatory characteristics that differentiates good data from bad
directives from Basel/RBI, etc., and business efficiency data, thereby addressing data quality issues that banks
imperatives demand diligence on granular data. face while deploying analytics.
Hence, data quality becomes more important now, Chapter 3 deals with approaches to address data quality
and, we attempt to put together in this booklet a issues. It provides a framework detailing various stages
guide on the data quality domain for the banks. of data quality improvement in banks, as well as, a
The issue of data quality with Indian banks is manifold. workflow to achieve and maintain good data quality.
These issues range from data collection, data entry and Chapter 4 explains the factors and ways for evolution in
timely updation of changes to customer data. Data Quality, and consequently enables better Data
Governance.
Chapter 5 deals with the HR perspective and provides
an organizational structure to monitor, improve and
manage data quality on a daily basis.
Chapter 6 provides detailed case studies of efforts taken
by some of the Indian banks to improve their data
quality. It provides an overview of the techniques
adopted by these banks to improve and manage data
quality.
Innovative incentive programmes for both the
customer and staff may be useful for improving and
retaining data quality. All delivery channels may be
leveraged for data quality improvement programmes.
04
Data Quality Framework for Indian Banking Sector
Introduction
05
Data Quality Framework for Indian Banking Sector
Chapter 1 Importance of Data Quality
Need for Data Quality customers, help a bank in this regard. Further, Data
The importance of good quality data for banks is Quality is important for the successful
manifested in five perspectives viz., (I) Regulatory (II) implementation of Automated Data Flow (ADF). With
Product pricing (III) Treasury Management (IV) Risk the advent of global banking and globalization, it is
Management and (V) Business/CRM/Analytics (see now necessary to understand the perspectives of
Figure 1). other banking regulators and agencies across the
globe. Various governing bodies have set guiding
principles and metrics to benchmark the quality of
Regulatory data in a bank. Figure 2 provides a summary of data
quality attributes defined by some of these governing
bodies.
06
Data Quality Framework for Indian Banking Sector
Treasury Management Perspective
Risk Management Perspective
Correct and Real-Time data of the following areas are
vital for efficient operations: Without good quality data, banks struggle to sustain
§ Asset-Liability Mismatch Data Over Time Ranges - sufficient liquidity, and deal with tighter regulatory
Required for deciding funds need levels and scrutiny. Yet, most banks are still relying on reactive
market borrowings/investments. Over/Under approaches while responding to the escalating data
borrowing/investment may mean reasonable demands. With Basel III, there will be a lot of emphasis
undesired losses on risk, capital and liquidity management. Therefore,
§ Granular Asset/Liability maturity dates, interest smart banks recognize the urgent need for a full
rates/accrual figures/re-pricing times and review and possible overhaul of their data quality,
expected rates ahead, and, renewal probability in integrity, underlying architecture and governance.
the portfolio. Errors in micro-levels of data may The solid business case for early and strategic
prove costly approach to the development of Basel III data
§ Correct Loan Data in Detail – Required for capabilities includes the ability to manage market
correctly classifying standard/doubtful/bad stresses, funding and liquidity constraints more
assets, primary/collateral security coverage, and confidently and effectively. Without this strategic
classification of loan with some parameters of it. approach to data, many of the problems exposed by
This is required for provision as well as capital the financial crisis, including the lack of consistency
requirements that significantly affect the and alignment in the information coming from risk,
business result and abilities finance and the front office, will be perpetuated.
§ Physical assets and valuations, employee costs Business Development/CRM/
and classifications, in fact, all items used to do Analytics Perspective
business with – affects cost recognitions,
Data Quality is necessary for obtaining the best RoI
insurance outlays, cost controls, provisions, and
from EDW/CRM implementation:
profitability
§ Aggregated Data of Deposits, Investments, etc. – § Customer segmentation for acquiring
To comply with RBI directives on SLR/CRR, etc. customers through target marketing
Lack of completeness/recency may mean over- § Cross-Sell/Up-Sell
borrowing/underinvestment causing financial § Mitigate various frauds on real time basis
loss § Predict and reduce customer churn
§ Correct past data of credits with delinquencies, § Increase customer loyalty and maximize
write-offs, correct investment history data and Customer Lifetime Value for the enterprise (CLV)
market data with trends, etc. - required for correct § Developing better credit scoring models and
Risk Capital computations. In the absence of good improving asset quality
data, the provisions demand higher capital and
§ Achieve Single Version of Truth
costs affecting business result of the bank, as also
§ Achieve 360 view of the customer.
0
07
Data Quality Framework for Indian Banking Sector
enrichment and data monitoring - are best addressed Business Process
through a single platform, providing a unified view of
any type of data, including customer, and product § Establishing DQ standards and customer on-
information. The technology companies who figure in boarding process for various business verticals
the Leaders quadrant of the Gartner's Magic § Establishing acceptance criteria for customer
Quadrant as of October 2013 are IBM, Informatica, information pertaining to name, address,
SAP, SAS and Trillium Software in alphabetical order. mobile number and email address
§ Training staff responsible for capturing
AN
AL customer information.
YZ
Technology
E
1
§ Taking care of DQ issues during system
migration
5 0
0
36 § DQ issues during integration of various
software systems, especially during business
ENTERPRISE
2
mergers
DATA
+++ § Reducing Data Entry Errors.
Figure 5: Delineation of Issues between Business
4
Process and Technology
Reasons
Master Data
for Poor Management
Data
Data Data Data
Cleansing
Migration Collection
Figure 4:
Issues Issues
Primary Causes
Affecting Data Quality Figure 6: Taxonomy of Data Quality Issues
08
Data Quality Framework for Indian Banking Sector
Advent of Big Data and its Implications on
Chapter 2
Achieving Good Data Quality in Banks
Of late, there is a new paradigmatic shift that took During the last few years, Indian banks have
place in the form of Big Data, which is set to have a successfully deployed powerful transaction
profound influence on the way banking will be done processing engines by way of CBS. While banks
in future. It is a catch-all word that connotes new achieved significant productivity gains, data quality
dimensions of data including volume, velocity and has not received the desired effort in the migration
variety. Volume refers to the sheer size of the data in process. Banks need to evolve to the next stage of
terms of transactional data with respect to several information processing and management, leading to
products and customers. Velocity refers to the speed ultimately progressing to deployment of analytics. It
with which customer data comes on board. Customer has been a long pause in this journey. Before they rush
views aired on their banking experience in social to technology deployment, banks need to address
media is an example of this dimension. Variety refers serious issues of data quality, which is one of the main
to the different types of data viz., structured and barriers for deployment of analytics. Many have set up
unstructured. The transaction data, demographic data warehouses without satisfactorily addressing
data, geographic data forms part of the structured data quality issues.
type, while customer complaints, e-mails, voice
recordings over the call center and customer views in Identifying Good Data and Bad Data
social media data, etc., form part of the unstructured
type. With new channels making waves across Gen-X Bad data is generally characterized by:
and Gen-Y customers, the unstructured data becomes § Improbable values
gold mine of customer data in identifying cross-sell § Impossible values
opportunities, detecting customer churn and § Illogical values
detecting frauds in near-real time. The quality of
§ Mis-informative values
unstructured data is as important as that of structured
§ Missing values
one. Therefore, big data throws a new challenge to the
banks in their data quality initiatives. It requires new § Illogical values
investments in IT to deal with this new phenomenon, Based on the above characteristics of bad data, good
which would improve the bottom line for the banks. data quality can be ensured by checking for following
Data Augmentation aspects in the bank’s data:
§ Frequency of errors
This aspect is closely related to big data. Many case
studies demonstrated that augmenting structured § Unreasonable distribution
data (which is described above) with the unstructured § Detection of unexpected quality
one viz., customer complaints, e-mails, voice § Numeric variables
recordings over the call center and customer views in § Out of range data
social media data indeed improved the accuracies of § Outliers
churn models, cross sell models, fraud detection
§ Logic of data
models, etc. Further, data augmentation also
Today technological advances have been made in
connotes collecting customer data from external
terms of software tools and algorithms to identify
agencies such as credit bureau reports, BASEL
unclean data and mitigate the same.
guidelines, RBI circulars, stock market data, census
data, etc. This further enhances the level of success in Single version of truth and single view of customer
our endeavour to achieve 360o view of customers. shall be the important hallmarks of data quality.
09
Data Quality Framework for Indian Banking Sector
Chapter 3 New Framework for Good Data Quality
In this chapter, we propose and recommend a new framework for achieving data quality in the banking industry
in India. It encompasses all aspects of data quality. Figure 7 provides the set of activities that need to be executed
to improve and maintain data quality in the long run.
Data Quality
Data Domain Monitor and
System Mapping Gap Analysis Remediation Remediate
Definition Control
Strategy
Build Data
Perform Root Rationalize
Identify Source Identify Initial Quality
Build Team Cause Analysis Remediation
Systems Issues Scorecards/
& Build Reports Portfolio
Reports
Map Data
Build Develop Initiate Assess
Elements to Profile Data
Common Remediation Remediation Remediation
Source Elements
Definition Options Project Results
Systems
Develop
Build Data
Identify Data Estimates Cleanse and Provide
Quality Rules
Gaps and Business Correct Data Feedback
& Standards
Case
Augment Develop List Prioritize and Implement
Data Quality - of Data Recommend Remediation
Architecture Quality Remediation Plan and
and Analytical Issues Option Track Status
Figure 7: Activity Classification Chart for Improving Data Quality
Descriptions of these work streams are listed below: Data Quality Remediation Strategy
Data Domain Definition § Identify steps and tools necessary to research a
§ Development of common business definitions data quality issue and develop options to
for a data domain remediate it
§ Develop standards about the meaning and § Develop remediation plan to address specific
format of each critical data element. data quality issues.
§ System Mapping Remediate
§ Identifies systems that contain data for the § Prioritize, obtain funding and execute approved
domain defined in Data Domain Definition remediation plans
§ Map each critical element as defined through § Data quality issues should be resolved to the
Data Domain Definition to the actual field level required by the business.
instance in the system. Monitor and Control
Gap Analysis § Monitor Data Quality
§ Identify collection and discovery of data quality § Measure effectiveness of remediation plans
issues related to data quality
§ Identify and log issues that are in ‘Data Domain § Develop Data Quality Score Card and identify
Definition’ and ‘System Mapping’. appropriate Metrics to measure Data Quality.
10
Data Quality Framework for Indian Banking Sector
Workflow for Implementing Data Quality Programme
Figure 8 establishes a workflow diagram that explains the input, output, controls and mechanisms to implement
data quality programme in the bank.
Each activity represented as a block, gives an idea on the inputs (arrows on the left side of the block and pointed
into it) necessary to execute it and the expected output (arrows on the right directed out of the box).
Simultaneously, for each sub-activity, we also indicate the controls that determine the scope of the activity (as
indicated by arrows on the top) and mechanism through which the activity can be achieved, i.e., the person in the
Bank responsible for the activity (indicated by arrows at the bottom of each block).
To be jointly handled by both Business and IT Teams
Controlling factors
ADF
Mailers to
Create and
Define Business Identify Data Customers Identify Sources
Populate
Purpose for Data Fields Necessary of Data
Data Cleaning Test Database
A0 A1 Tools/Algorithm A2 A3
Data Architect
Data Quality
Information
Architect
Manager
Board
Mechanisms
Techniques
Data Entry
Analytics
Business
Completeness
Accuracy
Currency
Lineage
Identify Possible
Identify Fields in
Technologies to
Error and
Improve Quality
A7 Reasons
A5
No Evaluate Data Cleaning
Meets Data Quality
Data
Technology Specialist
Populate
Master
Database
A6
Data Architect
To be handled by IT team
Figure 8: Work Flow Diagram for Continuous Monitoring and Management for Data Quality
11
Data Quality Framework for Indian Banking Sector
Data Collection and Administration Strategies
Data science
sits with IT
Centralized
Approach
Decentralized
Approach
Each of the above techniques has its own advantages and disadvantages. Both these ideas are evaluated in Table 1.
Advantages Disadvantages
12
Data Quality Framework for Indian Banking Sector
Data Quality Management, Data Governance and
Chapter 4
Master Data Management
Managing sustainable data quality process requires a
data governance strategy and framework.
Data governance often lands in the hands of IT by Facets of Data Quality Management
default. When banks focus on data for data’s sake,
they miss the broader picture - i.e., data is only as Vision and Business Case
valuable as the business processes, decisions, and Data governance is not just about the data. It’s about
interactions it enables and improves. The ultimate the business processes, decisions and stakeholder
objective of data quality governance is to generate interactions you want to enable. The major benefits of
the greatest possible return on data assets. If business a good data management process is:
wants to be sure to capture critical opportunities to § Productivity improvements through reduced
leverage data to support operations, strategy, and average turnaround time, viz., handling time in
customer experience, it needs to govern data assets as the call center’s inbound support line
it does with other enterprise assets such as financial § Revenue growth through increased campaign
securities, cash, and human resources (HR). response rate
Data is a key asset of the bank and like any other asset, § Lower direct marketing costs.
it has to be carefully managed, administered and Policies
guarded. But unlike other assets, data quality is both
§ Establish policies for data accountability and
an IT and business issue and is a continuous process.
o w n e r s h i p , o rg a n i z a t i o n a l ro l e s a n d
Therefore, a successful management and data quality
responsibilities, data capture & validation
administration and governance requires a
standards, information security and data privacy
combination of both tactical and strategic skills as
guidelines, data access and usage, data
highlighted in Figure 10. Figure 10 also provides a
retention, data masking and archiving policies.
hierarchical view of activities from local data quality
Organizational Alignment
management at the lowest level to IT governance
§ Establishes a hierarchical relationship between
activity of the Bank at the highest level.
different roles and teams of people
§ Provide details of responsibilities of each role
and team.
People
Having defined roles and the right people in them to
support, sponsor, steward, operationalize, and
ultimately deliver a positive return on data assets is
important in any data governance program. The
major objectives to be accomplished by the team are:
§ Support: Co-ordination of communication
between different business teams
§ Sponsor: Deal with prioritization and funding
§ Steward: Help establish relationship between
data and business objectives.
13
Data Quality Framework for Indian Banking Sector
Vision &
Business
Case
Define
Policies
Process
Measurement
Organizational
and
Monitoring Facets of Alignment
Data Quality
Management
14
Data Quality Framework for Indian Banking Sector
Design of Operational/Organization Model
Sponsorship
Deputy General
§ Warrants the enterprise adoption of measurably high-quality data
Manager
§ Negotiates quality SLAs with external data suppliers
Chief Data Officer (CDO)
§ Reports to GM(IT) and GM(Marketing).
Oversight § Strategic committee composed of business clients to oversee the quality program
Data Quality § Ensures data quality priorities are set and aided by business goals
Management § Delineates data accountability.
Council
Stewardship
§ Data quality governance structure at the business level
§ Defines data quality criteria for LOB expectations
§ Delineates stewardship roles
LOB Data Stewards
§ Reports activities and issues to Data Quality Coordination Council.
The adoption of Master Data Management (MDM) promises many benefits ranging from business agility and
improved business performance to increased revenue and lower IT and business costs. However, according to
Gartner Inc., achieving these benefits often entails overcoming formidable technical, organizational and
political hurdles.
Gartner defines MDM as a technology-enabled discipline that ensures the uniformity, accuracy, stewardship and
semantic consistency of an enterprise's official, shared master data assets. Organizations use master data for
consistency, simplification, uniformity of process, analysis and communication across the business.
What MDM is not What MDM is
About implementing technology About understanding how business processes are supposed to work.
Just a project MDM is implemented as a program that forever changes the way the
business creates and manages its master data.
Same as Enterprise Data MDM should/will span the organization across all business units and
Warehouse (EDW) processes (including data stores, operational and analytical).
A substitute for Enterprise Enterprise Resource Planning (ERP) generally means a packaged
Resource Planning (ERP) business application strategy, most often centered on a single, large
vendor. ERP implied, but rarely realized for the user organization, a
single process and data model across the organization.
Just for Large, Complex The principle of MDM is applied whenever two or more business
Enterprises only processes must view or share (master) data. Size of organization
does not matter.
15
Data Quality Framework for Indian Banking Sector
What MDM is not What MDM is
An IT effort MDM must be driven by the business, a business case, and
supported/enabled by IT.
Small. It is too big an MDM can be and is most presently being adopted in one domain
effort to handle province at a time and one use case at a time.
Not separate from data MDM includes governance (of master data) and data quality (of
governance and data quality master data) — MDM cannot be established without them.
Dependant on vendor. Vendor MDM capability has also focused on specialization across
As every vendor’s MDM data domain, industry, use case, organization and implementation
has the same features style. Consequently, vendor selection is critical if organizations are
to find the right partner.
Finally, before a bank can move forward in this journey, it becomes imperative for the bank to first understand
where it stands today and accordingly plan next steps for achieving a higher level of data quality. Based on the
above discussion, we now recommend the following maturity model that qualitatively identifies the current
maturity level of the data quality process and accordingly plan their way ahead in this journey. Figure 13 gives
the data quality maturity model and business capabilities that a bank can accomplish at each of these different
stages.
Technology Adoption
Customer MDM
Data Warehouse
Sales Force Automation Product MDM Business Process Automation
ERP
Database Marketing Employee MDM MDM
CRM
Location MDM
REWARD
RISK
UNDISCIPLINED REACTIVE PROACTIVE GOVERNED
Business requirements
Line of business IT and business groups drive IT Projects
IT-Driven Projects
influences IT Projects collaborate
Duplicate, Inconsistent Repeatable, automated
Little cross-functional Enterprise view of business processes
Data
collaboration certain domains
Inability to adapt to Personalized customer
High cost to maintain Data is a corporate relationships and
business changes
multiple applications asset optimized operations
16
Data Quality Framework for Indian Banking Sector
Chapter 5 HR Matters
We suggest the following specialist positions to be created within a bank in order to get good quality data on a
continual basis:
Job Title Responsibilities Qualification Experience
Deputy General Dual reporting to IT and Marketing Heads. MCA with 15 years of work Sound knowledge of Business
Manager Participate in the Steering Council experience in Banking and IT operations
Chief Data Officer meetings with Board and understand domain Knowledge of Statistics.
(CDO) business expectations from data.
Report ongoing data quality
activities/initiatives to the Board.
Chief Manager Organize data quality co-ordination B.Tech / M.Tech (CSE/IT) Practitioner of Statistics and
(Data Quality) council meetings. 5 years of work experience IT tools for Statistics.
Work closely with data quality technical
staff, Applications Developers and the
data owners/ subject matter experts from
the business.
Establish a Data Quality Methodology.
Collaborate directly with the business data
owners to establish the data quality
business rules for dealing data quality
issues.
Data Architects Establish measures to chart progress B.Tech / M.Tech (CSE/IT) Database Application
related to completeness and quality of 3 years of work experience Development, Process Design,
metadata for enterprise information, to Data Quality, ETL
support reduction of data redundancy Development, Data
and fragmentation, elimination of Migration, of which three
unnecessary movement of data, and years has been in an analytical
improvement of data quality. or management role.
Ensure the accuracy and accessibility of all
important data.
Put in place governance processes around
metadata to ensure an integrated
definition of data for enterprise
information, and to ensure the accuracy,
validity, and reusability of metadata.
Data Stewards Manage Data Assets in order to improve M.Sc. (Statistics/Operations 2 years of experience in Data
their Reusability, Accessibility, Integrity Research) Processing/Management
and Consistency and Structure. 3 years of work experience. and Reporting.
Develop measures of customer data
quality, maintain metrics and publish to
stakeholders.
Create Dashboards and report on Data
Quality Metrics.
Resolve Data Integration Issues.
Work with IT to administer Business
Metadata within systems and tools.
17
Data Quality Framework for Indian Banking Sector
Chapter 6 Case Studies
§ Special day wishes mail cannot be sent viz.,
In this chapter, we share approaches adopted by Birthday, Anniversary, Doctor’s day, Women’s Day
Indian banks to improve data quality in their etc.,
operations. The root cause of the data quality To start with, we have formed a team to identify
problem/issue in Indian banks is two-fold. The first is incomplete, incorrect, inaccurate, irrelevant parts of the
missing data or wrong data. The second is data errors data and arrived at an approach to be followed for
which are largely attributable to data entry errors. replacing, modifying, or deleting the irrelevant data.
Karur Vysya Bank We faced challenges in data cleaning of migrated data
like either data is not available or data available is
Mailers to Customers incorrect. During the migration, wherever DOB is not
We have sent statement of accounts to current account available, we have put a standard date 01-Jan-1800. If
& CC account holders and the reject percentage is less that customer comes for opening new account or for
on a test basis and also to SB Account holders wherein renewal of deposit/facility, we have put a system in
the reject percentage is high. check that does not proceed until correct date of birth is
Hence, we have sent the KYC – Self Declaration given. Apart from the above, after migration, we have
document to the customers who are all with our bank started collecting income, education, dependency
for more than five years with self-addressed envelope details, profession, etc.
and requested the customers to either submit the form Field: Country
in the nearest branch or drop the self-addressed postal Problem: Free text, containing special characters, No
envelope at the nearest post box with the latest uniformity in names e.g.) India, INDIA, Ind, etc.
photograph, proof of identify and proof of address.
Data Cleaning Algorithm:
Such collected details are updated in a centralized
processing zone. On an enquiry with the centralized § All the irrelevant data in the country field i.e., not
processing zone, we have received the latest KYC available in the country master have been
documents for 20% of the customers to the total letters removed
we have sent. § If name of the Indian cities/towns or Indian state
is available in the address field, country code has
Data Cleaning Algorithms
been changed as India
Initially, when we decided to clean the customer data,
§ Normalized the field by converting the field
we have chosen to clean the following fields:
values into upper case as in country master.
Country, State, City, Pin code, Gender, Mobile Number,
Data Validation Algorithm:
Email ID, PAN number are the key fields required for any
customer communication in addition to the address § Free text removed
fields. § Pick list provided at the data entry stage to bring
§ If bank sends a physical letter to the wrong it to standard format
address, it will incur the cost of printing, postage § Cross validation built across the fields.
and also operational cost Field: State
§ Intimation of new facilities/product features Problem: Free text, containing special characters,
cannot be communicated to the customer No uniformity in names. The State name Tamil
§ Sending multiple mails to the same person is Nadu has been represented in more than 1 lac
incredibly unprofessional patterns like Tamil Nadu, T.N., T. Nadu,TamilNaadu,
Tamilnadu , Tamil Nad, etc.
18
Data Quality Framework for Indian Banking Sector
Data Cleaning Algorithm codes have been fetched and updated in the
§ Got a CD from the Postal Department. Created a city name
separate master table for State, City & Pin code § Contacted the customer home branch, if any
for data cleansing doubt, in arriving the city names
§ All the irrelevant data in the state field i.e., not § Geographical spread of our branches
available in the state master have been removed predominantly in Tamil Nadu, Andhra Pradesh,
§ If name of the Indian cities/towns or Indian state Karnataka, Kerala and major cities in Northern
is available in the address field, relative state states when we started data cleaning. Our data
codes have been corrected cleaning team had people belonging to one of
§ Checked in internet, if any doubt, in arriving the those states or had work experience in that
state name geographical area. With their help, data
§ Contacted the customer home branch, if any cleaning was done
doubt, in arriving the state name § Normalized the field by converting the field
§ Geographical spread of our branches values into upper case.
predominantly in Tamil Nadu, Andhra Pradesh, Data Validation Algorithm
Karnataka, Kerala and major cities in Northern § Free text removed
states when we started data cleaning. Our data § Pick list provided at the data entry stage to bring
cleaning team had one person belong to that it to standard format. If the state chosen is Tamil
states or having work experience in that Nadu, only the town names available in the state
geographical area. With the help of them data will be displayed. Branch has to choose the
cleaning was done appropriate city name from the pick list
§ Normalized the field by converting the field § Cross validation built across the fields.
values into upper case as in state master.
Field: Pin codes
Data Validation Algorithm
Problem: Free text, containing special characters,
§ Free text removed non-numeric characters found.
§ Pick list provided at the data entry stage to bring
Data Cleaning Algorithm
it to standard format. First Branch has to choose
the country code in the pick list. If the country § Got a CD from the Postal Department. Created a
chosen is India, list of states will be displayed in separate master table for State, City & Pin code
the pick list. Branch has to choose the state code for data cleansing
from the pick list § After doing the city correction, by referring the
§ Cross validation built across the fields. city name, pin code details are corrected
§ Contacted the customer home branch, if any
Field: City
doubt, in arriving the pin codes
Problem: Free text, containing special characters, No
§ Geographical spread of our branches
uniformity in names.
predominantly in Tamil Nadu, Andhra Pradesh,
Data Cleaning Algorithm Karnataka, Kerala and major cities in Northern
§ Got a CD from the Postal Department. Created a states when we started data cleaning. Our data
separate master table for State, City & Pin code cleaning team had one person belonging to one
for data cleansing of those states or had work experience in that
§ If name of the Indian cities/towns or Indian state geographical area. With their help data cleaning
is available in the address field, relative city was done
19
Data Quality Framework for Indian Banking Sector
§ Normalized the field by converting the field stage. Parsing in data cleansing is performed for
values into upper case. the detection of syntax errors. Data is validated
Data Validation Algorithm whether the data entered is accepted within the
allowed data specifications i.e., length of the
§ Free text removed
mobile number should be 12 digits including
§ Pick list provided at the data entry stage to bring country code and it should be in numeric
it to a standard format. If the city is chosen as
§ Repeatednumbersnotaccepted(eg.11111111111).
Chennai, first 3-digit of the pin code will get
automatically displayed; branch has to type the Field: Email ID
remaining three digits only. In that way, we have Problem: Free text containing special characters,
restricted the error level only to the last three non-numeric characters found.
digits Data Cleaning Algorithm
§ Cross validation built across the fields. § Special characters are removed other than dots
Field: Prefix & Gender and @ symbol
Problem: Free text, containing special characters § Ensured no spaces in the mail ID
Data Cleaning Algorithm § Checked only one @ symbol and one dot is
By cross verifying the gender with prefix along with the available.
customer name, data cleaning has been done Data Validation Algorithm
Ex: If gender is male, but the prefix is marked as Ms., by § We adopted parsing method at the data entry
reading the name and verifying the customer photo, stage. Parsing in data cleansing is performed for
either prefix or gender is corrected. the detection of syntax errors. Data is validated
Data Validation Algorithm whether the data entered is accepted within the
allowed data specifications i.e., one @ symbol
Cross validation built between prefix and gender at the
and dot.
data entry level itself. If the prefix is chosen as “Mr.”,
then gender cannot be chosen other than male. Field: PAN Number
Field: Mobile Number Problem: Free text, containing special characters and
spaces
Problem: Free text, containing special characters,
non-numeric characters found. Data Cleaning Algorithm
§ If the length of the mobile number is less than 10 § We adopted parsing method at the data entry
digits, we concluded that number is a wrong stage. Parsing in data cleansing is performed for
number and those are removed. the detection of syntax errors. Data is validated
whether the data entered is accepted within the
Data Validation Algorithm
allowed data specifications i.e., first five digits
§ We adopted parsing method at the data entry are character, next four digits are in numeric and
20
Data Quality Framework for Indian Banking Sector
the last digit is in character customer facing channels (branches, ATM, call
§ We have also provided in our in-house software centres) were provided alerts to ensure that these
to verify the PAN number given in NSDL site. details are captured when they interact with any of
these channels.
ICICI Bank
§ Contactability of delinquent customers: Specific
The primary reason for the bank to go for data cleaning to delinquent customers, there is also a need to ensure
activities was for maintaining better communication that the bank looks at all available data sources to
with the customer. The need for better communication contact these customers. For this, de-duplication
falls into the following four areas: based logic is run with the help of the credit bureaus
§ Product and Service Communication: To ensure and any additional contact details thus identified are
that customers understand the product features and used for collections activity.
use them. This also includes transaction updates, service Role of Logic Algorithms in Identifying Unclean
updates etc. Data
§ Promotional Communication: To make customers It is possible to ascertain incorrect data through basic
aware of the offers and services that the customer logics implemented through data warehouse. Examples
qualifies for as a result of holding the product. of these rules are: Mobile numbers less than 10 digits,
§ Corporate Communication: To communicate with PAN number not following the specified format,
customers on wider issues, not directly relating to the Address and PIN code mismatch etc. These logics help
products held, such as updates on the bank, fraud trigger the customers where the contactability needs
education, credit bureau related education, etc. to be improved through approaches mentioned
§ Marketing communication: To promote products above.
of the bank that might be of interest to the customer for Initiatives Rolled Out
his/her financial needs.
While contactability is a continuous process, the
Since ability to communicate with the bank is so following initiatives have been taken at the bank in the
important to use financial products, the Bank focuses last 2 years that have further sharpened the process:
on various approaches to improve the contactability of
§ The synchronized multichannel contactability
the customers. The contactability problem is handled
capability to ensure that right channel and right
through the following 3 pronged approach:
message is used to get customer contactability
§ Contactability of new customers: The initial information
customer contact is extremely important to get quality
§ The availability of credit bureau data in the last
information on customer contact details etc. Data
few years has led to significant improvements in
quality checks at this stage ensure that quality
achieving further improvements in contactability
customer information is captured.
for the target customer base.
§ Contactability of existing customers: As
customers might move addresses, change phone HDFC Bank
numbers, etc., on-going maintenance of customer
It is possible to ascertain incorrect data through basic
contactibility information is important even when the
rules implemented through data warehouse.
initial data capture is of good quality. We have
Examples of these rules are: Mobile numbers less than
adopted a multi-channel approach to customer
10 digits, PAN number not following the specified
contactability enhancement. If there is a concern with
format, Address and PIN code mismatch, etc. These
the contactability of a customer (emails bouncing,
rules help trigger the customers where the
incorrect phone numbers, incorrect address), various
contactability needs to be improved.
21
Data Quality Framework for Indian Banking Sector
Methodologies Adopted System Level People Level
Data Level
Software tools are used to deal with data de- § Master Data § Understand § C o n t a c t
duplication and errors at the time of data migration. management existing CBS customer to fill
Missing or partial data information is corrected either § Correctness of implementation gaps in legal fields.
existing data § Dealing with § Training banking
by contacting the customer or by comparison with
§ Availability. multiple systems/ personnel
internal database. sources of data. § Monitoring &
Over and above the controls built into the source control of data
quality.
system to ensure correct capture of data,
standardization of data values is undertaken at Figure 14: Data Quality Improvement
regular intervals as per business needs. In this case, a
data profiling exercise is undertaken wherein the
values of the particular field are profiled. Based on the
Data Quality Improvement in SBI
outcome, decisions are taken on standardization to be In case of the bank, the different branches migrated to
done. Then using programming logic or ETL tools CBS at different times. This led to a host of Data Quality
transformation is done. In recent past, this has been (DQ) issues as the validation rules on many important
done primarily for mobile number s and fields or the data elements were not there in the legacy
city/state/country fields. systems. This gave birth to a host of DQ issues and
incomplete data, for reporting and analytics. Therefore,
Allahabad Bank
reporting was done with lot of manual interventions
Allahabad Bank followed an outsourced model for dealing leading to errors, delays and costs. Compounded with
with data quality issues. this, was the sheer number of customers and accounts
RollingOutDataQualityImprovementInitiative which needed to be worked upon. Therefore, the bank
The challenges were divided into three buckets – data level, took up a major initiative to improve the data quality
system level and people level as given in figure 14. with technology support.
DataLevel: Evaluatethecurrentstateofdatainterms of: Bank has been using tools to profile data and generate
data quality related reports. The reports are shared with
§ Completeness: Identifymissingvalues
the respective business units. There are provisions for
§ Correctness and Consistency: Identify fields that were
continuous data quality improvement, right from
filled with default values
cleansing, enrichment, de-duplication to migration.
§ Structureofdatastoredinthemasterdata ‘Project Ganga’ initiative has been taken up by the bank
System Level: Understand the organizational structure of to address and improve the data quality. Initially, the
data: key data fields which impact regulatory and statutory
§ Understand technology aspects of the CBS reporting and customer contactibility was taken up. The
§ How data is spread across multiple systems branches worked on the DQ Reports and updated the
§ All data corrections were made in a data repository fields in the Core Banking System ensuring that the
extracted from the CBS (a Data Mart) – not in quality of data both in the source and the reporting &
the live CBS. analytics systems is improved and is in sync. Project
has been implemented in a phased manner and has
PeopleLevel: Dealwithaspectslike:
produced successful results in various areas. There has
§ Training people to identify the shortcomings in the
been improvement in the quality of data pertaining to
existing data
credit risk and now customer demographic data is
§ Contactingcustomertocollectmissingvalues being worked upon.
§ EstablishaMonitoringsystemtomonitordataquality. Continued in page no. 23...
22
Data Quality Framework for Indian Banking Sector
Implementation
Figure 15 shows solution diagram for Data Quality Improvement activity undertaken:
Tool to
Certify/Missing
CBS Data Population
Extract Data
Updating Data
Quality Audit/ Data Cleansing
from De-duplication
Rules definition
Tables
Figure 15: Solution Approach Taken for Data Quality Improvement Branches
§ PANstandardization
§ StateNameenrichmentdependingoncitynames
Data Quality Improvement in SBI
§ Emailaddressstandardization ...Continued from page no. 22
§ EnrichmentofSalutationifitispartofthename We also work closely with the Credit Information
§ Removenoise(spaces,commas,dash,etc.) Companies (CICs) to constantly improve the data
§ StandardizationofBranchcode,customernumber,etc quality and improve the acceptance rate of the loan
§ RecordswhicharenotstandardizedisgiventoBusiness related data by CICs. In our case, due to the above-
Userconsole mentioned initiatives our acceptance rate is now
amongst the highest for PSU Banks and this quantum
§ AddressParsingandStandardization
jump has happened since last one and a half years .
Identifiesinternationaladdresselementsinpartially
fielded addresses and assigns them to the proper We have also taken up the work of customer de-
fields duplication and address standardization using tools to
further improve data quality. This improvement in data
Performs formatting and standardization of
quality, standardization and completeness of data has
elementstoensureconsistentrepresentation
enabled the bank to fully leverage its investments in
§ GlobalAddressValidation
Data Warehouse and Business Intelligence & Statistical
Performs matching of address to reference Modeling tools, by completing large number of
database, with a unique deliverability assessment business intelligence and analytic projects in the areas
feature that classifies addresses according to their of CRM , Risk, Pricing & Profitability, etc. We believe that
probabledeliverability Data Quality Improvement is not a one-time project but
w Validatesindividualaddresselements a continuous process.
23
Data Quality Framework for Indian Banking Sector
Action Plan for DQ Management
Based on the study conducted by the working group on w Tool should be back-ward compatible by
data quality issues and the current activities undertaken providing wrappers to extract and cleanse
by various banks in India, we recommend the data from currently existing data sources
following business action points to monitor and with the banks.
maintain data quality of the existing customer records n Establish process and tools for Metadata
with the Banks. Management
n Banks may form a top level DQ Management Tools for building a repository for Business
CouncilheadedbyDGM(Data Quality) Vocabularyshouldsupport:
Identify the broad business goals for which data w DataModeling&DataIntegritytool
qualityhastomeet (refer pages 6 & 7) w MetadataDiscoverytool&repository
Consult Data Steward to identify the data fields w DQProfiling&Monitoring Tool
tobecollectedand maintained.
w DataCleansing&MatchingTools
n Establish and build a DQ Coordination Council
w DataIntegrationTool.
responsible for continuous monitoring and
n Establishtrainingprocessforstaff:
improvement of customer records
Banks need to place qualified personnel for DQ
ChiefManager(DataQuality)responsibletopro-
jobs. Train/recruit as possible, and help develop
actively monitor customer information at
expertise
regular intervals
Regular user and operator awareness need to be
Conduct Audit runs at regular intervals to
a culture to maintain the mindset to avoid poor
measurethecontactability of customers
data.
Identifygapsincustomerinformation
n Generate quarterly audit reports on data quality and
Takeremedialstepstofillthegaps
thecostsincurred on maintaining the data quality
Adoptdataenrichmentpoliciestodealwithgaps
DataStewardstogeneratedataqualityreportsat
in legal fields in customer data records and
variouslevels of the organization like zone wise
non-legal fields
and branch wise and submit reports to Chief
Improvedatacollectionpoliciesthroughtraining Manager (Data Quality)
of operations and IT personnel to improve
Chief Manager (Data Quality) to identify the best
effectives of data quality activities.
branch and zone in terms of data quality and
n Identifyandadoptemergingtechnologiestoreduce provide incentives for the same with
data errors at the time of data entry and data approval from GM(DQ)
migration of customer records.
Identify zones and branches that lag in data
Reduce human intervention at the time qualityandconducta root cause analysis
customerrecordcreationand data migration
Provide necessary resources and training
n Appoint Data Stewards to manage and resolve Data programmes to improve data quality in the
Qualityissues zones and branches that lag in data quality.
n Establish process for Data Quality Maintenance n Identify and deploy metrics to measure data quality
(DQM) contiguously
ProcessforpurchaseofITsolutionsforDQM n Incentivise and celebrate good quality data
w Tools should have features for Data Profiling, practices
Data Cleansing, Data Enrichment and Data n Define accountability for data quality between
Integration front office, mid office and back office.
24
Data Quality Framework for Indian Banking Sector
Mentor
Shri B. Sambamurthy, Director, IDRBT
Contributors
¬ Dr. N. Raghu Kisore, Assistant Professor, IDRBT
Acknowledgements
¬ Mr. Ajay Kapoor, HDFC Bank