You are on page 1of 11

US Insurance Company - Informatica Data Quality

TCS Digital Enterprise – Analytics, Big Data and Information Management


PLEASE, CONTACT THE ACCOUNT OWNER FOR USING CUSTOMER INFORMATION AND REFERENCES.

Copyright © 2013 Tata Consultancy Services Limited

TCS Internal TCS Confidential 1


US Insurance Company
Data Quality
Insurance client
Business Case Challenges
 Data quality process constitutes profiling the data to discover  Data Quality measures in place to ensure customer satisfaction and
Customer Overview inconsistencies and other anomalies in the data, as well as performing product/service effectiveness.
data cleansing activities (e.g. removing outliers, missing data) to improve Challenges that are faced:
 US based Fortune 500  Client was not aware about the data quality.
the data quality . Broadly it comprises of three steps : profile, analyze and
insurance company.  Not a DQ focused approach.
cleanse.
 Company has three  Complications in Business rules.
segments: Retirement  Data quality is measured across various DQ factors.  Challenge in loading data into target due to millions of records.
& Protection, US  1)Duplicate Data(Uniqueness) 2)Stale Data(Timeliness)  Insertion of records from Informatica to greenplum was taking time and
 3)Incomplete Data(Completeness) 4)Invalid Data(Validity) the process was complicated.
Mortgage Insurance,
and International.  5)Conflicting Data(Consistency) 6)Incorrect Data(Accuracy)

TCS’ Solution Benefits


Designed a framework for data quality :-  Business can drill down and filter to see more details as needed.
 In Informatica developer tool, designed an LDO(Logical Data Objects).  PII sensitive data is still protected, however the overall data quality is
LDO’s are virtual mappings that allows to apply filters and can be used in available for analysis.
multiple profiles where the LDO is the source object.  Development time would be drastically reduced. No need of calling
 Informatica analyst tool is used to perform data integration(profiling) task mapplets for each and every column.
on the LDO.  Reusability of existing mapping – New mapping would be developed with
 Define business logic that makes data available in a virtual table. very few changes in existing one.
 Analyze the structure and quality of data.  Tracking of bad records would be easy as we can have Column value along
 Create a single view of data. with its DQ Analysis indicators.
 Data quality dashboard is published at the end so as to verify data quality  Better use of IDQ product capabilities and functionality.
index.
Technology Stack
 Informatica Developer 9.6.1,Informatica Analyst tool, Greenplum, Spotfire reporting tool, WinSCP, Putty, SQL developer.

TCS Internal TCS Confidential 2


US Insurance Company- Data Quality
Business Case
 Data Quality is the process of examining the data available in an existing information data source (e.g. a database or a file) and collecting statistics or small but informative
summaries about that data. Type of analysis performed:
 Completeness Analysis: How often is a given attribute populated, versus blank or null?
 Uniqueness Analysis: How many unique (distinct) values are found for a given attribute across all records? Are there duplicates? Should there be?
 Range Analysis: What are the minimum, maximum, average and median values found for a given attribute?
 Pattern Analysis: What formats were found for a given attribute, and what is the distribution of records across these formats?
 Values Distribution Analysis: What is the distribution of records across different values for a given attribute?
 Data from system are gathered for the purpose of reporting and analysis. This helps to identify data quality issues that must be corrected in source system and it helps
data governance for improving data quality. Quality benchmarking is done on the basis of Completeness ,Accuracy ,Validity ,Consistency, Uniqueness and Analysis/Leading
Trailing Spaces.
Solution Approach
Designed a framework for data quality :-
 In Informatica developer tool, designed an LDO(Logical Data Objects).LDO’s are virtual mappings that allows to apply filters and can be used in multiple profiles where the
LDO is the source object.
 Data quality rules are applied in LDO and same is used as source object in mapping.
 Informatica analyst tool is used to perform data integration(analyze, profile and score data) task on the LDO without loading data to a physical target. This is possible with
virtual database.
 A virtual table can have a virtual mapping that defines the data flow between the sources and the virtual table. Virtual table can be created manually, or from a physical or
logical data object.
 LDO is used in mapping and Flatfile is generated. Data from flatfile is read by external table in greenplum.
 Summary table created using external table ,which will help data governance team for the consolidated count of error records for a table in reporting tool.

TCS Internal TCS Confidential 3


Challenges
 Not a DQ focused approach.
 Challenges while designing a framework for LDO(Logical data object) so that all the analysis can performed such as Completeness,
Null analysis, Range analysis, Pattern analysis etc.
 Lack of defined Business logic rules, Null override rules.
 No data dictionary available for data quality.
 Dimensions /categories of data quality.
 Data Volume large in number.
 How to control and improve data quality.
 Challenge to show the statistic of data in summary table. Summary table will give details related to row_count , record_count (no.
of rows * no. of columns), total_issue_count etc.

TCS Internal TCS Confidential 4


Functional Coverage

 Data quality is measured across various DQ factors. Metric corresponding to each DQ factor is calculated to get the
final % DQ Factor.

Functional
understanding for
Data quality project

TCS Internal TCS Confidential 5


Attribute for logical data objects

TCS Internal TCS Confidential 6


Logical Architecture

TCS Internal TCS Confidential 7


Physical Architecture

TCS Internal TCS Confidential 8


Sample Screen Shots (esp. Reports)

TCS Internal TCS Confidential 9


Benefits

 Business can drill down and filter to see more details as needed.
 Single tool to connect to multiple data sources and bring data on single platform.
 Specify, validate, configure and test data quality rules.
 All required statistics of a table available on single screen.
 Easy to test specific business rules.
 Quality benchmarking quantifies quality and brings trust in data.
 Real-time Data Quality Processes - ensure high-quality data feeds accurate business intelligence. A unified approach to data quality brings reliability,
scalability and flexibility to managing data in the enterprise.
 Better informed, more reliable decisions come from using the right data quality technology during the process of loading a data warehouse.
 Operate Proactively and Efficiently.
 No code development.
 Quick results.

TCS Internal TCS Confidential 10


Thank You

Copyright © 2013 Tata Consultancy Services Limited

TCS Internal TCS Confidential 11

You might also like