Seeking Data Quality

Using Agile Methods to Test a Data Warehouse

© Copyright Ideaca 2008

Agenda – Seeking Data Quality
• • • • • • • Data Warehouse Overview The Value of a Data Warehouse Agile as Business Value Driver Test Strategy Test Techniques Test Results Conclusions

© Copyright Ideaca 2008

2

Agenda – Seeking Data Quality • • • • • • • Data Warehouse Overview The Value of a Data Warehouse Agile as Business Value Driver Test Strategy Test Techniques Test Results Conclusions © Copyright Ideaca 2008 3 .

What is a Data Warehouse? • • • • • • A non-transactional data repository Integrates data from multiple sources Organized around relevant subjects Queryable by business users Used for reporting Used for analysis © Copyright Ideaca 2008 4 .

The Structure of a Data Warehouse • Kimball’s Star Schema © Copyright Ideaca 2008 5 .

The Flow of Data • Typical data flow © Copyright Ideaca 2008 6 .

Agenda – Seeking Data Quality • • • • • • • Data Warehouse Overview The Value of a Data Warehouse Agile as Business Value Driver Test Strategy Test Techniques Test Results Conclusions © Copyright Ideaca 2008 7 .

The Value of a Data Warehouse • To provide information that will help people make better choices • This information is a solution to the problem of making choices in a complex environment • The benefit of the information is that it reduces risk by providing an accurate representation of the state of the world • This comes at the cost of building and maintaining the data warehouse now and into the future © Copyright Ideaca 2008 8 .

the more useful it is. and therefore the more valuable it is • The value of data increases when combined with other data • The value of data increases with its use. in fact is only has value when people use it • Focus on high risk problems using limited resources • Emphasis on Data Quality • • • • Relevance Completeness Correctness Consistency © Copyright Ideaca 2008 9 .Data Value Drivers • Our research led us to these value drivers: • The more accurate the data is.

Agenda – Seeking Data Quality • • • • • • • Data Warehouse Overview The Value of a Data Warehouse Agile as Business Value Driver Test Strategy Test Techniques Test Results Conclusions © Copyright Ideaca 2008 10 .

Agile Principles as Guides • • • • • • Testing is a process of investigation and evaluation Customer involved in deciding test relevance Customer involved in deciding test priority Communication of test goals and approach Simple and lightweight test “scripts” Avoid effort on low value tasks © Copyright Ideaca 2008 11 .

Agenda – Seeking Data Quality • • • • • • • Data Warehouse Overview The Value of a Data Warehouse Agile as Business Value Driver Test Strategy Test Techniques Test Results Conclusions © Copyright Ideaca 2008 12 .

Test Strategy Outline • Data Warehouse Test Targets • Stars are the business view of a data warehouse • Stars are comprised of a Fact and its Dimensions • Fact and Dimension tables are loaded through ETL’s • • • • • Each target had a similar test approach The test backlog was a prioritized list of these tests Detailed test scripts are expensive to produce Our “scripts” outlined a guided exploration Progress could be measured through a burndown chart • Regulatory requirements needed to be met © Copyright Ideaca 2008 13 .

Business View of a Data Warehouse • Testing progress reported on the basis of stars © Copyright Ideaca 2008 14 .

Agenda – Seeking Data Quality • • • • • • • Data Warehouse Overview The Value of a Data Warehouse Agile as Business Value Driver Test Strategy Test Techniques Test Results Conclusions © Copyright Ideaca 2008 15 .

Tests • We tested for completeness • No missing records • No missing fields • We tested for correctness • • • • • • • • • • • Correct keys Correct calculations Correct aggregations Correct data type/size Consistent aggregations Consistent calculations Consistent data type/size Consistent granularity Consistent business rules Consistent use of nulls and defaults Consistent formatting • We tested for consistency © Copyright Ideaca 2008 16 .

Test Points • Test every ETL. Fact. and Dimension © Copyright Ideaca 2008 17 .

Agenda – Seeking Data Quality • • • • • • • Data Warehouse Overview The Value of a Data Warehouse Agile as Business Value Driver Test Strategy Test Techniques Test Results Conclusions © Copyright Ideaca 2008 18 .

common understanding © Copyright Ideaca 2008 19 .99995% data accuracy • Testing less than 20% of development effort • Common “scripts”.Test Results • Greater than 99.

Root Cause Analysis Defects Classified by root cause Cause Defect % Development Standards Issues 23% Implementation Errors ETL Errors 22% 21% Database Issues Design Issues 13% 9% Other Issues 12% © Copyright Ideaca 2008 20 .

Defect Roots Causes Cause Development standards issues Cause Breakdown Naming conventions Design standards Documentation standards Metadata Implementation errors Primary/foreign key problems Inconsistent field lengths Field types Bad data Missing data ETL errors Counts off Totals off Failed calculations Failed conversions Unpopulated fields © Copyright Ideaca 2008 21 .

continued Cause Database errors Cause Breakdown Performance Indexes Partitions Tablespace Design issues Missing fields Extra fields Missing dimensions Mapping problems All other issues Miscellaneous © Copyright Ideaca 2008 22 .Defect Roots Causes .

Agenda – Seeking Data Quality • • • • • • • Data Warehouse Overview The Value of a Data Warehouse Agile as Business Value Driver Test Strategy Test Techniques Test Results Conclusions © Copyright Ideaca 2008 23 .

Conclusions • Value based approach focused our test efforts to find more serious problems sooner • Applying agile principles allowed us to minimize wasted time and effort • Testing identified development process changes that had the greatest impact on data quality • New regulatory requirements mean that the ability to test is now a design issue © Copyright Ideaca 2008 24 .

Summary – Contrasting Test Styles Old Approach New Approach Focus on tool – database. views. stored procedures Test plans Test cases Detailed scripts for instructions Focus on value – data usage in business context Focus on outcome – stars/dimensions/facts Test backlogs Test targets Light scripts as guides for exploration Team communication is vital No special emphasis on team communication © Copyright Ideaca 2008 25 . data warehouse Focus on process – tables.

Sign up to vote on this title
UsefulNot useful