This action might not be possible to undo. Are you sure you want to continue?
A Seminar Report submitted in partial fulﬁllment of the requirements for the award of the degree of Bachelor of Technology in COMPUTER ENGINEERING by Mr. Tage Nobin Under the guidance of
Prof. L.D. Netak
DEPARTMENT OF COMPUTER ENGINEERING DR. BABASAHEB AMBEDKAR TECHNOLOGICAL UNIVERSITY Lonere-402 103, Tal. Mangaon, Dist. Raigad (MS) INDIA April, 2010
The seminar report entitled Data Quality Management: Methods and Tools submitted by Mr. Tage Nobin (20070653) is approved for the partial fulﬁllment of the requirement for the award of the degree of Bachelor of Technology in Computer Engineering.
Prof L.D. Netak Guide Dept. of Computer Engineering External Examiner(s) 1. 2. (Name: (Name:
Prof L.D. Netak Head Dept. of Computer Engineering
Place:Dr.Babasaheb Ambedkar Technological University, Lonere. Date: 15/05/2010
This seminar report is a result of intense eﬀort of many people whom I need to thank for making this a reality. I thus express my deep regards to all those who have oﬀered their assistance and suggestions. I am grateful to my seminar guide Prof. L.D.Netak, for making this work possible. No word of thanks is enough for his mentorship, guidance, support and patience. I must acknowledge the freedom he gave me in pursuing topics that I found interesting. His resourcefulness, inﬂuence and keen scientiﬁc intuition were also vital to the progress of this work and for these I am deeply thankful. Finally, I would like to thank all whose direct and indirect support helped me in completing the seminar in time.
Mr. Tage Nobin (20070653)
Defective data is one of the serious problems pertaining to data world. Business success is becoming ever more dependent on the accuracy and integrity of mission critical data resources. As data volume increases, the question of internal consistency within data becomes paramount, regardless of ﬁtness for use for any external purpose. Diﬀerent methods and tools are being used for the maintenance of data quality as per the condition and the situation. This paper describes major data quality problems, requirements and common strategies to manage data quality in systems related to data. It also explains the importance of data quality management with special spotlight addressing the management issues of data quality and various methods and tools that can be used and implemented comprehensively in the management of data quality.
1 Introduction 2 What is Data Quality Management 3 Data Quality Deﬁnition (Rules and Targets) 3.1 3.2 Importance of Data Quality . . . . . . . . . . . . . . . . . . . . . . . . . Data Quality Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 4 6 6 7 10 13 13 15 15 17 17 18 19 19 19 21 24
4 Data Quality Management Challenges 5 Design Quality Improvement Process 5.1 Data Quality Management Objective . . . . . . . . . . . . . . . . . . . .
6 Implement Quality Improvement Process (Methods and Tools) 6.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.1 6.1.2 6.1.3 6.1.4 6.2 6.2.1 6.2.2 6.2.3 Data Proﬁling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Data Cleansing . . . . . . . . . . . . . . . . . . . . . . . . . . . . Data Augmentation . . . . . . . . . . . . . . . . . . . . . . . . . . Data Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . Data Auditing Tools . . . . . . . . . . . . . . . . . . . . . . . . . Data Cleansing Tools . . . . . . . . . . . . . . . . . . . . . . . . . Data Migration tool . . . . . . . . . . . . . . . . . . . . . . . . .
Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7 Basic Tools of Data Quality 8 Monitor Data Quality 8.1 Monitoring System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
26 29 29 32
List of Figures
3.1 4.1 5.1 6.1 8.1 Data Quality Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . Data Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Radial Cycle of Data Quality Process . . . . . . . . . . . . . . . . . . . . Data Quality Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . Monitor System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 11 14 16 30
Chapter 1 Introduction
SIX HUNDRED BILLION DOLLARS ANNUALLY! Yes that is what poor data quality costs American businesses, according to the Data Warehousing Institute. What about the whole world? Ensuring high level data quality is one of the most expensive and timeconsuming tasks to perform in data warehousing projects. Data quality management is the ﬁeld being the one in which all kinds of data- raw or processed are managed. It is one of the major ﬁelds yet to be rolled on in a serious way. The main topics covered are as follows:
1. What is Data Quality Management?
This is a key ﬁrst step, as the ability to understand the up-front level of data quality management will form the foundation of the domain rules and processes that will be put in place. Without performing an upfront assessment, the ability to eﬀectively implement a data quality strategy will be negatively impacted. From an ongoing perspective, the data quality management will allow an organization to see how the data quality procedures put in place have caused the quality of the data to improve.
2. Data Quality Deﬁnition (Rules and Targets) Once the initial data quality assessment is complete, the second part of the process involves deﬁning what exactly we mean by data quality. From an ongoing perspective, this phase will involve describing the various attributes of a quality data that will make the whole process much easier. Performing trend analyses on the data and the rules in place to ensure the data rules are adhered to and the target is focused on. 3. Data Quality Management Challenges Deploying a data quality management program is not easy; there are signiﬁcant challenges that must be overcome. Herein we shall discuss the various problems and challenges that may act as an obstacle in the way of quality data. Be it of serious nature or mild one, but these challenges need to be overcome so as to develop a quality data. 4. Design Quality Improvement Processes This phase involves designing of data quality management process architecture. With various ways on the ﬁeld, we need to choose the best way of managing process that would make the data of ﬁnest quality. 5. Implement Quality Improvement Processes (Methods and Tools) Once the designing has been standardized, the next phase of the enhancement process involves the actual implementation of the various methods and tools designed. Since data quality is an iterative process, the rules to manage and regulate will come in handy. 6. Monitor Data Quality The ability to monitor the data quality processes is critical, as it provides the organization with a quick snapshot of the health of the data within the organization. Through analysis of the data quality scorecard results, we can have the information needed to conﬁdently make additional modiﬁcations to the data quality strategies 2
in place if needed. Conversely, the scorecards and trend analysis results can also make sure that data quality is being eﬀectively addressed within the organization.
Chapter 2 What is Data Quality Management
First of all we will be dealing with what exactly is ’Data’, ’Quality’ and ’Management’. As Oxford dictionary deﬁnes • Data - ”a collection of facts from which conclusions may be drawn” • Quality - ”the standard of how good something is measured against other similar things”. In context of data it quality may be deﬁned as ”ﬁtness for use”. • Management - ”the action of managing something”. In a broad spectrum, data quality management entails the establishment and deployment of roles, responsibilities, policies, and procedures concerning the acquisition, maintenance, dissemination, and disposition of data. Actually the perfect deﬁnition depends upon the organization. For, successful data quality management, the solution must include techniques, processes, methods and tools. Data quality management lifecycle must be clearly deﬁned using continuous as well as iterative frameworks. Data quality must be designed into systems using proven engineering principles. Data quality is too often left to chance or given only superﬁcial attention in the design of information systems. While good engineering principles are sometimes applied to software development, data quality is usually left up to the end user. Applying engineering principles to data quality involves understanding the factors that aﬀect the creation 4
and maintenance of quality data. It is helpful to look at data as the output of a data manufacturing process.
Chapter 3 Data Quality Deﬁnition (Rules and Targets)
3.1 Importance of Data Quality
First oﬀ, data is an essential component of most of today’s business processes. In customer facing functions, it is the foundation to managing customer relationships. Without good data quality, it will be diﬃcult to get accurate report metrics, and it will also waste user’s time and eﬀorts. Bad data will also incur costs. One of the biggest risks of bad data quality is that it ultimately inhibits adoption, as users get frustrated and lose trust in the data. We need to know the fact that it isn’t quality that is expensive; it’s the cost of ”unquality”. Examples of the cost of ”unquality” include the cost of sending duplicate promotional materials because customers are duplicated in the database, the opportunity cost of not sending materials to the right customers because the data used to segment customers is ﬂawed, the opportunity cost of not shipping products to a customer because of inaccurate information about inventory levels, and the time spent ﬁnding and reconciling data needed to make eﬀective decisions. Data quality is evidenced by valid and reliable data; therefore planning in the early stages about the clear concept of the need and deﬁnition is well worth the investment
of time and resources. Data in a database has no actual value (or even quality); it only has potential value. Data has realized value only when someone uses it to do something useful. As mentioned earlier there is no ﬁxed global deﬁnition of high quality data. Data quality don’t restrict itself to a particular concept, instead diﬀers as domain and application changes. Whatever be the domain, there are some particular parameters which are essential to be gratiﬁed in order to make the data a real quality data. Contrary to popular belief, quality is not necessarily zero defects. Quality is conformance to valid requirements.
Data Quality Attributes
To be ﬁt for use, data products must possess all three attributes of quality: 1) Utility - refers to the usefulness of the information for its intended users. 2) Objectivity - refers to whether information is accurate, reliable, and unbiased, and is presented in an accurate, clear, and unbiased manner. 3) Integrity - refers to the security or protection of information from unauthorized access or revision. All the above three attributes may be further deﬁned in terms of seven dimensions of data quality: Relevance- refers to the degree to which our data products provide information that meets the customer’s needs. Accuracy- refers to the diﬀerence between an estimate of a parameter and its true value. Timeliness- refers to the length of time between the reference periods of the information and when we deliver the data product to customers. 7
Accessibility- refers to the ease with which customers can identify, obtain, and use the information in data products. Interpretability- refers to the availability of documentation to aid customers in understanding and using our data products. This documentation typically includes: the underlying concepts; deﬁnitions; the methods used to collect, process, and analyze the data; and the limitations imposed by the methods used. Transparency- refers to providing documentation about the assumptions, methods, and limitations of a data product to allow qualiﬁed third parties to reproduce the information, unless prevented by conﬁdentiality or other legal constraints. Completeness- refers to the degree to which values are present in the attributes that require them.
Figure 3.1: Data Quality Attributes
Chapter 4 Data Quality Management Challenges
Data is impacted by numerous processes, most of which aﬀect its quality to a certain degree. It is imperative that the issue of data quality be addressed if the data warehouse is to prove beneﬁcial to an organization. The information in the dataware house has the potential to be used by an organization to generate greater understanding of their customers, processes, and the organization itself. There potential to increase the usefulness of data by combining it with other data sources is great. But, if the underlying data is not accurate, any relationships found in the data warehouse will be misleading. As Wyatt Earp (Data-Base Expert) said Fast is ﬁne, but accuracy is everything. Resolving data quality problems is often the biggest eﬀort in a data mining study. 50-80 percent of time in data mining projects spent on DQ. Because it’s (data) in the computer, it doesn’t mean its right.
Figure 4.1: Data Flow Data/information is not static, it ﬂows between data collection and usage process. The main problem herein is that problems can and do arise at all of these stages which makes the need of End-to-End continuous monitoring. This process indeed becomes a herculean task. There are various factors that Inﬂuence Data Quality: 1. Data Control 2. Data Age 3. Data Types 4. Device Availability 5. Data Structure 6. Read/Write Management 7. Communication Timing
These factors matters a lot to development of a quality data. Any error up to 1 percent of data may impact ﬁndings and result. There are numerous problems which arise in wake of data quality. Some of them are of small nature and some of them big. Whatever be the nature, ignoring any such kind of matter may prove to be costly aﬀair. Some of them listed:
- Much of the raw data is of poor quality. This is because of incorrect data gathering and data operations. This leads to inaccurate assessment of the data. - The above mentioned fact results in data being costly to diagnose and assess. - Consequence of which is the data becomes costly to repair. - Many of the costs involved are hidden and hard to quantify. This makes the assessment a tough task. - Inconsistent data between diﬀerent systems. Since data ﬂows between diﬀerent systems, any obstacle in the smooth transition may lead to a total data failure. - Most of the attributes of a quality data are extremely diﬃcult to measure sometimes impossible. - They are of vague nature. The conventional deﬁnitions provide no guidance towards practical improvements of the data. - The priority for metadata is undermined. Setting standards for metadata is very important. - Data quality management requires cross-functional cooperation. - It is perceived to be extremely manpower-intensive. - There are various other systematic errors which can be attributed to lack of resources and skills. 12
Chapter 5 Design Quality Improvement Process
The quality of any data statistics disseminated by an agency depends on two aspects: the quality of statistics received, and the quality of internal processes for the collection, processing, analysis and dissemination of data and Meta data.
Data Quality Management Objective
Typical objectives of a data quality management program include: • Eliminate redundant data • Validate data at input • Eliminate false null values • Ensure data values fall within deﬁned domains • Resolve conﬂicts in data • Ensure proper deﬁnition and use of data values • Establish and apply standards. 13
Figure 5.1: Radial Cycle of Data Quality Process
Here in we develop a basic strategy by combining all the above steps: Preliminary Problem Deﬁnition, followed by Analysis, Improvement and monitor steps for each problem.
Chapter 6 Implement Quality Improvement Process (Methods and Tools)
Designing the quality management process is clearly not the end of the data quality eﬀort. Just identifying issues does nothing to improve things. The issues need to drive changes that will improve the quality of the data for the eventual users. We see process improvement fundamentally as a way of solving problems. If there is not an apparent or latent problem, process improvement is not needed. If there is any problem, howsoever intangible, one or more processes need to be improved to deal with the problem. Once you sense a problem, good problem solving technique involves alternating between the levels of thought and experience.
1. Proﬁling 2. Cleansing 3. Data Integration/Consolidation 4. Data Augmentation 15
Figure 6.1: Data Quality Methods
It can be deﬁned as use of analytical techniques on data for the purpose of developing a thorough knowledge of its content, structure and quality. It is a process of developing information about data instead of information from data. The purpose of these statistics may be to: 1. Find out whether existing data can easily be used for other purposes. 2. Improve the ability to search the data by tagging it with keywords, descriptions or assigning it to a category. 3. Give metrics on data quality, including whether the data conforms to particular standards or patterns. 4. Assess the risk involved in integrating data for new applications. 5. Assess whether metadata accurately describes the actual values in the source database. 6. Develop a master data management process for data governance for improving data quality.
Data cleansing or data scrubbing is the act of detecting and correcting (or removing) corrupt or inaccurate records from a record set, table, or database. Used mainly in databases, the term refers to identifying incomplete, incorrect, inaccurate, irrelevant etc. parts of the data and then replacing, modifying or deleting this dirty data. After cleansing, a data set will be consistent with other similar data sets in the system. The inconsistencies detected or removed may have been originally caused by diﬀerent data dictionary deﬁnitions of similar entities in diﬀerent stores, may have been caused by user entry errors, or may have been corrupted in transmission or storage. Data cleansing diﬀers from data validation in that validation almost invariably means data is rejected
from the system at entry and is performed at entry time, rather than on batches of data. The process of data cleansing are • Data Auditing: The data is audited with the use of statistical methods to detect anomalies and contradictions. This eventually gives an indication of the characteristics of the anomalies and their locations. • Workﬂow speciﬁcation: The detection and removal of anomalies is performed by a sequence of operations on the data known as the workﬂow. It is speciﬁed after the process of auditing the data and is crucial in achieving the end product of high quality data. In order to achieve a proper workﬂow, the causes of the anomalies and errors in the data have to be closely considered. If for instance we ﬁnd that an anomaly is a result of typing errors in data input stages, the layout of the keyboard can help in manifesting possible solutions. • Workﬂow execution: In this stage, the workﬂow is executed after its speciﬁcation is complete and its correctness is veriﬁed. The implementation of the workﬂow should be eﬃcient even on large sets of data which inevitably poses a trade-oﬀ because the execution of a data cleansing operation can be computationally expensive. • Post-Processing and Controlling: After executing the cleansing workﬂow, the results are inspected to verify correctness. Data that could not be corrected during execution of the workﬂow are manually corrected if possible. The result is a new cycle in the data cleansing process where the data is audited again to allow the speciﬁcation of an additional workﬂow to further cleanse the data by automatic processing.
The term data augmentation refers to methods for constructing iterative algorithms via the introduction of unobserved data or latent variables. In general, however, constructing data augmentation schemes that result in both simple and fast algorithms is a matter of art in that successful strategies vary greatly with the observed-data models 18
Data integration involves combining data residing in diﬀerent sources and providing users with a uniﬁed view of these data. This process becomes signiﬁcant in a variety of situations both commercial (when two similar companies need to merge their databases) and scientiﬁc (combining research results from diﬀerent repositories, for example). Data integration appears with increasing frequency as the volume and the need to share existing data explodes.
It is commonly accepted that data quality tools can be grouped according to the part of a data quality process they cover. Data proﬁling and analysis assist in detecting data problems. Data transformation, data cleaning, duplicate elimination and data enhancement propose to solve the discovered or previously known data quality problems. Data quality tools generally fall into one of three categories: 1. Auditing 2. Cleansing 3. Migration
Data Auditing Tools
Data auditing tools enhance the accuracy and correctness of the data at the source. These tools generally compare the data in the source database to a set of business rules. When using a source external to the organization, business rules can be determined by using data mining techniques to uncover patterns in the data. Business rules that are internal to the organization should be entered in the early stages of evaluating data 19
sources. Lexical analysis may be used to discover the business sense of words within the data. The data that does not adhere to the business rules could then be modiﬁed as necessary. Data Analysis: Activities that enclose the statistical evaluation, the logical study of data values and the application of data mining algorithms in order to deﬁne data patterns and rules to ensure that data does not violate the application domain constraints. The set of commercial and research tools that provide data analysis techniques are as follows: Commercial- dfPower ETLQ Migration Architect Trillium WizWhy ResearchPotter’s Wheel Ken State University Tool
Data Proﬁling: Process of analyzing data sources with respect to the data quality domain, to identify and prioritize data quality problems. Data proﬁling reports on the completeness of datasets and data records organize data problems by importance; output the distribution of data quality problems in a dataset, and lists missing values in existing records. The identiﬁcation of data quality problems before starting a data cleaning project is crucial to ensure the delivery of accurate information. The following set of commercial and research tools implement data proﬁling techniques: 20
Commercial- dfPower ETLQ Migration Architect Trillium WizWhy Research- Ken State University Tool
Data Cleansing Tools
Data cleansing tools are used in the intermediate staging area. The tools in this category have been around for a number of years. A data cleansing tool cleans names, addresses and other data that can be compared to an independent source. These tools are responsible for parsing, standardizing, and verifying data against known lists. The data cleansing tools contain features which perform the following functions: • Data parsing (elementizing) - breaks a record into atomic units that can be used in subsequent steps. Parsing includes placing elements of a record into the correct ﬁelds. • Data standardization - converts the data elements to forms that are standard throughout the data warehouse. • Data correction and veriﬁcation- matches data against know lists. • Record matching- determines whether two records represent data on the same subject. • Data transformation- ensures consistent mapping between source systems and data warehouse.
• House-holding - combining individual records that have the same address. • Documenting - documenting the results of the data cleansing steps in the meta data. Data cleaning: The act of detecting, removing and/or correcting dirty data. Data cleaning aims not only at cleaning up the data but also to bring consistency to diﬀerent sets of data that have been merged from separate databases. Sophisticated software applications are available to clean data using speciﬁc functions, rules and look-up tables. In the past, this task was done manually and therefore subject to human error. The following set of commercial and research tools implement data cleaning techniques: Commercial- DataBlade dfPower ETLQ ETI*DataCleanser Firstlogic NaDIS QuickAddress Batch Sagent Trillium WizRule Research- Ajax Arktos FraQL
Duplicate elimination: The process that identiﬁes duplicate records (referring to the same real entity) and merges them into a single record. Duplicate elimination processes are costly and very time consuming. They usually require the following steps: (i) to standardize format discrepancies; (ii) to translate abbreviations or numeric codes; (iii) to perform exact and approximate matching rules and (iv) to consolidate duplicate records. The set of commercial and research tools that provide duplicate elimination techniques is presented below: Commercial- Centrus Merge/Purge ChoiceMaker DataBlade DeDupe dfPower DoubleTake ETLQ ETI*DataCleanser Firstlogic Identity SearchvServer MatchIT Merge/Purge Plus ResearchAjax Flamingo Project
Data enrichment (also known as data enhancement): The process of using additional information from internal or external data sources to improve the quality of the input data that was incomplete, unspeciﬁc or outdated. Postal address enrichment, geocoding and demographic data additions are typical data enrichment procedures. The set of commercial and research data enrichment tools are listed below: Commercial- DataStage dfPower ETLQ Firstlogic NaDIS QuickAddress Batch Sagent Trillium Research- Ajax
Data Migration tool
The third type of tool, the data migration tool, is used in extracting data from a source database, and migrating the data into an intermediate storage area. The migration tools also transfer data from the staging area into the data warehouse. The data migration tool is responsible for converting the data from one platform to another. A migration tool will map the data from the source to the data warehouse. There can be a great deal of overlap in these tools and many of the same features are found in tools of each category.
Data transformation: The set of operations (schema/data translation and integration, ﬁltering and aggregation) that source data must undergo to appropriately ﬁt a target schema. Data transformations require metadata, such as data schemas, instance-level data characteristics, and data mappings. The set of commercial and research tools that can be classiﬁed as data transformation tools is the following: Commercial- Data Integrator DataFusion DataStage dfPower ETLQ Hummingbird ETL Firstlogic Informatica ETL SQL Server Trillium Research- Ajax Arktos Clio FraQL Potter’s Wheel TranScm
Chapter 7 Basic Tools of Data Quality
1. Fishbone Diagram Fishbone diagrams are diagrams that show the causes of a certain event. Common uses of the Fishbone diagram are product design and quality defect prevention, to identify potential factors causing an overall eﬀect. Each cause or reason for imperfection is a source of variation. Causes are usually grouped into major categories to identify these sources of variation.
2. Flow Chart A ﬂowchart identiﬁes the sequence of activities or the ﬂow of materials and information in a process. There is no precise format, and the diagram can be drawn simply with boxes, lines, and arrows. Flowcharts help the people involved in the process understand it much better and more objectively by providing a picture of the steps needed to accomplish a task.
3. Histogram and bar chart 26
Histograms provide clues about the characteristics of the parent population from which a sample is taken. Patterns that would be diﬃcult to see in an ordinary table of numbers become apparent. Bar Chart is series of bars representing the frequency, e.g. number of times yes/no.
• Displays large amounts of data that are diﬃcult to interpret in tabular form. • Shows centering, variation, and shape. • Illustrates the underlying distribution of the data. • Provides useful information for predicting future performance.
4. Scatter diagram It is a plot of two variables showing whether they are related. • Supplies the data to conﬁrm a hypothesis that two variables are related. • Provides both a visual and statistical means to test the strength of a relationship. • Provides a good follow-up to cause and eﬀect diagrams. 5. Run Chart Run charts show the performance and the variation of a process or some quality or productivity indicator over time in a graphical fashion that is easy to understand and interpret. They also identify process changes and trends over time and show the eﬀects of corrective actions.
• Monitors performance of one or more processes over time to detect trends, shifts, or cycles. • Allows a team to compare performance before and after implementation of a solution to measure its impact. • Focuses attention on truly vital changes in the process.
6. Control Chart Control charts, also known as Shewhart charts or process-behaviour charts, in statistical process control are tools used to determine whether or not a manufacturing or business process is in a state of statistical control.
7. Process chart It is an organized way of recording all the activities performed by a person, by a machine, at a workstation, with a customer, or on materials. Codes can be applied such as for operations, transport, inspection, delay, storage against e.g. Numbered steps, time, distance and step description.
Chapter 8 Monitor Data Quality
Monitoring data quality is an important sub-aspect of the Data Quality Life Cycle. It is based on the speciﬁed goals and rules and therefore on the current quality level obtained after the initial analysis carried out on the basis of data proﬁling and the initial cleansing of your data. Monitoring is not an end in itself but more or less serves as a sensor for data quality weaknesses, before they make themselves felt in the destination system. We can undertake the monitoring function by orienting itself towards the deﬁned data quality initiatives and general instructions as well any changes which may be required.
We can develop a simple 3-step monitoring system. This is a model for monitoring the process of data quality. Ultimately, data quality monitoring are based on a well-understood set of metrics which provides important knowledge about the value of the data in use. First of all these metrics need to be in right order. Data quality must be tracked, managed, and monitored if it is to improve business efﬁciency and transparency. Therefore, being able to measure and monitor data quality throughout the lifecycle and compare the results overtime is an essential ingredient in the proactive management of ongoing data quality improvement and data governance.
Figure 8.1: Monitor System
This paper on Data Quality Management System expresses the basic tasks of the management in the ﬁeld of techniques, tools and improvement of data quality. Organizations seeking relief from data problems often turn to technology for help. This is not the most eﬀective solution. Data quality is a behavioral problem, not a technology problem. To solve the data quality problem organizations need to change user behavior. A comprehensive program based on prevention, detection, correction and accountability is required. Deploying a data quality management program is not an easy task, but the rewards are enormous. Deploying a disciplined approach to managing data as an important asset will better position an organization to improve the productivity of its information workers and to better serve its customers. Strong frameworks and process are imperative for controlling data quality and for managing data, the most important asset of an organization. Additional validation procedures such as exception analysis and data reconciliation ensure high success rates in migrationrelated initiatives. The challenges associated with data quality control initiatives can be eﬀectively handled by implementing the recommended framework and process to control data quality. Maintaining data quality is no longer an option, particularly in today’s competitive and regulatory climate. With this in place, the six-phase program can be eﬀectively pursued for the management of data quality.
 Thomas Korner “Handbook on Data Quality Assessment Methods and Tools”, 2005, 3th Edition.
 Yang W. Lee “Total Data Quality Management: The case of IRI”,2001
 Suzanne M. Embury “Data Quality Control”,1999
 Theodore Johnson “Data-Quality& Data-Cleaning-An-Overview”,2006
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue listening from where you left off, or restart the preview.