Data Quality Concepts and Techniques Applied to Taxonomic Databases

 
 
 
 
 

by edalcin

Value This
Doc
Scribd
Average
     
Pages: 266 43
Words: 54391 13640
Characters: 355216 81678
Lines: 3319 623
     
     
Letters per word: 6.53 5.99
Words per line: 16.39 21.89
Words per page: 204.48 317.21

Add to your reading list

Flag_red Flag this document

Document Information

6,751 Reads | 0 Comments

Description

The thesis investigates the application of concepts and techniques of data quality in taxonomic databases to enhance the quality of information services and systems in taxonomy. Taxonomic data are arranged and introduced in Taxonomic Data Domains in
order to establish a standard and a working framework to support the proposed Taxonomic Data Quality Dimensions, as a specialised application of conventional Data
Quality Dimensions in the Taxonomic Data Quality Domains.

The thesis presents a discussion about improving data quality in taxonomic databases, considering conventional Data Cleansing techniques and applying generic data content error patterns to taxonomic data. Techniques of taxonomic error detection are explored, with special attention to scientific name spelling errors.

The spelling error problem is scrutinized through spelling error detecting techniques and algorithms. Spelling error detection algorithms are described and analysed. In order to
evaluate the applicability and efficiency of different spelling error detection algorithms, a suite of experimental spelling error detection tools was developed and a set of
experiments was performed, using a sample of five different taxonomic databases.

The results of the experiments are analysed from the algorithm and from the database point of view.

Database quality assessment procedures and metrics are discussed in the context of taxonomic databases and the previously introduced concepts of Taxonomic Data Domains and Taxonomic Data Quality Dimensions.

Four questions related to Taxonomic Database Quality are discussed, followed by conclusions and recommendations involving information system design and implementation and the processes involved in taxonomic data management and information flow.

Pdf_16x16 266 Pages


Date Added

02/20/2008

Category
Tags
Groups
Awards

Flame Rising

Copyright

Attribution Non-commercial

More info »

 

or use Facebook Connect

robot1984

You can draw a road-map for preparing a data test set for your software from this doc.

07 / 24 / 2008