You are on page 1of 22

Introduction on Data Dictionary

Johannes Paulus “JP” Acuna


Data Architect
Department of Information and Communications Technology
©NGP Project
Revisiting Some Concepts
Data

Re-interpretable representation of information in a formalized manner suitable


for communication, interpretation or processing (ISO 11179 - Metadata Registry
MDR standard)

Data is a means of representation; it stands for things other than itself (Chisholm,
2010)

Data is both an interpretation of the objects it represents and an object that must
be interpreted (Sebastian-Coleman, 2013)
Revisiting Some Concepts
Data Management Principles
Revisiting Some Concepts
Data Management is a shared responsibility between the data management
professionals within Information Technology (IT) organizations and the
business data stewards representing the collective interests of data
producers and information consumers.

Data stewards serve as the appointed trustees for data assets.

Data management professionals serve as the expert curators and technical


custodians of these data assets.
Revisiting Some Concepts
Revisiting Some Concepts
Metadata is data about data. This encompasses information about technical and
business processes, data rules and constraints, as well as logical and physical data
structures.

Metadata Management - Planning, implementation and control activities to


enable access to high quality and integrated metadata

Goals
1. Provide organizational understanding of business terms and usage
2. Collect and integrated metadata from diverse sources
3. Provide a standard way to access metadata
4. Ensure metadata quality and security
Stovepipe Systems
Stovepipe System (Voivoda, 2011)
– a computer system whose functionality and processes are
narrowly focused to provide specific data to a specific
recipient

Stovepipes tend to evolve over time and fly under the


radar. They usually go unnoticed until a stakeholder
makes a poor decision based on data that someone in
the organization provided.
Stovepipe Systems
Data becomes stovepiped for many reasons
– if response times are too slow for one department, data may be
held redundantly for performance purposes
– one person or department may require data that is slightly
different than what was already contained in a data store;
geographic considerations may play a role in separating the data
– people download data from a department data store to a
desktop application in order to manipulate the numbers and
then label that spreadsheet a "system" or authoritative source
The Stovepipe System Assessment
Redundancy
– Data redundancy occurs when a data element is
stored in more than one location or database at the
same time
– This creates issues with the reliability of the data
being retrieved
– Quite simply, the question becomes, “which
occurrence of the data is correct?”
The Stovepipe System Assessment
Authoritative data
– This is "officially recognized data that can be certified
and provided by an authoritative source"
– In other words, authoritative data is data that your
organization provides and that is accepted by the
consumer as reliable and accurate
– If data is stored redundantly and/or must be
converted in order to be presented, you are
vulnerable to irregularities, mistakes, and errors.
The Stovepipe System Assessment
Stewardship
– Data stewardship is the responsible management
of all aspects of data and related metadata.
– When multiple people update redundant data
elements in multiple data stores, things quickly
spiral out of control.
The Stovepipe System Assessment
Accessibility
– Accessibility addresses the ability or authority to
access, view, and update the data contained in the
data stores.
– Accessibility should be restricted or constrained to
only those persons who need access to the data.
– If one department can update the same data that is
stored but restricted in another department, the data
quickly becomes out of sync and inaccurate.
The Stovepipe System Assessment
Transfer
– Data transfer issues arise when the format of data
stored in the sending entity is not the same as the
format in the receiving entity.
The Stovepipe System Assessment
Timeliness
• Timeliness addresses the issue of retrieving the required
data in a timeframe that allows the decision maker ample
time to review, analyze, and make the correct decision
based on accurate and reliable information.
• If the data has to be retrieved from several different data
stores and then massaged, converted, and reformatted,
chances are your stakeholder will not receive the
information in a timely manner.
Defining Data Dictionary
Also known as business glossary.

Any place where business and/or technical terms and definitions are stored.
Typically, data dictionaries are designed to store a limited set of available
meta-data, concentrating on the names and definitions relating to the
physical data and related objects of systems implemented or in development.
Defining Data Dictionary

Video Presentation: Data Management - Data Dictionaries


Defining Data Dictionary
The Data Dictionary is a registry of data elements and definitions. It is the
authoritative document containing the standard names and definitions of
available data sets of the Department. This document must contain other
metadata and serve as a single point of reference for executive,
management, technical and external data users.
Defining Data Dictionary

Video Presentation: What is Metadata Management @ 03:52 – 04:52 mark


Defining Data Dictionary
Column Name Description Remarks
Data Element Official name of data Element adopted by the Department Prior approval from steering committee is needed to
change or add entries for this column
Definition Official definition of data element adopted by the Department Prior approval from steering committee is needed to
change or add entries for this column
Source of Definition Source document used to define the data element. This can be Prior approval from steering committee is needed to
Laws, International Conventions/Agreements, Internal Department change or add entries for this column
Orders et al.
Frequency of Update How often the data is collected and updated Prior approval from steering committee is needed to
change or add entries for this column
Data Type Data representation. This can be Text, Numeric, Date, Image or Managed and maintained by the TWG
other data types identified by the Department
Data Format Constrained on the data type. Examples are Date [MM-DD-YYYY], Managed and maintained by the TWG
Name [First Name, Middle Name, Last Name, Extension Name]

Possible Values Pre-listing of expected data. Usually applicable for pre-determined Managed and maintained by the TWG
selections. Examples are; [1-Yes, 2-No] [1-Male, 2-Female]

Data/System/Application Name of information system where the data can be found Managed and maintained by the TWG
Database Source Name of database where the data is stored Managed and maintained by the TWG
Table Name Explicit Name of database table where data can be found Managed and maintained by the TWG
Column Name Explicit Name of table column where data can be found Managed and maintained by the TWG
Defining Data Dictionary

Video Presentation: Top 5 Things to Capture into a Data Dictionary


Case Reference
PhilHealth Joins International Collaboration on Data Dictionary
Initiative (2012)
https://www.philhealth.gov.ph/news/2012/data_dictionary.html

"Without a data dictionary, confusion and misinterpretations are common


especially among stakeholders who may use the same terms but assign
them different meanings. The HDD is one way to bring them on the same
page," says PhilHealth Chief Information and Technology Executive Dr. Alvin
B. Marcelo.
Case Reference
Open Health Data Dictionary (Open HDD)
http://openhdd.org/index.html

"OpenHDD makes relevant data available for clinical decision-making,


research, public health and quality reporting. More importantly, the
standardized data is reliable—understood by the IT systems that exchange it
and trusted by users who need it." Dr. Arturo Alcantara, Chief, Task Force
Informatics PhilHealth, the Philippines

You might also like