You are on page 1of 66

DOCUMENT AND CONTENT MANAGEMENT

Document and Content Management


Document and Content Management is the control over capture,
storage, access, and use of data and information stored outside
relational databases
Strategic and tactical focus overlaps with other data management
functions in addressing the need for data governance,
architecture, security, managed metadata, and data quality for
unstructured data
Document and Content Management includes two sub-functions:
◦ Document management is the storage, inventory, and control of electronic
and paper documents. Document management encompasses the processes,
techniques, and technologies for controlling and organizing documents and
records, whether stored electronically or on paper
◦ Content management refers to the processes, techniques, and technologies
for organizing, categorizing, and structuring access to information content,
resulting in effective retrieval and reuse. Content management is
particularly important in developing websites and portals, but the
techniques of indexing based on keywords, and organizing based on
taxonomies, can be applied across technology platforms.
Business Driver and Guideline
 The primary business drivers for document and content
management include regulatory compliance, the ability to
respond to litigation and e-discovery requests, and
business continuity requirements.
 In 2009, ARMA International, a not-for-profit professional
association for managing records and information,
published a set of Generally Acceptable Recordkeeping
Principles® (GARP)45 that describes how business
records should be maintained. It also provides a
recordkeeping and information governance framework
with associated metrics. There are: accountability,
integrity, protection, compliance, availability, retention,
disposition, transparency.
Document and Content Management -
Principles
 Everyone in an organization has a role to play in
protecting its future. Everyone must create, use,
retrieve, and dispose of records in accordance with
the established policies and procedures
 Experts in the handling of records and content
should be fully engaged in policy and planning.
Regulatory and best practices can vary significantly
based on industry sector and legal jurisdiction
 Even if records management professionals are not
available to the organization, everyone can be
trained and have an understanding of the issues.
Once trained, business stewards and others can
collaborate on an effective approach to records
management
Document and Content Management
 A document management system is an application used to
track and store electronic documents and electronic
images of paper documents
 Document management systems commonly provide
storage, versioning, security, metadata management,
content indexing, and retrieval capabilities
 A content management system is used to collect, organize,
index, and retrieve information content; storing the
content either as components or whole documents, while
maintaining links between components
 While a document management system may provide
content management functionality over the documents
under its control, a content management system is
essentially independent of where and how the documents
are stored
Document / Record Management
 Document / Record Management is the life cycle
management of the designated significant
documents of the organization
 Records can
 Physical such as documents, memos, contracts, reports or
microfiche. Electronic such as email content, attachments,
and instant messaging
 Content on a website
 Documents on all types of media and hardware
 Data captured in databases of all kinds
 More than 90% of there cords created today are
electronic
 Growth in email and instant messaging has made the
management of electronic records critical to an
organization
Document / Record Management
The lifecycle of Document / Record Management
includes:
◦ Identification of existing and newly created documents /
records
◦ Creation, Approval, and Enforcement of documents / records
policies
◦ Classification of documents / records
◦ Documents / Records Retention Policy
◦ Storage: Short and long term storage of physical and electronic
documents / records
◦ Retrieval and Circulation: Allowing access and circulation of
documents / records in accordance with policies, security and
control standards, and legal requirements
◦ Preservation and Disposal: Archiving and destroying
documents / records according to organizational needs, statutes,
and regulations
What is Content?
“It is the information that is setting up competitive
differentiation, not specifically products and
processes. It is the information around both these
things that creates a competitive advantage.”
Enterprise Content
In any Organizations content could exists in
Multitude formats like
◦Images,
◦Text documents,
◦Web pages,
◦Spreadsheets,
◦Presentations,
◦Graphics, Drawings,
◦e-mail, Video, and Multimedia
Content can be defined as Corporate
Knowledge stored in any of the above format.
Content Types
Mainly content could be of two types

Structured Content Unstructured Content

◦ From emails and instant ◦ Both “static” and “dynamic”


messages to electronic forms, ◦ From scanned images and
XML and even business process electronic documents to complex
data forms and even rich media
◦ Alphanumeric information that
can be sorted in a Database
Examples: lists, addresses,
resumes, Records etc
Why there is a need for managing
content?
The Problem with Content
Content is everywhere
◦Voluminous
◦Erroneous
◦Inconsistent
◦Risky
◦Formal/informal
◦Multiple formats
Often uncoordinated with business goals
Content is Everywhere

e-Mail Servers
Paper
Files
Business
Document Systems
Repositories
Microfilm
Photographs

Imaging
Local Drives
Repositories

Video Libraries
File Systems
Web
Servers
Need for Managing Content
 Legislative requirements and audits
 Need to exert control over an abundant
volume of records and documents
 Need to automate business processes
 Need for solutions to help with the
process of authoring and publishing
the information online
The Need For Managing the Content comes up with next
question:

How to Manage the Content?

This is when Content Management becomes


Enterprise Content Management
Content Management
Organization, categorization, and structure of data
/ resources so that they can be stored, published,
and reused in multiple ways
Includes data / information, that exists in many
forms and in multiple stages of completion within
its lifecycle
Content management systems manage the content
of a website or intranet through the creation,
editing, storing, organizing, and publishing of
content
Enterprise Content Management consists of the technologies and
tools used to capture, manage, store, preserve and deliver content
across the enterprise

Capture Store Deliver

Manage Preserve

The Promise of Content Management


 Managing Content Effectively means Considering its full Lifecycle:
 Authoring and Creation
 Review and Approval
 Publishing and Distribution
 Archiving
ECM Process
Final
Review Edit
Create Retrieve, View, Annotate,
Document e-Mail, Fax

Versions Publish to Web

Capture Distribute, Present to Web


Reports,
Statements
Scan
Workflow Process
Paper
Index Images
Capture
e-Mail,
Web Transaction Content Declare Retention Disposition
Record
Activities
① Plan for lifecycle management
1) Plan for records management
2) Develop a content strategy
3) Create content handling policies
4) Define content information architecture
② Manage the lifecycle
1) Capture records and content
2) Manage versioning and control
3) Backup and recovery
4) Manage retention and disposal
5) Audit documents / records
③ Publish and deliver content
1) Provide access, search, and retrieval
2) Deliver through acceptable channels
(1) Plan for Lifecycle Management
 Plan document life cycle from creation or receipt, organization
for retrieval, distribution and archiving or disposition
 Develop classification / indexing systems and taxonomies so
that the retrieval of documents is easy
 Create planning and policy around documents and records on
the value of the data to the organization and as evidence of
business transactions
 Identify the responsible, accountable organizational unit for
managing the documents / records
 Develop and execute retention plan and policy to archive, such
as selected records for long-term preservation
 Records are destroyed at the end of their life cycle according to
operational needs, procedures, statutes and regulations
(1.3) Create content handling policies
 Policies codify requirements by describing principles, direction,
and guidelines for action.
 Most document management programs have policies related to:
 Scope and compliance with audits
 Identification and protection of vital records
 Purpose and schedule for retaining records (a.k.a retention
schedule)
 How to respond to information hold orders (special protection
orders); these are requirements for retaining information for a
lawsuit, even if retention schedules have expired
 Requirements for onsite and offsite storage of records
 Use and maintenance of hard drive and shared network drives
 Email management, addressed from content management
perspective
 Proper destruction methods for records (e.g., with pre-approved
vendors and receipt of destruction certificates)
(1.4) Define Enterprise Taxonomies
(Information Content Architecture)
Process of creating a structure for a body of
information or content
Contains a controlled vocabulary that can help with
navigation and search systems
Content Architecture identifies the links and
relationships between documents and content,
specifies document requirements and attributes and
defines the structure of content in a document or
content management system
(2.1) Capture Records and Content

 Documents can be created within a document management system or


captured via scanners or OCR software
 Electronic documents must be indexed via keywords or text during the
capture process so that the document can be found.
 Once the content has been described by metadata / key word tagging
and classified within the appropriate Information Content Architecture,
it is available for retrieval and use.
 Finding unstructured data can be eased through portal technology
 A document repository enables check-in and check-out features,
versioning, collaboration, comparison, archiving, status state(s),
migration from one storage media to another and disposition
 Document management can support different types of workflows:
 Manual workflows that indicate where the user sends the document
 Rules-based workflow, where rules are created that dictate the flow of the
document within an organization
 Dynamic rules that allow for different workflows based on content
(2.3) Backup and Recovery

The document / record management system needs to be


included as part of the overall corporate backup and
recovery activities for all data and information
Document / records manager be involved in risk
mitigation and management, and business continuity
especially regarding security for vital records
A vital records program provides the organization with
access to the records necessary to conduct its business
during a disaster and to resume normal business
afterward
(2.4) Manage Retention and Disposal

Defines the period of time during which documents /


records for operational, legal, financial or historical
value must be maintained
Specifies the processes for compliance, and the
methods and schedules for the disposition of
documents / records
Must deal with privacy and data protection issues
Legal and regulatory requirements must be considered
when setting up document record retention schedules
(2.5) Audit Document / Records
Document / records management requires auditing on a periodic basis to
ensure that the right information is getting to the right people at the right
time for decision making or performing operational activities
◦ Inventory - Each location in the inventory is uniquely identified
◦ Storage - Storage areas for physical documents / records have adequate space to
accommodate growth
◦ Reliability and Accuracy - Spot checks are executed to confirm that the documents /
records are an adequate reflection of what has been created or received
◦ Classification and Indexing Schemes - Metadata and document file plans are well
described
◦ Access and Retrieval - End users find and retrieve critical information easily
◦ Retention Processes - Retention schedule is structured in a logical way
◦ Disposition Methods - Documents / records are disposed of as recommended
◦ Security and Confidentiality - Breaches of document / record confidentiality and
loss of documents / records are recorded as security incidents and managed
appropriately
◦ Organizational Understanding of Documents / Records Management -
Appropriate training is provided to stakeholders and staff as to the roles and
responsibilities related to document / records management
METADATA MANAGEMENT
Objectives

Introduce the concept of metadata


Introduce metadata management
function
Introduce metadata standards

DATA DEVELOPMENT [2016]


Metadata defined
 Metadata is ― data about data
 Metadata is information about the physical data,
technical and business processes, data rules and
constraints, and logical and physical structures of the
data, as used by an organization. These descriptive
tags describe data (e.g. databases, data elements,
data models), concepts (e.g. business processes,
application systems, software code, technology
infrastructure), and the connections (relationships)
between the data and concepts.
 Metadata helps an organization understand its data,
its systems, and its workflows.

DATA DEVELOPMENT [2016] 30


Examples of metadata
Very Poor Metadata
DATA DEVELOPMENT [2016] 33
Business Driver
Reliable, well- managed Metadata helps:
 Increase confidence in data by providing context and enabling the
measurement of data quality
 Increase the value of strategic information (e.g., Master Data) by
enabling multiple uses
 Improve operational efficiency by identifying redundant data and
processes
 Prevent the use of out-of-date or incorrect data
 Reduce data-oriented research time
 Improve communication between data consumers and IT professionals
 Create accurate impact analysis thus reducing the risk of project
failure
 Improve time-to-market by reducing system development life-cycle
time
 Reduce training costs and lower the impact of staff turnover through
thorough documentation of data context, history, and origin
 Support regulatory compliance
Business driver
Poorly managed Metadata leads to:
 Redundant data and data management processes
 Replicated and redundant dictionaries, repositories,
and other Metadata storage
 Inconsistent definitions of data elements and risks
associated with data misuse
 Competing and conflicting sources and versions of
Metadata which reduce the confidence of data
 consumers
 Doubt about the reliability of Metadata and data
Goals
 Document and manage organizational knowledge of
data-related business terminology in order to ensure
people understand data content and can use data
consistently
 Collect and integrate Metadata from diverse sources to
ensure people understand similarities and differences
between data from different parts of the organization
 Ensure Metadata quality, consistency, currency, and
security
 Provide standard ways to make Metadata accessible to
Metadata consumers (people, systems, and processes)
 Establish or enforce the use of technical Metadata
standards to enable data exchange
Principles
 Organizational commitment: Secure organizational commitment to
Metadata management as part of an overall strategy to manage data as
an enterprise asset.
 Strategy: Develop a Metadata strategy that accounts for how Metadata
will be created, maintained, integrated, and accessed.
 Enterprise perspective: Take an enterprise perspective to ensure
future extensibility, but implement through iterative and incremental
delivery to bring value.
 Socialization: Communicate the necessity of Metadata and the
purpose of each type of Metadata
 Access: Ensure staff members know how to access and use Metadata.
 Quality: Recognize that Metadata is often produced through existing
processes and hold process owners accountable for the quality of
Metadata.
 Audit: Set, enforce, and audit standards for Metadata to simplify
integration and enable use.
Types of metadata
Four major types:
◦Business
◦Technical and operational
◦Process
◦Data stewardship

DATA DEVELOPMENT [2016] 38


Types of metadata
Business metadata
 includes the business names and definitions of subject and concept areas,
entities, and attributes; attribute data types and other attribute properties; range
descriptions; calculations; algorithms and business rules; and valid domain
values and their definitions
 Examples:
 Definitions and descriptions of data sets, tables, and columns
 Business rules, transformation rules, calculations, and derivations
 Data models
 Data quality rules and measurement results
 Schedules by which data is updated
 Data provenance and data lineage
 Data standards
 Designations of the system of record for data elements
 Valid value constraints
 Stakeholder contact information (e.g., data owners, data stewards)
 Security/privacy level of data
 Known issues with data
 Data usage notes

DATA DEVELOPMENT [2016] 39


Types of metadata
Technical and operational meta-data
 Provides developers and technical users with information about their systems.
 Technical meta-data includes physical database table and column names, column
properties, other database object properties, and data storage.
 Operational meta-data is targeted at IT operations users‘ needs, including information
about data movement, source and target systems, batch programs, job frequency,
schedule anomalies, recovery and backup information, archive rules, and usage.
 Examples:
 Physical database table and column names
 Column properties
 Database object properties
 Access permissions
 Data CRUD (create, replace, update and delete) rules
 Physical data models, including data table names, keys, and indexes
 Documented relationships between the data models and the physical assets
 ETL job details
 File format schema definitions
 Source-to-target mapping documentation
 Data lineage documentation, including upstream and downstream change impact information

DATA DEVELOPMENT [2016] 40


Types of data … examples Technical and
operational meta data
 Program and application names and descriptions
 Content update cycle job schedules and dependencies
 Recovery and backup rules
 Data access rights, groups, roles
 Logs of job execution for batch programs
 History of extracts and results
 Schedule anomalies
 Results of audit, balance, control measurements
 Error Logs
 Reports and query access patterns, frequency, and execution time
 Patches and Version maintenance plan and execution, current patching level
 Backup, retention, date created, disaster recovery provisions
 SLA requirements and provisions
 Volumetric and usage patterns
 Data archiving and retention rules, related archives
 Purge criteria
 Data sharing rules and agreements
 Technical roles and responsibilities, contacts
Types of metadata
Process metadata
 defines and describes the characteristics of other system
elements (processes, business rules, programs, jobs, tools,
etc.)
 Examples of process meta-data include:
 Data stores and data involved.
 Government / regulatory bodies.
 Organization owners and stakeholders.
 Process dependencies and decomposition.
 Process feedback loop documentation.
 Process name.
 Process order and timing.
 Process variations due to input or timing.
 Roles and responsibilities.
 Value chain activities.

DATA DEVELOPMENT [2016] 42


Types of metadata
Data stewardship meta-data
 Data about data stewards, stewardship processes, and responsibility
assignments.
 Data stewards assure that data and meta-data are accurate, with high
quality across the enterprise. They establish and monitor sharing of
data.
 Examples:
 Business drivers / goals.
 Data CRUD rules.
 Data definitions - business and technical.
 Data owners.
 Data sharing rules and agreements / contracts.
 Data stewards, roles and responsibilities.
 Data stores and systems involved.
 Data subject areas.
 Data users.
 Government / regulatory bodies.
 Governance organization structure and responsibilities

DATA DEVELOPMENT [2016] 43


Metadata for Unstructured Data
Unstructured data is any data that is not in a
database or data file, including documents or other
media data
Metadata classification:
◦ descriptive meta-data: describe a resource for purposes of
discovery and identification
◦ catalog information, thesauri keyword terms.
◦ structural meta-data:
◦ Dublin Core, field structures, format(audio/visual, booklet),
thesauri keyword labels, XML schemas.
◦ administrative meta-data: provides information to help
manage a resource
◦ source(s), integration/update schedule, access rights, page
relationships (e.g. site navigational design).

DATA DEVELOPMENT [2016] 44


Sources of Metadata
Meta-data is everywhere in every data management activity
Sources of metadata:
 Primary sources: anything named in an organization
 Application metadata repositories
 Business glossary
 Business intelligence (BI) tools
 Configuration management tools
 Data dictionaries
 Data integration tools
 Database management and system catalogs
 Data mapping management tools
 Data quality tools
 Directories and catalogs
 Event messaging tools
 Modeling tools and repositories
 Reference data repositories
 Secondary source: other meta-data repositories, accessed using bridge software.
 Service registries
45
Types of Metadata architecture
 Architectural layers of metadata management solutions/systems:
 metadata creation / sourcing
 metadata integration
 one or more metadata repositories
 metadata delivery
 metadata usage,
 metadata control / management.
 The architecture should provide a single access point for the metadata
repository.
 The access point must supply all related metadata resources
transparently to the user.
 Three technical architectural approaches to building a common meta-
data repository:
 Centralized:
 Distributed
 Hybrid
 Bi-directional
Centralized Metadata Architecture
Centralized Metadata Architecture
◦ A centralized architecture consists of a single meta-data repository that
contains copies of the live meta-data from the various sources

Advantages of a centralized repository include: Some limitations of the centralized approach include:
• High availability, since it is independent of the • Complex processes are necessary to ensure that
source systems. changes in source meta-data quickly replicate into
• Quick meta-data retrieval, since the repository and the repository.
the query reside together. • Maintenance of a centralized repository can be
• Resolved database structures that are not affected substantial.
by the proprietary nature of third party or • Extraction could require custom additional
commercial systems. modules or middleware.
• Extracted meta-data may be transformed or • Validation and maintenance of customized code
enhanced with additional meta-data that may not can increase the demands on both internal IT staff
reside in the source system, improving quality. and the software vendors.

DATA DEVELOPMENT [2016] 47


Distributed Metadata Architecture
Distributed Meta-data Architecture
◦ The meta-data management environment maintains the necessary source system
catalogs and lookup information needed to process user queries and searches
effectively.
◦ A common object request broker or similar middleware protocol accesses these source
systems
Advantages of distributed meta-data architecture include: The following limitations exist for
• Meta-data is always as current and valid as possible. distributed architectures:
• Queries are distributed, possibly improving response / • No enhancement or standardization of
process time. meta-data is necessary between
• Meta-data requests from proprietary systems are limited systems.
to query processing rather than requiring a detailed • Query capabilities are directly affected
understanding of proprietary data structures, therefore
by the availability of the participating
minimizing the implementation and maintenance effort
source systems.
required.
• Development of automated meta-data query processing is • No ability to support user-defined or
likely simpler, requiring minimal manual intervention. manually inserted meta-data entries
• Batch processing is reduced, with no meta-data replication since there is no repository in which to
or synchronization processes. place these additions.

DATA DEVELOPMENT [2016] 48


Hybrid Metadata Architecture
Hybrid Metadata Architecture
◦ Metadata still moves directly from the source systems into a repository.
◦ The repository design only accounts for the user-added metadata, the
critical standardized items, and the additions from manual sources.
Advantages:
◦ near-real-time retrieval of meta-data from its source
◦ enhanced meta-data to meet user needs most effectively, when needed.
◦ lowers the effort for manual IT intervention and custom-coded access
functionality to proprietary systems
◦ the meta-data is as current and valid as possible at the time of use, based on
user priorities and requirements
Disadvantages
◦ The availability of the source systems is a limit, because the distributed
nature of the back-end systems handles processing of queries
◦ Additional overhead is required to link those initial results with meta-data
augmentation in the central repository before presenting the result set to
the end user.

DATA DEVELOPMENT [2016] 49


Bi-directional Metadata Architecture
 allows Metadata to change in any part of the
architecture (source, data integration, user interface)
and then feedback is coordinated from the repository
(broker) into its original source.
Metadata History 1990 - 2008
In the 1990s, some business managers finally began to recognize the value
of metadata repositories
The mid to late 1990, standardize metadata definition and exchange
between applications in the enterprise were begun
◦ CASE Definition Interchange Facility (CDIF) developed by the Electronics
Industries Alliance (EIA) in 1995
◦ Dublin Core Metadata Elements developed by the Dublin Core Metadata
Initiative (DCMI) in 1995 in Dublin, Ohio.
◦ ISO 11179 standard for Specification and Standardization of Data Elements were
published in 1994 through 1999
The early years of the 21st century saw the update of existing meta-data
repositories for deployment on the web
◦ many data integration vendors began focusing on metadata as an additional
product offering, although not many organizations using it
At the current decade, focus is expanding on how to incorporate meta-
data beyond the traditional structured sources and include unstructured
sources

DATA DEVELOPMENT [2016] 51


Activities
① Define metadata strategy
② Understand metadata requirements
③ Define metadata architecture
1)Create metamodel
2)Apply metadata standards
3)Manage metadata stores
④ Create and maintain metadata
1)Integrate metadata
2)Distribute and deliver metadata
⑤ Query, report, and analyze metadata
(1) Define Metadata Strategy
 A metadata strategy is a statement of direction in meta-
data management by the enterprise.
 The primary focus of the meta-data strategy is to gain an
understanding of and consensus on the organization‘s
key business drivers, issues, and information
requirements for the enterprise metadata program.
 The objectives of the strategy define the organization‘s
future enterprise meta-data architecture
 Metadata strategy development phases:
 Meta-data Strategy Initiation and Planning
 Conduct Key Stakeholder Interviews
 Assess Existing Meta-data Sources and Information Architecture
 Develop Future Meta-data Architecture
o organization structure, including data governance and stewardship
alignment recommendations; managed meta-data architecture; meta-data
delivery architecture; technical architecture; and security architecture.
 Develop Phased MME (managed meta-data environment)
Implementation Strategy and Plan

53
(2) Understand Metadata Requirements
Business User Requirements
◦ Business users require improved understanding of the information from operational
and analytical systems
◦ Business users must understand the intent and purpose of meta-data management.
◦ To provide meaningful business requirements, users must be educated about the differences between
data and meta-data
◦ Critical to metadata management success is the establishment of a data governance
organization
◦ Example: what is the meaning of royalty? If it is confidential or considered
competitive information, the metadata should be managed carefully
Technical User Requirements
◦ High-level technical requirement topics include:
◦ Daily feed throughput: size and processing time.
◦ Existing metadata.
◦ Sources – known and unknown.
◦ Targets.
◦ Transformations.
◦ Architecture flow – logical and physical.
◦ Non-standard metadata requirements.
◦ Technical users include Database Administrators (DBAs), Meta-data Specialists and
Architects, IT support staff, and developers

54
DATA DEVELOPMENT [2016]
(3) Define Metadata Architecture
 Create meta-model
 Create a data model for the Metadata repository, or meta-model
 Different levels of meta-model may be developed as needed; a
high-level conceptual model, that explains the relationships
between systems, and a lower level meta-model that details the
attributions, to describe the elements and processes of a model.
 Apply metadata standards
 The Metadata solution should adhere to the agreed-upon internal
and external standards as identified in the Metadata strategy
 Manage metadata stores
 Implement control activities to manage the Metadata environment.
Control of repositories is control of Metadata movement and
repository updates performed by the Metadata specialist. Control
activities should have data governance oversight.
(3.2) Meta-data Standards Types
Major two types:
◦industry or consensus standards; and
◦international standards.

High Level Standards


Framework

DATA DEVELOPMENT [2016] 56


(3.2) Meta-data Standards Types
Example of industry standards:
◦ OMG specifications:
◦ Common Warehouse Meta-data (CWM): Specifies the interchange of meta-data among data warehousing,
BI, KM, and portal technologies.
◦ Information Management Metamodel (IMM): The next iteration of CWM
◦ MDC Open Information Model (OIM).
◦ The Extensible Markup Language (XML)
◦ Unified Modeling Language (UML) is the formal specification language for OIM
◦ Structured Query Language (SQL): The query language for OIM.
◦ Extensible Markup Interface (XMI)
◦ Ontology Definition Metamodel (ODM)

DATA DEVELOPMENT [2016] 57


(3.2) Meta-data Standards Types
Example of industry standards (cont’):
◦ World Wide Web Consortium (W3C) specifications: W3C has established the RDF (Relational Definition
Framework)
◦ Dublin Core
◦ Distributed Management Task Force (DTMF)
◦ Meta-data standards for unstructured data are:
◦ ISO 5964 - Guidelines for the establishment and development of multilingual thesauri.
◦ ISO 2788 - Guidelines for the establishment and development of monolingual thesauri.
◦ ANSI/NISO Z39.1 - American Standard Reference Data and Arrangement of Periodicals.
◦ ISO 704 - Terminology work – Principles and methods
◦ Geospatial standards grew from a global framework called the Global Spatial Data Infrastructure, maintained
by the U.S. Federal Geographic Data Committee (FGDC)
◦ Etc.
Example of international standards:
◦ International Organization for Standardization ISO / IEC 11179 that describes the standardizing and
registering of data elements to make data understandable and shareable
◦ to give concrete guidance on the formulation and maintenance of discrete data element descriptions and
semantic content (meta-data) that is useful in formulating data elements in a consistent, standard manner

DATA DEVELOPMENT [2016] 58


(3.3) Manage Metadata Repositories
Meta-data repository refers to the physical
tables in which the meta-data are stored
A Directory is a type of meta-data store that
limits the meta-data to the location or source of
data in the enterprise
A Glossary typically provides guidance for use
of terms, and a thesaurus can direct the user
through structural choices involving three
kinds of relationships: equivalence, hierarchy,
and association

DATA DEVELOPMENT [2016] 59


(3.3) Manage Metadata Repositories
Control activities include:
◦ Backup, recovery, archive, purging.
◦ Configuration modifications.
◦ Education and training of users and data stewards.
◦ Job scheduling / monitoring.
◦ Load statistic analysis.
◦ Management metrics generation and analysis.
◦ Performance tuning.
◦ Quality assurance, quality control.
◦ Query statistics analysis.
◦ Query / report generation.
◦ Repository administration.
◦ Security management.
◦ Source mapping / movement.
◦ Training on the control activities and query / reporting.
◦ User interface management.
◦ Versioning.
◦ Linking data sets Metadata maintenance – for NOSQL provisioning
◦ Linking data to internal data acquisition – custom links and job Metadata
◦ Licensing for external data sources and feeds
◦ Data enhancement Metadata, e.g., Link to GIS

DATA DEVELOPMENT [2016] 60


(4) Create and Maintain Metadata
 Software package vs custom solution
 Meta-data creation and update by authorized users and
programs
 An audit process validates activities and reports
exceptions.
 Metadata should be trusted its quality. Low-quality meta-
data creates:
 Replicated dictionaries / repositories / meta-data storage.
 Inconsistent meta-data.
 Competing sources and versions of meta-data ―truth‖.
 Doubt in the reliability of the meta-data solution systems
 High quality meta-data creates:
 Confident, cross-organizational development.
 Consistent understanding of the values of the data resources.
 Meta-data ―knowledge‖ across the organization

DATA DEVELOPMENT [2016] 61


(4.1) Integrate Metadata
Gather and consolidate meta-data from across the enterprise, including meta-data
from data acquired outside the enterprise.
Integrate extracted meta-data from a source meta-data store with other relevant
business and technical meta-data into the meta-data storage facility.
◦ Meta-data can be extracted using adaptors / scanners, bridge applications, or by directly
accessing the meta-data in a source data store.
◦ Adaptors are available with many third party vendor software tools, as well as from the
meta-data integration tool selected. In some cases, adaptors must be developed using the tool
API‘s.
Accomplish repository scanning in two distinct manners.
1. Proprietary interface: In a single-step scan and load process, a scanner collects the meta-data
from a source system, then directly calls the format-specific loader component to load the
meta-data into the repository. In this process, there is no format-specific file output and the
collection and loading of metadata occurs in a single step.
2. Semi-Proprietary interface: In a two-step process, a scanner collects the metadata from a
source system and outputs it into a format-specific data file. The scanner only produces a
data file that the receiving repository needs to be able to read and load appropriately. The
interface is a more open architecture, as the file is readable by many methods.

DATA DEVELOPMENT [2016] 62


(4.1) Integrate Metadata
A scanning process produces and leverages several types of files
during the process.
1. Control file: Containing the source structure of the data model.
2. Reuse file: Containing the rules for managing reuse of process loads.
3. Log files: Produced during each phase of the process, one for each scan
/ extract and one for each load cycle.
4. Temporary and backup files: Use during the process or for traceability.
(4.2) Distribute and Deliver Metadata
The meta-data delivery layer is responsible for the
delivery of the meta-data from the repository to
the end users and to any applications or tools that
require meta-data feeds to them.
Some delivery mechanisms:
◦ Meta-data intranet websites for browse, search, query,
reporting, and analysis.
◦ Reports, glossaries, other documents, and websites.
◦ Data warehouses, data marts, and BI tools.
◦ Modeling and software development tools.
◦ Messaging and transactions.
◦ Applications.
◦ External organization interface solutions ( e.g. supply
chain solutions).

DATA DEVELOPMENT [2016] 64


(5) Query, Report and Analyze Meta-data
Metadata guides:
◦ How we use data asset
◦ business intelligence (reporting and analysis), business decisions
(operational, tactical, strategic), and in business semantics (what we say,
what we mean – ‘business lingo‘)
◦ How we manage data asset
◦ Data governance processes use metadata to control and govern.
◦ Information system implementation and delivery uses metadata to add,
change, delete, and access data.
◦ Data integration (operational systems, DW / BI systems) refers to data by
its tags or meta-data to achieve that integration.
◦ Etc
A meta-data repository must have a front-end
application that supports the search-and retrieval
functionality required for all this guidance and
management of data asset

DATA DEVELOPMENT [2016] 65


Standard Metadata Metrics
Some suggested metrics on meta-data environments include:
◦ Meta-data Repository Completeness: Compare ideal coverage of the enterprise
meta-data (all artifacts and all instances within scope) to actual coverage.
◦ Meta-data Documentation Quality: Automatic methods include performing
collision logic on two sources, measuring how much they match, and the trend
over time. Another metric would measure the percentage of attributes that
have definitions, trending over time. Manual methods include random or
complete survey, based on enterprise definitions of quality.
◦ Master Data Service Data Compliance: Meta-data on the data services assists
developers in deciding when new development could use an existing service.
◦ Steward Representation / Coverage: Organizational commitment to meta-data
as assessed by the appointment of stewards, coverage across the enterprise for
stewardship, and documentation of the roles in job descriptions.
◦ Meta-data Usage / Reference: User uptake on the meta-data repository usage
can be measured by simple login measures.
◦ Meta-data Management Maturity: Metrics developed to judge the meta-data
maturity of the enterprise, based on the Capability Maturity Model (CMM)
approach to maturity assessment.
◦ Meta-data Repository Availability: Uptime, processing time (batch and query).

DATA DEVELOPMENT [2016] 66

You might also like