You are on page 1of 7

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/327321444

Using Dublin Core Metadata to Move from Need to Know to Need to Share

Conference Paper · July 2017

CITATIONS READS
0 189

2 authors, including:

Kevin Lynch
Raytheon Technologies
8 PUBLICATIONS   33 CITATIONS   

SEE PROFILE

All content following this page was uploaded by Kevin Lynch on 30 August 2018.

The user has requested enhancement of the downloaded file.


Proceedings of The 21st World Multi-Conference on Systemics, Cybernetics and Informatics (WMSCI 2017)
Approved for Public Release

Using Dublin Core Metadata to Move from Need to Know to Need to Share

Kevin Lynch and Randall Ramsey


Engineering & Information Technology
Raytheon Corporation
Tucson, AZ USA
{Kevin_J_Lynch, Ramsey}@raytheon.com

ABSTRACT shared with those who have a “Need to Know.” This


principle manifests itself in enterprise and information
A large aerospace manufacturer is using the fifteen-element Dublin architectures, business policies, and in the business culture
Core metadata as the basis for its semantic web, a linked data itself. Whether designing information systems or interacting
infrastructure that enables efficient and scalable integration of a
wide variety of information sources. With mounting cost pressures
with individuals, priority has historically been given to
and global competition, expensive or time-consuming large-scale securing information first by limiting its access, and
information harmonization approaches are not an option; value secondarily, to sharing that information for increased
must be delivered quickly in the context of existing information business benefit. In this decade, however, our primary
systems. This paper describes the extension of the Dublin Core customer, the US Government, has recognized the
metadata to provide an internal semantic web for information importance of sharing information early, and has tried to
discovery, navigation and integration. While some technical details balance the security needs with costs and benefits, moving
are discussed, the focus is on the user experience with a linked data from a primary stance of “Need to Know” to a primary
infrastructure, using metadata for tagging, linking, and stance of “Need to Share.”[1] This represents a significant
classification. The value contributed by and derived by users of
metadata is described. The simplicity of the approach facilitates a
strategic shift in information management, and while
critical shift in information strategy, moving from “Need to Know” security concerns are still paramount, there is wide
to “Need to Share” quickly and cost-effectively, and places recognition the overall benefit of information sharing clearly
specific responsibility on the user community for information outweighs the cost of continuing to compartmentalize
sharing. This approach parallels the shift our primary customer has information. Users have a responsibility to share
been making in the last decade to increase speed and effectiveness information, and information system designers have a
while lowering cost and still maintaining a secure information responsibility to build those capabilities in their systems.
environment.
As part of our efforts to protect, preserve, and share critical
Keywords: metadata, Dublin Core, semantic web
information, we have begun to use the Dublin Core as a
standard way to define, produce, and consume metadata,
I. INTRODUCTION across repositories, across disciplines, behind the firewall,
In many engineering environments today, the user and across the web [2]. Metadata is useful for information
experience is defined by hundreds of applications and and application integration, tagging and classification, and
interfaces. Although arguably necessary for complex security. The Dublin Core addresses the most pressing
aerospace design and manufacturing, the time to navigate integration and security needs in a non-disruptive,
between applications is costly for the user, and the time to incremental fashion, and its semantic underpinnings enable
integrate information behind applications and interfaces is us to deploy scalable web-based solutions that are usable by
costly and time-consuming for developers and maintainers. both people and machines.
Ultimately, the speed of business is defined by the ability to
operate on and share information quickly, often in parallel II. APPROACH WITH DUBLIN CORE
for concurrent engineering efforts. Customers are Dublin Core metadata describes resources in a standard
demanding both increased speed and drastically lower cost way, facilitating information discovery and integration.
to develop and deliver products, and delivering actionable Resources can be anything, including people, concepts,
information most quickly and safely remains a core things, and web pages (in web terminology, a resource is
challenge. “anything addressable via a URL”). The Dublin Core’s
fifteen elements are the result of a decade’s work to identify
The nature of the aerospace industry requires protection of simple, interoperable, extensible, and modular metadata for
information that is facilitated by a siloed environment: resources. The Dublin Core design is flexible and is
difficult to penetrate, with common authentication but intended to be used as a basis for cooperation and extension.
separate authorization mechanisms that are complex and The Dublin Core is the most oft-used semantic vocabulary,
expensive to reconcile. Complicating matters further is a deriving directly from its simplicity and wide applicability.
long-standing principle that information should only be

1
Proceedings of The 21st World Multi-Conference on Systemics, Cybernetics and Informatics (WMSCI 2017)
Approved for Public Release

All elements are optional, can be repeated, and have no data so that it can be interlinked and become more useful. It
order. builds upon standard web technologies, but rather than using
them to serve web pages for human readers, it extends them
When describing a resource using the Dublin Core, to share information in a way that it can be processed
information is attached to that resource in a standard way automatically by computers. Linked data’s connections are
("Kevin” “is-author-of” “this-resource”). By using the identified by strongly typed relationships between data
semantic web underpinnings, one can easily find all places elements (such as "is-author-of") that enhance
where “Kevin” exists, and all the relationships “Kevin” has discoverability and navigation.
to other resources, such as “is-author-of.” A web browser is
all that is necessary to exploit a consistently applied Dublin Dublin Core metadata is added to objects whenever they
Core metadata library. The semantic web constructs formally enter the information lifecycle. From the user’s
eliminate concerns about the application being used, the perspective, as work products are added, users have three
repository the data exists in, or the underlying technology responsibilities:
that structures the data.
1) Publish and Protect their work product
We have extended the Dublin Core model into a core - get it into the shared information environment
metadata element set. The core consists of 19 metadata 2) Link their work product
terms, composed of essential Dublin Core elements and - provide context that adds value
3) Share their results
several specific elements required to support search and - share it with others to add velocity to the
access control (depicted in Figure 1). The core elements business
provide foundational metadata for all controlled assets and
must be applied consistently throughout the ecosystem. Publishing makes the work product formally available to
Application, project, and domain-specific metadata may be others. The publishing process assigns a unique identity to
included as needed to support business processes but must the work product. Creating a unique identity is transparent
not replace or conflict with core metadata elements. to the user, and results in an immediately useful identity
. (link) they can see and operate on. The link is permanent,
reliable, and web-accessible. This unique object can then be
shared, augmented, commented upon, and rated. Unlike the
past, users can immediately depend on the link that gets
created, in their communications, presentations, and
applications.

Linking provides additional context and value to users and


others. Linking and tagging are roughly synonymous, in
that both tags and links provide additional understanding
and usefulness. The difference between links and tags is
simply that links are made between objects, and tags are
added to objects. Users have the ability to link a new object
to an existing object, via a specific context, and they have
the ability to add a tag (a string of characters). Tags may be
populated from controlled vocabularies, or fully defined by
an application, but links between objects will typically
require more human judgment and flexibility.
Figure 1 - Core metadata element set
In the linking process, users are presented with a set of
The core metadata element set is itself published and suggested auto-assigned tags for the core metadata element
discoverable on our internal web so everyone can use it. As set, including their business, their application, and their
metadata is added to objects throughout the enterprise, we project (where applicable, depicted in Figure 2). Controlled
are able to consistently discover that information. For vocabularies are used for these metadata types, each of
instance, the subject element (dc:subject) is used to search which may have a mandatory subset. Decisions are quick
and browse metadata elements for topics, and all titles can and easy for the user to make in terms of tagging and
be searched through the title element (dc:title). linking, while users also have the ability to add new tags and
new relationship links they see the need for. This is an
This simple approach, using Dublin Core metadata with efficient form of information integration, as integration
semantic web-based representation and access, enables occurs semantically and incrementally, without incurring the
incremental construction of a linked data infrastructure. huge expense typically associated with large data integration
Linked data describes a method of publishing structured
2
Proceedings of The 21st World Multi-Conference on Systemics, Cybernetics and Informatics (WMSCI 2017)
Approved for Public Release

projects. Each time a relationship is added or an object is A) Deliverable 1 – Performance Requirements


linked, context is added and integration is easier, better
informed. The relationships suggested by our user The new requirements are published into the information
community help guide us to the kinds of tagging and linking environment. Standard metadata from the core metadata
that is most effective. Specific relationships provide element set is applied, including:
inferencing power on the associated data.
Element Value Auto- Notes
mated?
Identifier http://www.aerospaceco.com/... Yes A URI minting
scheme is
used to
generate a
unique
identifier
Title SX-3 Performance No
Requirements
Type Text Yes
Format application/ms-word Yes
Language English Yes Default, and
based on
textual
analysis
Author Barry Allen Yes Authentication
Figure 2 – Layered Metadata Model credentials
used
Publisher Systems Engineering, Systems No This is the
Sharing is based on the user’s knowledge, the system’s Development and Performance group who
knowledge of the task and context, and the historical owns the
experience of both. We actively promote a “Need to Share” document at
the time it is
philosophy engendered by the auto-suggestion of people, or published.
groups of people, to share with. The auto-suggestion is a Created 09/12/15 06:22pm Yes
simple prompt to the author asking who else would benefit Function Business Development Yes Default,
from the work product they are producing. In the future the based on user
auto-suggestion will use previous activity to make authentication
Business Vehicles Yes Default,
increasingly useful sharing suggestions (in the same way based on user
Gmail does for topics to include potential participants). The authentication
author always has the ability to choose or add people, and Keywords SX, SX-2, SX-3 No
the system creates a very low cognitive burden for these Subject performance requirements No
decisions (smart, easy, quick). Appropriate protections are Description performance requirements for No
a variant of the SX-2 (the SX-3)
always carried with the object throughout the information Expiration None No Default,
environment. suggested
based on
document
III. A DAY IN THE LIFE OF METADATA type
Country US No
Based on its previous success, a customer has asked the Export International No
Engineering department in the Vehicles business to generate Control
the architectural and cost profile for a variant of an existing Security Proprietary No
Control
vehicle to increase its range by 15%. The original model-
Figure 3 – Core Metadata for Performance
based specification and requirements exist (the "SX-2"), and
Requirements
a subset of requirements is identified specific to the
customer request. Three deliverables and their associated Associating metadata such as Author, Business, and
metadata are described in this section: a performance Keywords links information with those same Authors,
requirements document, performance-based specifications, Businesses, and Keywords implicitly. Searching, filtering,
and a trade study. and ranking of information becomes more effective and
precise.

We use several vocabularies as sources for additional


metadata, including Deliverable (work product), Programs
(projects), and Discipline. The Deliverable vocabulary is a
specific subset of our internal product development system.
The Program vocabulary is available in navigable form down
3
Proceedings of The 21st World Multi-Conference on Systemics, Cybernetics and Informatics (WMSCI 2017)
Approved for Public Release

to a specific contract. The Discipline vocabulary is available While the new requirements were being generated, a
as part of our internal talent management system. systems engineer is assigned to assess the key performance
Element Value Required? Notes parameters affected by the increased range. The key
Auto- performance parameters and associated constraints are
mated? identified at the system level, including a weight constraint.
Deliverable Requirements Yes, Yes This is an The system engineer determines whether the existing
element chosen
by the user from
vehicle can accommodate the necessary changes. The
our internal work systems engineer produces a performance-based
product specification based on her analysis, and publishes it into the
vocabulary. information environment, concluding the existing system
Program SX No, No A navigation
scheme to the
will not accommodate the increased range requirement
contract level is given its existing weight.
available. The
lowest-level Standard metadata from the core metadata element set is
element possible
is selected, and
again applied, including:
the values in the
navigation
Element Value Auto- Notes
hierarchy above
mated?
(for instance,
Identifier http://www.aerospaceco.com/... Yes A URI minting
product line) are
scheme is
automatically
used to
inherited.
generate a
Contract SX3-0D45 No, No This identifies the
unique
specific contract
identifier
number the
Title SX-3 Performance-Based No
deliverable is
Specification
developed under.
Type Text Yes
Common Systems No, No
Format application/ms-word Yes
Elements Engineering
Language English No Default
Product SX-3 No, No The Product
(Part) name (Part) is the name Author Lara Croft Yes Authentication
of the part at the credentials
highest level used
(possibly a Publisher Systems Engineering, Systems No This is the
component, Development and Performance group who
possibly a owns the
product). Note document at
this is not the the time it is
type of part, published.
component, or Publisher Systems Engineering, Systems No
product; it is a Development and Performance
unique name. Created 09/17/15 10:40am Yes
Figure 4 – Added Metadata for Performance Function Engineering No Default,
based on user
Requirements authentication
Based on the program, deliverable, and discipline, Business Vehicles No Default,
based on user
optional sharing suggestions are made. Object similarity is authentication
used to determine past sharing suggestions, for instance, Keywords SX, SX-2, SX-3 No
whether an object was shared within or across one’s Subject performance specification No
functional boundaries. Sharing suggestions can be based on Description performance specification for a No
variant of the SX-2 (the SX-3)
any element, such as contract and program. While we do not
Expiration None No Default,
have a tremendous amount of experience in this area in our suggested
information systems today due to low overall volume, we based on
can now use the context of the work deliverable to get user document
feedback as to sharing suggestion quality. The goal is for the type
Country US No
user to be able to make quick decisions that are valuable Export International No
sharing contributions, and improve both the suggestions and Control
the speed by which information is shared with people who Security Proprietary No
can benefit. Control
Figure 5 – Core Metadata for Performance-Based
B) Deliverable 2 – Performance-Based Specifications Specifications

4
Proceedings of The 21st World Multi-Conference on Systemics, Cybernetics and Informatics (WMSCI 2017)
Approved for Public Release

Metadata from internal vocabularies is also applied (in provenance (lineage). As an example, people, programs,
Figure 6): products, and parts are each discoverable in the same way,
and their information can be combined and reused without
Element Value Required? Notes
ever having to move the data between applications or
Automated? repositories. Time to action is shortened and cost reduced.
Deliverable Trade Studies Yes, No This is an element
chosen by the user Once a reusable identity (a web address) has been assigned,
from our internal
work product
the information is able to be browsed, linked, shared, value
vocabulary. added to it, and used in any application on any device much
Program SX No, No Suggestion will be more easily. Knowledge and context can be added to the
made based on data, and more easily discovered with methods that get
textual analysis. A better over time. Links and tags are continuously added that
navigation scheme
to the contract provide additional context, both for people and machines.
level is available. This can be done in a decentralized, fully distributed way,
Template usage without having to agree on everything to have information
and analysis may that is widely available and immediately useful.
also be useful
here.
Contract SX3-0D45 No, No This identifies the A semantic web recognizes the messiness of information,
specific contract with disagreement, imperfection, incompleteness, and
number the constant change. The semantic web provides the means to
deliverable is
developed under. give information both identity and permanence, enabling
Vehicle Primary No, No This is one of two people and machines to use their own perspectives of the
System Structure lowest-level underlying data to come to their own conclusions, discover
elements in this new relationships, and assert those relationships.
vocabulary that
applies. We also Perspectives can change, but the underlying objects –
inherit all values in including relationships – do not change [3]. This creates a
that hierarchy, powerful, stable framework for incrementally delivering
including “Vehicle value as more objects and applications are created, even as
System >
Frame.” systems disappear, terminology changes, and data migrates.
Discipline Mechanical No, No While this may sound chaotic, the important insight is that
Engineering data has intrinsic meaning that can be applied differently in
Product Vehicle No, No The Product (Part) different contexts, and we embrace that diversity.
(Part) type type identifies the
Vocabularies and ontologies are published, accessible, and
type of product or
part (for instance, referenceable. Agreement is not forced; mechanisms are
at the highest level, provided for wide use, extension, distinction, and
as in this case, the versioning. The world wide web has done this today, and
product is the
missile). There is a
while imperfect, is successful.
one-to-one
correspondence Data that exists in today’s engineering information
between the repositories and applications remains in place, while
“Product (Part)
type” element and
incrementally that data is made globally addressable,
the “Product (Part) referenceable, findable, shareable, and reusable. Every
name” element. element, every relationship, every element type is
Figure 6 – Core Metadata for Performance-Based discoverable using the semantic web. The Dublin Core
Specifications metadata is a simple way to begin doing this.
IV. INTERNAL SEMANTIC WEB The foundation of the semantic web is the W3C’s successful
The world wide web operates fairly successfully with data sharing standards, RDF (Resource Description
largely human-curated links between pages. Like internal Framework), SPARQL (Simple Protocol and RDF Query
enterprise content management systems, information within Language), and OWL (Web Ontology Language), all of
those pages is not indexed or related in a way that makes which are based on HTTP. The semantic web is first and
them easily reusable across applications. A semantic web foremost the web, and uses its existing infrastructure and
has defined data elements that people can easily identify and protocols. To understand the power of the semantic
reuse, each with their own unique identity, fully and approach and the importance of the semantic web, a basic
consistently addressable using the web. What this means is understanding of linked data principles is helpful. Here are
that elements we see in applications or on web pages have the four linked data principles, from Sir Tim Berners-Lee
their own web address (identity), history, relationships, and [4]:
5
Proceedings of The 21st World Multi-Conference on Systemics, Cybernetics and Informatics (WMSCI 2017)
Approved for Public Release

1. Use Uniform Resource Identifiers (URIs) as (web address). For instance, metadata about Kevin’s
names for things publications can be stored on different machines, and
2. Use HTTP URIs so that people can look up because “Kevin” always has the same URI, graphs merge
those names the information at run-time. There is no additional
3. When someone looks up a URI, provide useful programming, development or maintenance; the identity is
Resource Description Framework (RDF) in the data. This can be done for any number of elements
information across any number of repositories. A graph is made of nodes
4. Include RDF statements that link to other URIs and edges, and in the semantic web, its most basic construct
so that they can discover related things is a triple: subject, predicate, object (e.g., “Kevin” “is-
author-of” “Using Dublin Core Metadata to Move from
Need to Know to Need to Share”). In the semantic web,
This simple scheme is used to give everything a unique,
though, each of the subject/predicate/object can have URIs,
globally addressable, and meaningful identity, including
so that their meanings are understood and shared by
relationships. For example, when a person or a program
everyone. Note this is also true for relationships, the
encounters the relationship “dc:author” between two
predicate in a subject/predicate/object triple. The “is-author-
objects, they are able to determine what that means, what
of” is the predicate/relationship in the triple above and has
kinds of objects can have that relationship, and where that
its own URI.
relationship exists between other objects in the system.
Inferences can be made across objects and relationships, V. Conclusion
such as who knows who. This scheme is usable by people A small set of elements derived from the Dublin Core
and machines. Metadata Initiative forms the basis for a linked data
infrastructure, implemented as a semantic web, and is
When every data element has a meaningful identity, data facilitating a change in users’ perspectives from information
can be disaggregated (put anywhere), and easily recombined siloing to information sharing. Using well-established web
and repurposed. Consistency of those identities means web standards and protocols, a linked data infrastructure is
addresses for the data elements don’t change, so that incrementally being delivered that changes the experience
elements are always accessible, with explicit relationships, and value contributed and derived by our users. People are
and those relationships also have their own identities. This better leveraged at the nexus of their interaction with
means you can easily find every instance of an “is-author- machines to add value, increase speed, and reduce the cost
of” relationship, independent of where or how the data is of designing and developing manufactured products for the
stored. aerospace industry. Users spend their time more
productively leveraging the vast heterogeneity and volume
URI-based object identity and access is the foundation for of data sources with their unique perspective. Using the
semantic linking of objects [5]. When objects have identity, Dublin Core represents a low-cost way to both deliver and
they are able to be linked to unambiguously, discovered and continuously improve the digital environment for the user
reused consistently; when they have URI-based identity, community. The semantic web provides a way to continue
discovery and reuse of objects can be performed using the extending this environment while maintaining the delicate
web’s HTTP. A scalable, semantic information repository balance between too much and too little machine
results that can incrementally have value added to. Objects intervention, privacy, and security.
are not restricted to documents, either; they can be people, ACKNOWLEDGMENT
projects, products, processes, and applications. Objects can
be linked in ways not done today, across siloed repositories, Special thanks to Forrest Howie, who both guided the
adding context and value for our engineers. development of this effort, and made significant
contributions to the ‘A Day in the Life of Metadata’ section.
The nature of the semantic web is a graph: a connected,
navigable graph, where people and machines can traverse REFERENCES
relationships, and find everything related to an object or a [1] Best Jr, R. A. (2011). Intelligence information: Need-
type of object. Examples of this type of traversal can be to-know vs. need-to-share. DIANE Publishing.
found in Google’s Knowledge Graph today [6]. Objects that [2] Dublin Core Metadata Initiative. (2012). Dublin core
metadata element set, version 1.1..
have identity and relationships can be discovered, and the
[3] Berners-Lee, T. Linked Data. 2006.
specific relationships they have with other objects are http://www.w3.org/DesignIssues/LinkedData.html.
revealed. A user can navigate by specific term (What objects [4] Berners-Lee, T. (1998). Cool {URIs} don't change.
reference “Kevin”?), by relationship (What objects have an [5] Sauermann, L., Cyganiak, R., & Völkel, M. (2011).
“is-author-of” relationship?), or by both (What objects Cool URIs for the semantic web.
reference “Kevin” that have an “is-author-of” relationship?). [6] Singhal, A. (2012). Introducing the knowledge graph:
Parts of any graph can be fully distributed and reconstituted. things, not strings. Official google blog.
Graph “merging” occurs when an element shares a URI
6

View publication stats

You might also like