You are on page 1of 66

1 NIST Special Publication 1500-7

2
3
4

6 DRAFT NIST Big Data Interoperability


7 Framework:
8 Volume 7, Standards Roadmap
9

10
11
12
13
14
15 NIST Big Data Public Working Group
16 Standards Roadmap Subgroup
17
18
19
20
21
22 Draft Version 2
23 March 20, 2017
24 http://dx.doi.org/ZZZZ
25
26
28
29
30 NIST Special Publication 1500-7
31 Information Technology Laboratory
32

33 DRAFT NIST Big Data Interoperability


34 Framework:
35 Volume 7, Standards Roadmap
36
37 Draft Version 2
38
39
40
41
42
43
44 NIST Big Data Public Working Group (NBD-PWG)
45 Standards Roadmap Subgroup
46 National Institute of Standards and Technology
47 Gaithersburg, MD 20899
48
49
50
51
52 March 2017
53
54
55

56
57
58 U. S. Department of Commerce
59 Wilbur L. Ross, Jr., Secretary
60
61 National Institute of Standards and Technology
62 Dr. Kent Rochford, Acting Under Secretary of Commerce for Standards and Technology
63 and Acting NIST Director
64

65
66
67 National Institute of Standards and Technology (NIST) Special Publication 1500-7
68 68 pages (March 20, 2017)
69
70
71
72Certain commercial entities, equipment, or materials may be identified in this document in order to describe an
73experimental procedure or concept adequately. Such identification is not intended to imply recommendation or
74endorsement by NIST, nor is it intended to imply that the entities, materials, or equipment are necessarily the best
75available for the purpose.
76There may be references in this publication to other publications currently under development by NIST in
77accordance with its assigned statutory responsibilities. The information in this publication, including concepts and
78methodologies, may be used by Federal agencies even before the completion of such companion publications. Thus,
79until each publication is completed, current requirements, guidelines, and procedures, where they exist, remain
80operative. For planning and transition purposes, Federal agencies may wish to closely follow the development of
81these new publications by NIST.
82Organizations are encouraged to review all draft publications during public comment periods and provide feedback
83to NIST. All NIST publications are available at http://www.nist.gov/publication-portal.cfm.
84
85
86
87
88 Comments on this publication may be submitted to Wo Chang
89
90 National Institute of Standards and Technology
91 Attn: Wo Chang, Information Technology Laboratory
92 100 Bureau Drive (Mail Stop 8900) Gaithersburg, MD 20899-8930
93 Email: SP1500comments@nist.gov
94
95
96 Request for Contribution
97The NIST Big Data Public Working Group (NBD-PWG) requests contributions to this draft version 2 of
98the NIST Big Data Interoperability Framework Volume 7, Standards Roadmap. All contributions are
99welcome, especially comments or additional content for the current draft.
100The NBD-PWG is actively working to complete version 2 of the set of nine (9) NBDIF documents. The
101goals of version 2 are to enhance the version 1 content, define general interfaces between the NIST Big
102Data Reference Architecture (NBDRA) components by aggregating low-level interactions into high-level
103general interfaces, and demonstrate how the NBDRA can be used.
104To contribute to this document, please follow the steps below, as soon as possible but no later than May
1051, 2017.
106 1. Register as a user of the NIST Big Data Portal (https://bigdatawg.nist.gov/newuser.php)
107 2. Record comments and/or additional content in one of the following methods:
108 a. TRACK CHANGES: make edits to and comments on the text directly into this Word
109 document using track changes
110 b. COMMENT TEMPLATE: capture specific edits using the Comment Template
111 (http://bigdatawg.nist.gov/_uploadfiles/SP1500-1-to-7_comment_template.docx), which
112 includes space for Section number, page number, comment, and text edits
113 3. Submit the edited file from either method above to SP1500comments@nist.gov with the volume
114 number in the subject line (e.g., Edits for Volume 1).
115 4. Attend the weekly virtual meetings on Tuesdays for possible presentation and discussion of your
116 submission. Virtual meeting logistics can be found at https://bigdatawg.nist.gov/program.php
117Please be as specific as possible in any comments or edits to the text. Specific edits include, but are not
118limited to, changes in the current text, additional text further explaining a topic or explaining a new topic,
119additional references, or comments about the text, topics, or document organization.
120The comments and additional content will be reviewed by the subgroup co-chair responsible for the
121volume in question. Comments and additional content may be presented and discussed by the NBD-PWG
122during the weekly virtual meetings on Tuesday.
123Three versions are planned for the NBDIF set of documents, with Versions 2 and 3 building on the first.
124Further explanation of the three planned versions and the information contained therein is included in
125Section 1.5 of each NBDIF document.
126Please contact Wo Chang (wchang@nist.gov) with any questions about the feedback submission process.
127Big Data professionals are always welcome to join the NBD-PWG to help craft the work contained in the
128volumes of the NBDIF. Additional information about the NBD-PWG can be found at
129http://bigdatawg.nist.gov. Information about the weekly virtual meetings on Tuesday can be found at
130https://bigdatawg.nist.gov/program.php.
131
132
133
134
135 Reports on Computer Systems Technology
136The Information Technology Laboratory (ITL) at NIST promotes the U.S. economy and public welfare
137by providing technical leadership for the Nation’s measurement and standards infrastructure. ITL
138develops tests, test methods, reference data, proof of concept implementations, and technical analyses to
139advance the development and productive use of information technology (IT). ITL’s responsibilities
140include the development of management, administrative, technical, and physical standards and guidelines
141for the cost-effective security and privacy of other than national security-related information in Federal
142information systems. This document reports on ITL’s research, guidance, and outreach efforts in IT and
143its collaborative activities with industry, government, and academic organizations.
144
145 Abstract
146Big Data is a term used to describe the large amount of data in the networked, digitized, sensor-laden,
147information-driven world. While opportunities exist with Big Data, the data can overwhelm traditional
148technical approaches and the growth of data is outpacing scientific and technological advances in data
149analytics. To advance progress in Big Data, the NIST Big Data Public Working Group (NBD-PWG) is
150working to develop consensus on important, fundamental concepts related to Big Data. The results are
151reported in the NIST Big Data Interoperability Framework series of volumes. This volume, Volume 7,
152contains summaries of the work presented in the other six volumes and an investigation of standards
153related to Big Data.
154
155 Keywords
156Big Data, reference architecture, System Orchestrator, Data Provider, Big Data Application Provider, Big
157Data Framework Provider, Data Consumer, Security and Privacy Fabric, Management Fabric, Big Data
158taxonomy, use cases, Big Data characteristics, Big Data standards
159
160
161 Acknowledgements
162This document reflects the contributions and discussions by the membership of the NBD-PWG, co-
163chaired by Wo Chang (NIST ITL), Nancy Grady (SAIC), Geoffrey Fox (University of Indiana), Arnab
164Roy (Fujitsu), Mark Underwood (Krypton Brothers), David Boyd (InCadence Corp), Russell Reinsch
165(Loci), and Gregor von Laszewski (University of Indiana).
166The document contains input from members of the NBD-PWG: Technology Roadmap Subgroup, led by
167Russell Reinsch (CFGIO); Definitions and Taxonomies Subgroup, led by Nancy Grady (SAIC); Use
168Cases and Requirements Subgroup, led by Geoffrey Fox (University of Indiana); Security and Privacy
169Subgroup, led by Arnab Roy (Fujitsu) and Mark Underwood (Krypton Brothers); and Reference
170Architecture Subgroup, led by David Boyd (InCadence Strategic Solutions).
171NIST SP1500-7, Version 2 has been collaboratively authored by the NBD-PWG. As of the date of this
172publication, there are _ NBD-PWG participants from industry, academia, and government. Federal
173agency participants include the National Archives and Records Administration (NARA), National
174Aeronautics and Space Administration (NASA), National Science Foundation (NSF), and the U.S.
175Departments of Agriculture, Commerce, Defense, Energy, Health and Human Services, Homeland
176Security, Transportation, Treasury, and Veterans Affairs.
177NIST acknowledges the specific contributionsa to this volume by the following NBD-PWG members:
178A list of contributors to version 2 of this volume will be added here.
179

180The editors for this document were Russell Reinsch and Wo Chang
181
182

1a “Contributors” are members of the NIST Big Data Public Working Group who dedicated great effort to prepare
2and substantial time on a regular basis to research and development in support of this document.
DRAFT NIST BIG DATA INTEROPERABILITY FRAMEWORK: VOLUME 7, STANDARDS ROADMAP

183 TABLE OF CONTENTS


184EXECUTIVE SUMMARY..................................................................................................................................... VIII
1851. INTRODUCTION.......................................................................................................................................... 1
186 1.1 BACKGROUND.................................................................................................................................................1
187 1.2 NIST BIG DATA PUBLIC WORKING GROUP..........................................................................................................2
188 1.3 SCOPE AND OBJECTIVES OF THE TECHNOLOGY ROADMAP SUBGROUP........................................................................3
189 1.4 REPORT PRODUCTION......................................................................................................................................3
190 1.5 FUTURE WORK ON THIS VOLUME.......................................................................................................................3
1912. BIG DATA ECOSYSTEM................................................................................................................................ 4
192 2.1 DEFINITIONS...................................................................................................................................................4
193 2.1.1 Big Data Definitions..................................................................................................................................5
194 2.1.2 Data Science Definitions...........................................................................................................................5
195 2.2 TAXONOMY....................................................................................................................................................6
196 2.3 USE CASES.....................................................................................................................................................6
197 2.3.1 Mapping Standards to Specific Requirements..........................................................................................9
198 2.3.2 Mapping Standards to Use Cases.............................................................................................................9
199 2.4 SECURITY AND PRIVACY..................................................................................................................................11
200 2.5 REFERENCE ARCHITECTURE SURVEY..................................................................................................................11
201 2.6 REFERENCE ARCHITECTURE..............................................................................................................................11
202 2.6.1 Overview.................................................................................................................................................11
203 2.6.2 NBDRA Conceptual Model......................................................................................................................12
2043. BIG DATA STANDARDS.............................................................................................................................. 15
205 3.1 EXISTING STANDARDS.....................................................................................................................................16
206 3.2 PERSPECTIVES FOR VIEWING STANDARDS...........................................................................................................16
207 3.2.1 Functional or Organizational Perspective...............................................................................................17
208 3.2.1.1 Types of Functional Perspectives..................................................................................................................17
209 3.2.1.2 Types of Organizational Perspectives...........................................................................................................17
210 3.2.2 Product or Non-product Perspective......................................................................................................17
211 3.2.2.1 Types of Product Standards..........................................................................................................................17
212 3.2.2.2 Two Types of Non-product Standards: Technical and Basic..........................................................................17
213 3.3 MODERNITY OF EXISTING STANDARDS...............................................................................................................18
2144. BIG DATA STANDARDS ROADMAP............................................................................................................. 19
215 4.1 GAP IN STANDARDS.......................................................................................................................................19
216 4.2 AN ERA OF DIVERGENCE FROM PROJECT METHODOLOGY.......................................................................................19
217 4.3 PATHWAY TO ADDRESS STANDARDS GAPS.........................................................................................................20
218 4.4 AREAS WHERE TECHNOLOGY STANDARDS MAY HAVE SIGNIFICANT IMPACT IN THE FUTURE..........................................20
219 4.4.1 Standards Gap 2: Metadata...................................................................................................................21
220 4.4.2 Standards Gap 4: Non-relational DB Query, Search and Information Retrieval....................................21
221 4.4.3 Standards Gap 11: Data Sharing and Exchange.....................................................................................23
222 4.4.4 Standards Gap 13: Visualization and Human Consumption of Data.....................................................23
223 4.4.5 Standards Gap 15: Interface Between Relational and Non-relational Storage.....................................23
224 4.4.6 Integration..............................................................................................................................................24
2255. SANDBOX................................................................................................................................................. 25

vi
DRAFT NIST BIG DATA INTEROPERABILITY FRAMEWORK: VOLUME 7, STANDARDS ROADMAP

226 5.1 VARIOUS TOPICS............................................................................................................................................25


227 5.1.1 Internet of Things (Area of Impact)........................................................................................................25
228 5.1.2 network connectivity..............................................................................................................................25
229 5.1.3 Complex Event Processing......................................................................................................................26
230 5.1.4 iPaaS.......................................................................................................................................................26
231 5.1.5 Crowdsourced Mediation.......................................................................................................................26
232 5.2 DATA STANDARDS.........................................................................................................................................26
233 5.3 LANDSCAPE PERSPECTIVE AND POSSIBILITY FRONTIER.............................................................................................27
234 5.3.1 DIVERGENCE VS. CONVERGENCE TOWARD BIG DATA STANDARDS......................................................27
235APPENDIX A: ACRONYMS................................................................................................................................. A-1
236APPENDIX B: COLLECTION OF BIG DATA RELATED STANDARDS.........................................................................B-1
237APPENDIX C: STANDARDS AND THE NBDRA...................................................................................................... C-1
238APPENDIX D: CATEGORIZED STANDARDS......................................................................................................... D-1
239APPENDIX E: REFERENCES................................................................................................................................ E-1
240

241LIST OF FIGURES
242FIGURE 1: NIST BIG DATA REFERENCE ARCHITECTURE TAXONOMY...........................................................................................6
243FIGURE 2: NBDRA CONCEPTUAL MODEL...........................................................................................................................13
244

245LIST OF TABLES
246TABLE 1: SEVEN REQUIREMENTS CATEGORIES AND THEIR SPECIFICS...........................................................................................7
247TABLE 2: DATA SOURCE REQUIREMENTS-TO-STANDARDS MATRIX:..............................................................................................9
248TABLE 3: AN EXCERPT OF THE MASTER USE CASE-TO-STANDARDS MATRIX:...................................................................................9
249TABLE 4: M0165 BREAKOUT SUPPLEMENT.........................................................................................................................10
250TABLE 5: MAPPING USE CASE CHARACTERIZATION CATEGORIES TO REFERENCE ARCHITECTURE COMPONENTS AND FABRICS.............12
251TABLE 6: EXCERPT OF HEALTHCARE RELATED STANDARDS.......................................................................................................18
252TABLE 7: RESULTS FROM POLL ABOUT DATA SCIENCE PROJECT METHODOLOGY.........................................................................20
253TABLE B-1: BIG DATA RELATED STANDARDS......................................................................................................................B-1
254TABLE C-1: STANDARDS AND THE NBDRA........................................................................................................................C-1
255TABLE D-1: CATEGORIZED STANDARDS.............................................................................................................................D-1
256
257

258

259
260

vii
DRAFT NIST BIG DATA INTEROPERABILITY FRAMEWORK: VOLUME 7, STANDARDS ROADMAP

261 EXECUTIVE SUMMARY


262This section will be written when the document content is near finalization.
263
264
265

viii
DRAFT NIST BIG DATA INTEROPERABILITY FRAMEWORK: VOLUME 7, STANDARDS ROADMAP

266 1. INTRODUCTION
2671.1 BACKGROUND
268There is broad agreement among commercial, academic, and government leaders about the remarkable
269potential of Big Data to spark innovation, fuel commerce, and drive progress. Big Data is the common
270term used to describe the deluge of data in today’s networked, digitized, sensor-laden, and information-
271driven world. The availability of vast data resources carries the potential to answer questions previously
272out of reach, including the following:
273  How can a potential pandemic reliably be detected early enough to intervene?
274  Can new materials with advanced properties be predicted before these materials have ever been
275 synthesized?
276  How can the current advantage of the attacker over the defender in guarding against cyber-
277 security threats be reversed?
278There is also broad agreement on the ability of Big Data to overwhelm traditional approaches. The growth
279rates for data volumes, speeds, and complexity are outpacing scientific and technological advances in data
280analytics, management, transport, and data user spheres.
281Despite widespread agreement on the inherent opportunities and current limitations of Big Data, a lack of
282consensus on some important fundamental questions continues to confuse potential users and stymie
283progress. These questions include the following:
284  How is Big Data defined?
285  What attributes define Big Data solutions?
286  What is the significance of possessing Big Data?
287  How is Big Data different from traditional data environments and related applications?
288  What are the essential characteristics of Big Data environments?
289  How do these environments integrate with currently deployed architectures?
290  What are the central scientific, technological, and standardization challenges that need to be
291 addressed to accelerate the deployment of robust Big Data solutions?
292Within this context, on March 29, 2012, the White House announced the Big Data Research and
293Development Initiative.1 The initiative’s goals include helping to accelerate the pace of discovery in
294science and engineering, strengthening national security, and transforming teaching and learning by
295improving the ability to extract knowledge and insights from large and complex collections of digital
296data.
297Six federal departments and their agencies announced more than $200 million in commitments spread
298across more than 80 projects, which aim to significantly improve the tools and techniques needed to
299access, organize, and draw conclusions from huge volumes of digital data. The initiative also challenged
300industry, research universities, and nonprofits to join with the federal government to make the most of the
301opportunities created by Big Data.
302Motivated by the White House initiative and public suggestions, the National Institute of Standards and
303Technology (NIST) has accepted the challenge to stimulate collaboration among industry professionals to
304further the secure and effective adoption of Big Data. As one result of NIST’s Cloud and Big Data Forum
305held on January 15–17, 2013, there was strong encouragement for NIST to create a public working group
306for the development of a Big Data Standards Roadmap. Forum participants noted that this roadmap

1
DRAFT NIST BIG DATA INTEROPERABILITY FRAMEWORK: VOLUME 7, STANDARDS ROADMAP

307should define and prioritize Big Data requirements, including interoperability, portability, reusability,
308extensibility, data usage, analytics, and technology infrastructure. In doing so, the roadmap would
309accelerate the adoption of the most secure and effective Big Data techniques and technology.
310On June 19, 2013, the NIST Big Data Public Working Group (NBD-PWG) was launched with extensive
311participation by industry, academia, and government from across the nation. The scope of the NBD-PWG
312involves forming a community of interests from all sectors—including industry, academia, and
313government—with the goal of developing consensus on definitions, taxonomies, secure reference
314architectures, security and privacy, and—from these—a standards roadmap. Such a consensus would
315create a vendor-neutral, technology- and infrastructure-independent framework that would enable Big
316Data stakeholders to identify and use the best analytics tools for their processing and visualization
317requirements on the most suitable computing platform and cluster, while also allowing value-added from
318Big Data service providers.
319The NIST Big Data Interoperability Framework will be released in three versions, which correspond to
320the three stages of the NBD-PWG work. The three stages aim to achieve the following with respect to the
321NIST Big Data Reference Architecture (NBDRA.)
322 Stage 1: Identify the high-level Big Data reference architecture key components, which are
323 technology, infrastructure, and vendor agnostic.
324 Stage 2: Define general interfaces between the NBDRA components.
325 Stage 3: Validate the NBDRA by building Big Data general applications through the general
326 interfaces.
327On September 16, 2015, seven volumes NIST Big Data Interoperability Framework V1.0 documents
328were published (http://bigdatawg.nist.gov/V1_output_docs.php), each of which addresses a specific key
329topic, resulting from the work of the NBD-PWG. The seven volumes are as follows:
330  Volume 1, Definitions
331  Volume 2, Taxonomies
332  Volume 3, Use Cases and General Requirements
333  Volume 4, Security and Privacy
334  Volume 5, Architectures White Paper Survey
335  Volume 6, Reference Architecture
336  Volume 7, Standards Roadmap
337Currently the NBD-PWG is working on Stage 2 with the goals to enhance the version 1 content, define
338general interfaces between the NBDRA components by aggregating low-level interactions into high-level
339general interfaces, and demonstrate how the NBDRA can be used. As a result, the following two
340additional volumes have been identified.
341  Volume 8, Reference Architecture Interfaces
342  Volume 9, Adoption and Modernization
343Potential areas of future work for each volume during Stage 3 are highlighted in Section 1.5 of each
344volume. The current effort documented in this volume reflects concepts developed within the rapidly
345evolving field of Big Data.

3461.2 NIST BIG DATA PUBLIC WORKING GROUP


347This section will be written when the document content is near finalization.
348

2
DRAFT NIST BIG DATA INTEROPERABILITY FRAMEWORK: VOLUME 7, STANDARDS ROADMAP

3491.3 SCOPE AND OBJECTIVES OF THE TECHNOLOGY


350 ROADMAP SUBGROUP
351This section will be written when the document content is near finalization.
352

3531.4 REPORT PRODUCTION


354This section will be written when the document content is near finalization.
355To achieve technical and high quality document content, this document will go through public comments
356period along with NIST internal review.

3571.5 FUTURE WORK ON THIS VOLUME


358This section will be written when the document content is near finalization.
359
360

3
DRAFT NIST BIG DATA INTEROPERABILITY FRAMEWORK: VOLUME 7, STANDARDS ROADMAP

361 2. BIG DATA ECOSYSTEM


362The development of a Big Data reference architecture involves a thorough understanding of current
363techniques, issues, concerns, and other topics. To this end, the NBD-PWG collected use cases to gain an
364understanding of current applications of Big Data, conducted a survey of reference architectures to
365understand commonalities within Big Data architectures in use, developed a taxonomy to understand and
366organize the information collected, and reviewed existing Big Data relevant technologies and trends.
367From the collected use cases and architecture survey information b, the NBD-PWG created the NBDRA,
368which is a high-level conceptual model designed to serve as a tool to facilitate open discussion of the
369requirements, structures, and operations inherent in Big Data. These NBD-PWG activities were used as
370input during the development of the entire NIST Big Data Interoperability Framework.

3712.1 DEFINITIONS
372Section Scope: Point to Volume 1 definitions, highlight several relevant definitions, define new concepts
373specific to Volume 7 or previously unmarked, (when Volume 7 is done, revisit Volume 1 and decide
374where the definitions should be placed (Vol 1 or 7 or both))
375There are two fundamental concepts in the emerging discipline of Big Data that have been used to
376represent multiple concepts. These two concepts, Big Data and data science, are broken down into
377individual terms and concepts in the following subsections. As a basis for discussions of the NBDRA and
378related standards and measurement technology, associated terminology is defined in subsequent
379subsections. NIST Big Data Infrastructure Framework: Volume 1, Definitions contains additional details
380and terminology.
381Both technical and non-technical audiences need to keep abreast of the rapid changes in the big data
382landscape as those changes can affect their ability to manage information in effective ways. Consumption
383of written, audio or video information on big data is reliant on certain accepted definitions for terms. For
384non-technical audiences, a method of expressing the big data aspects in terms of volume, variety and
385velocity, known as the “Vs,” became popular for its ability to frame the somewhat complex concepts of
386big data in simpler, more digestible ways. For technical audiences however, the Vs do not closely
387correspond to the complete corpus of terminology used in the field of study.
388Tested against the corpus of use, the essential technical characteristics of big data can be clustered into
389four distinct segments:
390  heterogeneity and irregularity;
391  parallelism;
392  real time analysis; and
393  presentation or visualization. 1
394The concept of heterogeneity in big data includes the ‘variety’ of data types, commonly referred to as
395structured, semi structured or unstructured.
396Parallelism refers to distributed / horizontal processing big data and includes the ‘volume’ of data.
397The concept of real time analysis of big data includes the ‘velocity’ of data.

3b See NIST Big Data Interoperability Framework Volumes 3, 5, and 6, version 1 for additional information on the
4use cases, reference architecture information collection, and development of the NBDRA.

4
DRAFT NIST BIG DATA INTEROPERABILITY FRAMEWORK: VOLUME 7, STANDARDS ROADMAP

3982.1.1 BIG DATA DEFINITIONS


399Big Data refers to the inability of traditional data architectures to efficiently handle the new datasets.
400Characteristics of Big Data that force new architectures are volume (i.e., the size of the dataset) and
401variety (i.e., data from multiple repositories, domains, or types), and the data in motion characteristics of
402velocity (i.e., rate of flow) and variability (i.e., the change in other characteristics). These characteristics
403—volume, variety, velocity, and variability—are known colloquially as the ‘Vs’ of Big Data and are
404further discussed in Section 3.
405Each of these characteristics influences the overall design of a Big Data system, resulting in different data
406system architectures or different data lifecycle process orderings to achieve needed efficiencies. A
407number of other terms are also used, several of which refer to the analytics process instead of new Big
408Data characteristics. The following Big Data definitions have been used throughout the seven volumes of
409the NIST Big Data Interoperability Framework and are fully described in Volume 1.
410 Big Data consists of extensive datasets—primarily in the characteristics of volume,
411 variety, velocity, and/or variability—that require a scalable architecture for efficient
412 storage, manipulation, and analysis.
413 The Big Data paradigm consists of the distribution of data systems across horizontally
414 coupled, independent resources to achieve the scalability needed for the efficient
415 processing of extensive datasets.
416 Veracity refers to accuracy of the data
417 Value refers to the inherent wealth, economic and social, embedded in any data set
418 Volatility refers to the tendency for data structures to change over time
419 Validity refers to appropriateness of the data for its intended use

4202.1.2 DATA SCIENCE DEFINITIONS


421In its purest form, data science is the fourth paradigm of science, following theory, experiment, and
422computational science. The fourth paradigm is a term coined by Dr. Jim Gray in 2007 to refer to the
423conduct of data analysis as an empirical science, learning directly from data itself. Data science as a
424paradigm would refer to the formulation of a hypothesis, the collection of the data—new or pre-existing
425—to address the hypothesis, and the analytical confirmation or denial of the hypothesis (or the
426determination that additional information or study is needed.) As in any experimental science, the end
427result could in fact be that the original hypothesis itself needs to be reformulated. The key concept is that
428data science is an empirical science, performing the scientific process directly on the data. Note that the
429hypothesis may be driven by a business need, or can be the restatement of a business need in terms of a
430technical hypothesis.
431 Data science is the empirical synthesis of actionable knowledge from raw data through
432 the complete data lifecycle process.
433 The data science paradigm is extraction of actionable knowledge directly from data
434 through a process of discovery, hypothesis, and hypothesis testing.
435While the above definition of the data science paradigm refers to learning directly from data, in the Big
436Data paradigm this learning must now implicitly involve all steps in the data lifecycle, with analytics
437being only a subset. Data science can be understood as the activities happening in the data layer of the
438system architecture to extract knowledge from the raw data.
439 The data lifecycle is the set of processes that transform raw data into actionable
440 knowledge.

5
DRAFT NIST BIG DATA INTEROPERABILITY FRAMEWORK: VOLUME 7, STANDARDS ROADMAP

441Traditionally, the term analytics has been used as one of the steps in the data lifecycle of collection,
442preparation, analysis, and action.
443 Analytics is the synthesis of knowledge from information.

4442.2 TAXONOMY
445The NBD-PWG Definitions and Taxonomy Subgroup developed a hierarchy of reference architecture
446components. Additional taxonomy details are presented in the NIST Big Data Interoperability
447Framework: Volume 2, Taxonomy document.
448Figure 1 outlines potential actors for the seven roles developed by the NBD-PWG Definition and
449Taxonomy Subgroup. The dark blue boxes contain the name of the role at the top with potential actors
450listed directly below.

System Orchestrator actors: Big Data Application Provider


 Business Leadership actors:
 Consultants  Application Specialists
 Data Scientists  Platform Specialists
 Information Architects  Consultants
 Software Architects
 Security Architects
Big Data Framework Provider
 Privacy Architects
actors:
 Network Architects
 In-house Clusters
 Data Centers
Data Provider actors: Reference  Cloud Providers


Enterprises
Public Agencies
Architecture
Data Consumer actors:


Researchers and Scientists
Search Engines
Taxonomy  End Users
 Researchers
 Web, FTP and Other Applications
 Applications
 Network Operators
 Systems
 End Users

Management Fabric actors:


Security and Privacy Fabric  In-house Staff
actors:  Data Center Management
 Corporate Security Officer  Cloud Providers
 Security Specialist
451
452 Figure 1: NIST Big Data Reference Architecture Taxonomy

453

4542.3 USE CASES


455A consensus list of Big Data requirements across stakeholders was developed by the NBD-PWG Use
456Cases and Requirements Subgroup. The development of requirements included gathering and
457understanding various use cases from the nine diversified areas, or application domains, listed below.
458  Government Operation
459  Commercial
460  Defense
461  Healthcare and Life Sciences
462  Deep Learning and Social Media

6
DRAFT NIST BIG DATA INTEROPERABILITY FRAMEWORK: VOLUME 7, STANDARDS ROADMAP

463  The Ecosystem for Research


464  Astronomy and Physics
465  Earth, Environmental, and Polar Science
466  Energy
467Participants in the NBD-PWG Use Cases and Requirements Subgroup and other interested parties
468supplied publically available information for various Big Data architecture examples from the nine
469application domains, which developed organically from the 51 use cases collected by the Subgroup.
470After collection, processing, and review of the use cases, requirements within seven Big Data
471characteristic categories were extracted from the individual use cases. Requirements are the challenges
472limiting further use of Big Data. The complete list of requirements extracted from the use cases is
473presented in the document NIST Big Data Interoperability Framework: Volume 3, Use Cases and
474General Requirements.
475The use case specific requirements were then aggregated to produce high-level, general requirements,
476within seven characteristic categories. The seven categories were as follows:
477  Data source requirements (relating to data size, format, rate of growth, at rest, etc.)
478  Data transformation provider (i.e., data fusion, analytics)
479  Capabilities provider (i.e., software tools, platform tools, hardware resources such as storage and
480 networking)
481  Data consumer (i.e., processed results in text, table, visual, and other formats)
482  Security and privacy
483  Lifecycle management (i.e., curation, conversion, quality check, pre-analytic processing)
484  Other requirements
485
486The general requirements, created to be vendor neutral and technology agnostic, are listed below.
487
488 Table 1: Seven Requirements Categories and their Specifics
DATA SOURCE REQUIREMENTS (DSR)
DSR-1 Needs to support reliable real-time, asynchronous, streaming, and batch processing
to collect data from centralized, distributed, and cloud data sources, sensors, or
instruments.
DSR-2 Needs to support slow, bursty, and high-throughput data transmission between data
sources and computing clusters.
DSR-3 Needs to support diversified data content ranging from structured and unstructured
text, document, graph, web, geospatial, compressed, timed, spatial, multimedia,
simulation, and instrumental data.
TRANSFORMATION PROVIDER REQUIREMENTS (TPR)
TPR-1 Needs to support diversified compute-intensive, analytic processing, and machine
learning techniques.
TPR-2 Needs to support batch and real-time analytic processing.
TPR-3 Needs to support processing large diversified data content and modeling.
TPR-4 Needs to support processing data in motion (e.g., streaming, fetching new content,
tracking)
CAPABILITY PROVIDER REQUIREMENTS (CPR)
CPR-1 Needs to support legacy and advanced software packages (software).
CPR-2 Needs to support legacy and advanced computing platforms (platform).

7
DRAFT NIST BIG DATA INTEROPERABILITY FRAMEWORK: VOLUME 7, STANDARDS ROADMAP

CPR-3 Needs to support legacy and advanced distributed computing clusters, co-processors,
input output processing (infrastructure).
CPR-4 Needs to support elastic data transmission (networking).
CPR-5 Needs to support legacy, large, and advanced distributed data storage (storage).
CPR-6 Needs to support legacy and advanced executable programming: applications, tools,
utilities, and libraries (software).
DATA CONSUMER REQUIREMENTS (DCR)
DCR-1 Needs to support fast searches (~0.1 seconds) from processed data with high
relevancy, accuracy, and recall.
DCR-2 Needs to support diversified output file formats for visualization, rendering, and
reporting.
DCR-3 Needs to support visual layout for results presentation.
DCR-4 Needs to support rich user interface for access using browser, visualization tools.
DCR-5 Needs to support high-resolution, multi-dimension layer of data visualization.
DCR-6 Needs to support streaming results to clients.
SECURITY AND PRIVACY REQUIREMENTS (SPR)
SPR-1 Needs to protect and preserve security and privacy of sensitive data.
SPR-2 Needs to support sandbox, access control, and multi-level, policy-driven
authentication on protected data.
LIFECYCLE MANAGEMENT REQUIREMENTS (LMR)
LMR-1 Needs to support data quality curation including pre-processing, data clustering,
classification, reduction, and format transformation.
LMR-2 Needs to support dynamic updates on data, user profiles, and links.
LMR-3 Needs to support data lifecycle and long-term preservation policy, including data
provenance.
LMR-4 Needs to support data validation.
LMR-5 Needs to support human annotation for data validation.
LMR-6 Needs to support prevention of data loss or corruption.
LMR-7 Needs to support multi-site archives.
LMR-8 Needs to support persistent identifier and data traceability.
LMR-9 Support standardizing, aggregating, and normalizing data from disparate sources.
OTHER REQUIREMENTS (OR)
OR-1 Needs to support rich user interface from mobile platforms to access processed
results.
OR-2 Needs to support performance monitoring on analytic processing from mobile
platforms.
OR-3 Needs to support rich visual content search and rendering from mobile platforms.
OR-4 Needs to support mobile device data acquisition.
OR-5 Needs to support security across mobile devices.
489
490
491Additional information about the Subgroup, use case collection, analysis of the use cases, and generation
492of the use case requirements are presented in the NIST Big Data Interoperability Framework: Volume 3,
493Use Cases and General Requirements document.

8
DRAFT NIST BIG DATA INTEROPERABILITY FRAMEWORK: VOLUME 7, STANDARDS ROADMAP

4942.3.1 MAPPING STANDARDS TO SPECIFIC REQUIREMENTS


495Section Scope: The objective of this section is to illustrate the intersection of the specific requirements
496presented in the NBDIF, Volume 3 and many of the standards related to Big Data. Sufficient participation
497from contributors will be necessary to undertake this significant and important work.
498Volume 7 takes the use cases and specific requirements from volume 3 and maps use cases or
499requirements to applicable standards. Appendix a, b and c contain the volume 7 standards catalog. The
500following requirements-to-standards and use case-to-standards matrices start with a slice of the full
501catalog, consider a range of either a requirement or use case facet, and then map the selected range of use
502cases or requirements to existing standards. The resulting matrices provide a visual summary of the areas
503where standards overlap, and highlight gaps in the catalog at year end 2016.
504
505 Table 2: Data source requirements-to-standards matrix:
Requirement Requirement Description Standard Description Standard
DSR-1 Fast search
DSR-2
DSR-3 Visual layout of results for Suggested charts and tables for IBCS notation;
presentation. various purposes. related: ACRL
DSR-4 Browser access WebRTC

DSR-5 Layer standard ISO 13606

DSR-6
506
507Related: ACRL standards focus on the intellectual / interpretation competency of the consumer to
508evaluate accuracy of imagery, or critique, not the visualizing technology.
509Products note: DSR-4: IDL, AVS, OpenDX,
510

5112.3.2 MAPPING STANDARDS TO USE CASES


512Section Scope: The objective of this section is to illustrate the intersection of the specific requirements
513presented in the NBDIF, Volume 3 and many of the standards related to Big Data. Sufficient participation
514from contributors will be necessary to undertake this significant and important work.
515
516 Table 3: An excerpt of the master use case-to-standards matrix:
Use Case Number Use Case Description Standard Description Standard
and Type
2: Government Information retrieval / records
search in US Census DB
6: Commercial Research DB document
recommender, impact forecast
8: Commercial Web search See table 4, M0165 Xpath, xquery
breakout supplement full-text, elixir,
below. Query [no SY]: xirql, xxl.

9
DRAFT NIST BIG DATA INTEROPERABILITY FRAMEWORK: VOLUME 7, STANDARDS ROADMAP

15: Defense Intelligence data processing

34: Research Graph DB search

517
518Taken further, we investigate individual use cases in more detail, by dividing the overall task of the use
519case into its sub task components and activities, and map associated standards to the sub tasks.

520USE CASE 2: IR AND RECORDS SEARCH IN US CENSUS DB


521Additional text: contributions are requested for this section.
522

523USE CASE 6: RESEARCH DB DOCUMENT RECOMMENDER; IMPACT FORECAST


524Additional text: contributions are requested for this section.
525

526USE CASE 8: WEB SEARCH:


527General web search Sub tasks: data access, storage, processing, and consumption.
528Subtask #1: Accessing web media. Accessing web media is a complicated task, due to the many different
529types of websites and content. While there are different scenarios on how this task can be accomplished,
530there are amongst those scenarios, some notable patterns in the ways sites and media are accessed.
531Generic content websites [having raw data] are typically accessed by web crawlers. Commerce sites and
532social media networks [having filtered or formatted data supplied via a data provider] are typically
533accessed through an accelerator, adaptor or API mechanism. Specific content websites may be accessed
534by either web crawlers; or via an accelerator, adaptor or API. The acquired data may or may not then go
535through pre-processing prior to storage, however unstructured data will eventually be returned to the pre-
536processing component to be structured [into a format such as JSON].
537
538 Table 4: M0165 Breakout Supplement
Current solution Compute system Large cloud
Current solution Storage Inverted index
Current solution Networking External most important SRU, SRW, [CQL],
z39.50; OAI PMH;
Sparql, REST, Href;
Current solution Software Spark
Data science Veracity Main hubs, authorities
Data science Visualization Page layout is critical. Technical
elements inside a site affect content
delivery.
Data science Data quality SRank
Data science Data types
Data science Data analytics Crawl, preprocess, index, rank, cluster, Sitemap.xml,
recommend. Crawling / collection: responsive design,
connection elements including mentions
from other sites
Gaps Links to user profiles, social data Schema.org
539

10
DRAFT NIST BIG DATA INTEROPERABILITY FRAMEWORK: VOLUME 7, STANDARDS ROADMAP

540USE CASE 15: DEFENSE INTELLIGENCE DATA PROCESSING


541Additional text: contributions are requested for this section.
542

543USE CASE 34: GRAPH DB SEARCH


544Additional text: contributions are requested for this section.
545

5462.4 SECURITY AND PRIVACY


547Security and privacy measures for Big Data involve a different approach than traditional systems. Big
548Data is increasingly stored on public cloud infrastructure built by various hardware, operating systems,
549and analytical software. Traditional security approaches usually addressed small scale systems holding
550static data on firewalled and semi-isolated networks. The surge in streaming cloud technology
551necessitates extremely rapid responses to security issues and threats. 2
552Security and privacy considerations are a fundamental aspect of Big Data and affect all components of the
553NBDRA. This comprehensive influence is depicted in Figure 2 by the grey rectangle marked “Security
554and Privacy” surrounding all of the reference architecture components. At a minimum, a Big Data
555reference architecture will provide verifiable compliance with both governance, risk management, and
556compliance (GRC) and confidentiality, integrity, and availability (CIA) policies, standards, and best
557practices. Additional information on the processes and outcomes of the NBD PWG Security and Privacy
558Subgroup are presented in NIST Big Data Interoperability Framework: Volume 4, Security and Privacy.

5592.5 REFERENCE ARCHITECTURE SURVEY


560The NBD-PWG Reference Architecture Subgroup conducted the reference architecture survey to advance
561understanding of the operational intricacies in Big Data and to serve as a tool for developing system-
562specific architectures using a common reference framework. The Subgroup surveyed currently published
563Big Data platforms by leading companies or individuals supporting the Big Data framework and analyzed
564the collected material. This effort revealed a remarkable consistency between Big Data architectures.
565Survey details, methodology, and conclusions are reported in NIST Big Data Interoperability
566Framework: Volume 5, Architectures White Paper Survey.

5672.6 REFERENCE ARCHITECTURE


5682.6.1 OVERVIEW
569The goal of the NBD-PWG Reference Architecture Subgroup is to develop a Big Data, open reference
570architecture that facilitates the understanding of the operational intricacies in Big Data. It does not
571represent the system architecture of a specific Big Data system, but rather is a tool for describing,
572discussing, and developing system-specific architectures using a common framework of reference. The
573reference architecture achieves this by providing a generic high-level conceptual model that is an
574effective tool for discussing the requirements, structures, and operations inherent to Big Data. The model
575is not tied to any specific vendor products, services, or reference implementation, nor does it define
576prescriptive solutions that inhibit innovation.
577The design of the NBDRA does not address the following:
578  Detailed specifications for any organization’s operational systems
579  Detailed specifications of information exchanges or services

11
DRAFT NIST BIG DATA INTEROPERABILITY FRAMEWORK: VOLUME 7, STANDARDS ROADMAP

580  Recommendations or standards for integration of infrastructure products


581Building on the work from other subgroups, the NBD PWG Reference Architecture Subgroup evaluated
582the general requirements formed from the use cases, evaluated the Big Data Taxonomy, performed a
583reference architecture survey, and developed the NBDRA conceptual model. The NIST Big Data
584Interoperability Framework: Volume 3, Use Cases and General Requirements document contains details
585of the Subgroup’s work.
586The use case characterization categories from Vol 3, listed below on the left, were used as input in the
587development of the NBDRA. Some category [label] terms were converted / changed to different category
588[label] terms for the NBDRA. Table 5 maps the earlier use case terms directly to NBDRA components
589and fabrics.
590 Table 5: Mapping Use Case Characterization Categories to
591 Reference Architecture Components and Fabrics
USE CASE CHARACTERIZATION REFERENCE ARCHITECTURE COMPONENTS AND
CATEGORIES FABRICS
Data sources → Data Provider
Data transformation → Big Data Application Provider
Capabilities → Big Data Framework Provider
Data consumer → Data Consumer
Security and privacy → Security and Privacy Fabric
Life cycle management → System Orchestrator; Management Fabric
Other requirements → To all components and fabrics
592

5932.6.2 NBDRA CONCEPTUAL MODEL


594The NBD-PWG Reference Architecture Subgroup used a variety of inputs from other NBD-PWG
595subgroups in developing a vendor-neutral, technology- and infrastructure-agnostic, conceptual model of
596Big Data architecture. This conceptual model, the NBDRA, is shown in Figure 2 and represents a Big
597Data system comprised of five logical functional components connected by interoperability interfaces
598(i.e., services). Two fabrics envelop the components, representing the interwoven nature of management
599and security and privacy with all five of the components.
600The NBDRA is intended to enable system engineers, data scientists, software developers, data architects,
601and senior decision makers to develop solutions to issues that require diverse approaches due to
602convergence of Big Data characteristics within an interoperable Big Data ecosystem. It provides a
603framework to support a variety of business environments, including tightly-integrated enterprise systems
604and loosely-coupled vertical industries, by enhancing understanding of how Big Data complements and
605differs from existing analytics, business intelligence, databases, and systems.
606

12
DRAFT NIST BIG DATA INTEROPERABILITY FRAMEWORK: VOLUME 7, STANDARDS ROADMAP

I N F O R M AT I O N V A L U E C H A I N

System Orchestrator

Big Data Application Provider

Data Consumer
Data Provider

Preparation
DATA Collection / Curation Analytics Visualization Access DATA

SW SW

I T VA LU E C H A I N
DATA

SW
Big Data Framework Provider

Security & Privacy


Processing: Computing and Analytic
Messaging/ Communications

Batch Interactive Streaming

Resource Management

Management
Platforms: Data Organization and Distribution
Indexed Storage
File Systems
Infrastructures: Networking, Computing, Storage
Virtual Resources
Physical Resources

KEY: DATA Big Data Information Service Use SW Software Tools and
Flow Algorithms Transfer

607 Figure 2: NBDRA Conceptual Model

608The NBDRA is organized around two axes representing the two Big Data value chains: the information
609(horizontal axis) and the Information Technology (IT) (vertical axis). Along the information axis, the
610value is created by data collection, integration, analysis, and applying the results following the value
611chain. Along the IT axis, the value is created by providing networking, infrastructure, platforms,
612application tools, and other IT services for hosting of and operating the Big Data in support of required
613data applications. At the intersection of both axes is the Big Data Application Provider component,
614indicating that data analytics and its implementation provide the value to Big Data stakeholders in both
615value chains. The names of the Big Data Application Provider and Big Data Framework Provider
616components contain “providers” to indicate that these components provide or implement a specific
617technical function within the system.
618The five main NBDRA components, shown in Figure 2 and discussed in detail in Section 4, represent
619different technical roles that exist in every Big Data system. These functional components are as follows:
620  System Orchestrator
621  Data Provider
622  Big Data Application Provider
623  Big Data Framework Provider
624  Data Consumer

13
DRAFT NIST BIG DATA INTEROPERABILITY FRAMEWORK: VOLUME 7, STANDARDS ROADMAP

625The two fabrics shown in Figure 2 encompassing the five functional components are the following:
626  Management
627  Security and Privacy
628These two fabrics provide services and functionality to the five functional components in the areas
629specific to Big Data and are crucial to any Big Data solution.
630The “DATA” arrows in Figure 2 show the flow of data between the system’s main components. Data
631flows between the components either physically (i.e., by value) or by providing its location and the means
632to access it (i.e., by reference). The “SW” arrows show transfer of software tools for processing of Big
633Data in situ. The “Service Use” arrows represent software programmable interfaces. While the main focus
634of the NBDRA is to represent the run-time environment, all three types of communications or
635transactions can happen in the configuration phase as well. Manual agreements (e.g., service-level
636agreements [SLAs]) and human interactions that may exist throughout the system are not shown in the
637NBDRA.
638The components represent functional roles in the Big Data ecosystem. In system development, actors and
639roles have the same relationship as in the movies, but system development actors can represent
640individuals, organizations, software, or hardware. According to the Big Data taxonomy, a single actor can
641play multiple roles, and multiple actors can play the same role. The NBDRA does not specify the business
642boundaries between the participating actors or stakeholders, so the roles can either reside within the same
643business entity or can be implemented by different business entities. Therefore, the NBDRA is applicable
644to a variety of business environments, from tightly-integrated enterprise systems to loosely-coupled
645vertical industries that rely on the cooperation of independent stakeholders. As a result, the notion of
646internal versus external functional components or roles does not apply to the NBDRA. However, for a
647specific use case, once the roles are associated with specific business stakeholders, the functional
648components would be considered as internal or external—subject to the use case’s point of view.
649The NBDRA does support the representation of stacking or chaining of Big Data systems. For example, a
650Data Consumer of one system could serve as a Data Provider to the next system down the stack or chain.
651The five main components and the two fabrics of the NBDRA are discussed in the NIST Big Data
652Interoperability Framework: Volume 6, Reference Architecture and Volume 4, Security and Privacy.
653

14
DRAFT NIST BIG DATA INTEROPERABILITY FRAMEWORK: VOLUME 7, STANDARDS ROADMAP

654 3. BIG DATA STANDARDS


655Big Data has generated interest in a wide variety of multi-stakeholder, collaborative organizations,
656including those involved in the de jure standards process, industry consortia, and open source
657organizations. These organizations may operate differently and focus on different aspects, but they all
658have a stake in Big Data. Integrating additional Big Data initiatives with ongoing collaborative efforts is a
659key to success. Identifying which collaborative initiative efforts address architectural requirements and
660which requirements are not currently being addressed is a starting point for building future multi-
661stakeholder collaborative efforts. Collaborative initiatives include, but are not limited to the following:
662  Subcommittees and working groups of American National Standards Institute (ANSI)
663  Accredited standards development organizations (SDOs; the de jure standards process)
664  Industry consortia
665  Reference implementations
666  Open source implementations
667Some of the leading SDOs and industry consortia working on Big Data related standards include:
668  International Committee for Information Technology Standards (INCITS) and International
669 Organization for Standardization (ISO)—de jure standards process
670  Institute of Electrical and Electronics Engineers (IEEE)—de jure standards process
671  International Electrotechnical Commission (IEC)
672  Internet Engineering Task Force (IETF)
673  World Wide Web Consortium (W3C)—Industry consortium
674  Open Geospatial Consortium (OGC®)—Industry consortium
675  Organization for the Advancement of Structured Information Standards (OASIS)—Industry
676 consortium
677  Open Grid Forum (OGF)—Industry consortium
678The organizations and initiatives referenced in this document do not from an exhaustive list. It is
679anticipated that as this document is more widely distributed, more standards efforts addressing additional
680segments of the Big Data mosaic will be identified.
681There are a number of government organizations that publish standards relative to their specific problem
682areas. The US Department of Defense alone maintains hundreds of standards. Many of these are based on
683other standards (e.g., ISO, IEEE, ANSI) and could be applicable to the Big Data problem space.
684However, a fair, comprehensive review of these standards would exceed the available document
685preparation time and may not be of interest to the majority of the audience for this report. Readers
686interested in domains covered by the government organizations and standards, are encouraged to review
687the standards for applicability to their specific needs.
688Open source implementations are providing useful new technology that is being used either directly or as
689the basis for commercially supported products. These open source implementations are not just individual
690products. One needs to integrate an eco-system of products to accomplish ones goals. Because of the
691ecosystem complexity, and because of the difficulty of fairly and exhaustively reviewing open source
692implementations, such implementations are not included in this section. However, it should be noted that
693those implementations often evolve to become the de facto reference implementations for many
694technologies.

15
DRAFT NIST BIG DATA INTEROPERABILITY FRAMEWORK: VOLUME 7, STANDARDS ROADMAP

6953.1 EXISTING STANDARDS


696Appendix B presents a list of existing standards from the above listed organizations that are relevant to
697Big Data and the NBDRA. Determining the relevance of standards to the Big Data domain is challenging
698since almost all standards in some way deal with data. Whether a standard is relevant to Big Data is
699generally determined by impact of Big Data characteristics (i.e., volume, velocity, variety, and veracity)
700on the standard or, more generally, by the scalability of the standard to accommodate those
701characteristics. A standard may also be applicable to Big Data depending on the extent to which that
702standard helps to address one or more of the Big Data characteristics. Finally, a number of standards are
703also very domain or problem specific and, while they deal with or address Big Data, they support a very
704specific functional domain and developing even a marginally comprehensive list of such standards would
705require a massive undertaking involving subject matter experts in each potential problem domain, which
706is beyond the scope of the NBD-PWG.
707Within a single domain, several sub domains for standards may exist. In healthcare / life sciences for
708example, standards for terminology, devices, interoperability, systems for classification, EHR _, language
709formats, internet protocols, and a large ecosystem of accepted biomedical ontology categories for
710knowledge management, data integration, and decision support and reasoning. [healthcare+ppt]. see table
7116.
712Additional text needed: Additional text is needed to enhance the discussion of the previous paragraph,
713which discusses domain specific standards.
714In selecting standards to include in Appendix B, the working group focused on standards that met the
715following criteria:
716  Facilitate interfaces between NBDRA components
717  Facilitate the handling of data with one or more Big Data characteristics
718  Represent a fundamental function needing to be implemented by one or more NBDRA
719 components
720Appendix B represents a portion of potentially applicable standards from a portion of contributing
721organizations working in Big Data domain.
722As most standards represent some form of interface between components, the standards table in Appendix
723C indicates whether the NBDRA component would be an Implementer or User of the standard. For the
724purposes of this table the following definitions were used for Implementer and User.
725 Implementer: A component is an implementer of a standard if it provides services based
726 on the standard (e.g., a service that accepts Structured Query Language [SQL]
727 commands would be an implementer of that standard) or encodes or presents data based
728 on that standard.
729 User: A component is a user of a standard if it interfaces to a service via the standard or
730 if it accepts/consumes/decodes data represented by the standard.
731While the above definitions provide a reasonable basis for some standards the difference between
732implementation and use may be negligible or non-existent.

7333.2 PERSPECTIVES FOR VIEWING STANDARDS


734Section Scope: This section views Big Data standards from different perspectives. Functional or
735organizational perspective. Product or non-product perspective. In addition, categories need to be defined
736to breakdown the standards into subgroups. One standard may have several categories. One possible set
737of categories is compute, network, and storage.

16
DRAFT NIST BIG DATA INTEROPERABILITY FRAMEWORK: VOLUME 7, STANDARDS ROADMAP

7383.2.1 FUNCTIONAL OR ORGANIZATIONAL PERSPECTIVE


739Perspective of standards viewed as either functional, or organizational: Compliance and standards are
740often thought of as organizational issues; in contrast to compatibility, interoperability and portability,
741considered to be functional issues [which face unique dilemmas]. Nevertheless, standards can be viewed
742as having a functional nature, and…

7433.2.1.1 Types of Functional Perspectives


744Subsection Scope: Discussion and examples of functional perspectives of standards.

7453.2.1.2 Types of Organizational Perspectives


746Subsection Scope: Discussion and examples of organizational perspectives of standards.

7473.2.2 PRODUCT OR NON-PRODUCT PERSPECTIVE


748Perspective of standards viewed as either product oriented, or what is often referred to as non-product
749standards. Historically NIST has provided guidance on basic non-product standards… facilitating a
750“public good. These non-product standards can be one of two types, either” technical or basic. In some
751cases non product standards are industry wide, in contrast to the component level product standards.

7523.2.2.1 Types of Product Standards


753Product standards govern the performance and function of components used in HT [high tech] products
754and prescribe procedures to test product development, production, and market transactions. In the United
755States, businesses have typically developed product standards by reaching voluntary consensus with
756relevant stakeholders, including firms in the industry, suppliers, and R&D laboratories.

7573.2.2.2 Two Types of Non-product Standards: Technical and Basic


758Non-product standards have broader, more general utility than product standards. These standards
759generally govern the efficiency, operation, and performance of an entire industry. Examples include
760measurement and test methods, interface standards, scientific and engineering databases, and standard
761reference materials (Tassey 2015:192). Non-product standards have become increasingly important
762because many HT products are a complex mix of goods and services.
763Technical non-product standards are operational, applied functions and guidelines that govern the
764interaction or performance of services and products. One example of a technical standard would be an
765operating system for a major cell phone manufacturer, if that operating system ultimately governs the
766interface and function of the large number of applications (apps) that are created by third party companies
767and independent developers to provide apps that increase the functionality or utility of the phone itself.
768Basic non-product standards include generic measurement and test methods that are typically derived
769from fundamental scientific principles, such as laws of physics. Although these standards have wide
770applications in industry, firms and even industries tend to underinvest because they are expensive and
771require an extensive and specialized scientific infrastructure. Therefore, basic standards are considered a
772public good and usually have some degree of public involvement in many developed countries. The
773National Institute of Standards and Technology provides this function for the United States.
774Highest level classification of standards: Various acting bodies identify the range of standards each in
775their own, different terms. For example: ASTM identifies six types of standards, labeling the six types
776Classification, Guide, Practice, Specification, Terminology, and Test Method. NIST meanwhile identifies
777eight types, with other various distinctions. NIST’s types include: Basic, Data Interface, Process, Product,
778Service, Terminology, Test and Measurement. Note that in contrast to NIST’s’ main types, ASTM does
779not include Interface as a main category type.

17
DRAFT NIST BIG DATA INTEROPERABILITY FRAMEWORK: VOLUME 7, STANDARDS ROADMAP

780Domain / vertical standards:


781Within large domains, there are multiple sub domains often served by different, and incompatible
782standards. In healthcare for example, standards exist for terminology, language formats, records, systems
783of classification and biomedical ontology categories for knowledge management, as well as devices,
784internet protocols and interoperability.

7853.3 MODERNITY OF EXISTING STANDARDS


786Section Scope: This section discusses the applicability of standards to the current Big Data landscape.
787Some standards are legacy standards and do not apply to Big Data. They may apply to legacy systems or
788they may apply to previous ‘eras’ of big data, what we might refer to as Big Data 1.0 for example
789[Hadoop / batch processing… ].
790Interoperability of electronic health records and medical information continue to be a problem area in the
791healthcare / life science space. Many of the traditional standards in existence today predate modern
792problems. For Example, ISO 13606, openEHR, HL7 Messaging, LOINC, RxNorm, SNOMED CT
793Clinical Terms.
794 Table 6: Excerpt of Healthcare related standards
Standard Description Type / Class / Synonymous Application
Namespace
HL7 Messaging Interoperability; messaging HC

ISO 13606 Message exchange among EMRs Messaging HC

LOINC Logical observation identifiers Terminology; controlled HC


vocabulary

openEHR Message exchange among EMRs Messaging EHR related

RxNorm Normalized names for clinical Terminology; controlled HC


drugs vocabulary

SNOMED CT Systematized nomenclature for Terminology; controlled HC


medicine vocabulary

795
796SNOMED is inconsistent within itself. FHIR looks to solve key problems by extending RESTful web
797services but is immature.
798
799

18
DRAFT NIST BIG DATA INTEROPERABILITY FRAMEWORK: VOLUME 7, STANDARDS ROADMAP

800 4. BIG DATA STANDARDS ROADMAP


8014.1 GAP IN STANDARDS
802The potential gaps in Big Data standardization are provided in this section to describe broad areas that
803may be of interest to SDOs, consortia, and readers of this document. The list provided below was
804produced by an ISO/IEC Joint Technical Committee 1 (JTC1) Study Group on Big Data to serve as a
805potential guide to ISO in their establishment of Big Data standards activities. 3 The 16 potential Big Data
806standardization gaps, identified by the study group, described broad areas that may be of interest to this
807community. These gaps in standardization activities related to Big Data are in the following areas:
808 1. Big Data use cases, definitions, vocabulary and reference architectures (e.g., system, data,
809 platforms, online/offline)
810 2. Specifications and standardization of metadata including data provenance
811 3. Application models (e.g. batch, streaming)
812 4. Query languages including non-relational queries to support diverse data types (e.g., XML, RDF,
813 JSON, multimedia) and Big Data operations (i.e., matrix operations)
814 5. Domain-specific languages
815 6. Semantics of eventual consistency
816 7. Advanced network protocols for efficient data transfer
817 8. General and domain specific ontologies and taxonomies for describing data semantics including
818 interoperation between ontologies
819 9. Big Data security and privacy access controls.
820 10. Remote, distributed, and federated analytics (taking the analytics to the data) including data and
821 processing resource discovery and data mining
822 11. Data sharing and exchange
823 12. Data storage (e.g., memory storage system, distributed file system, data warehouse)
824 13. Human consumption of the results of big data analysis (e.g., visualization)
825 14. Energy measurement for Big Data
826 15. Interface between relational (i.e., SQL) and non-relational (i.e., Not Only or No Structured Query
827 Language [NoSQL]) data stores
828 16. Big Data quality and veracity description and management (includes MDM)
829With respect to the 16 gaps identified above, we are attempting to limit the scope of Volume 7 version 2
830by addressing the gaps which the NBD-PWG has available resources for participation on the discussion
831and composition of content. Also would like to add a few more to the list of 16. As of January 24, 2017
832the following areas are highlighted as targets for standards mapping:
833  2. Metadata
834  4. Non-relational DB query
835  11. Data sharing and exchange
836  13. Visualization and human consumption of data
837  15. Interface between relational and non-relational storage

19
DRAFT NIST BIG DATA INTEROPERABILITY FRAMEWORK: VOLUME 7, STANDARDS ROADMAP

8384.2 AN ERA OF DIVERGENCE FROM PROJECT


839 METHODOLOGY
840Section scope: This section will include a discussion about the current time period where there is a high
841level of technological innovation, branching in various directions and increasingly not aligned with
842standard project methodologies. Not only are the technologies diverging, the use of common project
843management methodologies [ / best practices] are also diverging.
844
845 Table 7: Results from Poll about Data Science Project Methodology
Data science project methodology 2014 poll 2007 poll
Crisp-DM [86] 43% 42%
My own [55] 27.5% 19%
SEMMA [17] 8.5% 13%
Other, not domain specific [16] 8% 4%
KDD Process [15] 7.5% 7.3%
My organizations’ [7] 3.5% 5.3%
A domain specific method [4] 2% 4.7%
None [0] 0% 4.7%
846

8474.3 PATHWAY TO ADDRESS STANDARDS GAPS


848Standards often evolve from implementation of best practices and approaches which are proven against
849real world applications or from theory that is tuned to reflect additional variables and conditions
850uncovered during implementation. In the case of Big Data, most standards are evolving from existing
851standards modified to address the unique characteristics of Big Data. Like many terms that have come
852into common usage in the current information age, Big Data has many possible meanings depending on
853the context from which it is viewed. Big Data discussions are complicated by the lack of accepted
854definitions, taxonomies, and common reference views. The products of the NBD-PWG are designed to
855specifically address the lack of consistency. Recognizing this lack of a common framework on which to
856build standards, ISO/IEC JTC1 has specifically charted a working group, which first focused on
857developing common definitions and a reference architecture.
858Once established, the definitions and reference architecture formed the basis for evolution of existing
859standards to meet the unique needs of Big Data and evaluation of existing implementations and practices
860as candidates for new Big Data related standards. In the first case, existing standards efforts may address
861these gaps by either expanding or adding to the existing standard to accommodate Big Data
862characteristics or developing Big Data unique profiles within the framework of the existing standards.
863The exponential growth of data is already resulting in the development of new theories addressing topics
864from synchronization of data across large distributed computing environments to addressing consistency
865in high volume and velocity environments. As actual implementations of technologies are proven,
866reference implementations will evolve based on community accepted open source efforts.

8674.4 AREAS WHERE TECHNOLOGY STANDARDS MAY HAVE


868 SIGNIFICANT IMPACT IN THE FUTURE
869Section Scope: Addresses the areas where standards would have the biggest impact.

20
DRAFT NIST BIG DATA INTEROPERABILITY FRAMEWORK: VOLUME 7, STANDARDS ROADMAP

870A number of technology areas are considered to be of significant importance and are expected to have
871sizeable impacts heading into the next decade. Any list of important technologies will obviously not
872satisfy every community member; and any line of demarcation between sizeable impact and lesser than
873sizable will arguably remain fuzzy, but for purposes of highlighting key areas that are forecast to be
874especially impactful on the economy…
875Additional text needed: Reference data showing the largest problems: Ingest, preparation [~80%],
876integration. Transition to data referencing expected solutions: discovery, integration, BI [see planned-tech
877png]; The increased usage of search for query functions (Integration and search can be discussed as
878functions
879All relies on metadata.

8804.4.1 STANDARDS GAP 2: METADATA


881Additional need: This section will be revised and enhanced.
882Metadata is the most significant of the big data problems. Metadata is the only way of finding items, yet
88380% of data lakes are not applying metadata effectively [Gartner, 2016].
884Metadata layers are ways for lesser technical users to interact with data mining systems; as well as
885facilitating governance. Metadata layers also provide a means for bridging data stored in different
886locations, such as on-premise and in the cloud.
887The metadata management field is converging with MDM and somewhat also with analytics. Metadata
888management facilitates access control and governance, change management, and reduces complexity and
889the scope of change management. The top use case will likely be data governance [Gartner, 2016].
890Definitely a need for innovation in the areas of automating search capabilities such as semantic
891enrichment [during load] and [inclusion of expert / community enrichment / crowd governance and]
892machine learning.
893Organizations that have existing metadata management systems will need to match any new metadata
894systems to the existing system, paying special attention to federation and integration issues. Organizations
895initiating new use cases or projects have much more latitude to investigate a range of potential solutions.
896Perhaps a more attainable goal for standards development will be to strive for standards for supporting
897interoperability beyond definition of ontologies or XML, where investment of labor concentrates on the
898semantic mappings instead of syntactic mapping; in smaller blocks that can be put together to form a
899larger picture.
900For example, to define conveying the semantics of who, what, where and when of an event and
901translation of an individual users’ terms, to create a module that can then be mapped to another standard.

9024.4.2 STANDARDS GAP 4: NON-RELATIONAL DB QUERY, SEARCH AND


903 INFORMATION RETRIEVAL
904Additional need: This section needs to be revised and possibly enhanced.
905Search serves as a function for interfacing with data in both retrieval and analysis use cases. Introduces a
906promise of “self-service” extraction capability over multiple sources of unstructured [and structured] big
907data in multiple internal and external locations. Accepting natural language. Finding and analyzing
908patterns, statistics, and providing conceptual summary and consumable, visual formats.
909Without going into too much detail, it may be useful to differentiate between the use of search as a data
910analysis method, and the use of search for information retrieval [IR] in the traditional sense, [as well as
911differences between web search for IR and enterprise search for IR]]. Footnote for the following 4 ppgs:

21
DRAFT NIST BIG DATA INTEROPERABILITY FRAMEWORK: VOLUME 7, STANDARDS ROADMAP

912In an age where one web search engine maintains the mindshare of the American public, it may make
913sense to describe the significantly different challenges faced in using search for retrieval or analysis of
914data that resides within an organization’s storage repositories. In web search, “Civilian” consumers are
915familiar with the experience of web search technologies, namely, instant query expansion, ranking of
916results, and rich snippets and occasional KG container. Footnote? Civilians are also familiar with standard
917functionality in personal computer file folders for information management. For large enterprises and
918organizations needing search functionality over documents, challenges persist.
919Web search engines of 2016 provide a huge service to citizens but do apply bias [maintain control] over
920how and what objects are delivered. The surrender of control that citizens willingly trade in exchange for
921the use of the free web search services is widely accepted as a value for the citizen; however future
922technologies promise even more value for the citizens who will search across the rapidly expanding scale
923of the world wide web. The case in point is commonly referred to as the semantic web.
924Current semantic approaches to searching almost all require content indexing as a measure for controlling
925the enormous corpus of documents that reside online. In attempting to tackle this problem of enormity of
926scale, the automation of content indexes for the semantic web have proven difficult to program.
927Continuing challenges for the semantic web. Two promising approaches to for developing the semantic
928web are ontologies and linked data technologies, however neither approach has proven to be a complete
929solution. Standard Ontological alternatives OWL, and RDF, which would benefit from the addition of
930linked data, suffer from an inability to effectively use linked data technology. Reciprocally, linked data
931technologies suffer from the inability to effectively use ontologies. Not apparent how standards in these
932areas would be an asset to the concept of an all-encompassing semantic web, or improve retrieval over
933that scale of data.
934Use of search in data analysis: Steady increase in the use of [logical] search systems as the superior
935method for information retrieval on data at rest [citation needed]. Generally speaking analytics search
936indexes can be constructed more quickly than natural language processing [NLP] search systems; NLP
937requiring semi-supervision, can have 20% error rates [Thoughtspot (only an example placeholder which
938will be replaced with vendor neutral language)]. Currently, Contextual Query Language 4, declarative
939logic programming languages [Datalog (only an example placeholder which will be replaced with vendor
940neutral language)], and RDF5 query languages [Sparql (only an example placeholder which will be
941replaced with vendor neutral language)] serve as search query language / NoSQL language structure
942standards.
943Discuss strengths in data acquisition, connectors and ingest. Speed and scale. A particular product’s
944underlying tech will likely be document, metadata or numerically focused, not all three.
945Architecturally speaking, indexing is the centerpiece. Metadata provides context. ML can provide
946enrichment.
947After indexing: Query planning / functionality: the age of big data has applied a downward pressure on
948the use of standard indexes. Indexes are good for small queries but have three issues: cause slow loading;
949ad hoc queries require advance column indexing; and lastly, the constant updating that is required to
950maintain indexes quickly becomes prohibitively expensive. In an example case, one SQL engine project
951[HAWQ] dropped indexing in favor of _ when it migrated from its original [Greenplum (only an example
952placeholder which will be replaced with vendor neutral language)] DB to a distributed file system [HDFS
953(only an example placeholder which will be replaced with vendor neutral language)]. One open source
954search technology [Lucene (only an example placeholder which will be replaced with vendor neutral
955language)] provides an incremental indexing technique that solves some part of this problem.
956Generally speaking, access and IR functions will remain areas of continual work in progress. In some
957cases, silo architectures for data are a necessary condition for running an organization, legal and security

22
DRAFT NIST BIG DATA INTEROPERABILITY FRAMEWORK: VOLUME 7, STANDARDS ROADMAP

958reasons being the most obvious. Proprietary, patented access methods are a barrier to building connectors
959required for true federated search. Unified info access; UIMA. Unified indexing vs. federation.
960Incredibly valuable external data is underused in most search implementations because of the lack of an
961appropriate architecture. The ideal framework would separate content acquisition from content processing
962by putting a data buffer (a big copy of the data) between them. With this framework one could gather data
963but defer the content processing decisions until later. Documents are “pre-joined” when they are
964processed for indexing; and large, mathematically challenging [relevancy [fuzzy matching, vectors [better
965than tf-idf] [what about scoring]] algorithms and complex search security requirements [encryption] can
966be run at index time.
967With this framework search may become superior to SQL for OLAP and data warehousing. Search is
968faster, more powerful, scalable, and schema free. Records can be output in XML and JSON and then
969loaded into a search engine. Fields can be mapped as needed.
970Tensions exist between a search systems’ functional power and its ease of use. Discovery. Initially the
971facets in a sidebar, loaded when a search system returned a result set. Supplement. WAIS “relied on
972standards for content representation” but now there are hundreds of formats. Open source tech promises
973power and flexibility to customize, but comes at the price of being technically demanding or requiring
974skilled staff to setup and operate [or third party to maintain].
975Compatibility with different ETL techniques. Standards for connectors to: CMS, collaboration apps, web
976portals, social media apps, CRM, and file systems and databases.
977Standards for content processing. Compatibility with normalizing techniques, records merging formats,
978external taxonomies or semantic resources, REGEX, or use of metadata for supporting interface
979navigation functionality.
980Standards for describing relationships between different data sources; and maintaining metadata context
981relationships. Semantic platforms to enhance information discovery and data integration applications. Rdf
982and ontology mapping can provide semantic uniformity. Rdf graph for visualization, ontology for
983descriptions of elements.

9844.4.3 STANDARDS GAP 11: DATA SHARING AND EXCHANGE


985Section Scope: This section will discuss the standards gap #11 and the impact standards could have on
986this area. It might be useful to find other SDO tackling data sharing and exchange. Look into the open
987data FAIR (Findable, Accessible, Interoperable, and Re-usable) data principle.
988

9894.4.4 STANDARDS GAP 13: VISUALIZATION AND HUMAN CONSUMPTION OF


990 DATA
991Section Scope: This section will discuss the standards gap #13 and the impact standards could have on
992this area.
993

9944.4.5 STANDARDS GAP 15: INTERFACE BETWEEN RELATIONAL AND NON-


995 RELATIONAL STORAGE
996Section Scope: This section will discuss the standards gap #15 and the impact standards could have on
997this area.
998

23
DRAFT NIST BIG DATA INTEROPERABILITY FRAMEWORK: VOLUME 7, STANDARDS ROADMAP

9994.4.6 INTEGRATION
1000Section Scope: This section will discuss standards for integration.
1001
1002Integration use cases:
1003 Data acquisition for data warehouses and analytics applications.
1004 Supporting MDM [and sharing metadata].
1005 Supporting governance [potential interoperability with mining, profiling, quality].
1006 Data migration.
1007 Intra-organization and data consistency between apps, DWs, MDM.
1008 Inter-organizational data sharing.
1009 System integration; system consolidation; certified integration interfaces.
1010Traditional integration focused on the mechanics of moving data to or from different types of data
1011structures. Big data systems require more attention to other related systems such as MDM and data
1012quality.
1013Increased importance of metadata; increased demand for metadata interfaces that provide non-technical
1014users with functionality for working with metadata.
1015Message formats for table: EDI and SWIFT
1016Shallow integration.
1017
1018

24
DRAFT NIST BIG DATA INTEROPERABILITY FRAMEWORK: VOLUME 7, STANDARDS ROADMAP

1019 5. ADDITIONAL TOPICS TO BE


1020 CONSIDERED
1021Section Scope: This section is merely a repository for additional items and topics that might be included
1022in the document (but have not yet been proposed to the PWG or have not yet been assigned to a proper
1023section).

10245.1 VARIOUS TOPICS


10255.1.1 INTERNET OF THINGS (AREA OF IMPACT)
1026Section Scope: this section could investigate the impact of standards for IoT, including a recognized need
1027in the area of encrypted network traffic.
1028The expansion of the internet of things is well underway, and the expanse of the industry is a challenge in
1029itself for the diversity of the types of devices and supporting networks is creating a divergence of
1030compatibility between systems, which is reciprocal to convergence toward standards. Perhaps the arch use
1031case for data in motion.
1032Barrier: Security. Thing to Cloud security. Iot devices are in vulnerable environments; not behind any
1033firewall.
1034Gaps: encrypted network traffic.
1035Question of whether IoT device manufacturers have usurped responsibility of devising standards for the
1036collection of external data.

10375.1.2 NETWORK CONNECTIVITY


1038Internet bandwidth is the primary constraint affecting smart grid power control, cloud computing, data
1039mining, grid computing, NLP and distributed storage. [constraints png]
1040New technologies in ICT: based on patent filings, ICT has the largest numbers. Healthcare industry has
1041exploited 50% of the IPv6 related inventions [398] [cisco]. Automotive industry has exploited 35%. 47%
1042of US R&D is in ICT. 600 transportation patents in 2014. Europe’s adoption of network security
1043legislation is 2.5x higher than the US.
1044Networking: 5G wifi.
1045 Information centric networking [named information]: CCN 2009, DONA 2007, PSIRP 2008,
1046 NetInf, SAIL 2010, PURSUIT 2010, NDN 2010, ANR Connect 2011.
1047 network function virtualization, SDN routers, smart fog, data fabric [NoSQL storage; r/t
1048 anaytics]. Webrtc browser to browser [peer to peer] applications.
1049 visible light communication
1050Computing; network related:
1051 h.265 streaming video
1052 vp9 streaming video

25
DRAFT NIST BIG DATA INTEROPERABILITY FRAMEWORK: VOLUME 7, STANDARDS ROADMAP

1053 daala streaming video


1054 homo morphic encryption
1055 elliptic curve cryptography
1056 silicon photonics
1057 quantum computing
1058A great deal of attention has been paid to analytics applications, however implementers realize that
1059infrastructure is the more difficult data problem. Physically, connectivity [bandwidth] is the most
1060significant restraint. Mesh [+simplification], Cell, Internet Network. Location messaging protocols.
1061Foursquare can pinpoint your location but 911 cannot. Requirements: 1, standardization of location
1062update frequency [CEP]. 2, Exception handling in the event of missing messages; possibly inferring
1063position through the use of external data.
1064Privacy a component of the predictive context engine. Process:
1065Sources of context are aggregated [aggregator]; near real time analysis provides information [analysis];
1066decision engine infers an action; individuals retain the control over what information they share.

10675.1.3 COMPLEX EVENT PROCESSING


1068Section Scope: related to use case # 3
1069Complex events. Chronicle Recognition System, SASE. See complex-event-rec.

10705.1.4 IPAAS
1071Lightweight platforms, open source technology for iPaaS delivery

10725.1.5 CROWDSOURCED MEDIATION


1073Section Scope: Two discourses: primarily) the use of crowd inputs as data sources and refinement of
1074output information; secondarily) the use of crowd inputs for cleansing processes and data preparation
1075prior to modeling.
1076Methods for integrating crowdsourcing into computing tasks, enabling collectives of humans to interact
1077with and participate in distributed projects in citizen science for example, so as to extend the scope of
1078computer driven activities. HCI / interface dependent.
1079Machine learning systems are obvious candidates for scalable data transformation tasks but activities such
1080as data categorization, curation and identification [/ named entity recognition] are still best served through
1081inclusion of humans in the loop. [NITRD] The limitations of AI for making preparation processing
1082decisions [and usage of labeling and sorting turks].
1083

10845.2 DATA STANDARDS


1085Section Scope: Data standards; types of available data sets (commercial, open data, etc.); application level
1086services (DaaS); Open Data and government initiatives
1087Data standards: defined reference data sets, exchange, naming conventions, storage…
1088Commercial data sets and open marketplaces. Check: data.gov, premise, yodlee, xignite, plaid, quandle,
1089standard treasury, validic, human api.

26
DRAFT NIST BIG DATA INTEROPERABILITY FRAMEWORK: VOLUME 7, STANDARDS ROADMAP

1090Markets: scope of classification of: azure mktplace, bluekai, knoema, factual, datamarket
1091Azure marketplace:
1092Bluekai: acquired by oracle in 2014 [400m]. Aggregated consumer browsing behavior. Oracle integrated
1093it with marketing automation platforms.
1094Datamarket: acquired by Qlik in 2014 [13m]. Provided a search engine support for analyzing market
1095statistics; heavy focus on gov data. Qlik was acquired by thoma bravo for approx. 3 bn.
1096Factual: location data. 62m total funding, 35m series b.
1097Knoema: over 500 free DBs, visualization tool.
1098Aggregation APIs. Human readable mechanisms for sharing. Closed clearinghouses.
1099Gap #11: Open Data: Sharing improves the value of existing data. Availability and access to data through
1100APIs. Interoperability through customized metadata. Some government led initiatives in dataset
1101cataloging, include data.gov, National Laboratory of Medicine, but on average, Open Data is a very small
1102percentage of the data sourced by business and industry. Open Data is an area which will benefit from
1103additional standards work.
1104Collaborations and joint solicitations. CTTBD, NOAA, BD Hubs [may be able to apply some text from
1105chatlog here]. Pilots.
1106Context information: “Best practices for data sharing platforms today include not only [the retention] and
1107sharing [of] data assets themselves but also sharing the context of data collection, generation and analysis.
1108This process includes metadata strategies but may also include the actual gathering techniques, analytics
1109codes, processes and workflows used to collect and analyze the data.” [NITRD]
1110Gap #8: Annotation systems for workflow registries similar to business process modeling and notation
1111[BPMN]. Automated; NLP approaches for semantic context information extraction. More than NER,
1112include identification of [relationships] related relevant information; for the purposes of supporting cross-
1113domain or multi-level annotation systems. Data as a service is an application level service [in contrast to
1114IaaS].
1115Information valuation.

11165.3 LANDSCAPE PERSPECTIVE AND POSSIBILITY


1117 FRONTIER
1118Section Scope: that outlined in section 4 or 5 describe technology that enables analysis. The roadmap
1119should investigate how analytics can be tied together; and how they can be reused.
1120Cross agency frameworks for reducing bureaucratic hurdles. Interoperability requirements. Aggregation:
1121descriptive data, statistics, correlation and exploratory visualization.
1122A shift from centralized stewardship, to a decentralized environment where roles become as granular as
1123each individual who has access. This is tied to the self service movement in business intelligence and
1124analytics solutions.

11255.3.1 DIVERGENCE VS. CONVERGENCE TOWARD BIG DATA


1126 STANDARDS
1127Section Scope: How new technologies branch off into individual directions, in contrast to how standards
1128attempt to consolidate. Also the associated challenges; or how diverging use cases or technologies
1129eventually align onto standards.

27
DRAFT NIST BIG DATA INTEROPERABILITY FRAMEWORK: VOLUME 7, STANDARDS ROADMAP

1130Innovation is itself a form of variation. While some sectors or industries such as construction are more
1131likely to attempt to reduce variation, which creates an environment fostering standardization, certain
1132industries / sectors innovate so frequently as to create environments that produce large amounts of
1133variance and divergence from standardization.
1134The software industry, a major part of the big data phenomenon, is one such industry that is divergent.
1135Distributed computing, cloud computing,
1136In comparison to financial services for example, [the data is on premise? The applications are on
1137premise?] The organizational structure of a software firm is decentralized; interactivity of information
1138flows in the software industry is decentralizing, adds to the challenge of developing standards big data
1139[Box]. Regulations.
1140Convergence of problem areas: parallelism; descriptive data;
1141 “[I]ncentivize big data and data science research communities to provide comprehensive documentation
1142on their analysis workflows and related data, driven by metadata standards and annotation systems. Such
1143efforts will encourage greater data reuse and provide a greater return on research investments.” [NITRD]
1144From 4.2: While it makes sense to discuss the implications of standards in relation to specific vertical
1145domains such as healthcare where various technologies are ultimately applied, Section 5.3 focuses
1146somewhat on the Big Data landscape with attention to the technologies themselves, not the applications of
1147those technologies. Technology itself is rarely a solution to any situation and is most likely to be an
1148empowering means to improve its users’ effectiveness, but for purposes of discussing standards in
1149technology, ………. From the perspective of underlying mechanisms.
1150U.S. industries have also developed technical nonproduct standards through a voluntary consensus
1151approach. [A technical standard is “a document that provides requirements, specifications, guidelines or
1152characteristics that can be used consistently to ensure that materials, products, processes and services are
1153fit for their purpose.”* * The source of this definition is the International Organization for Standardization
1154(http://www.iso.org/iso/home/standards.htm).]
1155
1156BARRIERS
1157Cloud architectures create challenges for governance.
1158
1159

28
DRAFT NIST BIG DATA INTEROPERABILITY FRAMEWORK: VOLUME 7, STANDARDS ROADMAP

1160 Appendix A: Acronyms


1161The acronym list will be revised once Volume 7 is closer to completion.
1162AMQP Advanced Message Queuing Protocol
1163ANSI American National Standards Institute
1164API application programming interface
1165AVDL Application Vulnerability Description Language
1166BDAP Big Data Application Provider component
1167BDFP Big Data Framework Provider component
1168BIAS Biometric Identity Assurance Services
1169CGM Computer Graphics Metafile
1170CIA confidentiality, integrity, and availability
1171CMIS Content Management Interoperability Services
1172CPR Capability Provider Requirements
1173DC Data Consumer component
1174DCAT Data Catalog Vocabulary
1175DCR Data Consumer Requirements
1176DOM Document Object Model
1177DP Data Provider component
1178DSML Directory Services Markup Language
1179DSR Data Source Requirements
1180DSS Digital Signature Service
1181EPP Extensible Provisioning Protocol
1182EXI Efficient XML Interchange
1183GeoXACML Geospatial eXtensible Access Control Markup Language
1184GML Geography Markup Language
1185GRC governance, risk management, and compliance
1186HTML HyperText Markup Language
1187IEC International Electrotechnical Commission
1188IEEE Institute of Electrical and Electronics Engineers
1189IETF Internet Engineering Task Force
1190INCITS International Committee for Information Technology Standards
1191ISO International Organization for Standardization

A-1
DRAFT NIST BIG DATA INTEROPERABILITY FRAMEWORK: VOLUME 7, STANDARDS ROADMAP

1192IT information technology


1193ITL Information Technology Laboratory
1194ITS Internationalization Tag Set
1195JPEG Joint Photographic Experts Group
1196JSON JavaScript Object Notation
1197JSR Java Specification Request
1198JTC1 Joint Technical Committee 1
1199LMR Lifecycle Management Requirements
1200M Management Fabric
1201MFI Metamodel Framework for Interoperability
1202MOWS Management of Web Services
1203MPEG Moving Picture Experts Group
1204MQTT Message Queuing Telemetry Transport
1205MUWS Management Using Web Services
1206MUWS Management Using Web Services
1207NARA National Archives and Records Administration
1208NASA National Aeronautics and Space Administration
1209NBD-PWG NIST Big Data Public Working Group
1210NCAP Network Capable Application Processor
1211netCDF network Common Data Form
1212NIST National Institute of Standards and Technology
1213NoSQL Not Only or No Structured Query Language
1214NSF National Science Foundation
1215OASIS Organization for the Advancement of Structured Information Standards
1216OData Open Data
1217ODMS On Demand Model Selection
1218OGC Open Geospatial Consortium
1219OpenMI Open Modelling Interface Standard
1220OR Other Requirements
1221OWS Context Web Services Context Document
1222P3P Platform for Privacy Preferences Project
1223PICS Platform for Internet Content Selection
1224PMML Predictive modeling markup language
1225POWDER Protocol for Web Description Resources
1226RDF Resource Description Framework

A-2
DRAFT NIST BIG DATA INTEROPERABILITY FRAMEWORK: VOLUME 7, STANDARDS ROADMAP

1227RFID radio frequency identification


1228RIF Rule Interchange Format
1229RPM Redhat Package Manager
1230S&P Security and Privacy Fabric
1231SAF Symptoms Automation Framework
1232SAML Security Assertion Markup Language
1233SDOs standards development organizations
1234SFA Simple Features Access
1235SKOS Simple Knowledge Organization System Reference
1236SLAs service-level agreements
1237SML Service Modeling Language
1238SNMP Simple Network Management Protocol
1239SO System Orchestrator component
1240SOAP Simple Object Access Protocol
1241SPR Security and Privacy Requirements
1242SQL Structured Query Language
1243SWE Sensor Web Enablement
1244SWS Search Web Services
1245TEDS Transducer Electronic Data Sheet
1246TJS Table Joining Service
1247TPR Transformation Provider Requirements
1248TR Technical Report
1249UBL Universal Business Language
1250UDDI Universal Description, Discovery and Integration
1251UDP User Datagram Protocol
1252UIMA Unstructured Information Management Architecture
1253UML Unified Modeling Language
1254UOML Unstructured Operation Markup Language
1255W3C World Wide Web Consortium
1256WCPS Web Coverage Processing Service Interface
1257WCS Web Coverage Service
1258WebRTC
1259WFS Web Feature Service
1260WMS Web Map Service
1261WPS Web Processing Service

A-3
DRAFT NIST BIG DATA INTEROPERABILITY FRAMEWORK: VOLUME 7, STANDARDS ROADMAP

1262WS-BPEL Web Services Business Process Execution Language


1263WS-Discovery Web Services Dynamic Discovery
1264WSDL Web Services Description Language
1265WSDM Web Services Distributed Management
1266WS-Federation Web Services Federation Language
1267WSN Web Services Notification
1268XACML eXtensible Access Control Markup Language
1269XDM XPath Data Model
1270X-KISS XML Key Information Service Specification
1271XKMS XML Key Management Specification
1272X-KRSS XML Key Registration Service Specification
1273XMI XML Metadata Interchange
1274XML Extensible Markup Language
1275XSLT Extensible stylesheet language Transformations
1276
1277
1278

A-4
DRAFT NIST BIG DATA INTEROPERABILITY FRAMEWORK: VOLUME 7, STANDARDS ROADMAP

1279 Appendix B: Collection of Big Data Related Standards


1280The following table contains a collection of standards that pertain to a portion of the Big Data ecosystem. This collection is current, as of the date
1281of publication of Volume 7. It is not an exhaustive list of standards that could relate to Big Data but rather a representative list of the standards that
1282significantly impact some area of the Big Data ecosystem.
1283The standards were chosen based on the following criteria.
1284The text below will be modified to enhance the discussion of how the standards were chosen to be included – criteria, etc.
1285In selecting standards to include in Appendix B, the working group focused on standards that would do the following:
1286  Facilitate interfaces between NBDRA components
1287  Facilitate the handling of data with one or more Big Data characteristics
1288  Represent a fundamental function needing to be implemented by one or more NBDRA components
1289Appendix B represents a portion of potentially applicable standards from a portion of contributing organizations working in Big Data domain.
1290
1291 Table B-1: Big Data Related Standards

Standard Name/Number Description URL


ISO/IEC 9075 defines SQL. The scope of SQL is the definition of data
structure and the operations on data stored in that structure. ISO/IEC
ISO/IEC 9075-*
9075-1, ISO/IEC 9075-2 and ISO/IEC 9075-11 encompass the
minimum requirements of the language. Other parts define extensions.
Guidelines for the Organization and Representation of Data Elements
ISO/IEC Technical Report (TR) 9789
for Data Interchange

B-1
DRAFT NIST BIG DATA INTEROPERABILITY FRAMEWORK: VOLUME 7, STANDARDS ROADMAP

Standard Name/Number Description URL


The 11179 standard is a multipart standard for the definition and
implementation of Metadata Registries. The series includes the
following parts:
 Part 1: Framework http://metadata-standards.org/11179/#A1
ISO/IEC 11179-*  Part 2: Classification http://metadata-standards.org/11179/#A2
 Part 3: Registry metamodel and basic attributes http://metadata-standards.org/11179/#A3
 Part 4: Formulation of data definitions http://metadata-standards.org/11179/#A4
 Part 5: Naming and identification principles http://metadata-standards.org/11179/#A5
 Part 6: Registration http://metadata-standards.org/11179/#A6
ISO/IEC 10728-* Information Resource Dictionary System Services Interface
ISO/IEC 13249-* Database Languages – SQL Multimedia and Application Packages
This is a series of TRs on SQL related technologies.
 Part 1: Xquery
 Part 2: SQL Support for Time-Related Information
ISO/IE TR 19075-*
 Part 3: Programs Using the Java Programming Language
 Part 4: Routines and Types Using the Java Programming
Language
ISO/IEC 19503 Extensible Markup Language (XML) Metadata Interchange (XMI)
ISO/IEC 19773 Metadata Registries Modules
ISO/IEC TR 20943 Metadata Registry Content Consistency
Information Technology—Metamodel Framework for Interoperability
(MFI) ISO/IEC 19763, Information Technology –MFI. The 19763
standard is a multipart standard that includes the following parts:
 Part 1: Reference model
 Part 3: Metamodel for ontology registration
 Part 5: Metamodel for process model registration
 Part 6: Registry Summary
ISO/IEC 19763-*  Part 7: Metamodel for service registration
 Part 8: Metamodel for role and goal registration
 Part 9: On Demand Model Selection (ODMS) TR
 Part 10: Core model and basic mapping
 Part 12: Metamodel for information model registration
 Part 13: Metamodel for forms registration
 Part 14: Metamodel for dataset registration
 Part 15: Metamodel for data provenance registration

B-2
DRAFT NIST BIG DATA INTEROPERABILITY FRAMEWORK: VOLUME 7, STANDARDS ROADMAP

Standard Name/Number Description URL


ISO/IEC 9281:1990 Information Technology—Picture Coding Methods
Information Technology—Digital Compression and Coding of
ISO/IEC 10918:1994
Continuous-Tone Still Images
Information Technology—Coding of Moving Pictures and Associated
ISO/IEC 11172:1993
Audio for Digital Storage Media at up to About 1,5 Mbit/s
Information Technology—Generic Coding of Moving Pictures and
ISO/IEC 13818:2013
Associated Audio Information
ISO/IEC 14496:2010 Information Technology—Coding of Audio-Visual Objects
Information Technology—JPEG (Joint Photographic Experts Group)
ISO/IEC 15444:2011
2000 Image Coding System
Information Technology—Multimedia Framework (MPEG [Moving
ISO/IEC 21000:2003
Picture Experts Group]-21)
ISO 6709:2008 Standard Representation of Geographic Point Location by Coordinates
ISO 19115-* Geographic Metadata
ISO 19110 Geographic Information Feature Cataloging
ISO 19139 Geographic Metadata XML Schema Implementation
ISO 19119 Geographic Information Services
ISO 19157 Geographic Information Data Quality
ISO 19114 Geographic Information—Quality Evaluation Procedures
Information Technology—Smart transducer interface for sensors and
actuators
 Part 1: Network Capable Application Processor (NCAP)
information model
 Part 2: Transducer to microprocessor communication
IEEE 21451 -* protocols and Transducer Electronic Data Sheet (TEDS)
formats
 Part 4: Mixed-mode communication protocols and TEDS
formats
 Part 7: Transducer to radio frequency identification (RFID)
systems communication protocols and TEDS formats
IEEE 2200-2012 Standard Protocol for Stream Management in Media Client Devices
Information Technology—Security Techniques—Evaluation Criteria
ISO/IEC 15408-2009
for IT Security
Information Technology—Security Techniques—Information Security
ISO/IEC 27010:2012
Management for Inter-Sector and Inter-Organizational Communications

B-3
DRAFT NIST BIG DATA INTEROPERABILITY FRAMEWORK: VOLUME 7, STANDARDS ROADMAP

Standard Name/Number Description URL


ISO/IEC 27033-1:2009 Information Technology—Security Techniques—Network Security
Information Technology—Security Techniques—Guidelines for the
ISO/IEC TR 14516:2002
Use and Management of Trusted Third Party Services
ISO/IEC 29100:2011 Information Technology—Security Techniques—Privacy Framework
ISO/IEC 9798:2010 Information Technology—Security Techniques—Entity Authentication
ISO/IEC 11770:2010 Information Technology—Security Techniques—Key Management
Information Technology—Security Techniques—Information Security
ISO/IEC 27035:2011
Incident Management
Information Technology—Security Techniques—Guidelines for
ISO/IEC 27037:2012 Identification, Collection, Acquisition and Preservation of Digital
Evidence
JSR (Java Specification Request) 221
(developed by the Java Community JDBC™ 4.0 Application Programming Interface (API) Specification
Process)
W3C XML XML 1.0 (Fifth Edition) W3C Recommendation 26 November 2008
The RDF is a framework for representing information in the Web. RDF
W3C Resource Description
graphs are sets of subject-predicate-object triples, where the elements
Framework (RDF)
are used to express descriptions of resources.
W3C JavaScript Object Notation JSON-LD 1.0 A JSON-based Serialization for Linked Data W3C
(JSON)-LD 1.0 Recommendation 16 January 2014
This series of specifications define the DOM, a platform- and language-
W3C Document Object Model (DOM) neutral interface that allows programs and scripts to dynamically access
Level 1 Specification and update the content, structure and style of HyperText Markup
Language (HTML) and XML documents.
The XQuery specifications describe a query language called XQuery,
W3C XQuery 3.0 which is designed to be broadly applicable across many types of XML
data sources.
This specification describes the syntax and semantics of XProc: An
W3C XProc XML Pipeline Language, a language for describing operations to be
performed on XML documents.
W3C XML Encryption Syntax and This specification covers a process for encrypting data and representing
Processing Version 1.1 the result in XML.
This specification covers XML digital signature processing rules and
W3C XML Signature Syntax and syntax. XML Signatures provide integrity, message authentication,
Processing Version 1.1 and/or signer authentication services for data of any type, whether
located within the XML that includes the signature or elsewhere.

B-4
DRAFT NIST BIG DATA INTEROPERABILITY FRAMEWORK: VOLUME 7, STANDARDS ROADMAP

Standard Name/Number Description URL


XPath 3.0 is an expression language that allows the processing of
values conforming to the data model defined in [XQuery and XPath
Data Model (XDM) 3.0]. The data model provides a tree representation
W3C XPath 3.0
of XML documents as well as atomic values and sequences that may
contain both references to nodes in an XML document and atomic
values.
This specification defines the syntax and semantics of XSLT 2.0, a
W3C XSL Transformations (XSLT)
language for transforming XML documents into other XML
Version 2.0
documents.
This specification covers the EXI format. EXI is a very compact
W3C Efficient XML Interchange representation for the XML Information Set that is intended to
(EXI) Format 1.0 (Second Edition) simultaneously optimize performance and the utilization of
computational resources.
The Data Cube vocabulary provides a means to publish multi-
W3C RDF Data Cube Vocabulary dimensional data, such as statistics on the Web using the W3C RDF
standard.
DCAT is an RDF vocabulary designed to facilitate interoperability
W3C Data Catalog Vocabulary
between data catalogs published on the Web. This document defines the
(DCAT)
schema and provides examples for its use.
W3C HTML5 A vocabulary and
This specification defines the 5th major revision of the core language of
associated APIs for HTML and
the World Wide Web—HTML.
XHTML
The ITS 2.0 specification enhances the foundation to integrate
W3C Internationalization Tag Set automated processing of human language into core Web technologies
(ITS) 2.0 and concepts that are designed to foster the automated creation and
processing of multilingual Web content.
The OWL 2 Web Ontology Language, informally OWL 2, is an
W3C OWL 2 Web Ontology Language ontology language for the Semantic Web with formally defined
meaning.
The P3P enables Web sites to express their privacy practices in a
W3C Platform for Privacy Preferences
standard format that can be retrieved automatically and interpreted
(P3P) 1.0
easily by user agents.
POWDER—the Protocol for Web Description Resources—provides a
W3C Protocol for Web Description
mechanism to describe and discover Web resources and helps the users
Resources (POWDER)
to make a decision whether a given resource is of interest.

B-5
DRAFT NIST BIG DATA INTEROPERABILITY FRAMEWORK: VOLUME 7, STANDARDS ROADMAP

Standard Name/Number Description URL


Provenance is information about entities, activities, and people
involved in producing a piece of data or thing, which can be used to
form assessments about its quality, reliability or trustworthiness. The
W3C Provenance Provenance Family of Documents (PROV) defines a model,
corresponding serializations and other supporting definitions to enable
the inter-operable interchange of provenance information in
heterogeneous environments such as the Web.
RIF is a series of standards for exchanging rules among rule systems, in
W3C Rule Interchange Format (RIF)
particular among Web rule engines.
This specification defines the SML, Version 1.1 used to model complex
W3C Service Modeling Language
services and systems, including their structure, constraints, policies, and
(SML) 1.1
best practices.
W3C Simple Knowledge Organization This document defines the SKOS, a common data model for sharing
System Reference (SKOS) and linking knowledge organization systems via the Web.

W3C Simple Object Access Protocol SOAP is a protocol specification for exchanging structured information
(SOAP) 1.2 in the implementation of web services in computer networks.
SPARQL is a language specification for the query and manipulation of
W3C SPARQL 1.1
linked data in a RDF format.
W3C Web Service Description This specification describes the WSDL Version 2.0, an XML language
Language (WSDL) 2.0 for describing Web services.
This standard specifies protocols for distributing and registering public
keys, suitable for use in conjunction with the W3C Recommendations
W3C XML Key Management for XML Signature [XML-SIG] and XML Encryption [XML-Enc]. The
Specification (XKMS) 2.0 XKMS comprises two parts:
 The XML Key Information Service Specification (X-KISS)
 The XML Key Registration Service Specification (X-KRSS).

This series of standard covers Catalogue Services based on


OGC® OpenGIS® Catalogue Services
ISO19115/ISO19119 are organized and implemented for the discovery,
Specification 2.0.2 -ISO Metadata
retrieval and management of data metadata, services metadata and
Application Profile
application metadata.

B-6
DRAFT NIST BIG DATA INTEROPERABILITY FRAMEWORK: VOLUME 7, STANDARDS ROADMAP

Standard Name/Number Description URL


The GeoAPI Standard defines, through the GeoAPI library, a Java
language API including a set of types and methods which can be used
OGC® OpenGIS® GeoAPI for the manipulation of geographic information structured following the
specifications adopted by the Technical Committee 211 of the ISO and
by the OGC®.
The OGC® GeoSPARQL standard supports representing and querying
geospatial data on the Semantic Web. GeoSPARQL defines a
OGC® OpenGIS® GeoSPARQL vocabulary for representing geospatial data in RDF, and it defines an
extension to the SPARQL query language for processing geospatial
data.
The GML is an XML grammar for expressing geographical features.
OGC® OpenGIS® Geography Markup
GML serves as a modeling language for geographic systems as well as
Language (GML) Encoding Standard
an open interchange format for geographic transactions on the Internet.
The Policy Language introduced in this document defines a geo-
OGC® Geospatial eXtensible Access
specific extension to the XACML Policy Language, as defined by the
Control Markup Language
OASIS standard eXtensible Access Control Markup Language
(GeoXACML) Version 1
(XACML), Version 2.0”
netCDF is a set of software libraries and self-describing, machine-
OGC® network Common Data Form
independent data formats that support the creation, access, and sharing
(netCDF)
of array-oriented scientific data.
The purpose of the OpenMI is to enable the runtime exchange of data
OGC® Open Modelling Interface between process simulation models and also between models and other
Standard (OpenMI) modelling tools such as databases and analytical and visualization
applications.
This OGC standard specifies the Geo and Time extensions to the
OGC® OpenSearch Geo and Time
OpenSearch query protocol. OpenSearch is a collection of simple
Extensions
formats for the sharing of search results.
The OGC® OWS Context was created to allow a set of configured
OGC® Web Services Context
information resources (service set) to be passed between applications
Document (OWS Context)
primarily as a collection of services.
This series of standards support interoperability interfaces and metadata
encodings that enable real time integration of heterogeneous sensor
OGC® Sensor Web Enablement (SWE) webs. These standards include a modeling language (SensorML),
common data model, and sensor observation, planning, and alerting
service interfaces.
Describes the common architecture for simple feature geometry and is
OGC® OpenGIS® Simple Features
also referenced as ISO 19125. It also implements a profile of the spatial
Access (SFA)
schema described in ISO 19107:2003.

B-7
DRAFT NIST BIG DATA INTEROPERABILITY FRAMEWORK: VOLUME 7, STANDARDS ROADMAP

Standard Name/Number Description URL


® ®
OGC OpenGIS Georeferenced Table This standard is the specification for a TJS that defines a simple way to
Joining Service (TJS) Implementation describe and exchange tabular data that contains information about
Standard geographic objects.
OGC® OpenGIS® Web Coverage Defines a protocol-independent language for the extraction, processing,
Processing Service Interface (WCPS) and analysis of multi-dimensional gridded coverages representing
Standard sensor, image, or statistics data.
This document specifies how a WCS offers multi-dimensional coverage
OGC® OpenGIS® Web Coverage
data for access over the Internet. This document specifies a core set of
Service (WCS)
requirements that a WCS implementation must fulfill.
The WFS standard provides for fine-grained access to geographic
information at the feature and feature property level. This International
OGC® Web Feature Service (WFS) 2.0
Standard specifies discovery operations, query operations, locking
Interface Standard
operations, transaction operations and operations to manage stored,
parameterized query expressions.
The OpenGIS® WMS Interface Standard provides a simple HTTP
OGC® OpenGIS® Web Map Service
interface for requesting geo-registered map images from one or more
(WMS) Interface Standard
distributed geospatial databases.
The OpenGIS® WPS Interface Standard provides rules for
standardizing how inputs and outputs (requests and responses) for
geospatial processing services, such as polygon overlay. The standard
OGC® OpenGIS® Web Processing
also defines how a client can request the execution of a process, and
Service (WPS) Interface Standard
how the output from the process is handled. It defines an interface that
facilitates the publishing of geospatial processes and clients’ discovery
of and binding to those processes.
Standard for business to business exchange of messages via a web
OASIS AS4 Profile of ebMS 3.0 v1.0
service platform.
The AMQP is an open internet protocol for business messaging. It
OASIS Advanced Message Queuing
defines a binary wire-level protocol that allows for the reliable
Protocol (AMQP) Version 1.0
exchange of business messages between two parties.
This specification describes a standard XML format that allows entities
OASIS Application Vulnerability
(such as applications, organizations, or institutes) to communicate
Description Language (AVDL) v1.0
information regarding web application vulnerabilities.
OASIS Biometric Identity Assurance This OASIS BIAS profile specifies how to use XML (XML10) defined
Services (BIAS) Simple Object Access in ANSI INCITS 442-2010—BIAS to invoke SOAP -based services
Protocol (SOAP) Profile v1.0 that implement BIAS operations.

B-8
DRAFT NIST BIG DATA INTEROPERABILITY FRAMEWORK: VOLUME 7, STANDARDS ROADMAP

Standard Name/Number Description URL


The CMIS standard defines a domain model and set of bindings that
OASIS Content Management include Web Services and ReSTful AtomPub that can be used by
Interoperability Services (CMIS) applications to work with one or more Content Management
repositories/systems.
This specification describes two XML-based request/response protocols
- a signing protocol and a verifying protocol. Through these protocols a
OASIS Digital Signature Service client can send documents (or document hashes) to a server and receive
(DSS) back a signature on the documents; or send documents (or document
hashes) and a signature to a server, and receive back an answer on
whether the signature verifies the documents.
The DSML provides a means for representing directory structural
OASIS Directory Services Markup information as an XML document methods for expressing directory
Language (DSML) v2.0 queries and updates (and the results of these operations) as XML
documents
These specifications define a communications-protocol neutral method
OASIS ebXML Messaging Services
for exchanging electronic business messages as XML.
ebXML RegRep is a standard defining the service interfaces, protocols
and information model for an integrated registry and repository. The
OASIS ebXML RegRep
repository stores digital content while the registry stores metadata that
describes the content in the repository.
The Registry Information Model provides a blueprint or high-level
OASIS ebXML Registry Information schema for the ebXML Registry. It provides implementers with
Model information on the type of metadata that is stored in the Registry as
well as the relationships among metadata Classes.
An ebXML Registry is an information system that securely manages
any content type and the standardized metadata that describes it. The
OASIS ebXML Registry Services
ebXML Registry provides a set of services that enable sharing of
Specification
content and metadata between organizational entities in a federated
environment.
The standard defines a declarative access control policy language
OASIS eXtensible Access Control
implemented in XML and a processing model describing how to
Markup Language (XACML)
evaluate access requests according to the rules defined in policies.
MQTT is a Client Server publish/subscribe messaging transport
OASIS Message Queuing Telemetry protocol for constrained environments such as for communication in
Transport (MQTT) Machine to Machine and Internet of Things contexts where a small
code footprint is required and/or network bandwidth is at a premium.

B-9
DRAFT NIST BIG DATA INTEROPERABILITY FRAMEWORK: VOLUME 7, STANDARDS ROADMAP

Standard Name/Number Description URL


The OData Protocol is an application-level protocol for interacting with
data via RESTful interfaces. The protocol supports the description of
OASIS Open Data (OData) Protocol
data models and the editing and querying of data according to those
models.
The OASIS SWS initiative defines a generic protocol for the interaction
OASIS Search Web Services (SWS) required between a client and server for performing searches. SWS
define an Abstract Protocol Definition to describe this interaction.
The SAML defines the syntax and processing semantics of assertions
OASIS Security Assertion Markup made about a subject by a system entity. This specification defines both
Language (SAML) v2.0 the structure of SAML assertions, and an associated set of protocols, in
addition to the processing rules involved in managing a SAML system.
This specification defines a binding of SOAP to user datagrams,
OASIS SOAP-over-UDP (User
including message patterns, addressing requirements, and security
Datagram Protocol) v1.1
considerations.
This specification defines schema for two XML document types:
Package Descriptors and Deployment Descriptors. Package Descriptors
OASIS Solution Deployment define characteristics of a package used to deploy a solution.
Descriptor Specification v1.0 Deployment Descriptors define characteristics of the content of a
solution package, including the requirements that are relevant for
creation, configuration and maintenance of the solution content.
This standard defines reference architecture for the Symptoms
OASIS Symptoms Automation
Automation Framework, a tool in the automatic detection, optimization,
Framework (SAF) Version 1.0
and remediation of operational aspects of complex systems,
The concept of a “service template” is used to specify the “topology”
OASIS Topology and Orchestration (or structure) and “orchestration” (or invocation of management
Specification for Cloud Applications behavior) of IT services. This specification introduces the formal
Version 1.0 description of Service Templates, including their structure, properties,
and behavior.
The OASIS UBL defines a generic XML interchange format for
OASIS Universal Business Language
business documents that can be restricted or extended to meet the
(UBL) v2.1
requirements of particular industries.
The focus of UDDI is the definition of a set of services supporting the
OASIS Universal Description,
description and discovery of (1) businesses, organizations, and other
Discovery and Integration (UDDI)
Web services providers, (2) the Web services they make available, and
v3.0.2
(3) the technical interfaces which may be used to access those services.
OASIS Unstructured Information
The UIMA specification defines platform-independent data
Management Architecture (UIMA)
representations and interfaces for text and multi-modal analytics.
v1.0

B-10
DRAFT NIST BIG DATA INTEROPERABILITY FRAMEWORK: VOLUME 7, STANDARDS ROADMAP

Standard Name/Number Description URL


UOML is interface standard to process unstructured document; it plays
OASIS Unstructured Operation
the similar role as SQL to structured data. UOML is expressed with
Markup Language (UOML) v1.0
standard XML.
Computer Graphics Metafile (CGM) is an ISO standard, defined by
ISO/IEC 8632:1999, for the interchange of 2D vector and mixed
vector/raster graphics. WebCGM is a profile of CGM, which adds Web
OASIS/W3C WebCGM v2.1
linking and is optimized for Web applications in technical illustration,
electronic documentation, geophysical data visualization, and similar
fields.
This standard defines a language for specifying business process
OASIS Web Services Business Process
behavior based on Web Services. WS-BPEL provides a language for
Execution Language (WS-BPEL) v2.0
the specification of Executable and Abstract business processes.
OASIS/W3C - Web Services
MUWS defines how an IT resource connected to a network provides
Distributed Management (WSDM):
manageability interfaces such that the IT resource can be managed
Management Using Web Services
locally and from remote locations using Web services technologies.
(MUWS) v1.1
OASIS WSDM: Management of Web This part of the WSDM specification addresses management of the
Services (MOWS) v1.1 Web services endpoints using Web services protocols.
This specification defines a discovery protocol to locate services. The
OASIS Web Services Dynamic
primary scenario for discovery is a client searching for one or more
Discovery (WS-Discovery) v1.1
target services.
This specification defines mechanisms to allow different security
OASIS Web Services Federation realms to federate, such that authorized access to resources managed in
Language (WS-Federation) v1.2 one realm can be provided to security principals whose identities and
attributes are managed in other realms.
WSN is a family of related specifications that define a standard Web
OASIS Web Services Notification
services approach to notification using a topic-based publish/subscribe
(WSN) v1.3
pattern.
SNMP is a series of IETF sponsored standards for remote management
IETF Simple Network Management of system/network resources and transmission of status regarding
Protocol (SNMP) v3 network resources. The standards include definitions of standard
management objects along with security controls.
This IETF series of standards describes an application-layer client-
server protocol for the provisioning and management of objects stored
IETF Extensible Provisioning Protocol
in a shared central repository. Specified in XML, the protocol defines
(EPP)
generic object management operations and an extensible framework
that maps protocol operations to objects.

B-11
DRAFT NIST BIG DATA INTEROPERABILITY FRAMEWORK: VOLUME 7, STANDARDS ROADMAP

Standard Name/Number Description URL


Electronic data exchange standard used in medication reconciliation
NCPDPD Script standard
process. Medication history, prescription info (3), census update.
Electronic data exchange standard used in medication reconciliation
ASTM CCR message process. Continuity of care record (CCR) represents a summary format
for the core facts of a patient’s dataset.
Electronic data exchange standard used in medication reconciliation
HITSP C32 HL7 CCD Document
process. Summary format for CCR document structure.
XML based data handling. Mature standard defines and enables data
PMML Predictive Model Markup
modeling, and reliability and scalability for custom deployments. Pre /
Language
post processing, expression of predictive models.
Add other standards
Wireless sensor and actuator protocol; home automation, based on ISO
Dash7
IEC 18000-7
High efficiency video coding / HEVC / MPEG-H part 2. Potential
H.265
compression successor to AVC H.264. streaming video.
Royalty free codec alternative to HEVC. Successor to VP8, competitor
VP9
to H.265. streaming video.
Daala Video coding format. streaming video.

WebRTC Browser to browser communication

X.509 Public key encryption for securing email and web communication.

MDX Multidimensional expressions. Became the standard for OLAP query.

1292
1293

B-12
DRAFT NIST BIG DATA INTEROPERABILITY FRAMEWORK: VOLUME 7, STANDARDS ROADMAP

1294 Appendix C: Standards and the NBDRA


1295Section Scope: This section will compare standards to the components of the NBDRA. The text below will be enhanced to further explain the table
1296presented.
1297As most standards represent some form of interface between components, the standards table in Appendix C indicates whether the NBDRA
1298component would be an Implementer or User of the standard. For the purposes of this table the following definitions were used for Implementer
1299and User.
1300 Implementer: A component is an implementer of a standard if it provides services based on the standard (e.g., a service that
1301 accepts Structured Query Language [SQL] commands would be an implementer of that standard) or encodes or presents data
1302 based on that standard.
1303 User: A component is a user of a standard if it interfaces to a service via the standard or if it accepts/consumes/decodes data
1304 represented by the standard.
1305While the above definitions provide a reasonable basis for some standards the difference between implementation and use may be negligible or
1306non-existent.
1307The NBDRA components are abbreviated in the table header as follows:
1308  SO = System Orchestrator component
1309  DP = Data Provider component
1310  DC = Data Consumer component
1311  BDAP = Big Data Application Provider component
1312  BDFP = Big Data Framework Provider component
1313  S&P = Security and Privacy Fabric
1314  M = Management Fabric
1315
1316 Table C-1: Standards and the NBDRA
NBDRA Components
Standard Name/Number
SO DP DC BDAP BDFP S&P M
ISO/IEC 9075-* I I/U U I/U U U
ISO/IEC Technical Report (TR) 9789 I/U I/U I/U I/U

D-1
DRAFT NIST BIG DATA INTEROPERABILITY FRAMEWORK: VOLUME 7, STANDARDS ROADMAP

NBDRA Components
Standard Name/Number
SO DP DC BDAP BDFP S&P M
ISO/IEC 11179-* I I/U I/U U
ISO/IEC 10728-*
ISO/IEC 13249-* I I/U U I/U
ISO/IE TR 19075-* I I/U U I/U
ISO/IEC 19503 I I/U U I/U U
ISO/IEC 19773 I I/U U I/U I/U
ISO/IEC TR 20943 I I/U U I/U U U
ISO/IEC 19763-* I I/U U U
ISO/IEC 9281:1990 I U I/U I/U
ISO/IEC 10918:1994 I U I/U I/U
ISO/IEC 11172:1993 I U I/U I/U
ISO/IEC 13818:2013 I U I/U I/U
ISO/IEC 14496:2010 I U I/U I/U
ISO/IEC 15444:2011 I U I/U I/U
ISO/IEC 21000:2003 I U I/U I/U
ISO 6709:2008 I U I/U I/U
ISO 19115-* I U I/U U
ISO 19110 I U I/U
ISO 19139 I U I/U
ISO 19119 I U I/U
ISO 19157 I U I/U U
ISO 19114 I
IEEE 21451 -* I U
IEEE 2200-2012 I U I/U
ISO/IEC 15408-2009 U I
ISO/IEC 27010:2012 I U I/U
ISO/IEC 27033-1:2009 I/U I/U I/U I
ISO/IEC TR 14516:2002 U U
ISO/IEC 29100:2011 I
ISO/IEC 9798:2010 I/U U U U I/U

D-2
DRAFT NIST BIG DATA INTEROPERABILITY FRAMEWORK: VOLUME 7, STANDARDS ROADMAP

NBDRA Components
Standard Name/Number
SO DP DC BDAP BDFP S&P M
ISO/IEC 11770:2010 I/U U U U I/U
ISO/IEC 27035:2011 U I
ISO/IEC 27037:2012 U I
JSR (Java Specification Request) 221 (developed by the Java Community Process) I/U I/U I/U I/U
W3C XML I/U I/U I/U I/U I/U I/U I/U
W3C Resource Description Framework (RDF) I U I/U I/U
W3C JavaScript Object Notation (JSON)-LD 1.0 I U I/U I/U
W3C Document Object Model (DOM) Level 1 Specification I U I/U I/U
W3C XQuery 3.0 I U I/U I/U
W3C XProc I I U I/U I/U
W3C XML Encryption Syntax and Processing Version 1.1 I U I/U
W3C XML Signature Syntax and Processing Version 1.1 I U I/U
W3C XPath 3.0 I U I/U I/U
W3C XSL Transformations (XSLT) Version 2.0 I U I/U I/U
W3C Efficient XML Interchange (EXI) Format 1.0 (Second Edition) I U I/U
W3C RDF Data Cube Vocabulary I U I/U I/U
W3C Data Catalog Vocabulary (DCAT) I U I/U
W3C HTML5 A vocabulary and associated APIs for HTML and XHTML I U I/U
W3C Internationalization Tag Set (ITS) 2.0 I U I/U I/U
W3C OWL 2 Web Ontology Language I U I/U I/U
W3C Platform for Privacy Preferences (P3P) 1.0 I U I/U I/U
W3C Protocol for Web Description Resources (POWDER) I U I/U
W3C Provenance I U I/U I/U U
W3C Rule Interchange Format (RIF) I U I/U I/U
W3C Service Modeling Language (SML) 1.1 I/U I U I/U
W3C Simple Knowledge Organization System Reference (SKOS) I U I/U
W3C Simple Object Access Protocol (SOAP) 1.2 I U I/U
W3C SPARQL 1.1 I U I/U I/U
W3C Web Service Description Language (WSDL) 2.0 U I U I/U
W3C XML Key Management Specification (XKMS) 2.0 U I U I/U

D-3
DRAFT NIST BIG DATA INTEROPERABILITY FRAMEWORK: VOLUME 7, STANDARDS ROADMAP

NBDRA Components
Standard Name/Number
SO DP DC BDAP BDFP S&P M
® ®
OGC OpenGIS Catalogue Services Specification 2.0.2 - I U I/U
ISO Metadata Application Profile
OGC® OpenGIS® GeoAPI I U I/U I/U
OGC® OpenGIS® GeoSPARQL I U I/U I/U
OGC® OpenGIS® Geography Markup Language (GML) Encoding Standard I U I/U I/U
OGC® Geospatial eXtensible Access Control Markup Language (GeoXACML)
Version 1 I U I/U I/U I/U
OGC® network Common Data Form (netCDF) I U I/U
OGC® Open Modelling Interface Standard (OpenMI) I U I/U I/U
OGC® OpenSearch Geo and Time Extensions I U I/U I
OGC® Web Services Context Document (OWS Context) I U I/U I
OGC® Sensor Web Enablement (SWE) I U I/U
OGC® OpenGIS® Simple Features Access (SFA) I U I/U I/U
OGC® OpenGIS® Georeferenced Table Joining Service (TJS) Implementation
Standard I U I/U I/U
OGC® OpenGIS® Web Coverage Processing Service Interface (WCPS) Standard I U I/U I
OGC® OpenGIS® Web Coverage Service (WCS) I U I/U I
OGC® Web Feature Service (WFS) 2.0 Interface Standard I U I/U I
OGC® OpenGIS® Web Map Service (WMS) Interface Standard I U I/U I
OGC® OpenGIS® Web Processing Service (WPS) Interface Standard I U I/U I
OASIS AS4 Profile of ebMS 3.0 v1.0 I U I/U
OASIS Advanced Message Queuing Protocol (AMQP) Version 1.0 I U U I
OASIS Application Vulnerability Description Language (AVDL) v1.0 I U I U
OASIS Biometric Identity Assurance Services (BIAS) Simple Object Access Protocol
(SOAP) Profile v1.0 I U I/U U
OASIS Content Management Interoperability Services (CMIS) I U I/U I
OASIS Digital Signature Service (DSS) I U I/U
OASIS Directory Services Markup Language (DSML) v2.0 I U I/U I
OASIS ebXML Messaging Services I U I/U
OASIS ebXML RegRep I U I/U I
OASIS ebXML Registry Information Model I U I/U

D-4
DRAFT NIST BIG DATA INTEROPERABILITY FRAMEWORK: VOLUME 7, STANDARDS ROADMAP

NBDRA Components
Standard Name/Number
SO DP DC BDAP BDFP S&P M
OASIS ebXML Registry Services Specification I U I/U
OASIS eXtensible Access Control Markup Language (XACML) I U I/U I/U I/U
OASIS Message Queuing Telemetry Transport (MQTT) I U I/U
OASIS Open Data (OData) Protocol I U I/U I/U
OASIS Search Web Services (SWS) I U I/U
OASIS Security Assertion Markup Language (SAML) v2.0 I U I/U I/U I/U
OASIS SOAP-over-UDP (User Datagram Protocol) v1.1 I U I/U
OASIS Solution Deployment Descriptor Specification v1.0 U I/U
OASIS Symptoms Automation Framework (SAF) Version 1.0 I/U
OASIS Topology and Orchestration Specification for Cloud Applications Version 1.0 I/U U I I/U
OASIS Universal Business Language (UBL) v2.1 I U I/U U
OASIS Universal Description, Discovery and Integration (UDDI) v3.0.2 I U I/U U
OASIS Unstructured Information Management Architecture (UIMA) v1.0 U I
OASIS Unstructured Operation Markup Language (UOML) v1.0 I U I/U I
OASIS/W3C WebCGM v2.1 I U I/U I
OASIS Web Services Business Process Execution Language (WS-BPEL) v2.0 U I
OASIS/W3C - Web Services Distributed Management (WSDM): Management Using
Web Services (MUWS) v1.1 U I I U U
OASIS WSDM: Management of Web Services (MOWS) v1.1 U I I U U
OASIS Web Services Dynamic Discovery (WS-Discovery) v1.1 U I U I/U U
OASIS Web Services Federation Language (WS-Federation) v1.2 I U I/U U
OASIS Web Services Notification (WSN) v1.3 I U I/U
IETF Simple Network Management Protocol (SNMP) v3 I I I/U U
IETF Extensible Provisioning Protocol (EPP) U I/U
NCPDPD Script standard . . . . . . .
ASTM CCR message . . . . . . .
HITSP C32 HL7 CCD Document . . . . . . .
PMML Predictive Model Markup Language . . . . . . .
Add the standards that Russell gathered
1317

D-5
DRAFT NIST BIG DATA INTEROPERABILITY FRAMEWORK: VOLUME 7, STANDARDS ROADMAP

1318
1319

D-6
DRAFT NIST BIG DATA INTEROPERABILITY FRAMEWORK: VOLUME 7, STANDARDS ROADMAP

1320 Appendix D: Categorized Standards


1321Section Scope: This section could contain several categories to group the standards. The aim of the categories would be to facilitate the selection
1322of standards of interest by the reader. Additional text is needed to briefly describe the categories, explain the creation of the categories, explain the
1323application of categories to the standards, etc.
1324The categories discussed in Section ___ have been applied to the standards and are presented in the table below.
1325
1326 Table D-1: Categorized Standards

Standard Name/Number Category 1 Category 2 Category 3 Category 4

ISO/IEC 9075-*
ISO/IEC Technical Report (TR) 9789
ISO/IEC 11179-*
ISO/IEC 10728-*
ISO/IEC 13249-*
ISO/IE TR 19075-*
ISO/IEC 19503
ISO/IEC 19773
ISO/IEC TR 20943
ISO/IEC 19763-*
ISO/IEC 9281:1990
ISO/IEC 10918:1994
ISO/IEC 11172:1993
ISO/IEC 13818:2013
ISO/IEC 14496:2010
ISO/IEC 15444:2011
ISO/IEC 21000:2003
ISO 6709:2008

D-1
DRAFT NIST BIG DATA INTEROPERABILITY FRAMEWORK: VOLUME 7, STANDARDS ROADMAP

Standard Name/Number Category 1 Category 2 Category 3 Category 4

ISO 19115-*
ISO 19110
ISO 19139
ISO 19119
ISO 19157
ISO 19114
IEEE 21451 -*
IEEE 2200-2012
ISO/IEC 15408-2009
ISO/IEC 27010:2012
ISO/IEC 27033-1:2009
ISO/IEC TR 14516:2002
ISO/IEC 29100:2011
ISO/IEC 9798:2010
ISO/IEC 11770:2010
ISO/IEC 27035:2011
ISO/IEC 27037:2012

JSR (Java Specification Request) 221 (developed by the Java Community Process)
W3C XML
W3C Resource Description Framework (RDF)
W3C JavaScript Object Notation (JSON)-LD 1.0
W3C Document Object Model (DOM) Level 1 Specification
W3C XQuery 3.0
W3C XProc
W3C XML Encryption Syntax and Processing Version 1.1
W3C XML Signature Syntax and Processing Version 1.1
W3C XPath 3.0
W3C XSL Transformations (XSLT) Version 2.0
W3C Efficient XML Interchange (EXI) Format 1.0 (Second Edition)

D-2
DRAFT NIST BIG DATA INTEROPERABILITY FRAMEWORK: VOLUME 7, STANDARDS ROADMAP

Standard Name/Number Category 1 Category 2 Category 3 Category 4

W3C RDF Data Cube Vocabulary


W3C Data Catalog Vocabulary (DCAT)
W3C HTML5 A vocabulary and associated APIs for HTML and XHTML
W3C Internationalization Tag Set (ITS) 2.0
W3C OWL 2 Web Ontology Language
W3C Platform for Privacy Preferences (P3P) 1.0
W3C Protocol for Web Description Resources (POWDER)
W3C Provenance
W3C Rule Interchange Format (RIF)
W3C Service Modeling Language (SML) 1.1
W3C Simple Knowledge Organization System Reference (SKOS)
W3C Simple Object Access Protocol (SOAP) 1.2
W3C SPARQL 1.1
W3C Web Service Description Language (WSDL) 2.0
W3C XML Key Management Specification (XKMS) 2.0
OGC® OpenGIS® Catalogue Services Specification 2.0.2 -
ISO Metadata Application Profile
OGC® OpenGIS® GeoAPI
OGC® OpenGIS® GeoSPARQL
OGC® OpenGIS® Geography Markup Language (GML) Encoding Standard
OGC® Geospatial eXtensible Access Control Markup Language (GeoXACML) Version
1
OGC® network Common Data Form (netCDF)
OGC® Open Modelling Interface Standard (OpenMI)
OGC® OpenSearch Geo and Time Extensions
OGC® Web Services Context Document (OWS Context)
OGC® Sensor Web Enablement (SWE)
OGC® OpenGIS® Simple Features Access (SFA)

OGC® OpenGIS® Georeferenced Table Joining Service (TJS) Implementation Standard

D-3
DRAFT NIST BIG DATA INTEROPERABILITY FRAMEWORK: VOLUME 7, STANDARDS ROADMAP

Standard Name/Number Category 1 Category 2 Category 3 Category 4

OGC® OpenGIS® Web Coverage Processing Service Interface (WCPS) Standard


OGC® OpenGIS® Web Coverage Service (WCS)
OGC® Web Feature Service (WFS) 2.0 Interface Standard
OGC® OpenGIS® Web Map Service (WMS) Interface Standard
OGC® OpenGIS® Web Processing Service (WPS) Interface Standard
OASIS AS4 Profile of ebMS 3.0 v1.0
OASIS Advanced Message Queuing Protocol (AMQP) Version 1.0
OASIS Application Vulnerability Description Language (AVDL) v1.0
OASIS Biometric Identity Assurance Services (BIAS) Simple Object Access Protocol
(SOAP) Profile v1.0
OASIS Content Management Interoperability Services (CMIS)
OASIS Digital Signature Service (DSS)
OASIS Directory Services Markup Language (DSML) v2.0
OASIS ebXML Messaging Services
OASIS ebXML RegRep
OASIS ebXML Registry Information Model
OASIS ebXML Registry Services Specification
OASIS eXtensible Access Control Markup Language (XACML)
OASIS Message Queuing Telemetry Transport (MQTT)
OASIS Open Data (OData) Protocol
OASIS Search Web Services (SWS)
OASIS Security Assertion Markup Language (SAML) v2.0
OASIS SOAP-over-UDP (User Datagram Protocol) v1.1
OASIS Solution Deployment Descriptor Specification v1.0
OASIS Symptoms Automation Framework (SAF) Version 1.0

OASIS Topology and Orchestration Specification for Cloud Applications Version 1.0
OASIS Universal Business Language (UBL) v2.1
OASIS Universal Description, Discovery and Integration (UDDI) v3.0.2
OASIS Unstructured Information Management Architecture (UIMA) v1.0

D-4
DRAFT NIST BIG DATA INTEROPERABILITY FRAMEWORK: VOLUME 7, STANDARDS ROADMAP

Standard Name/Number Category 1 Category 2 Category 3 Category 4

OASIS Unstructured Operation Markup Language (UOML) v1.0


OASIS/W3C WebCGM v2.1

OASIS Web Services Business Process Execution Language (WS-BPEL) v2.0


OASIS/W3C - Web Services Distributed Management (WSDM): Management Using
Web Services (MUWS) v1.1
OASIS WSDM: Management of Web Services (MOWS) v1.1
OASIS Web Services Dynamic Discovery (WS-Discovery) v1.1
OASIS Web Services Federation Language (WS-Federation) v1.2
OASIS Web Services Notification (WSN) v1.3
IETF Simple Network Management Protocol (SNMP) v3
IETF Extensible Provisioning Protocol (EPP)
NCPDPD Script standard
ASTM CCR message
HITSP C32 HL7 CCD Document
PMML Predictive Model Markup Language
Add standards Russell gathered
1327
1328
1329
1330
1331
1332

D-5
DRAFT NIST BIG DATA INTEROPERABILITY FRAMEWORK: VOLUME 7, STANDARDS ROADMAP

1333 Appendix E: References


1334Additional needs: Additional references are welcome. References will be formatted.

1335GENERAL RESOURCES
1336Institute of Electrical and Electronics Engineers (IEEE). https://www.ieee.org/index.html
1337International Committee for Information Technology Standards (INCITS). http://www.incits.org/
1338International Electrotechnical Commission (IEC). http://www.iec.ch/
1339International Organization for Standardization (ISO). http://www.iso.org/iso/home.html
1340Open Geospatial Consortium (OGC). http://www.opengeospatial.org/
1341Open Grid Forum (OGF). https://www.ogf.org/ogf/doku.php
1342Organization for the Advancement of Structured Information Standards (OASIS). https://www.oasis-
1343open.org/
1344World Wide Web Consortium (W3C). http://www.w3.org/
1345
1346Authoritative sources. OGAF, Zachman, MOD Architecture, MDM Institute, ITVAL and CoBIT, IFEAD,
1347FSAM, Carnegie Mellon Software Engineering Institute.
13481. Frank Farance, Definitions of Big Data, 2015. ( CITATION NEEDED)
13492. Dale Neef, Digital Exhaust: What Everyone Should Know About Big Data, Digitization and Digitally
1350Driven Innovation. http://searchdatamanagement.techtarget.com/feature/Three-ways-to-build-a-big-
1351data-system?utm_medium=EM&asrc=EM_ERU_53936785&utm_campaign=20160229_ERU
1352%20Transmission%20for%2002/29/2016%20(UserUniverse:%201963109)_myka-
1353reports@techtarget.com&utm_source=ERU&src=5484610
13542.3. IBM developerworks. Big data architecture and patterns, part 2.
1355http://www.ibm.com/developerworks/library/bd-archpatterns2/index.html
13563. NITRD Subcommittee on Networking and Information Technology Research and Development.
1357Federal Big Data R&D Strategic Plan, May 2016. https://www.nitrd.gov/pubs/bigdatardstrategicplan.pdf
13584. Box. The information economy: a study of five industries. 2014.
13594.4.1. Gartner. Magic quadrant for metadata management solutions.
1360

1361DOCUMENT REFERENCES

B-1
51 The White House Office of Science and Technology Policy, “Big Data is a Big Deal,” OSTP Blog, accessed February 21,
62014, http://www.whitehouse.gov/blog/2012/03/29/big-data-big-deal.
72 Cloud Security Alliance, Expanded Top Ten Big Data Security and Privacy Challenges, April 2013.
8https://downloads.cloudsecurityalliance.org/initiatives/bdwg/Expanded_Top_Ten_Big_Data_Security_and_Privacy_Challe
9nges.pdf
103 “Big Data, Preliminary Report 2014”, ISO/IEC JTC1: Information Technology. http://www.iso.org/iso/big_data_report-
11jtc1.pdf. (Accessed March 2, 2015). Pages 21-23.
124 universal, federated or unified search in the land of information silos. KMworld. Accessed may 5, 2016,
13http://www.kmworld.com/Articles/Editorial/Features/Universal-federated-or-unified-search-in-the-land-of-information-
14silos-101165.aspx
154 Context Query Language (CQL): http://www.loc.gov/standards/sru/cql/contextSets/
165 Resource Description Framework (RDF): https://www.w3.org/RDF/

You might also like