You are on page 1of 10

10/27/2020 Document-oriented database - Wikipedia

Document-oriented database
A document-oriented database, or document store, is a computer program designed for
storing, retrieving and managing document-oriented information, also known as semi-structured
data.[1]

Document-oriented databases are one of the main categories of NoSQL databases, and the popularity
of the term "document-oriented database" has grown[2] with the use of the term NoSQL itself. XML
databases are a subclass of document-oriented databases that are optimized to work with XML
documents. Graph databases are similar, but add another layer, the relationship, which allows them
to link documents for rapid traversal.

Document-oriented databases are inherently a subclass of the key-value store, another NoSQL
database concept. The difference lies in the way the data are processed; in a key-value store, the data
are considered to be inherently opaque to the database, whereas a document-oriented system relies
on internal structure in the document in order to extract metadata that the database engine uses for
further optimization. Although the difference is often negligible due to tools in the systems,[a]
conceptually the document-store is designed to offer a richer experience with modern programming
techniques.

Document databases[b] contrast strongly with the traditional relational database (RDB). Relational
databases generally store data in separate tables that are defined by the programmer, and a single
object may be spread across several tables. Document databases store all information for a given
object in a single instance in the database, and every stored object can be different from every other.
This eliminates the need for object-relational mapping while loading data into the database.

Contents
Documents
CRUD operations
Keys
Retrieval
Editing
Organization
Relationship to other databases
Relationship to key-value stores
Relationship to search engines
Relationship to relational databases
Implementations
XML database implementations
See also
Notes
References
Further reading
External links

https://en.wikipedia.org/wiki/Document-oriented_database 1/10
10/27/2020 Document-oriented database - Wikipedia

Documents
The central concept of a document-oriented database is the notion of a document. While each
document-oriented database implementation differs on the details of this definition, in general, they
all assume documents encapsulate and encode data (or information) in some standard format or
encoding. Encodings in use include XML, YAML, JSON, as well as binary forms like BSON.

Documents in a document store are roughly equivalent to the programming concept of an object.
They are not required to adhere to a standard schema, nor will they have all the same sections, slots,
parts or keys. Generally, programs using objects have many different types of objects, and those
objects often have many optional fields. Every object, even those of the same class, can look very
different. Document stores are similar in that they allow different types of documents in a single
store, allow the fields within them to be optional, and often allow them to be encoded using different
encoding systems. For example, the following is a document, encoded in JSON:

{
"FirstName": "Bob",
"Address": "5 Oak St.",
"Hobby": "sailing"
}

A second document might be encoded in XML as:

<contact>
<firstname>Bob</firstname>
<lastname>Smith</lastname>
<phone type="Cell">(123) 555-0178</phone>
<phone type="Work">(890) 555-0133</phone>
<address>
<type>Home</type>
<street1>123 Back St.</street1>
<city>Boys</city>
<state>AR</state>
<zip>32225</zip>
<country>US</country>
</address>
</contact>

These two documents share some structural elements with one another, but each also has unique
elements. The structure and text and other data inside the document are usually referred to as the
document's content and may be referenced via retrieval or editing methods, (see below). Unlike a
relational database where every record contains the same fields, leaving unused fields empty; there
are no empty 'fields' in either document (record) in the above example. This approach allows new
information to be added to some records without requiring that every other record in the database
share the same structure.

Document databases typically provide for additional metadata to be associated with and stored along
with the document content. That metadata may be related to facilities the datastore provides for
organizing documents, providing security, or other implementation specific features.

CRUD operations

The core operations that a document-oriented database supports for documents are similar to other
databases, and while the terminology is not perfectly standardized, most practitioners will recognize
them as CRUD:

Creation (or insertion)


Retrieval (or query, search, read or find)
https://en.wikipedia.org/wiki/Document-oriented_database 2/10
10/27/2020 Document-oriented database - Wikipedia

Update (or edit)


Deletion (or removal)

Keys

Documents are addressed in the database via a unique key that represents that document. This key is
a simple identifier (or ID), typically a string, a URI, or a path. The key can be used to retrieve the
document from the database. Typically the database retains an index on the key to speed up
document retrieval, and in some cases the key is required to create or insert the document into the
database.

Retrieval

Another defining characteristic of a document-oriented database is that, beyond the simple key-to-
document lookup that can be used to retrieve a document, the database offers an API or query
language that allows the user to retrieve documents based on content (or metadata). For example,
you may want a query that retrieves all the documents with a certain field set to a certain value. The
set of query APIs or query language features available, as well as the expected performance of the
queries, varies significantly from one implementation to another. Likewise, the specific set of
indexing options and configuration that are available vary greatly by implementation.

It is here that the document store varies most from the key-value store. In theory, the values in a key-
value store are opaque to the store, they are essentially black boxes. They may offer search systems
similar to those of a document store, but may have less understanding about the organization of the
content. Document stores use the metadata in the document to classify the content, allowing them,
for instance, to understand that one series of digits is a phone number, and another is a postal code.
This allows them to search on those types of data, for instance, all phone numbers containing 555,
which would ignore the zip code 55555.

Editing

Document databases typically provide some mechanism for updating or editing the content (or other
metadata) of a document, either by allowing for replacement of the entire document, or individual
structural pieces of the document.

Organization

Document database implementations offer a variety of ways of organizing documents, including


notions of

Collections: groups of documents, where depending on implementation, a document may be


enforced to live inside one collection, or may be allowed to live in multiple collections
Tags and non-visible metadata: additional data outside the document content
Directory hierarchies: groups of documents organized in a tree-like structure, typically based on
path or URI

Sometimes these organizational notions vary in how much they are logical vs physical, (e.g. on disk or
in memory), representations.

Relationship to other databases


https://en.wikipedia.org/wiki/Document-oriented_database 3/10
10/27/2020 Document-oriented database - Wikipedia

Relationship to key-value stores

A document-oriented database is a specialized key-value store, which itself is another NoSQL


database category. In a simple key-value store, the document content is opaque. A document-
oriented database provides APIs or a query/update language that exposes the ability to query or
update based on the internal structure in the document. This difference may be minor for users that
do not need richer query, retrieval, or editing APIs that are typically provided by document
databases. Modern key-value stores often include features for working with metadata, blurring the
lines between document stores.

Relationship to search engines

Some search engines (aka information retrieval) systems like Elasticsearch provide enough of the
core operations on documents to fit the definition of a document-oriented database.

Relationship to relational databases

In a relational database, data are first categorized into a number of predefined types, and tables are
created to hold individual entries, or records, of each type. The tables define the data within each
record's fields, meaning that every record in the table has the same overall form. The administrator
also defines the relationships between the tables, and selects certain fields that they believe will be
most commonly used for searching and defines indexes on them. A key concept in the relational
design is that any data that may be repeated is normally placed in its own table, and if these instances
are related to each other, a column is selected to group them together, the foreign key. This design is
known as database normalization.[3]

For example, an address book application will generally need to store the contact name, an optional
image, one or more phone numbers, one or more mailing addresses, and one or more email
addresses. In a canonical relational database, tables would be created for each of these rows with
predefined fields for each bit of data: the CONTACT table might include FIRST_NAME,
LAST_NAME and IMAGE columns, while the PHONE_NUMBER table might include
COUNTRY_CODE, AREA_CODE, PHONE_NUMBER and TYPE (home, work, etc.). The
PHONE_NUMBER table also contains a foreign key column, "CONTACT_ID", which holds the
unique ID number assigned to the contact when it was created. In order to recreate the original
contact, the database engine uses the foreign keys to look for the related items across the group of
tables and reconstruct the original data.

In contrast, in a document-oriented database there may be no internal structure that maps directly
onto the concept of a table, and the fields and relationships generally don't exist as predefined
concepts. Instead, all of the data for an object is placed in a single document, and stored in the
database as a single entry. In the address book example, the document would contain the contact's
name, image, and any contact info, all in a single record. That entry is accessed through its key, which
allows the database to retrieve and return the document to the application. No additional work is
needed to retrieve the related data; all of this is returned in a single object.

A key difference between the document-oriented and relational models is that the data formats are
not predefined in the document case. In most cases, any sort of document can be stored in any
database, and those documents can change in type and form at any time. If one wishes to add a
COUNTRY_FLAG to a CONTACT, this field can be added to new documents as they are inserted, this
will have no effect on the database or the existing documents already stored. To aid retrieval of
information from the database, document-oriented systems generally allow the administrator to
provide hints to the database to look for certain types of information. These work in a similar fashion
to indexes in the relational case. Most also offer the ability to add additional metadata outside of the
https://en.wikipedia.org/wiki/Document-oriented_database 4/10
10/27/2020 Document-oriented database - Wikipedia

content of the document itself, for instance, tagging entries as being part of an address book, which
allows the programmer to retrieve related types of information, like "all the address book entries".
This provides functionality similar to a table, but separates the concept (categories of data) from its
physical implementation (tables).

In the classic normalized relational model, objects in the database are represented as separate rows of
data with no inherent structure beyond that given to them as they are retrieved. This leads to
problems when trying to translate programming objects to and from their associated database rows, a
problem known as object-relational impedance mismatch.[4] Document stores more closely, or in
some cases directly, map programming objects into the store. These are often marketed using the
term NoSQL.

Implementations

https://en.wikipedia.org/wiki/Document-oriented_database 5/10
10/27/2020 Document-oriented database - Wikipedia

Languages RESTful
Name Publisher License Notes
supported API
The database platform
supports document store and
Java, Python, graph data models in a single
Common database. Supports JSON,
AllegroGraph Franz, Inc. Proprietary Lisp, Ruby, JSON-LD, RDF, full-text Yes[5]
Scala, .NET, search, ACID, two-phase
Perl commit, Multi-Master
Replication, Prolog and
SPARQL.
The database system supports
C, .NET, document store as well as
Java, Python, key/value and graph data
Apache
ArangoDB ArangoDB
License
Node.js, PHP, models with one database core Yes[6]
Scala, Go, and a unified query language
Ruby, Elixir AQL (ArangoDB Query
Language).
Support for XML, JSON and
binary formats; client-/server
BaseX BaseX Team BSD License Java, XQuery based architecture; concurrent Yes
structural and full-text searches
and updates.
Commonly used in Health,
InterSystems Java, C#,
Caché Proprietary Business and Government Yes
Corporation Node.js
applications.
Distributed database service
based on BigCouch, the
Cloudant, Erlang, Java,
Cloudant Proprietary company's open source fork of Yes
Inc. Scala, and C
the Apache-backed CouchDB
project. Uses JSON model.
Distributed document-oriented
XML / JSON database platform
with ACID-compliant
JavaScript,
transactions; high-availability
SQL, PHP,
Proprietary data replication and sharding;
Clusterpoint Clusterpoint .NET, Java,
with free built-in full-text search engine Yes
Database Ltd. Python,
download with relevance ranking; JS/SQL
Node.js, C,
query language; GIS; Available
C++,
as pay-per-use database as a
service or as an on-premise
free software download.
C, .NET,
Java, Python,
Node.js, PHP, Distributed NoSQL Document
Couchbase, Apache
Couchbase Server
Inc. License
SQL, Go, Database, JSON model and Yes[7]
Spring SQL based Query Language.
Framework,
LINQ
JSON over REST/HTTP with
Any language
Apache Multi-Version Concurrency
Apache that can make
CouchDB Software
License HTTP
Control and limited ACID Yes[9]
Foundation properties. Uses map and
requests
reduce for views and queries.[8]
Use familiar SQL syntax for
real time distributed queries
CRATE
Apache across a cluster. Based on
CrateIO Technology Java Yes[10]
License Lucene / Elasticsearch
GmbH
ecosystem with built-in support
for binary objects (BLOBs).

https://en.wikipedia.org/wiki/Document-oriented_database 6/10
10/27/2020 Document-oriented database - Wikipedia

Languages RESTful
Name Publisher License Notes
supported API
.NET, Java, Platform-as-a-Service offering,
Python, part of the Microsoft Azure
Cosmos DB Microsoft Proprietary Node.js, platform. Builds upon and Yes
JavaScript, extends the earlier Azure
SQL DocumentDB.
Amazon
Proprietary various, fully managed MongoDB v3.6-
DocumentDB Web Yes
online service REST compatible database service
Services
Apache
Elasticsearch Shay Banon Java JSON, Search engine. Yes
License
XML over REST/HTTP,
WebDAV, Lucene Fulltext
search, binary data support,
eXist eXist LGPL XQuery, Java validation, versioning, Yes[11]
clustering, triggers, URL
rewriting, collections, ACLS,
XQuery Update
Various
Proprietary, (Compatible RDBMS with JSON,
Informix IBM with no-cost with replication, sharding and ACID Yes
editions[12] MongoDB compliance.
API)
Apache Apache Java Content Repository
Jackrabbit Java ?
Foundation License implementation
LotusScript,
Lotus Notes (IBM
IBM Proprietary Java, Lotus MultiValue Yes
Lotus Domino)
@Formula
Java, Distributed document-oriented
Free JavaScript, database for JSON, XML, and
MarkLogic Developer Node.js, RDF triples. Built-in full-text
MarkLogic Yes
Corporation license or XQuery, search, ACID transactions,
Commercial[13] SPARQL, high availability and disaster
XSLT, C++ recovery, certified security.

Server Side C, C++, C#,


Public License Java, Perl,
PHP, Python, Document database with
for the DBMS,
MongoDB, Go, Node.js, replication and sharding,
MongoDB
Inc
Apache 2
BSON store (binary format Yes[17][18]
License for the Ruby,
JSON).
client Rust,[15]
drivers[14] Scala[16]
Proprietary
Commonly used in health
MUMPS Database ? and Affero MUMPS ?
applications.
GPL[19]
Ekky C++, C#, Binary Native C++ class
ObjectDatabase++ Proprietary ?
Software TScript structures
C++, C#,
OpenLink GPLv2[1] and Middleware and database
OpenLink Virtuoso Java, Yes
Software proprietary engine hybrid
SPARQL
Orient Apache JSON over HTTP, SQL
OrientDB Java Yes
Technologies License support, ACID transactions
Shared nothing, horizontally
scalable database with support
C, C#, Java,
Oracle NoSQL Apache and for schema-less JSON, fixed
Oracle Corp Python, Yes
Database proprietary schema tables, and key/value
node.js, Go
pairs. Also supports ACID
transactions.

https://en.wikipedia.org/wiki/Document-oriented_database 7/10
10/27/2020 Document-oriented database - Wikipedia

Languages RESTful
Name Publisher License Notes
supported API
PostgreSQL HStore, JSON store (9.2+),
PostgreSQL PostgreSQL Free C JSON function (9.3+), HStore2 No
License[20] (9.4+), JSONB (9.4+)

REST, Java, Distributed document-oriented


XQuery, XML database with integrated
Qizx Qualcomm Proprietary Yes
XSLT, C, full-text search; support for
C++, Python JSON, text, and binaries.
Node.js,
Redis Source Java, Python,
Native in-memory data type
ReJSON[21] Redis Labs Available Go and all
packaged as Redis Module.
?
License Redis
clients.[22]

Apache C++, Python, Distributed document-oriented


RethinkDB ? JavaScript, JSON database with replication No
License[23]
Ruby, Java and sharding.
SQL-like ACID transaction supported,
SAP HANA SAP Proprietary Yes
language JSON only
Apache
Sedna sedna.org C++, XQuery XML database No
License
Amazon
Proprietary
SimpleDB Web Erlang ?
online service
Services
Apache
Solr Apache Java Search engine Yes
License
GNU Affero
MongoDB with Fractal Tree
TokuMX Tokutek General Public C++, C#, Go ?
indexing
License

XML database implementations

Most XML databases are document-oriented databases.

See also
Database theory
Data hierarchy
Data analysis
Full-text search
In-memory database
Internet Message Access Protocol (IMAP)
Machine-Readable Documents
Multi-model database
NoSQL
Object database
Online database
Real-time database
Relational database

Notes
https://en.wikipedia.org/wiki/Document-oriented_database 8/10
10/27/2020 Document-oriented database - Wikipedia

a. To the point that document-oriented and key-value systems can often be interchanged in
operation.
b. And key-value stores in general.

References
1. Drake, Mark (9 August 2019). "A Comparison of NoSQL Database Management Systems and
Models" (https://web.archive.org/web/20190813163612/https://www.digitalocean.com/community/
tutorials/a-comparison-of-nosql-database-management-systems-and-models). DigitalOcean.
Archived from the original (https://www.digitalocean.com/community/tutorials/a-comparison-of-nos
ql-database-management-systems-and-models) on 13 August 2019. Retrieved 23 August 2019.
"Document-oriented databases, or document stores, are NoSQL databases that store data in the
form of documents. Document stores are a type of key-value store: each document has a unique
identifier — its key — and the document itself serves as the value."
2. "DB-Engines Ranking per database model category" (http://db-engines.com/en/ranking_categorie
s).
3. "Description of the database normalization basics" (https://support.microsoft.com/en-ca/kb/28387
8). Microsoft.
4. Wambler, Scott. "The Object-Relational Impedance Mismatch" (http://www.agiledata.org/essays/i
mpedanceMismatch.html). Agile Data.
5. "HTTP Protocol for AllegroGraph" (https://franz.com/agraph/support/documentation/current/http-p
rotocol.html).
6. "Multi-model highly available NoSQL database" (https://www.arangodb.com/). ArangoDB.
7. Documentation (http://www.couchbase.com/docs/) Archived (https://web.archive.org/web/201208
20182153/http://www.couchbase.com/docs/) 2012-08-20 at the Wayback Machine. Couchbase.
Retrieved on 2013-09-18.
8. "Apache CouchDB" (https://web.archive.org/web/20111020074113/http://couchdb.apache.org/doc
s/overview.html). Apache Couchdb. Archived from the original (http://couchdb.apache.org/) on
October 20, 2011.
9. "HTTP_Document_API - Couchdb Wiki" (https://web.archive.org/web/20130301093229/http://wik
i.apache.org/couchdb/HTTP_Document_API). Archived from the original (http://wiki.apache.org/c
ouchdb/HTTP_Document_API) on 2013-03-01. Retrieved 2011-10-14.
10. "Crate SQL HTTP Endpoint (Archived copy)" (https://web.archive.org/web/20150622174526/http
s://crate.io/docs/stable/sql/rest.html). Archived from the original (https://crate.io/docs/stable/sql/re
st.html) on 2015-06-22. Retrieved 2015-06-22.
11. eXist-db Open Source Native XML Database (http://exist-db.org). Exist-db.org. Retrieved on
2013-09-18.
12. "Compare the Informix Version 12 editions" (http://www.ibm.com/developerworks/data/library/tech
article/dm-0801doe/). 22 July 2016.
13. "MarkLogic Licensing" (https://web.archive.org/web/20120112032849/http://developer.marklogic.c
om/licensing). Archived from the original (http://developer.marklogic.com/licensing) on 2012-01-
12. Retrieved 2011-12-28.
14. "MongoDB Licensing" (http://www.mongodb.org/about/licensing/).
15. "The New MongoDB Rust Driver" (https://www.mongodb.com/blog/post/the-new-mongodb-rust-dri
ver). MongoDB. Retrieved 2018-02-01.
16. "Community Supported Drivers Reference" (http://docs.mongodb.org/ecosystem/drivers/communi
ty-supported-drivers/).
17. "HTTP Interface — MongoDB Ecosystem" (https://docs.mongodb.com/ecosystem/tools/http-interf
aces/). MongoDB Docs.
18. "GitHub - mongodb/docs-ecosystem: MongoDB Ecosystem Documentation" (https://github.com/m
ongodb/docs-ecosystem). June 27, 2019 – via GitHub.
19. "GT.M High end TP database engine" (http://sourceforge.net/projects/fis-gtm/).
https://en.wikipedia.org/wiki/Document-oriented_database 9/10
10/27/2020 Document-oriented database - Wikipedia

20. "PostgreSQL: License" (https://www.postgresql.org/about/licence/). PostgreSQL.


21. Huang, Pengcheng; Wang, Zuofei (2018-02-28). Redis 4.x cookbook : over 80 hand-picked
recipes for effective Redis development and administration (https://books.google.com/books?id=
VOtODwAAQBAJ&pg=PA316). pp. 316–318. ISBN 9781783988174.
22. "RedisJSON - a JSON data type for Redis" (https://oss.redislabs.com/redisjson/#client-libraries).
oss.redislabs.com. Retrieved 18 July 2019.
23. "Transferring copyright to The Linux Foundation, relicensing RethinkDB under ASLv2" (https://gith
ub.com/rethinkdb/rethinkdb/commit/b0ec8bc5a874d5241d8af1166d664083edc5f750#diff-97d930
3acdfc078a050e61dc5c1a9a76). github.com. Retrieved 27 January 2020.

Further reading
Assaf Arkin. (2007, September 20). Read Consistency: Dumb Databases, Smart Services. (http
s://web.archive.org/web/20080327222152/http://blog.labnotes.org/2007/09/20/read-consistency-d
umb-databases-smart-services/)

External links
DB-Engines Ranking of Document Stores (http://db-engines.com/en/ranking/document+store) by
popularity, updated monthly

Retrieved from "https://en.wikipedia.org/w/index.php?title=Document-oriented_database&oldid=977717966"

This page was last edited on 10 September 2020, at 15:04 (UTC).

Text is available under the Creative Commons Attribution-ShareAlike License; additional terms may apply. By using this
site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia
Foundation, Inc., a non-profit organization.

https://en.wikipedia.org/wiki/Document-oriented_database 10/10

You might also like