Entity Relationships in A Document Database: Mapreduce Views For SQL Users

You might also like

You are on page 1of 93

Entity Relationships in

a Document Database
MapReduce Views for SQL Users
When to Choose a
Document Database
You’re using a relational database, but have been relying
heavily on denormalization to optimize read performance

You would like to give up consistency in exchange for a


high level of concurrency

Your data model is a “fit” for documents (e.g. a CMS)


When Not to Choose a
Document Database
Your data fits better in a relational model—SQL is a powerful
and mature language for working with relational data sets

Consistency is critical to your application

You haven’t bothered exploring scalability options for


your current database
Incremental Map/Reduce

"How fucked is my NoSQL database?" howfuckedismydatabase.com. 2009. http://howfuckedismydatabase.com/nosql/ (24 October 2012).
Entity
Relationship
Model
Join vs. Collation
SQL Query Joining
Publishers and Books
SELECT
`publisher`.`id`,
`publisher`.`name`,
`book`.`title`
FROM `publisher`
FULL OUTER JOIN `book`
ON `publisher`.`id` = `book`.`publisher_id`
ORDER BY
`publisher`.`id`,
`book`.`title`;
Joined Result Set
Publisher (“left”) Book “right”

publisher.id publisher.name book.title


Building iPhone Apps with
oreilly O'Reilly Media
HTML, CSS, and JavaScript
CouchDB: The Definitive
oreilly O'Reilly Media
Guide
DocBook: The Definitive
oreilly O'Reilly Media
Guide

oreilly O'Reilly Media RESTful Web Services


Collated Result Set
key id value

["oreilly",0] "oreilly" "O'Reilly Media" Publisher


"Building iPhone Apps with
["oreilly",1] "oreilly"
HTML, CSS, and JavaScript"
"CouchDB: The Definitive
["oreilly",1] "oreilly"
Guide"
Books
"DocBook: The Definitive
["oreilly",1] "oreilly"
Guide"
["oreilly",1] "oreilly" "RESTful Web Services"
View Result Sets
Made up of columns and rows

Every row has the same three columns:


• key
• id
• value
Columns can contain a mixture of logical data types
One to Many Relationships
Embedded Entities:
Nest related entities within a document
Embedded Entities
A single document represents the “one” entity

Nested entities (JSON Array) represents the “many” entities

Simplest way to create a one to many relationship


Example: Publisher
with Nested Books
{
"_id":"oreilly",
"collection":"publisher",
"name":"O'Reilly Media",
"books":[
{ "title":"CouchDB: The Definitive Guide" },
{ "title":"RESTful Web Services" },
{ "title":"DocBook: The Definitive Guide" },
{ "title":"Building iPhone Apps with HTML, CSS,
and JavaScript" }
]
}
Map Function
function(doc) {
if ("publisher" == doc.collection) {
emit([doc._id, 0], doc.name);
for (var i in doc.books) {
emit([doc._id, 1], doc.books[i].title);
}
}
}
Result Set
key id value

["oreilly",0] "oreilly" "O'Reilly Media"


"Building iPhone Apps with
["oreilly",1] "oreilly"
HTML, CSS, and JavaScript"
"CouchDB: The Definitive
["oreilly",1] "oreilly"
Guide"
"DocBook: The Definitive
["oreilly",1] "oreilly"
Guide"
["oreilly",1] "oreilly" "RESTful Web Services"
Limitations
Only works if there aren’t a large number of related entities:
• Too many nested entities can result in very large documents
• Slow to transfer between client and server
• Unwieldy to modify
• Time-consuming to index
Related Documents:
Reference an entity by its identifier
Related Documents
A document representing the “one” entity

Separate documents for each “many” entity

Each “many” entity references its related


“one” entity by the “one” entity’s document identifier

Makes for smaller documents

Reduces the probability of document update conflicts


Example: Publisher
{
"_id":"oreilly",
"collection":"publisher",
"name":"O'Reilly Media"
}
Example: Related Book
{
"_id":"9780596155896",
"collection":"book",
"title":"CouchDB: The Definitive Guide",
"publisher":"oreilly"
}
Map Function
function(doc) {
if ("publisher" == doc.collection) {
emit([doc._id, 0], doc.name);
}
if ("book" == doc.collection) {
emit([doc.publisher, 1], doc.title);
}
}
Result Set
key id value

["oreilly",0] "oreilly" "O'Reilly Media"


"CouchDB: The Definitive
["oreilly",1] "9780596155896"
Guide"
["oreilly",1] "9780596529260" "RESTful Web Services"
"Building iPhone Apps with
["oreilly",1] "9780596805791"
HTML, CSS, and JavaScript"
"DocBook: The Definitive
["oreilly",1] "9781565925809"
Guide"
Limitations
When retrieving the entity on the “right” side of the relationship,
one cannot include any data from the entity on the “left” side of
the relationship without the use of an additional query

Only works for one to many relationships


Many to Many Relationships
List of Keys:
Reference entities by their identifiers
List of Keys
A document representing each “many” entity on the “left” side
of the relationship

Separate documents for each “many” entity on the “right” side


of the relationship

Each “many” entity on the “right” side of the relationship


maintains a list of document identifiers for its related “many”
entities on the “left” side of the relationship
Books and Related Authors
Example: Book
{
"_id":"9780596805029",
"collection":"book",
"title":"DocBook 5: The Definitive Guide"
}
Example: Book
{
"_id":"9781565920514",
"collection":"book",
"title":"Making TeX Work"
}
Example: Book
{
"_id":"9781565925809",
"collection":"book",
"title":"DocBook: The Definitive Guide"
}
Example: Author
{
"_id":"muellner",
"collection":"author",
"name":"Leonard Muellner",
"books":[
"9781565925809"
]
}
Example: Author
{
"_id":"walsh",
"collection":"author",
"name":"Norman Walsh",
"books":[
"9780596805029",
"9781565925809",
"9781565920514"
]
}
Map Function
function(doc) {
if ("book" == doc.collection) {
emit([doc._id, 0], doc.title);
}
if ("author" == doc.collection) {
for (var i in doc.books) {
emit([doc.books[i], 1], doc.name);
}
}
}
Result Set
key id value
["9780596805029",0] "9780596805029" "DocBook 5: The Definitive Guide"

["9780596805029",1] "walsh" "Norman Walsh"

["9781565920514",0] "9781565920514" "Making TeX Work"

["9781565920514",1] "walsh" "Norman Walsh"

["9781565925809",0] "9781565925809" "DocBook: The Definitive Guide"

["9781565925809",1] "muellner" "Leonard Muellner"

["9781565925809",1] "walsh" "Norman Walsh"


Authors and Related Books
Map Function
function(doc) {
if ("author" == doc.collection) {
emit([doc._id, 0], doc.name);
for (var i in doc.books) {
emit([doc._id, 1], {"_id":doc.books[i]});
}
}
}
Result Set
key id value
["muellner",0] "muellner" "Leonard Muellner"

["muellner",1] "muellner" {"_id":"9781565925809"}

["walsh",0] "walsh" "Norman Walsh"

["walsh",1] "walsh" {"_id":"9780596805029"}

["walsh",1] "walsh" {"_id":"9781565920514"}

["walsh",1] "walsh" {"_id":"9781565925809"}


Including Docs
include_docs=true
key id value doc (truncated)
["muellner",0] "muellner" … {"name":"Leonard Muellner"}
["muellner",1] "muellner" … {"title":"DocBook: The Definitive Guide"}
["walsh",0] "walsh" … {"name":"Norman Walsh"}
["walsh",1] "walsh" … {"title":"DocBook 5: The Definitive Guide"}
["walsh",1] "walsh" … {"title":"Making TeX Work"}
["walsh",1] "walsh" … {"title":"DocBook: The Definitive Guide"}
Or, we can reverse the references…
Example: Author
{
"_id":"muellner",
"collection":"author",
"name":"Leonard Muellner"
}
Example: Author
{
"_id":"walsh",
"collection":"author",
"name":"Norman Walsh"
}
Example: Book
{
"_id":"9780596805029",
"collection":"book",
"title":"DocBook 5: The Definitive Guide",
"authors":[
"walsh"
]
}
Example: Book
{
"_id":"9781565920514",
"collection":"book",
"title":"Making TeX Work",
"authors":[
"walsh"
]
}
Example: Book
{
"_id":"9781565925809",
"collection":"book",
"title":"DocBook: The Definitive Guide",
"authors":[
"muellner",
"walsh"
]
}
Map Function
function(doc) {
if ("author" == doc.collection) {
emit([doc._id, 0], doc.name);
}
if ("book" == doc.collection) {
for (var i in doc.authors) {
emit([doc.authors[i], 1], doc.title);
}
}
}
Result Set
key id value
["muellner",0] "muellner" "Leonard Muellner"
["muellner",1] "9781565925809" "DocBook: The Definitive Guide"
["walsh",0] "walsh" "Norman Walsh"
["walsh",1] "9780596805029" "DocBook 5: The Definitive Guide"
["walsh",1] "9781565920514" "Making TeX Work"
["walsh",1] "9781565925809" "DocBook: The Definitive Guide"
Limitations
Queries from the “right” side of the relationship cannot include
any data from entities on the “left” side of the relationship
(without the use of include_docs)

A document representing an entity with lots of relationships


could become quite large
Relationship Documents:
Create a document to represent each
individual relationship
Relationship Documents
A document representing each “many” entity on the “left” side
of the relationship

Separate documents for each “many” entity on the “right” side


of the relationship

Neither the “left” nor “right” side of the relationship contain any
direct references to each other

For each distinct relationship, a separate document includes the


document identifiers for both the “left” and “right” sides of the
relationship
Example: Book
{
"_id":"9780596805029",
"collection":"book",
"title":"DocBook 5: The Definitive Guide"
}
Example: Book
{
"_id":"9781565920514",
"collection":"book",
"title":"Making TeX Work"
}
Example: Book
{
"_id":"9781565925809",
"collection":"book",
"title":"DocBook: The Definitive Guide"
}
Example: Author
{
"_id":"muellner",
"collection":"author",
"name":"Leonard Muellner"
}
Example: Author
{
"_id":"walsh",
"collection":"author",
"name":"Norman Walsh"
}
Example:
Relationship Document
{
"_id":"44005f2c",
"collection":"book-author",
"book":"9780596805029",
"author":"walsh"
}
Example:
Relationship Document
{
"_id":"44005f72",
"collection":"book-author",
"book":"9781565920514",
"author":"walsh"
}
Example:
Relationship Document
{
"_id":"44006720",
"collection":"book-author",
"book":"9781565925809",
"author":"muellner"
}
Example:
Relationship Document
{
"_id":"44006b0d",
"collection":"book-author",
"book":"9781565925809",
"author":"walsh"
}
Books and Related Authors
Map Function
function(doc) {
if ("book" == doc.collection) {
emit([doc._id, 0], doc.title);
}
if ("book-author" == doc.collection) {
emit([doc.book, 1], {"_id":doc.author});
}
}
Result Set
key id value
["9780596805029",0] "9780596805029" "DocBook 5: The Definitive Guide"
["9780596805029",1] "44005f2c" {"_id":"walsh"}
["9781565920514",0] "9781565920514" "Making TeX Work"
["9781565920514",1] "44005f72" {"_id":"walsh"}
["9781565925809",0] "9781565925809" "DocBook: The Definitive Guide"
["9781565925809",1] "44006720" {"_id":"muellner"}
["9781565925809",1] "44006b0d" {"_id":"walsh"}
Including Docs
include_docs=true
key id value doc (truncated)
["9780596805029",0] … … {"title":"DocBook 5: The Definitive Guide"}
["9780596805029",1] … … {"name":"Norman Walsh"}
["9781565920514",0] … … {"title":"Making TeX Work"}
["9781565920514",1] … … {"author","name":"Norman Walsh"}
["9781565925809",0] … … {"title":"DocBook: The Definitive Guide"}
["9781565925809",1] … … {"name":"Leonard Muellner"}
["9781565925809",1] … … {"name":"Norman Walsh"}
Authors and Related Books
Map Function
function(doc) {
if ("author" == doc.collection) {
emit([doc._id, 0], doc.name);
}
if ("book-author" == doc.collection) {
emit([doc.author, 1], {"_id":doc.book});
}
}
Result Set
key id value
["muellner",0] "muellner" "Leonard Muellner"

["muellner",1] "44006720" {"_id":"9781565925809"}

["walsh",0] "walsh" "Norman Walsh"

["walsh",1] "44005f2c" {"_id":"9780596805029"}

["walsh",1] "44005f72" {"_id":"9781565920514"}

["walsh",1] "44006b0d" {"_id":"9781565925809"}


Including Docs
include_docs=true
key id value doc (truncated)
["muellner",0] … … {"name":"Leonard Muellner"}
["muellner",1] … … {"title":"DocBook: The Definitive Guide"}
["walsh",0] … … {"name":"Norman Walsh"}
["walsh",1] … … {"title":"DocBook 5: The Definitive Guide"}
["walsh",1] … … {"title":"Making TeX Work"}
["walsh",1] … … {"title":"DocBook: The Definitive Guide"}
Limitations
Queries can only contain data from the “left” or “right” side of the
relationship (without the use of include_docs)

Maintaining relationship documents may require more work


Doctrine’s Object-Document
Mapper (ODM)
Doctrine CouchDB[1]

1. http://docs.doctrine-project.org/projects/doctrine-couchdb/
Features
Includes a CouchDB client library and ODM

Maps documents using Doctrine’s persistence semantics

Maps CouchDB views to PHP objects

Document conflict resolution support

Includes a write-behind feature for increased performance


Defining an Entity[1]

/** @Document */
class BlogPost
{
/** @Id */
private $id;
/** @Field(type="string") */
private $headline;
/** @Field(type="string") */
private $text;
/** @Field(type="datetime") */
private $publishDate;
// getter/setter here
}
1. http://docs.doctrine-project.org/projects/doctrine-couchdb/en/latest/reference/introduction.html#architecture
Persisting an Entity[1]

$blogPost = new BlogPost();


$blogPost->setHeadline("Hello World!");
$blogPost->setText("This is a blog post going to
be saved into CouchDB");
$blogPost->setPublishDate(new \DateTime("now"));
$dm->persist($blogPost);
$dm->flush();

1. http://docs.doctrine-project.org/projects/doctrine-couchdb/en/latest/reference/introduction.html#architecture
Querying an Entity[1]

// $dm is an instance of Doctrine\ODM\CouchDB


\DocumentManager
$blogPost = $dm->find("MyApp\Document\BlogPost",
$theUUID);

1. http://docs.doctrine-project.org/projects/doctrine-couchdb/en/latest/reference/introduction.html#querying
Doctrine MongoDB ODM [1]

1. http://docs.doctrine-project.org/projects/doctrine-mongodb-odm/
Features
Maps documents using Doctrine’s persistence semantics

Map embedded documents

Map referenced documents

Uses batch inserts

Performs atomic updates


Defining Entities[1]

/** @MappedSuperclass */
abstract class BaseEmployee
{
/** @Id */
private $id;

/** @EmbedOne(targetDocument="Address") */
private $address;

// ...
}
1. http://docs.doctrine-project.org/projects/doctrine-mongodb-odm/en/latest/reference/introduction.html#features-overview
Defining Entities[1]

/** @Document */
class Employee extends BaseEmployee
{
/** @ReferenceOne(targetDocument="Documents
\Manager") */
private $manager;

// ...
}

1. http://docs.doctrine-project.org/projects/doctrine-mongodb-odm/en/latest/reference/introduction.html#features-overview
Defining Entities[1]

/** @Document */
class Manager extends BaseEmployee
{
/** @ReferenceMany
(targetDocument="Documents\Project") */
private $projects = array();

// ...
}

1. http://docs.doctrine-project.org/projects/doctrine-mongodb-odm/en/latest/reference/introduction.html#features-overview
Defining Entities[1]

/** @EmbeddedDocument */
class Address
{
/** @String */
private $address;
/** @String */
private $city;
/** @String */
private $state;
/** @String */
private $zipcode;
// ...
}
1. http://docs.doctrine-project.org/projects/doctrine-mongodb-odm/en/latest/reference/introduction.html#features-overview
Defining Entities[1]

/** @Document */
class Project
{
/** @Id */
private $id;
/** @String */
private $name;
public function __construct($name)
{
$this->name = $name;
}
// ...
}
1. http://docs.doctrine-project.org/projects/doctrine-mongodb-odm/en/latest/reference/introduction.html#features-overview
Persisting Entities[1]

$employee = new Employee();


$employee->setName('Employee');
$employee->setSalary(50000.00);
$employee->setStarted(new \DateTime());

1. http://docs.doctrine-project.org/projects/doctrine-mongodb-odm/en/latest/reference/introduction.html#features-overview
Persisting Entities[1]

$address = new Address();


$address->setAddress('555 Doctrine Rd.');
$address->setCity('Nashville');
$address->setState('TN');
$address->setZipcode('37209');
$employee->setAddress($address);

1. http://docs.doctrine-project.org/projects/doctrine-mongodb-odm/en/latest/reference/introduction.html#features-overview
Persisting Entities[1]

$project = new Project('New Project');


$manager = new Manager();
$manager->setName('Manager');
$manager->setSalary(100000.00);
$manager->setStarted(new \DateTime());
$manager->addProject($project);

1. http://docs.doctrine-project.org/projects/doctrine-mongodb-odm/en/latest/reference/introduction.html#features-overview
Persisting Entities[1]

// $dm is an instance of Doctrine\ODM\MongoDB


\DocumentManager
$dm->persist($employee);
$dm->persist($address);
$dm->persist($project);
$dm->persist($manager);
$dm->flush();

1. http://docs.doctrine-project.org/projects/doctrine-mongodb-odm/en/latest/reference/introduction.html#features-overview
Querying an Entity
// $dm is an instance of Doctrine\ODM\MongoDB
\DocumentManager
$manager = $dm->find("Documents\Manager",
$theID);
Final Thoughts
Document Databases Compared
to Relational Databases
Document databases have no tables (and therefore no columns)

Indexes (views) are queried directly, instead of being used to


optimize more generalized queries

Result set columns can contain a mix of logical data types

No built-in concept of relationships between documents

Related entities can be embedded in a document, referenced from


a document, or both
Caveats
No referential integrity

No atomic transactions across document boundaries

Some patterns may involve denormalized (i.e. redundant) data

Data inconsistencies are inevitable (i.e. eventual consistency)

Consider the implications of replication—what may seem


consistent with one database may not be consistent across nodes
(e.g. referencing entities that don’t yet exist on the node)
Additional Techniques
Use the startkey and endkey parameters to retrieve one entity and
its related entities:
startkey=["9781565925809"]&endkey=["9781565925809",{}]

Define a reduce function and use grouping levels

Use UUIDs rather than natural keys for better performance

Use the bulk document API when writing Relationship Documents

When using the List of Keys or Relationship Documents patterns,


denormalize data so that you can have data from the “right” and
“left” side of the relationship within your query results
Cheat Sheet
Embedded Related Relationship
List of Keys
Entities Documents Documents

One to Many ✓ ✓
Many to Many ✓ ✓
<= N* Relations ✓ ✓
> N* Relations ✓ ✓

* where N is a large number for your system


http://oreilly.com/catalog/9781449303129/ http://oreilly.com/catalog/9781449303433/
Thank You
@BradleyHolt
http://bradley-holt.com
https://joind.in/7040

Copyright © 2011-2012 Bradley Holt. All rights reserved.

You might also like