Unstructured Data: November 20 1

UNSTRUCTURED DATA
November 20 1
BY THE END OF THIS LESSON, YOU SHOULD
KNOW:
How to model a document NoSQL database.
November 20 2
NOSQL DATA MODELLING VS.
RELATIONAL MODELLING
NoSQL data modeling often starts from the application-specific queries as opposed
to relational modelling:
 Relational modeling is typically driven by the structure of available data. The main design theme
is “What answers do I have?”
 NoSQL data modeling is typically driven by application-specific access patterns, i.e. the types of
queries to be supported. The main design theme is “What questions do I have?”
November 20 3
November 20 4
November 20 5
November 20 6
MODELLING TECHNIQUES
Referencing documents.
Embedding documents.
Denormalisation.
Heterogeneous collection.
November 20 7
REFERENCING DOCUMENTS
References store the relationships between data by including links or references from
one document to another.
You can reference another document using the document key. This is similar to
normalisation in relational db.
Referencing enables document databases to cache, store and retrieve the documents
independently.
Provides better write speed/performance.
Reading may require more round trips to the server.
November 20 8
EXAMPLE
November 20 9
REFERENCING DOCUMENTS CAN BE
BENEFICIAL…
Key document
 If a document is a key document, it means that it is referenced by many other documents. It is more efficient
and less error prone to reference key documents.
1-to-many relationships (unbounded).

Many-to-many relationships.
Related data changes with differing volatility (speed of change or update).
 If related documents do not have similar volatility (update, insert, and delete rates are not similar) than the
referencing modeling should be applied.
Data changes or growth much

 If we keep embedding related data in a document or constantly updating data in the document, it may
cause the document size to grow after the document creation. This can lead to data fragmentation and also
slow database performance since MongoDB will have to move document to location where enough space
exist to accommodate it. All this means that the related documents should be referenced.
November 20 10
EMBEDDING DOCUMENTS
Embedded documents capture relationships between data by storing related data in
a single document structure. You can embed a document in another document by
simply defining an attribute to be an embedded document.
These denormalized data models allow applications to retrieve and manipulate
related data in a single database operation. Embedding enables document
databases to cache, store and retrieve the complex document with embedded
documents as a single piece.
Eliminates the need to retrieve two separate documents and join them.
Provides better read speed/performance.
November 20 11
EXAMPLE
November 20 12
EMBEDDING CAN BE ADVANTAGEOUS WHEN….
Two data items are often queried together.
 By embedding one document into another, the query performance will be improved since all data will be
stored in the single document. In other words, embedding supports locality.
One data item is dependent on another.

 A document is independent if it can be found using only its own fields. Otherwise the document is dependent.
The dependent document should be embedded in its "parent" document.
1:1 relationship.
 This means that there is no redundancy between the documents and embedding one document into another is
a natural and efficient way to implement their relationship. This is an easy-to-query structure that also
guarantees cinsistency when data is updated/removed in these embedded documents.
Similar volatility (speed of change or update).

 If a document that is considered to be embedded has similar volatility (update, insert, and delete rates are
similar) as the "parent" document, than the document should be embedded. Otherwise the referencing
approach should be used.
November 20 13
1-TO-1 RELATIONSHIPS: REFERENCING
If the address data is frequently retrieved with
the name information, then with referencing,
your application needs to issue multiple queries
to resolve the reference. The better data
model would be to embed the address data in
the patron data, as in the following document:
November 20 14
1-TO-1 RELATIONSHIPS: EMBEDDED
With the embedded data

model, your application
can retrieve the complete
patron information with
one query.
November 20 15
MANY-TO-MANY RELATIONSHIP: EMBEDDED
November 20 16
MANY-TO-MANY RELATIONSHIP: REFERENCING
When using references, the growth of the
relationships determine where to store the
reference.
If the number of books per publisher is small
with limited growth, storing the book
reference inside the publisher document may
sometimes be useful.
Otherwise, if the number of books per
publisher is unbounded, this data model
would lead to mutable, growing arrays
November 20 17
MANY-TO-MANY RELATIONSHIP: REFERENCING
November 20 18
1-TO-MANY RELATIONSHIPS (UNBOUNDED)
November 20 19
1-TO-MANY RELATIONSHIPS (UNBOUNDED)
November 20 20
MANY-TO-MANY RELATIONSHIPS
Not efficient, requires two references.

First to speaker documents,
Second to session documents.
November 20 21
MANY-TO-MANY RELATIONSHIPS
Reference by session Reference by speaker

More efficient, requires only one reference.
November 20 22
RELATED DATA CHANGES WITH DIFFERING
VOLATILITY
Lower volatility
Greater volatility
November 20 23
RELATED DATA CHANGES WITH DIFFERING
VOLATILITY
November 20 24
TWO DATA ITEMS ARE OFTEN QUERIED TOGETHER
November 20 25
ONE DATA ITEM IS DEPENDENT ON ANOTHER
Dependent
on Order
November 20 26
1:1 RELATIONSHIP
November 20 27
SIMILAR VOLATILITY
Both email and

socialIds do not
change very often
November 20 28
NORMALISED
Query
Two
reads
are
needed
November 20 29
DENORMALISED
Embeds speaker into

session with summary
information.
If further information
about a speaker is
needed, only then it will
be loaded.
November 20 30
NORMALISATION VS. DENORMALISATION
Normalised:
 Requires multiple reads.
 Doesn’t align with instances.
 Provides faster write speed.
Denormalised:
 Requires updates in multiple places.
 Provides faster read speed.
November 20 31
KEY CONSIDERATIONS WITH DATA MODELING
Read and write operations
 When designing a data model for MongoDB, it is important to know your application patterns. They will help you to better understand how data will be
created and used. Based on that understanding, you should be in a better position to improve the design of your data model applying the data model
design patterns that are the best fit for the patterns of your application. One of the main questions are:
 How your data will grow and change over time?
 What is the read/write ratio?
 What kinds of queries your application will perform?
 Are there any concurrency related constrains you should look at?
Document growth.
 Documents can grow by either adding new fields to them, or adding new elements to its array fields, or by frequently updating them. MongoDB has a
document size limit of 16 MB. MongoDB will move documents to accommodate their new space requirements. Document moves are generally slow and can
also fragment space where the file with document's collection resides.
Atomicity
 There is no concept of transaction in MongoDB. All operations that create or change data (e.g., write, update, delete) are atomic at the document level
only. If fields of a document have to be modified together, all of them have to be embedded in a single document in order to guarantee atomicity.
MongoDB does not support multi-document transactions.
November 20 32
HOMOGENEOUS COLLECTIONS
One collection per data type.
 Speaker
 Session
 Room
But, this would require three different queries over three different collections.
November 20 33
HETEROGENEOUS COLLECTIONS
Multiple types in a single collection.
November 20 34

Unstructured Data: November 20 1

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unstructured Data: November 20 1

Uploaded by

Copyright:

Available Formats

UNSTRUCTURED DATA

1-to-many relationships (unbounded).

Data changes or growth much

One data item is dependent on another.

Similar volatility (speed of change or update).

With the embedded data

Not efficient, requires two references.

Reference by session Reference by speaker

Both email and

Embeds speaker into

You might also like