You are on page 1of 10

Laboratory work 3. Data Exchange Formats in MongoDB. Data Modeling.

The objective of this laboratory work is to familiarize oneself with the data exchange
formats and data representation in MongoDB, as well as data modeling.
Tasks of the laboratory work: learn how to translate data models from relational
databases into document-oriented format.

Theoretical background

Data exchange formats: JSON and BSON.

JSON Data Exchange Format


JSON (short for "JavaScript Object Notation") is a text-based data exchange format
that is widely used for transmitting data between a server and a client, or between different parts
of a distributed system. JSON is based on a subset of the JavaScript programming language and
is characterized by its simplicity, readability, and ease of use. JSON data is represented as key-
value pairs, where keys are strings and values can be strings, numbers, booleans, arrays, or
nested JSON objects. JSON is supported by many programming languages and has become a de
facto standard for data interchange on the web.

JSON is built on two data structures:


• A collection of key/value pairs. In various programming languages, this is
implemented as an object, record, structure, dictionary, hash table, keyed list, or
associative array. The key must be a string, and the value can be of any form.
• An ordered collection of values. In many languages, this is implemented as an
array, vector, list, or sequence.
These data structures are universal and theoretically supported by all modern
programming languages in some form. Since JSON is used for data exchange between different
programming languages, it makes sense to build it on these data structures.

JSON uses the following forms:


Object - an unordered collection of name/value pairs enclosed in curly braces { }. The
name and value are separated by a colon ':' and pairs are separated by commas.
Array (one-dimensional) - a collection of values with sequential indexes. Arrays are
enclosed in square brackets [ ] and values are separated by commas.
Value can be a string enclosed in double quotes, a number, a boolean value (true or
false), an object, an array, or a null value. These structures can be nested within each other.
String - an ordered collection of zero or more Unicode characters enclosed in double
quotes, with escape sequences starting with a backslash () used for representing special
characters. Characters are represented as simple strings.
Name is a string.
A string in JSON is similar to a string in C or Java. Numbers in JSON are also similar to
numbers in C or Java, except that only decimal format is used. Spaces can be inserted between
any two characters.
The following example shows JSON representation of some objects:
{
"name": "John",
"age": 30,
"isStudent": false,
"courses": ["math", "history", "chemistry"],
"address": {
"street": "123 Main St",
"city": "New York",
"state": "NY",
"zip": "10001"
}}

In laboratory work No. 1, we checked an example of storing information about


phones as separate documents. Figure 1 shows the "phone" database as a collection of
documents.
In order to input information about phones into MongoDB, the data needs to be
represented in JSON format. Information about the "Nokia Lumia 920" phone in JSON
format will look as follows:
{
“Name”: “NL920”,
“Brand”: “Nokia”,
“Model”: “Lumia 920”,
“OSFamily”: “Windows”,
“OSVersion”: “8”
}

Figure 1 - The "phone" database in non-relational form.

The field "Name" contains the name of the document.


Information about all phones will be represented as a collection of documents describing the
phones:
{
“Brand”: “Nokia”,
“Model”: “Lumia 920”,
“OSFamily”: “Windows”,
“OSVersion”: “8”
}
{
“Brand”: “Apple”,
“Model”: “iPhone 4”,

“OSFamily”: “iOS”,
“OSVersion”: “4”
}
{
“Brand”: “Samsung”,
“Model”: “Galaxy S3”,
“OSFamily”: “Android”,
“OSVersion”: “4.0 Ice Cream Sandwich”,
“Display”: “4.8 HD Super AMOLED”
}

BSON data exchange format


The BSON data exchange format is a binary representation of JSON (Binary JavaScript
Object Notation). It allows for storing data as binary data, providing efficient serialization and
deserialization of data in various programming languages. BSON is designed to be compact and
efficient, making it suitable for use in scenarios where bandwidth or storage space is limited,
such as in distributed systems or databases.
BSON supports various data types, including strings, numbers, boolean values, arrays,
and documents (which can be nested). Additionally, BSON includes additional data types not
found in JSON, such as binary data, date/time, and more.
BSON is widely used in databases like MongoDB, which is a popular NoSQL database.
It allows for efficient storage and retrieval of data in MongoDB, as BSON documents can be
directly stored in the database without the need for additional processing or conversion.
The above-presented JSON documents in BSON would look like the following:
{“hello”: “world”} → “\x16\x00\x00\x00\x02hello\x00
\x06\x00\x00\x00world\x00\x00”

“\x31\x00\x00\x00\x04BSON\x00\x26\x00
\x00\x00\x020\x00\x08\x00\x00
{“BSON”: [“awesome”, 5.05, 1986]} →
\x00awesome\x00\x011\x00\x33\x33\x33
\x33\x33\x33
\x14\x40\x102\x00\xc2\x07\x00\x00
\x00\x00”

Note:
Please note that strings enclosed in quotes represent terminal symbols and should be interpreted with "C"
semantics (for example, "\x01" represents the bytes "0000 0001").

Data Modeling
Data modeling in MongoDB involves designing the structure and organization of data
within MongoDB collections, which are analogous to tables in relational databases. MongoDB is
a NoSQL database that uses a flexible, document-oriented data model, where data is stored in
BSON (Binary JSON) format.
Here are some key considerations for data modeling in MongoDB:
Denormalization: MongoDB does not enforce rigid schema structures, allowing for
denormalization, where related data can be embedded within a single document for efficient
retrieval. This eliminates the need for complex joins, but may require duplication of data.
Embedded documents: MongoDB supports nested or embedded documents, allowing
for the modeling of hierarchical relationships between data. This can simplify data retrieval and
improve performance in some cases.
Document-oriented design: MongoDB is optimized for handling documents, which can
be rich in structure and can vary from one document to another. Data modeling should consider
the document-oriented nature of MongoDB and leverage its flexibility for storing data.
Query optimization: Data modeling should consider the types of queries that will be
performed on the data and optimize the document structure accordingly. This may involve
creating indexes, using appropriate data types, and organizing data in a way that aligns with
query patterns.
Scalability: MongoDB is designed to scale horizontally, so data modeling should take
into account the potential for high data volumes and plan for distributed deployments, such as
sharding and replica sets, to ensure scalability and high availability.
Data integrity: While MongoDB allows for flexible data modeling, care should be taken
to ensure data integrity, consistency, and accuracy. Application-level validation and data
validation rules should be implemented as needed to maintain data quality.
Performance considerations: Data modeling should take into consideration
performance aspects, such as the size of documents, the frequency of updates, and the read/write
patterns of the application, to optimize for performance and minimize potential bottlenecks.
Overall, data modeling in MongoDB requires careful consideration of the application's
requirements, query patterns, scalability needs, and performance considerations, while
leveraging the flexibility and document-oriented nature of MongoDB to design an efficient and
effective data model.
Non-relational databases allow designing the model of a domain as a set of objects. In
contrast to relational databases (RDB), where information about one entity is scattered across
different tables, in a non-relational database, it can be stored in a single object.
The main difference between MongoDB and RDB is the absence of an equivalent
operation to JOIN. If there is a need to perform joins in a database, they are implemented in the
application's source code. To find data related to a particular document, typically a second query
needs to be executed.
To associate documents, they can be saved together with the "_id" of the related
documents.
As an example, let's illustrate the storage of information about a phone manufacturer as an
embedded record.
{
_id: ObjectId ("1"),
“Name”: “Nokia”,
“BrandName”: “Nokia”,
“BrandCountry”: “Finland”
}
The documents that need to refer to the "Nokia" document as the manufacturer will have
a reference to its "_id" field. The record indicating the manufacturer company will look like
this:
{
_id: ObjectId ("2"),
“Name”: “L920”,
“Model”: “Lumia 920”,
“OSFamily”: “Windows”,
“OSVersion”: “8”,
“Brand”: ObjectId ("1")
}

Note that the value of the "Brand" field in the "L920" document and the "_id" field in the
"Nokia" document are the same.
The "_id" field can be any unique value.
To find all phones manufactured under the "Nokia" brand, you would need to execute a query
specifying the value of its "_id" field.
db.phones.find ({Brand: ObjectId ("1")})
If you need to specify more than one related document, you can use
arrays:
“Brand”: [ObjectId ("1"), ObjectId ("3")]
One way to get rid of links between documents is to use nested documents. For example, the
above example could be rewritten using the Nokia company details as an attached
document:
{ “Name”: “L920”,
“Model”: “Lumia 920”,
“OSFamily”: “Windows”,
“OSVersion”: “8”,
“Brand”: {
“BrandName”: “Nokia”,
“BrandCountry”: “Finland”
}
}

Subdocuments can be used to model one-to-many relationships. To do this, you need to


use an array of nested documents.
MongoDB does provide mechanisms for modeling relationships between data in different
ways:
Embedded Documents:
In MongoDB, you can embed documents within other documents, allowing for the
modeling of one-to-many or many-to-many relationships. For example, consider a scenario
where you have a "users" collection and a "comments" collection, and you want to associate
comments with users. You can embed comments as an array within the user document:

{
"_id": 1,
"name": "John",
"age": 30,
"comments": [
{
"comment_id": 1,
"text": "Great post!"
},
{
"comment_id": 2,
"text": "Interesting article!"
}
]
}

In this example, comments are embedded as an array within the user document, creating a
one-to-many relationship between users and comments. You can easily retrieve comments for a
user by querying the "users" collection.

References:
MongoDB also supports referencing documents from one collection to another using
references or foreign keys. For example, you can have a "users" collection and a separate
"comments" collection, where each comment has a reference to the user who made the comment:
// Users collection
{
"_id": 1,
"name": "John",
"age": 30
}

// Comments collection
{
"_id": 101,
"text": "Great post!",
"user_id": 1
}

In this example, the "comments" collection includes a "user_id" field that references the
"_id" field of the corresponding user in the "users" collection. This creates a many-to-one
relationship between comments and users. You can use queries and joins to retrieve comments
and their associated user information as needed.
Now, to retrieve comments along with their associated user information, you can use
MongoDB's $lookup aggregation operator to perform a join-like operation between the
"comments" and "users" collections:

db.comments.aggregate([
{
$lookup: {
from: "users",
localField: "user_id",
foreignField: "_id",
as: "user_info"
}
}
])

This aggregation query will retrieve comments from the "comments" collection, and for
each comment, it will perform a lookup in the "users" collection based on the "user_id" field,
matching it with the "_id" field in the "users" collection. The retrieved user information will be
added to the "user_info" field in the output.
The result of the above aggregation query might look like this:

{
{
"_id": 101,
"text": "Great post!",
"user_id": 1,
"user_info": [
{
"_id": 1,
"name": "John",
"age": 30
}
]
}

Hybrid Approach:
You can also use a hybrid approach, combining embedded documents and references,
depending on the specific requirements of your application. For example, you can embed some
related data within a document for efficiency, and use references for other related data that may
have more complex relationships or require frequent updates.

It's important to carefully consider the requirements and characteristics of your


application when modeling relationships in MongoDB, and choose the approach that best fits
your use case, taking into account factors such as query patterns, performance, data integrity, and
scalability.

Equipment and materials


To perform the laboratory work, it is recommended to use a personal computer with the
following specifications: 32-bit (x86) or 64-bit (x64) processor with a clock speed of 1 GHz or
higher, RAM - 1 GB or higher, free disk space - at least 1 GB, DirectX 9 graphics device.
Software: WINDOWS 7 or higher operating system, MongoDB 6.0.5.

Safety guidelines
The safety guidelines for carrying out the laboratory work are in line with those generally
accepted for personal computer users. Do not attempt to repair your personal computer or install
and remove software. In case of a malfunction of the personal computer, report it to the
laboratory service staff (operator, administrator). Observe safety regulations when working with
electrical equipment. Do not touch electrical sockets with metallic objects. The user's
workstation should be kept clean. Eating and drinking are not allowed near the personal
computer.

Methodology and procedure for laboratory work

Individual assignments

Follow the requirements in the project description file:


https://moodle.astanait.edu.kz/mod/resource/view.php?id=80453

Individual assignment options:

1. Car trading company.


2. Store specializing in digital cameras.
3. USB gadget manufacturing company.
4. Audio system manufacturing company.
5. Laptop store.
6. Aircraft fleet management company.
7. Software store.
8. Kitchen appliance manufacturer.
9. Motorcycle distributor.
10. Computer peripherals store.
11. Clothing store.
12. Online store for wristwatches.
13. Store specializing in tablet computers.
14. Sports bicycle store.
15. Climate control system installation company.
16. Bank department storing information about bank cardholders.
17. Supplier of DSLR cameras.
18. Jewelry store.
19. Scooter sales company.
20. Shoe store.
21. All-in-one computer manufacturer.
22. Television store.
23. Home appliances supplier.
24. Video game console rental provider.
25. Sports goods store.
26. Personal blog.
27. Social network.
28. Plastic cardholder information collection system.
29. Mobile phone holder information collection system.
30. Online store.
31. Enterprise turnover analysis system.
32. E-learning platform.
33. User preference information collection system.
34. Internet traffic analysis system.
35. E-commerce sales analysis.
36. Transportation company traffic analysis.

Contents of the report and its form


1. The number and title of the laboratory work.
2. The objectives of the laboratory work.
3. Answers to the control questions.
4. Screen forms showing the procedure for carrying out the laboratory work and the
results obtained during its execution.
5. A written report on the execution of the laboratory work is submitted to the teacher.

Questions for defending the report:

Provide definitions for the terms JSON and BSON.


What structures does JSON rely on?
What data representation formats are used in JSON?
Is it possible to reference documents from one MongoDB collection to another? If so,
what mechanisms are used for this?
Can MongoDB store arrays of documents?
Is it possible to use nested documents in MongoDB?

Defense of the work

Before carrying out the laboratory work, each student receives an individual assignment.
The defense of the laboratory work takes place only after its completion (individual assignment).
During the defense of the laboratory work, the student:
• Answers control questions;
• Explains the process of completing the individual assignment;
• Explains the results obtained as a result of completing the individual assignment.
The progress of the defense of the laboratory work is monitored by the teacher.

FAQ:
1. What are some considerations to keep in mind when designing data models in
MongoDB?
Answer: Some considerations to keep in mind when designing data models in MongoDB
include understanding the data access patterns, considering the read vs. write trade-offs, selecting
appropriate data types, denormalizing or embedding data for performance, and planning for
future scalability and flexibility.

2. How can you model relationships between entities in MongoDB?


Answer: Relationships between entities in MongoDB can be modeled using either
reference-based or embedded document approaches. Reference-based approach involves storing
references to related data in separate collections and performing lookups using MongoDB's
$lookup aggregation operator, while embedded document approach involves nesting related data
directly within the parent document.

3. When should you use embedding in MongoDB data models?


Answer: Embedding in MongoDB data models should be used when the related data is
small in size, frequently accessed together, and when there are no complex querying or updating
requirements on the embedded data. Embedding can improve read performance by reducing the
need for joins and separate queries.

4. When should you use references in MongoDB data models?


Answer: References in MongoDB data models should be used when the related data is
large in size, accessed infrequently, or when there are complex querying or updating
requirements on the related data. References allow for flexibility in querying and updating
related data independently.

5. How can you optimize data models in MongoDB for write-heavy workloads?
Answer: To optimize data models in MongoDB for write-heavy workloads, you can
consider denormalizing data to reduce the need for joins, using indexes to improve query
performance, and leveraging MongoDB's sharding capabilities to distribute data across multiple
shards for horizontal scalability.

6. What are some best practices for data modeling in MongoDB?


Answer: Some best practices for data modeling in MongoDB include understanding the
application's data access patterns, considering the performance implications of embedding vs.
referencing data, designing data models based on the read and write requirements, using
appropriate data types, and planning for future scalability and flexibility.

You might also like