Professional Documents
Culture Documents
NoSQL Migration
CONTENTS
Essentials
They Thinking?)
− NoSQL Incentives (How Did We
Get Here?)
− Additional Resources
UPDATED BY MATTHEW D. GROVES, PRODUCT MARKETING MANAGER
The need to transition from an SQL (relational) to a NoSQL (non- NOSQL INCENTIVES (HOW DID WE GET HERE?)
relational) data solution happens usually under a positive business From 1979–2021 (42 years), the number of people using networked
environment. Your customers' requirements are causing you to computers (the Internet) grew exponentially. During this time, the
reevaluate your data solution in order to help them achieve their cost of storage fell to $0.023 per gigabyte. The motivations of 1979's
business objectives. So congratulations, you have come to the right data engineers have changed drastically. Internet hosts went from
place. When you're finished reading this Refcard, you'll have a solid single digits in the late 60s to just over one billion in 2021. The modern
understanding of the foundations of this transition and serious paths to incentive of data engineering is scalability, while cost of storage is now
consider when going about planning and implementing your migration basically an afterthought. The foremost guiding principle within the
from an SQL to a NoSQL database. NoSQL database community is using keys and indexes on independent
data to create efficient data access and as a result, achieve speed and
ABOUT SQL AND NOSQL scalability.
The primary engineering incentive driving the design principles for
SQL is minimizing memory and disk usage, while that of NoSQL is KEY CONCEPTS IN NOSQL MIGRATION
improving scalability. Understanding the reasons behind each of these Why is database migration such a critical topic of conversation? It is
motivations will help set a foundation on which to base a migration essential to first understand how the issue ties directly to the success
from SQL to NoSQL. of today's modern business.
Learn More
REFCARD | NOSQL MIGR ATION ESSENTIALS
Since the introduction of SQL, the importance of data has grown In his paper, "Further Normalization of the Data Base Relational Model,"
exponentially, and in tandem, so has the need to learn and apply the Edgar Codd defined four forms of data normalization:
concepts of SQL databases. Academia and industry both found new and
Table 1
compelling uses for data. This naturally led colleges to teach relational
principles. As graduates entered the workforce, relational databases FORM PURPOSE
legacy systems can offer. The loss of market share for these companies 3NF Remove transitive dependencies within the data
has opened a new market for industry solutions that help businesses model
with legacy systems migrate their data infrastructure to NoSQL and
4NF Identify statistics that could change in the future;
educate engineering teams about efficient implementation of it with make changes to the data model relationships so
the hopes of maintaining and regaining lost market share. querying for these statistics is neutral
Making this leap isn’t as hard as it might seem. Next, we'll cover the
basics (and a comparison) of SQL and NoSQL data modeling. Following this process results in a set of tables with well-defined
constraints that meet the objectives of reduced data redundancy and
NOSQL DATA MODELING keeping data integrity intact.
The fundamental principles around NoSQL data modeling are the same
What Is Denormalization? — Only Exceptional Data Need Apply!
as those used for SQL. The difference comes down to denormalization
The objective of denormalization is to design all access patterns in the
vs. normalization of data and the minor shift in the shared approaches
most efficient way possible. Data access patterns are a main focus,
that this causes. Below is a Venn diagram showing a high-level view of
and careful analysis is needed to attain that efficiency. This is done by
data modeling for SQL and NoSQL.
combining or nesting data into one structure, which makes the reads
Figure 1 and writes faster and avoids the overhead of joins.
Some people think that NoSQL is schema-less; however, this isn't true.
All applications have an inherent logical data structure, which still
exists in a NoSQL database — it is just stored implicitly. By utilizing this
capability, joins can be reduced, allowing the application to retrieve all
necessary information using a single lookup.
Before jumping into the example, let's first review the possible
variations of NoSQL data models.
NOSQL MODELS
For SQL data solutions, a table is the core data model. However, there
are multiple models for NoSQL data solutions to choose from. This
The process of creating an entity relationship diagram, understanding
mainly depends on the NoSQL database you are using — the four most
your application’s access patterns, and then optimizing data throughput
popular types:
by using indexes is the same for both SQL and NoSQL. So let’s focus on
understanding the differences between normalization and 1. Key-value pair
denormalization. 2. Column-oriented
3. Graph-based
DENORMALIZING DATA
What Is Normalization? …Just Be Like Everyone Else 4. Document-oriented
The objectives of normalization are to reduce data redundancy and the
Many NoSQL databases support two or more of these models. These
amount of data storage used, as well as to improve data integrity.
are known as "multi-model" databases.
Quick note: In a later section, a simple music site is used to demonstrate Figure 2
denormalization. The data used in the models below — band and song
names — are taken from that example.
Key-Value Pair
For key-value pair NoSQL databases, the model consists of a key and
value:
Table 2
But for more complex data structures, the value can be a JSON object,
which is greater in depth and complexity. Typically, text is used as the
visualization model for JSON structures. Document-Oriented
Document-oriented NoSQL databases have a primary key and a JSON
Primary Key: Band Name
value.
Value:
This model shows two data structures, Band Name and Song Name.
With the base knowledge about NoSQL databases, let's dive into how
When creating any data solution, whether SQL or NoSQL, you should The data model for the Home screen:
consider following these steps:
Table 4
1. Understand your application – Use design tools and processes
PRIMARY KEY ATTRIBUTES
that are best suited to illustrate your problem domain and its
design. For this example, I use wire diagrams. SONG BAND PEAK CHART
HOME SCREEN YEAR
NAME NAME POS
2. Build an entity relationship diagram – Capture the data Yes When the Arctic 2018 1
structures and the relationships between them. Sun Goes Monkeys
Down
3. Define your access patterns – The data that should be grouped
together for optimal data delivery based on the application's Yes Yellow Coldplay 2000 2
needs.
4. Design your primary keys and indexes – Analyze the best Note that accessing this data structure will be done using the Home
way to design the primary keys and indexes for optimal data Screen column, and therefore, indexing this column may be necessary
delivery. for improved performance.
For this example, our problem domain is a simple music site where Functionality: Allow users to view song details.
users are presented with a list of songs and associated data, as well as
Figure 4
links to more information about each song and band.
Figure 3
Table 5
PEAK
BAND NAME -
ALBUM YEAR CHART LYRICS
SONG NAME
POS
Figure 6
There are four entities: Band Name, Song Name, Genre, and Band
The model for the Band Details screen:
Member. And there are three relationships, Perform, BelongsTo, and
Table 7 While not possible in every NoSQL database, many document databases
have recently introduced JOIN capability. With JOIN functionality, you
PRIMARY
ATTRIBUTES
KEY can design your model to take advantage of whichever approach is best.
HOME SONG BAND PEAK CHART You can denormalize to maximize performance, and you can normalize
YEAR
SCREEN NAME NAME POS to maximize data integrity.
Yes When the Sun Arctic 2018 1
Goes Down Monkeys 4: Design Your Primary Keys and Indexes
In the composite table above, there are separate primary keys for each
Yes Yellow Coldplay 2000 2
data structure used. By querying this single table, you can extract
BAND NAME ALBUM PEAK YEAR LYRICS different schemas or groups of data. The Home screen’s primary key is
- SONG CHART the Home Screen column. When the Home Screen column is queried, it
NAME POS
will look for data that has a Home Screen attribute set to yes. Therefore,
Arctic Whatever 1 2018 So who's that an index should be created on the Home Screen attribute, so that look up
Monkeys - People Say girl there?
will be as fast as possible. The Song Details primary key uses a string
I Am, That's I wonder what
When the
What I'm Not went wrong that consists of the Band Name and Song Name concatenated together.
Sun Goes So that she had The Band Details primary key uses a string consisting of the Band Name
Down to roam the
followed by - Details. These primary keys are designed this way to
streets
give the application the data it needs in the quickest manner possible.
Coldplay - Parachutes 2 2000 Look at the
Yellow stars COMMON MIGRATION PATHS
Look how they In most cases, legacy solutions are unable to meet organizations’
shine for you
scalability objectives. And the engineering teams supporting those
And everything
you do legacy applications may not be aware of NoSQL database principles;
however, they have a tremendous amount of application and domain
BAND NAME BAND ORIGIN YEAR GENRES
- DETAILS MEMBERS knowledge. In short, companies in these situations have two primary
through table constraints. business operational, switching the entire data layer over at once isn't
reasonable. Depending on the size of the data layer and the number of
To meet scalability needs, a NoSQL database has duplicated data and engineering teams involved, migrating piecemeal may be an option.
moves the responsibility of data integrity to the application code. When Oftentimes, the areas where the application struggles under the
data needs to be updated in multiple places, it is the application’s current SQL data layer is known. Focus your engineering efforts there
responsibility to keep all data structures up to date. first, porting the critical code over in a piecemeal fashion.
SQL and NoSQL Working Together The basic principles surrounding NoSQL databases aren’t that different
If your application’s data layer involves using multiple databases, you from SQL. The key difference is in normalizing your data versus
can choose to port over one database to a NoSQL solution while leaving denormalizing your data. By walking through an application design step
the other alone. This approach works well in a cloud-native environment by step, you saw how data structures can be organized and then put into
where multiple microservices are running. Teams should start with the a denormalized structure. Finally, I highly encourage you to visit https://
service that is under the most stress and then move to the next service. couchbase.live/ and interact with the JSON data to truly experience a
When a monolith is involved, a careful process needs to be followed NoSQL database (see Additional Resources). The lessons learned from
where you would wall off the part that has the most pressing issues with the evolution of SQL are being applied to NoSQL in an effort to improve
scaling. After decoupling this part, it starts to look like a microservice, and strengthen its weaker areas, as well as to continue innovating
and you can then focus on the data layer porting process. Once complete, simpler ways for organizations to migrate from SQL to NoSQL.
the next piece of the monolith can be addressed, repeating the process.
ADDITIONAL RESOURCES
Convert to NoSQL While Still Using SQL Queries • "Number of Internet hosts worldwide: 1969–present" – https://
The market has recognized the need for businesses to migrate to a en.wikipedia.org/wiki/History_of_the_Internet#/media/
NoSQL solution. And those currently using an SQL solution also need File:Internet_Hosts_Count_log.svg
a bridge that enables them to move incrementally into a full NoSQL
• "Further Normalization of the Data Base Relational Model" –
database. This method offers three things:
https://forum.thethirdmanifesto.com/wp-content/uploads/
1. Businesses can address their scalability objectives.
asgarosforum/987737/00-efc-further-normalization.pdf
2. Engineering teams have time to come up to speed on NoSQL
• Couchbase Playground – https://couchbase.live/ (start a
principles.
sandbox session and paste this code in the "Query Workbench"
3. Businesses have more options for addressing and evolving their
to create the structures and data)
data solution going forward.
CREATE COLLECTION tutorial._default.bands;
There are solutions available that allow you to convert your SQL schema
CREATE COLLECTION tutorial._default.songs;
over to a NoSQL database while continuing to use SQL queries. There CREATE COLLECTION tutorial._default.band_details;
are also NoSQL databases that are implementing the "SQL++" standard:
adding denormalized capabilities to the familiar query language. This INSERT INTO tutorial._default.bands (KEY, VALUE)
approach may help teams ramp up to NoSQL faster, building on their VALUES
existing experience and code base. (With the advent of SQL++, the term ("Arctic Monkeys", {
"Home Screen": "Yes",
"NoSQL" can be defined as "Not Only SQL": SQL is still an option for
"Song Name": "When the Sun Goes Down",
interacting with data, but it's not the only one).
"Year": 2018,
After, the business can operate as normal and only address the areas where "Peak Chart Pos": 1
}),
scalability has become a critical issue, leaving the rest of their data solution
("Coldplay", {
as is. As the business and engineering teams grow more familiar with
"Home Screen": "Yes",
NoSQL databases and key principles, leaders can make more informed
"Song Name": "Yellow",
decisions about how to design the business' overall data architecture. "Year": 2000,
"Peak Chart Pos": 2
CONCLUSION });
There are areas where a NoSQL database shouldn’t be used, and another
type of database should be considered. For example, an engineering team INSERT INTO tutorial._default.songs (KEY, VALUE)
uses custom queries to discover and explore relationships within their VALUES
data, which provides the metrics needed for business intelligence efforts. ("Arctic Monkeys - When the Sun Goes Down", {
"Album": "Whatever People Say I Am, That's
Put simply, probing such data and performing statistical analyses that
What I'm Not",
do complex database queries will execute poorly in a NoSQL database.
"Peak Chart Pos": 1,
NoSQL has been designed specifically to address database scalability
"Year": 2018,
at the expense of the memory used, the database’s ability to support
"Lyrics": "So who's that girl there? I wonder
custom queries, and data integrity — just as the SQL database was what went wrong So that she had to roam the streets"
designed to minimize memory use through the process of normalization. }),
As SQL database technology evolved, so did its capabilities to handle
CODE CONTINUES ON NEXT PAGE
efficient custom queries and support data integrity.