Professional Documents
Culture Documents
Problems in RDBMS:
Relational databases have been dominating the database industry for the past many years, but
this way of storing data is challenged with new business requirements. With ever increasing user
growth and the huge amounts of data being generated, relational databases cannot provide on
demand scalability. It is because they are primarily engineered to run in a single server machine,
thus scaling-up has been the only option.
1) Scaling-Up is Expensive
Scaling up means upgrading your servers by adding more processors, RAM and
hard-disks to your machine to handle more load and increase capacity. Which means that
whenever your business has grown enough to max out server capacity, you have to
replace and upgrade your database server machines to accommodate more throughput
(reads and writes). This results in downtime. Also, in "less-than-peak-load" conditions,
you are under-utilizing your server machines and incurring unnecessary initial costs.
2) Downtime: downtime is used to refer to periods when a system is unavailable
3) Rigid Data Models
A relational database stores data in a fixed and pre-defined structure. This leads to a rigid
data model where each schema change comes at a cost. This is because before your begin
your development, you have to define your data schema in terms of tables and columns.
Any change in the data model requires a change in the schema which leads to creating
new columns, defining new relations, reflecting the changes in your application,
discussing with your database administrators etc. While possible, this process is against
everything agile, where new business requirements need a fast implementation.
NoSQL
A NoSQL database provides extreme data flexibility, scales out as your application scales out
and assures 100% database uptime. These advantages can be addressed under three headings:
● Data Modeling
● Scalability
● High Availability
All businesses have a knack for adding new demands and features to their
applications after deployment. Such requirement changes are not just feature requests
but entire changes in business process as well. Innovation in any industry is not just a
buzzword anymore, it's a necessity, and competition demands rapid adaptability to
new requirements.
NoSQL is built to support exactly such fast moving requirements whereas a relational
database is inherently slow to such changes due to its rigid data structure. NoSQL
databases fully support agile development.
The diagram shows a single row of a user data converted to a JSON document. In it, the column
names are converted to keys inside a JSON document and the values are retained as is.
For example, if we are to add new information to the user data, it requires a hit on the whole
database (i.e. all the rows of a table) which hurts performance on production servers thereby
hurting every other application depending on the servers as well.
NoSQL on the other hand is completely flexible. Instead of defining all the column names before
inserting data, a NoSQL database has no data schema restrictions. If your business demands a
change or addition in the data from here-forth, no problem. Just add new entities inside new JSON
documents.
With the help of a NoSQL database, your data schema is flexible and that supports the speed that
agile development demands. Your application will accept new changes as required by the
business without advanced definition.
Storing data in the JSON format not only provides schema flexibility, it is also a more natural
way to store data than in rows and columns. For example, a relational database stores data in
pieces. Suppose you need to store a USER object which has multiple Hobbies and also has a
few Shipping Addresses. The relational way to store such information would be as follows.
This data is naturally meant to be stored together since the values are accessed, used and created
together. So it's safe to say that storing information in rows and columns is highly structured and
therefore unnatural. The natural way would have been to store this information as one and group
them together. This would allow your applications to access this information without any
complex queries and JOINS.
Such flexibility with fluid data models allows you to mold your data according to your
application needs and also helps you adapt to new changes per requirements of the business.
NoSQL databases break the traditional mindset of storing data at a single location. Instead,
NoSQL distributes and stores data over a set of multiple servers. This distribution of data helps
the NoSQL database server to distribute the load on the database tier. This distributed data
approach also means the system can scale out rather than just scale up.
Scaling out without limits
One of the main advantages of a NoSQL database over a relational database is the ability to scale
out. Scaling out means adding more servers to your system to distribute the load. That is, when
your current server machines (application or database) reach their maximum capacity, you do not
stop operations to upgrade your server machines; instead you add more nodes. Such scalability
also saves money by using less-costly commodity hardware rather than expensive multi-core
heavy duty machines.
As seen from the above figure, adding servers only when required helps to keep the operating
cost low. This can scale out to an infinitely large cluster and it provides linear scalability. Which
means that as you increase your NoSQL database servers, your transaction capacity increases
equally. This helps you to handle huge amounts of data beyond the limits of a single machine.
Other than providing scalability, NoSQL databases also provide high availability out of the box.
NoSQL architecture supports data replication that happens near real time. If a primary server
goes down, the replica server automatically takes over and your client applications work
seamlessly. There is practically no limit to the number of replicas a node can have.
Furthermore, NoSQL databases allow you to configure your database to span multiple data
centers. This way, if any data center completely crashes, other replica machines can take over.
These replica machines can be configured to serve read requests from client applications.
Typically, the closest replica is automatically selected by the client application based on a
"nearest-path" algorithm.
Features of NoSQL
Non-relational
Schema-free
NoSQL is a non-relational DMS, that does not require a fixed schema, avoids joins, and is easy
to scale. NoSQL database is used for distributed data stores. NoSQL is used for Big data and
real-time web apps. For example, companies like Twitter, Facebook, Google that collect
terabytes of user data every single day.
NoSQL database stands for "Not Only SQL" or "Not SQL." Carl Strozz introduced the NoSQL
concept in 1998.
Traditional RDBMS uses SQL syntax to store and retrieve data for further insights. Instead, a
NoSQL database system encompasses a wide range of database technologies that can store
structured, semi-structured, unstructured and polymorphic data.
Types of NoSQL Databases
Each of these categories has its unique attributes and limitations. No specific database is better to
solve all problems. You should select a database based on your product needs.
Data is stored in key/value pairs. It is designed in such a way to handle lots of data and heavy
load.
Key-value pair storage databases store data as a hash table where each key is unique, and the
value can be a JSON, BLOB(Binary Large Objects), string, etc.
It is one of the most basic types of NoSQL databases. This kind of NoSQL database is used as a
collection, dictionaries, associative arrays, etc. Key value stores help the developer to store
schema-less data. They work best for shopping cart contents.
Redis, Dynamo, Riak are some examples of key-value store DataBases. They are all based on
Amazon's Dynamo paper.
Column-based
Column-oriented databases work on columns and are based on BigTable paper by Google. Every
column is treated separately. Values of single column databases are stored contiguously.
They deliver high performance on aggregation queries like SUM, COUNT, AVG, MIN etc. as
the data is readily available in a column.
Column-based NoSQL databases are widely used to manage data warehouses, business
intelligence, CRM, Library card catalogs,
HBase, Cassandra, Hypertable, Amazon DynamoDB are examples of column based database.
Document-Oriented
Document-Oriented NoSQL DB stores and retrieves data as a key value pair but the value part is
stored as a document. The document is stored in JSON or XML formats. The value is understood
by the DB and can be queried.
The document type is mostly used for CMS systems, blogging platforms, real-time analytics &
e-commerce applications. It should not use for complex transactions which require multiple
operations or queries against varying aggregate structures.
Amazon SimpleDB, CouchDB, MongoDB, Riak, Lotus Notes, are popular Document originated
DBMS systems.
Graph-Based
A graph type database stores entities as well the relations amongst those entities. The entity is
stored as a node with the relationship as edges. An edge gives a relationship between nodes.
Every node and edge has a unique identifier.
Compared to a relational database where tables are loosely connected, a Graph database is a
multi-relational in nature. Traversing relationship is fast as they are already captured into the DB,
and there is no need to calculate them.
Graph base database mostly used for social networks, logistics, spatial data.
Neo4J, Infinite Graph, OrientDB, FlockDB are some popular graph-based databases.
CAP Theorem
CAP theorem is also called brewer's theorem. It states that is impossible for a distributed
data store to offer more than two out of three guarantees
● Consistency
● Availability
● Partition Tolerance
Consistency:The data should remain consistent even after the execution of an operation.
This means once data is written, any future read request should contain that data. For example,
after updating the order status, all the clients should be able to see the same data.
Availability:The database should always be available and responsive. It should not have
any downtime.
Partition Tolerance:Partition Tolerance means that the system should continue to function
even if the communication among the servers is not stable. For example, the servers can be
partitioned into multiple groups which may not communicate with each other. Here, if part of the
database is unavailable, other parts are always unaffected.
BASE: Basically Available, Soft state, Eventual consistency
● Basically, available means DB is available all the time as per CAP theorem
● Soft state means even without an input (User input); the system state may change
(state change without input which is required for eventual consistency. )
● Eventual consistency means that the system will become consistent over time
The term "eventual consistency" means to have copies of data on multiple machines to get
high availability and scalability. Thus, changes made to any data item on one machine has to
be propagated to other replicas.
Advantages of NoSQL
Disadvantages of NoSQL
● No standardization rules
● Limited query capabilities
● It does not offer any traditional database capabilities, like consistency when multiple
transactions are performed simultaneously.
● When the volume of data increases it is difficult to maintain unique values as keys
become difficult
● Doesn't work as well with relational data
● Open source options so not so popular for enterprises.
● Inserting
● Deleting
● Updating
● Retrieving
By convention, the database container is referred to as ‘db’. To refer to a collection, you prefix the
collection name with ‘db’. For example, the collection customer is indicated by ‘db.customer.’
Creating a database
“use” command (If the database does not
exist a new one will be created
db.Employee.insert({_id:10, "EmployeeName"
: "Smith"})
{
_id: 6,
name: "abc",
age: 43,
type: 1,
status: "A",
favorites: { food: "pizza", artist:
"Picasso" },
finished: [ 18, 12 ], db.books.insert( 🡪(or) db.books.insertMany(
badges: [ "black", "blue" ], [
points: [ {“book_id”: 1298747,
{ points: 78, bonus: 8 }, “title”:“Mother Night”,
{ points: 57, bonus: 7 } “author”: “Kurt Vonnegut, Jr.”},
] {“book_id”: 639397,
} “title”:“Science and the Modern World”,
Retrieve artist equal to "Picasso" “author”: “Alfred North Whitehead”},
{“book_id”: 1456701,
“title”:“Foundation and Empire”,
db.users.find( { "favorites.artist": “author”: “Isaac Asimov”}
"Picasso" } ) ]
)
Deleting Documents from a Collection deletes all documents in the books collection
remove() command db.books.remove()
Updating Documents in a Collection update the name which has the Employee id 22
The update() method requires two
parameters to update: db.Employee.update(
• Document query {"Employeeid" : 22},
{$set: { "EmployeeName" : "NewMartin"}});
• Set of keys and values to update
Review Questions:
1. Define a document with respect to document databases.
2. Name two types of formats for storing data in a document database.
3. List at least three syntax rules for JSON objects.
4. Create a sample document for a small appliance with the following attributes:
appliance ID, name, description, height, width, length, and shipping weight. Use the
JSON format.
5. Using the db.books collection, write a command to insert a book to the collection.
6. Using the db.books collection, write a command to remove books by Isaac Asimov.
7. Using the db.books collection, write a command to retrieve all books with quantity greater
than or equal to 20.
8. Which query operator is used to search for values in a single key?