You are on page 1of 31

MongoDB

An overview of NoSQL

COS216
AVINASH SINGH
DEPARTMENT OF COMPUTER SCIENCE
UNIVERSITY OF PRETORIA
Big Data - Overview

 “Big Data” is a recent buzzword


 Although not a new concept, it became popular with recent hardware and
algorithmic advances
 Big data refers to the large volume of data used by businesses on a day-to-day basis
 The amount of data is not the important thing
 Rather how the data is used by the organization
 And how the data is processed
Big Data - V

 Some of the Vs of Big Data


 Volume: refers to the size of the data, in this case a very large amount in the
Petabyte or even Exabyte range
 Velocity: refers to how quickly new incoming data can be processed and analysed.
If data is processed too slowly, it might already be outdated before it can be used.
 Variety: refers to the different data types and files that have to be stores and
analysed. Recently hypermedia (video, audio, images, etc) is being analysed more
often, rather than just plain text
SQL - Problems

 SQL-based databases have been used for decades


 However, they have problems, especially with modern requirements
 SQL requires the database/table structure to be known before adding data
 SQL has inherent problems storing and processing hypermedia and non-text data
 Although SQL is relatively good with large tables (millions or billions of records), inserting,
indexing, and lookup queries can become slow with very large datasets
 SQL has storage overhead which might consume storage space that could be utilized
better
 Depending on the programming language used to query SQL databases, the data
representation is typically different, requiring the data to be converted before being usable
NoSQL - Overview

 NoSQL can refer to multiple concepts


 Not using SQL at all
 Not only using SQL, but combing other mechanisms with SQL
 Not using relational SQL
 NoSQL databases provide mechanism for storage and retrieval which is modelled by
means other than tabular relations used by RDBMSs like MySQL
 NoSQL has been around since the 1960s, but only gained widespread adoption
during the past decade due to large datasets
 Google, Facebook, Twitter, Amazon, etc
NoSQL - Design

 NoSQL systems are designed for:


 Large scale data storage
 Massively parallel data processing
 Processing using a large number of low cost servers
NoSQL - Types

 Four general types of NoSQL databases


 Key-value stores: every item in the database is stored as an attribute name/key with a
corresponding value. For example Amazon Dynamo
 Document databases: pair each key with a complex data structure, known as a document.
Documents can contain many different key-value pairs, arrays, and nested documents. For
example MongoDB
 Wide-column stores: optimized for queries over large datasets. Stores columns of data
together, instead of rows. For example Cassandra and HBase
 Graph databases: store data as graphs showing connections and networks. Often used for
social media where users have multiple relationships with other users. For example: Neo4j
NoSQL - Benefits

 NoSQL has some advantages


 Manage large volumes of rapidly changing structured, semi-structured, and unstructured
data
 Agile sprints, quick schema iterations, and frequent code pushes
 Object-oriented programming that is easy to use and flexible
 Geographically distributed scale-out architecture instead of expensive monolithic
architecture
 Database is splits across multiple cheap servers, instead of a single expensive server
MongoDB - Overview

 MongoDB is a document-oriented database which provides high performance, high


availability, and high scalability
 MongoDB has a dynamic schema, which means that you do not have to create a
structure for the database before being able to use it
 Released in 2009
 Free, open-source, an cross-platform
 Written in C/C++ and JavaScript
 Nicely works together with JavaScript clients, such as NodeJS
MongoDB - Overview

 MongoDB documents are a set of key-value pairs


 Similar to JavaScript objects
 MongoDB makes use of Binary JSON (BSON)
 BSON is very similar to JSON, but it supports more data types
 MongoDB makes use of the following structure
 Database
 Collections
 Documents
MongoDB - Terminology

RDBMS MongoDB
Database Database
Table Collection
Row Document
Column Field
Primary Key Primary Key
MongoDB - Documents

 A document is equivalent to a row in a relational database


 A document has a dynamic schema
 The schema does therefore not have to be specified beforehand
 Different documents in the same collection can also have different fields and structure

{"id" : "X47521", "city" : "Pretoria", "location" : [25.7479, 28.2293]}


MongoDB - Collections

 A collection is a grouping of multiple documents


 The documents are similar, or have a similar purpose, but do not need to have the
same structure or fields
 A collection is equivalent to a table in a relational database

{"id" : "X47521", "city" : "Pretoria", "location" : [25.7479, 28.2293]}


{"id" : "X47561", "city" : "Johannesburg", "location" : [26.2041, 28.0473]}
{"id" : "X43542", "city" : "Durban", "location" : [29.8587, 31.0218]}
MongoDB - Processes

 The following programs are available in MongoDB

Name Description
mongod Daemon/server for MongoDB that has to run in the
background
mongo The client used to connect to the server and execute
queries
mongoimport Import JSON documents into MongoDB
mongoexport Export JSON documents from MongoDB
MongoDB - Connection

 First start the MongoDB server (mongod)


 Use the client to connect, similar to MySQL
mongo -u <user> -p <pass> --host <host> --port <port>
 Or connect to localhost if no parameters given
mongo
 Further commands on the server, client, import, export are not discussed further
 Read up on this if you need it
MongoDB – Data Types

Type Description
String UTF-8 strings
Boolean True/False values
Integer 32bit or 64bit integers
Double Floating points
Arrays Arrays of any of the other types
Timestam Unix timestamp with ordinal
ps operations
Date Date and times
Object Embedded documents
Code JavaScript code
Regex Regular expressions
MongoDB – GRUD Operations

Operation Mongo Call Description


Create use mydb Creates and select a database
Database
Create mydb.mycollection.insert Creates the collection if it does not
Collection or (doc) exist, otherwise adds the document
Document to the collection
Read mydb.mycollection.find( Queries specified collections
…)
Update mydb.mycollection.upda Updates document in specified
te(…) collection
Delete mydb.mycollection.remo Remove specific document
ve(…)
Drops specified collection
mydb.mycollection.drop
()
MongoDB - Insert

 Insert documents into a collections

db.myCollection.insert({name : "Bitcoin", price : 8521.32});

 Insert multiple documents at the same time

db.myCollection.insert([
{name : "Bitcoin", price : 8521.32},
{name : "Ethereum", price : 625.32}
]);
MongoDB - Find

 Search a collection with specific criteria that is equal


db.myCollection.find({name : "Bitcoin"});

 Search values that are less than

db.myCollection.find({price :{$lt : 5200}});

 Other operators are available: less than ($lt), less than or equal ($lte), greater than
($gt), greater than or equal ($gte), not equal ($ne)
MongoDB - Find

 Search according to multiple criteria (AND – all must match)

db.myCollection.find({name : "Bitcoin", price : {$lt : 10000}});

 Note that each key can only appear once


 Hence, when searching for a range (eg price greater than 500 AND less than 1000)
the key cannot be used multiple times
 Use $range instead, or combine it with $and
 More operators and combinations can be found in MongoDB’s docs
MongoDB - Find

 Search according to either criteria (OR– one must match)

db.myCollection.find({$or : [
{name : "Bitcoin"},
{name : "Ethereum"}
]});
MongoDB - Sort

 Sort results based on one or more fields


 Sort in ascending order (1) or descending order (-1)

db.myCollection.find({price : {$lt : 10000}})


.sort({price : -1, name : 1});
MongoDB - Projection

 Determine which fields to return from the collection


 Fields can be shown (1) or hidden (0)

db.myCollection.find({ price : {$lt : 10000}},


{name : 0, price : 1});
MongoDB - Update

 Update the values of specific fields


 First find documents and then set the values
 Multi determines of all matches should be updated, or only the first one

db.myCollection.update({name : "Bitcoin"},
{$set : {price : 8652.23}},
{multi : false});
MongoDB - Remove

 Remove documents from a collection

 Remove all documents db.myCollection.remove({});

 Remove specific documents

db.myCollection.remove({name : "Bitcoin"});
MongoDB - Aggregation

 MongoDB supports a range of aggregated queries


 $sum – calculate the sum
 $avg – calculate the mean
 $min – find the minimum value
 $max – find the maximum value
MongoDB - Aggregation

 Aggregated queries have the following syntax


 The queries are pipelined from one to the other

.aggregate([
{ $match : <document criteria> }, // Limits data before grouping
{ $group : <group specification> }, // Grouping data
{ $match : <group criteria> }, // Limits results after grouping
{ $sort : <sort specification> }, // Sorts grouped data
{ $out : <collection> }, // The results are inserted into a collection
]) ;
MongoDB - Aggregation

 Example: find the maximum valued coin


db.myCollection.aggregate([
{
$group : {_id : "$name", _price : {$max : "$price"}}
}
])

 Result

{"_id" : "Bitcoin", "_price" : "8765.23"}


MongoDB - Aggregation

 Example: find the total market cap of all coins

db.myCollection.aggregate([
{
$group : {_cap : {$sum : "$marketcap"}}
}
])
 Result

{"_cap" : "390776273498"}
MongoDB - Aggregation

 Example: find all coins above $5000 and sort them in descending order according to
price

db.myCollection.aggregate([
{$group : {_id : "$name", _price : "$price"}},
{$match : {_price : {$gt : 5000}}},
{$sort : {_price : -1}},
{$out : "CoinResults"}}
]) ;

You might also like