You are on page 1of 27

Advance Data Organization

Lecture VIII
MBA(DSDA) 2023-25, SCIT
ADO
Session Plan
1. Introduction to ADO, Data, Big Data, Time-Series Data, Spatial
Data, Graph Data, Streaming Data, Session Plan, Cos (0.5 Session)
2. Features of Database. Structured/Semi-Structured/Unstructured,
SQL DBs, NoSQL DBs, NewSQL DBs, ACID- CAP-BASE Property,
Distributed Databases. (0.5 Session)
3. Journey from RDBMS to NoSQL- BigTable, Dynamo DB, Hbase,
Cassandra, VoltDB. (2 Sessions)
4. MongoDB (in Detail) (3-4 Sessions)
5. Neo4j (in Detail) (2 Sessions)
6. Time-Series DB (if time Permits) (1 Session)
7. Data Lakes and Data Quality Management (1 Session)
ADO
MongoDB
1. Document Data Concept/Model
2. MongoDB platform
3. Basic Commands
4. CRUD
5. Data Types
6. File Import/Export
7. GridFS
8. Collection
9. Time-Series
10. Collection - Validation
11. Spatial Features &Complex Queries
MongoDB
GeoSpatial Features

{
Name: { first: “abc”, last:”efg”},
Profession: “Teaching”,
house: [ -95.3253, 45.7895]
}
MongoDB
GeoSpatial Features

{
Name: { first: “abc”, last:”efg”},
Profession: “Teaching”,
house: {
type: “point”,
coordinates: [ -95.3253, 45.7895]
}
}
MongoDB
GeoSpatial Features

{
Name: { first: “abc”, last:”efg”},
Profession: “Teaching”,
house: { GeoJSON Object

type: “point”,
coordinates: [ -95.3253, 45.7895]
}
}
MongoDB
Spatial Features
• GeoJSON Objects
https://www.mongodb.com/docs/manual/reference/
geojson/
• GeoSpatial Indexes
https://www.mongodb.com/docs/manual/core/indexes/
index-types/index-geospatial/#std-label-geospatial-index
• GeoSpatial Queries
https://www.mongodb.com/docs/manual/geospatial-
queries/
https://www.mongodb.com/docs/manual/reference/method/db.createCollection/
#mongodb-method-db.createCollection
MongoDB
Aggregation Pipeline
An aggregation pipeline consists of one or more stages that
process documents:
• Each stage performs an operation on the input
documents. For example, a stage can filter documents,
group documents, and calculate values.
• The documents that are output from a stage are passed to
the next stage.
• An aggregation pipeline can return results for groups of
documents. For example, return the total, average,
maximum, and minimum values.
https://www.mongodb.com/docs/manual/core/aggregation-pipeline/#std-label-
aggregation-pipeline
MongoDB
Aggregation Pipeline
db.orders.aggregate( [

// Stage 1: Filter pizza order documents by pizza size


{
$match: { size: "medium" }
},

// Stage 2: Group remaining documents by pizza name and calculate total


quantity
{
$group: { _id: "$name", totalQuantity: { $sum: "$quantity" } }
}
https://www.mongodb.com/docs/manual/core/aggregation-pipeline/#std-label-
aggregation-pipeline
])
MongoDB
db.orders.aggregate( [

Aggregation Pipeline
// Stage 1: Filter pizza order documents by date range
{
$match:
{
"date": { $gte: new ISODate( "2020-01-30" ), $lt: new ISODate( "2022-01-30" ) }
}
},

// Stage 2: Group remaining documents by date and calculate results


{
$group:
{
_id: { $dateToString: { format: "%Y-%m-%d", date: "$date" } },
totalOrderValue: { $sum: { $multiply: [ "$price", "$quantity" ] } },
averageOrderQuantity: { $avg: "$quantity" }
}
},

// Stage 3: Sort documents by totalOrderValue in descending order


{
$sort: { totalOrderValue: -1 }
}
https://www.mongodb.com/docs/manual/core/aggregation-pipeline/#std-label-
aggregation-pipeline
])
MongoDB
db.orders.aggregate( [

Aggregation Pipeline
// Stage 1: Filter pizza order documents by date range
{
$match:
{
"date": { $gte: new ISODate( "2020-01-30" ), $lt: new ISODate( "2022-01-30" ) }
}
},

// Stage 2: Group remaining documents by date and calculate results


{
$group:
{
_id: { $dateToString: { format: "%Y-%m-%d", date: "$date" } },
totalOrderValue: { $sum: { $multiply: [ "$price", "$quantity" ] } },
averageOrderQuantity: { $avg: "$quantity" }
}
},

// Stage 3: Sort documents by totalOrderValue in descending order


{
$sort: { totalOrderValue: -1 }
}
https://www.mongodb.com/docs/manual/core/aggregation-pipeline/#std-label-
aggregation-pipeline
])
MongoDB
Join Operation

https://www.mongodb.com/docs/manual/core/aggregation-pipeline/#std-label-
aggregation-pipeline
MongoDB
Join Operation
{
$lookup:
{
from: <collection to join>,
localField: <field from the input documents>,
foreignField: <field from the documents of the "from" collection>,
as: <output array field>
}
}

https://www.mongodb.com/docs/manual/reference/operator/aggregation/lookup/
#:~:text=%24lookup%20performs%20an%20equality%20match%20on%20the
MongoDB
UnionWith

{ $unionWith: { coll: "<collection>", pipeline:


[ <stage1>, ... ] } }

{ $unionWith: "<collection>" } // Include all


documents from the specified collection

https://www.mongodb.com/docs/manual/reference/operator/aggregation/unionWith/
#mongodb-pipeline-pipe.-unionWith.
MongoDB
Example – Social Media Platform

Show the last twenty posts with an “important”


rating from all users in reverse chronological
order. Each returned document should contain
the text, the time of the post and the associated
user’s name and country.
MongoDB
Complex Queries

GM Road Casualty Accidents

Table 1 - Accidents:
Contain general information about the accidents, including when and where they occurred, the
severity of the accident, and various factors that may have contributed to the accident (e.g.
road conditions, weather, lighting). Important variables in this table include the accident index,
year, severity, number of vehicles involved, number of casualties, location (Easting and
Northing), and various details about the road (e.g. road class, speed limit, junction details).

Table 2 - Casualities:
Provide more specific information about the casualties involved in each accident. Variables
include the accident index, year, vehicle reference number, casualty number, casualty class, sex
of casualty, age band of casualty, and severity of injury. Other variables include information
about whether the casualty was a pedestrian, passenger, or driver, and whether the casualty
was at work at the time of the accident.
MongoDB
Complex Queries
GM Road Casualty Accidents

Table 3 - Vehicles:
This table seems to provide information about the vehicles
involved in each accident. Variables include the accident index,
year, vehicle reference number, vehicle type, manoeuvre,
skidding, and whether the vehicle hit an object on or off the
carriageway. Other variables include information about the driver,
such as sex and age band, as well as details about the journey
purpose and whether the vehicle was registered in a foreign
country.
MongoDB
Complex Queries
GM Road Casualty Accidents
MongoDB
Complex Queries
GM Road Casualty Accidents
MongoDB
Complex Queries
GM Road Casualty Accidents
MongoDB
Complex Queries
GM Road Casualty Accidents
Query in English
Provide insights on casualties and accidents by
weather conditions
MongoDB
Complex Queries
GM Road Casualty Accidents

Provide insights on casualties and accidents by


weather conditions

Query in MongoDB Query Language


MongoDB
Complex Queries
GM Road Casualty Accidents
Query in English
Provide insights on casualties and accidents by
weather conditions
Calculate the average number of casualties per accident for each
weather condition in the dataset and present the results sorted by
total accidents and then by average casuality rate in descending
order.
MongoDB
Complex Queries
GM Road Casualty Accidents
Query in English
Provide insights on casualties and accidents by
weather conditions
Calculate the average number of casualties per accident for each
weather condition in the dataset and present the results sorted by
total accidents and then by average casuality rate in descending
order.
MongoDB
Complex Queries
GM Road Casualty Accidents

Provide insights on casualties and accidents by


weather conditions

Query in MongoDB Query Language


Executed Successfully
MongoDB
Complex Queries
GM Road Casualty Accidents
Query in English
Provide insights on casualties and accidents by
weather conditions

Results of the Query


MongoDB
Complex Queries

• Each member in Group needs to create ONE


unique complex Query - First in English
Language ( 2 Marks) Then in MongoDB using
MongoDB Query Language. (5 Marks X
Complexity).
Query Complexity: 1 (Complex), 0.75(Semi-Complex), 0.5 (Simple)

You might also like