You are on page 1of 3

Project 2 Mongo

Francesco Valente
Politecnico di Torino
Student id: s291486
s291486@studenti.polito.it

Abstract—Report of the Advanced Data Bases project on B. year


MongoDB. The report firstly describe the data ingestion and
curation of a data collection and then it depict all the query
db.movies.find({
performed on the MongoDB database built on the data collection "year": {$exists: true}
previously modified. }).forEach(
function(doc) {
I. P ROJECT OVERVIEW
var int_value =
The data collection proposed for the project is related new NumberInt(doc.year);
to a collection of movies’ information and their associated db.movies.updateOne({
reviews on IMDB and Rotten Tomatoes. Some attributes of "_id": doc._id
the collection use data type not suitable for analysis, so it’s }, {
important to correctly parse these data type after the import of "$set": {
the collection in MongoDB and before starting query the DB. "year" : int_value
After the data curation the project consist in the execution }
of different query on the MongoDB: standard queries and });
queries based on the aggregation framework and Map-reduce }
framework. );
II. DATA INGESTION AND CURATION
C. lastupdate
The database created for the project is called ”imdb” and
db.movies.find({
contains a single collection named ”movies”.
"lastupdated": {$exists: true}
The movie collection contains several attributes and each of
}).forEach(
them is associated with a data type. Among these attributes
function(doc) {
there are three of them which have an incorrect data type:
var date_value =
imdb.votes, years and lastupdated.
new Date(doc.lastupdated);
An incorrect data type can give a wrong result in the
db.movies.updateOne({
following queries or it may be impossible to perform certain
"_id": doc._id
kind of queries. For this reason it’s important to execute some
}, {
functions on the DB which correctly update all documents,
"$set": {
parsing the data type for those attributes.
"lastupdated" : date_value
The function used to update the collection are:
}
A. imdb.votes });
db.movies.find({ }
"imdb.votes": {$exists: true} );
}).forEach(
III. S TANDARD QUERIES RESOLUTION
function(doc) {
var int_value = All the standard queries (overall four) has been executed on
new NumberInt(doc.imdb.votes); the simple graphical interface offered by MongoDB Compass.
db.movies.updateOne({ Below, for each query, are reported only the code of the indi-
"_id": doc._id vidual parts that make up the query tab (FILTER, PROJECT,
}, { SORT).
"$set": { A. Standard query 1
"imdb.votes" : int_value
Find all the movies which have been scored higher than 4.5
}
on Rotten Tomatoes. Sort the results using the ascending order
});
for the release date.
}
); FILTER {"tomatoes":{$exists:true},
"released":{$exists:true}, $group: {
"tomatoes.viewer.rating":{$gt:4.5}} _id: "$year",
SORT {released:1} avg_rating: {
$avg: "$tomatoes.viewer.rating"
B. Standard query 2 }
Find the movies that have been written by 3 writers and }
directed by 2 directors. }]
FILTER {directors: {$size: 3},
writers: {$size: 2}} B. Aggregation query 2
C. Standard query 3 [{
For the movies that belong to the “Drama” genre and belong $match: {
to the USA country, show their plot, duration ( runtime ), and "countries" : {
title. Order the results according to the descending duration. $elemMatch: {
$eq: "Italy"
FILTER {genres: { }
$elemMatch: {$eq: "Drama"} },
}, "directors": {
countries: { $exists: true
$elemMatch: {$eq: "USA"} }
} }
} },{
PROJECT {plot: 1, runtime: 1, title: 1} $group: {
SORT {runtime: -1} _id: null,
D. Standard query 4 total_directors: {
$avg: {
Find the movies satisfying all the following conditions: have $size: "$directors"
been published between 1900 and 1910, have an imdb rating }
higher than 9.0 and contain the fullplot attribute. In the results, }
show the publication year and the length of the full plot in }
terms of number of characters. Sort the results according to }]
the ascending order of the IMDB rating.
FILTER {released: { C. Aggregation query 3
$gt: new Date("1900"),
$lt: new Date("1911")}, [{
"imdb.rating": {$gt: 9.0}, $match: {
fullplot: {$exists: true}} "imdb.rating": {
PROJECT {year:1, $exists: true,
fullplotLen: { $type: "number"
$strLenCP: "$fullplot"} }
} }
SORT {"imdb.rating": 1} },{
$unwind: {
IV. AGGREGATION FRAMEWORK AND MAP - REDUCE path: "$genres"
In this section it is shown the solutions for queries that }
required aggregation pipeline and map-reduce. },{
The aggregation ones are listed first: $group: {
_id: "$genres",
A. Aggregation query 1 avg_pub_year: {
[{ $avg: "$year"
$match: { },
"tomatoes.viewer.rating" : { max_imdb_score: {
$exists: true $max: "$imdb.rating"
} }
} }
},{ }]
D. Aggregation query 4 },
[{ function(key, values){
$unwind: { return Array.sum(values)
path: "$directors" }
} )
},{ V. I NTEREST QUERY
$group: {
_id: "$directors",
tot_films: {
$sum: 1
}
}
},{
$sort: {
tot_films: -1
}
}]
MongoDB provide not only aggregation pipeline, but also
another method to perform aggregation: map-reduce.
The solutions to proposed queries are the following:
E. Map-reduce query 1
db.movies.mapReduce(
function(){
if(this.year){
emit(this.year, this._id)
}
},
function(key, values){
return values.length
}
)
F. Map-reduce query 2
db.movies.mapReduce(
function(){
if(this.writers && this.title){
emit(this.writers.length,
String(this.title).split(" ").length)
}
},
function(key, values){
sum = values.reduce((a,b) => a+b, 0);
avg = sum/values.length;
return avg;
}
)
G. Map-reduce query 3
db.movies.mapReduce(
function(){
if(this.languages){
this.languages.forEach(
language => emit(language, 1)
)
}

You might also like